whd
9100a2d529
Remove pioneer/rally special cases ( #2621 )
...
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2024-10-16 19:26:16 +02:00
whd
2f5160c85c
Update documentation to remove reference to PBD ( #2528 )
2024-01-26 19:03:20 +00:00
Mikaël Ducharme
73c92c0f9e
chore(doc): Update ingestion diagram. ( #2490 )
2023-11-13 16:46:22 -06:00
Daniel Thorn
e9edfd30c0
Simplify proxy handling ( #2271 )
...
and remove logic for x-pipeline-proxy
Co-authored-by: whd <whd@users.noreply.github.com>
2023-06-26 17:19:02 +00:00
Jeff Klukas
46f7db0b17
Add links to batch Contextual Services data flows ( #2108 )
...
* Add links to batch Contextual Services data flows
* Lint and spelling
2022-06-15 11:38:23 -05:00
Jeff Klukas
f90917b490
Update description for new streaming jobs ( #2056 )
...
* Update description for new streaming jobs
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2022-04-21 09:31:51 -07:00
Jeff Klukas
988726268a
Refactor top-level docs out of README ( #2048 )
...
* Refactor top-level docs out of README
Follow-up to https://github.com/mozilla/gcp-ingestion/pull/2047 which
I had thought would update the rendered docs.
* spelling
2022-04-06 15:58:44 -04:00
Jeff Klukas
4f0d1b3021
Bug 1737861 Tag pings from BrowserStack ISP as automation ( #2031 )
...
* Bug 1737861 Tag pings from BrowserStack ISP as automation
See https://bugzilla.mozilla.org/show_bug.cgi?id=1757216
* Add non-tagged test
* lower automation in docs
2022-03-23 09:42:29 -04:00
Jeff Klukas
7815c1264c
Bug 1742172 Add SanitizeAttributes for timestamp granularity ( #1896 )
...
* Bug 1742172 Add SanitizeAttributes for timestamp granularity
See https://bugzilla.mozilla.org/show_bug.cgi?id=1742172 for more on the design here.
2021-11-22 16:37:22 -05:00
whd
c0dabae9e7
Bug 1736466 Add X-LB-Tags header ( #1880 )
2021-11-09 19:16:29 +00:00
Jeff Klukas
3632e88887
Another fix to diagram path ( #1826 )
2021-09-17 11:48:57 -04:00
Jeff Klukas
7b8cfa29c2
Fix diagram link ( #1825 )
2021-09-16 11:23:16 -04:00
Jeff Klukas
0f7c0e3529
Publish docs on ContextualServicesReporter ( #1824 )
...
* Publish docs on ContextualServicesReporter
* Apply suggestions from code review
Co-authored-by: Will Lachance <wlachance@mozilla.com>
* Link to data blog post
Co-authored-by: Will Lachance <wlachance@mozilla.com>
2021-09-16 10:57:45 -04:00
Ben Wu
34dbde4b3e
Create doc page for ctxsvc reporter ( #1801 )
2021-08-26 07:30:35 -07:00
Daniel Thorn
cfc826cf65
Upgrade ingestion-sink to java 11 ( #1791 )
2021-08-23 13:28:54 -04:00
Daniel Thorn
fa98ac0c8f
Remove support for Dataflow classic templates ( #1789 )
2021-08-17 12:02:23 -07:00
Jeff Klukas
d5fac21565
Bug 1719353 Support X-Foxsec-IP-Reputation header ( #1745 )
...
* Bug 1719353 Support X-Foxsec-IP-Reputation header
Co-authored-by: whd <whd@users.noreply.github.com>
2021-07-07 13:35:00 -04:00
Jeff Klukas
123956bc37
Bug 1711706 Support X-Telemetry-Agent header ( #1699 )
...
See https://bugzilla.mozilla.org/show_bug.cgi?id=1711706
2021-05-18 15:15:40 -04:00
Anthony Miyaguchi
1da3d3202b
Add basic documentation for rally decoder job ( #1673 )
...
* Add basic documentation for rally decoder job
* Update docs/ingestion-beam/rally-job.md
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
* Update spelling and run prettier
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2021-05-10 21:15:50 +02:00
Daniel Thorn
494a2fc5b6
Add support for publishing Dataflow flex templates ( #1549 )
2021-04-08 10:01:42 -07:00
whd
93ff621946
Rename default branch ( #1636 )
2021-04-06 20:58:49 +00:00
Jeff Klukas
f96fb3b8c6
Update rendered docs in line with bigquery-etl ( #1611 )
...
Uses the `material` theme and some additional plugins.
2021-03-19 06:17:37 -07:00
Jeff Klukas
e1c97fdad5
Remove Memorystore and AET from architecture overview ( #1610 )
...
Both of these features have been removed from the infrastructure.
2021-03-18 14:31:40 -04:00
Jeff Klukas
1ea77ec0aa
Update pain_points.md ( #1594 )
...
* Update pain_points.md
* Prettier
2021-03-10 15:37:50 -05:00
Jeff Klukas
2c494f48eb
Update docs about deduplication ( #1577 )
...
* Update docs about deduplication
To match reality of https://bugzilla.mozilla.org/show_bug.cgi?id=1694764
and allow for clear communication to data users about the change.
A separate follow-up will be to remove code from the repository that
handles interaction with Redis.
* Apply suggestions from code review
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2021-02-25 15:57:37 -05:00
Daniel Thorn
bc7854b0b5
Format markdown with prettier and enforce in CI ( #1399 )
2020-09-10 13:04:04 -07:00
Daniel Thorn
f573460e9a
edge disk cleanup service ( #1322 )
2020-08-12 14:36:13 -07:00
Mark Reid
cd6548d2c5
Update index.md ( #1327 )
2020-08-06 10:59:26 -03:00
Jeff Klukas
0893023ea7
Bug 1655477 Support X-Source-Tags header ( #1328 )
2020-08-03 15:02:00 -04:00
Daniel Thorn
d375025ff9
add sink documentation ( #1212 )
2020-07-28 15:56:24 -07:00
Jeff Klukas
f15656f203
Annotate some lines on diagram
2020-06-12 10:47:47 -04:00
Jeff Klukas
764101dee8
Add error topic and sink
2020-06-12 10:47:47 -04:00
Jeff Klukas
e3f2d1c097
Update architecture overview
...
This corrects info on how AET support is deployed, updates some outdated info
(sinks are now all Kube jobs rather than Dataflow), and now mentions the
concept of pipeline families.
2020-06-12 10:47:47 -04:00
Jeff Klukas
e5b390c7b3
Update docs
2020-06-10 10:46:09 -04:00
Anna Scholtz
ffdc653b32
Remove --perDocTypeEnabledList and --perDocTypeDestination options
2020-04-01 11:39:11 -07:00
Jeff Klukas
32d6c45026
Implement proposal for flagging unwanted data
...
See [the proposal Google
doc](https://docs.google.com/document/d/1VWI-PsD9tFaKVWp5hC75lu65lrtgDEBhO4S7XYvHkqQ/edit# )
We refactor some code here, fill in some documentation,
add a new exception type, and also add a single new case of unwanted data,
fixing https://bugzilla.mozilla.org/show_bug.cgi?id=1612933
2020-02-27 15:49:29 -05:00
Jeff Klukas
5e345c193d
Specify 413
2020-02-06 13:53:27 -05:00
Jeff Klukas
ae09d47b68
Limit decoded payload size to 8 MB
...
Closes #776 and also addresses documentation request in
https://bugzilla.mozilla.org/show_bug.cgi?id=1605503
2020-02-06 13:53:27 -05:00
Jeff Klukas
539b7dee6b
Link to architecture section
2020-02-05 13:44:15 -05:00
Jeff Klukas
30d4707d1d
Update docs to match current state
...
This updates the docs to better reflect the names we actually use for
components, adapt to how most of our sinks are now using `ingestion-edge`,
etc.
2020-02-05 13:39:58 -05:00
Anna Scholtz
cad125dab8
Update instructions to download GeoLite2-City dataset and fix markdown format
2020-01-09 14:01:38 -08:00
Anthony Miyaguchi
65769c6208
Bug 1601139 - Update ingestion-beam integration workflow with sampled documents ( #1021 )
...
* Replace sampled-landfill with bq query
* Update samples for testing the pipeline
* Update documentation to include document sample
* Use CTE and remove reference to landfill
2019-12-06 12:24:02 -08:00
Daniel Thorn
192fc05d87
Replace maven inter-module deps with copying src ( #887 )
...
* Replace maven inter-module deps with copying src
* Update docs/ingestion-beam/sink-job.md
Co-Authored-By: Jeff Klukas <jeff@klukas.net>
2019-10-02 15:09:28 -07:00
William Lachance
3b9664deda
Create seperate sections for sink, decoder, and republish jobs in ingestion-beam ( #845 )
...
This makes it easier to navigate through the ingestion-beam documentation,
which is getting rather long.
2019-09-24 12:40:45 -07:00
William Lachance
143b6bcab3
Update instructions for running individual unit tests in ingestion-beam ( #844 )
2019-09-24 12:10:40 -07:00
Anthony Miyaguchi
f25e6ce1c6
Add script to flatten schema hierarchy for updating bq datasets and tables ( #832 )
...
* Add script to flatten schema hierarchy for updating bq datasets and tables
* Rename default output name of download-sampled-landfill
* Update script to write to a single dataset
* Add comments in reference commands for running dataflow job
* Selectively mount GOOGLE_APPLICATION_CREDENTIALS into maven container
* Apply suggestions from code review
Co-Authored-By: whd <whd@users.noreply.github.com>
* Use bin/mvn script in testing workflow document
2019-09-17 15:05:01 -07:00
Anthony Miyaguchi
4ece5011d3
Revert "Add script to flatten schema hierarchy for updating bq datasets and tables ( #827 )"
...
This reverts commit cfbe9c0f04
.
2019-09-16 15:13:04 -04:00
Anthony Miyaguchi
cfbe9c0f04
Add script to flatten schema hierarchy for updating bq datasets and tables ( #827 )
...
* Add script to flatten schema hierarchy for updating bq datasets and tables
* Update bin/mvn to mount GOOGLE_APPLICATION_CREDENTIALS
* Rename default output name of download-sampled-landfill
* Remove extra #s from reflowing comments
2019-09-16 11:46:39 -07:00
William Lachance
b0f74aad85
Add explicit building section for beam ( #825 )
2019-09-13 15:29:07 -04:00
William Lachance
4ac1578fbe
Add references to the class definitions for each ingestion-beam job in docs ( #790 )
...
Makes the mapping from source to implementation a little easier.
2019-09-03 13:49:19 -04:00