Граф коммитов

88 Коммитов

Автор SHA1 Сообщение Дата
whd 9100a2d529
Remove pioneer/rally special cases (#2621)
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2024-10-16 19:26:16 +02:00
whd 2f5160c85c
Update documentation to remove reference to PBD (#2528) 2024-01-26 19:03:20 +00:00
Mikaël Ducharme 73c92c0f9e
chore(doc): Update ingestion diagram. (#2490) 2023-11-13 16:46:22 -06:00
Daniel Thorn e9edfd30c0
Simplify proxy handling (#2271)
and remove logic for x-pipeline-proxy

Co-authored-by: whd <whd@users.noreply.github.com>
2023-06-26 17:19:02 +00:00
Jeff Klukas 46f7db0b17
Add links to batch Contextual Services data flows (#2108)
* Add links to batch Contextual Services data flows

* Lint and spelling
2022-06-15 11:38:23 -05:00
Jeff Klukas f90917b490
Update description for new streaming jobs (#2056)
* Update description for new streaming jobs

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2022-04-21 09:31:51 -07:00
Jeff Klukas 988726268a
Refactor top-level docs out of README (#2048)
* Refactor top-level docs out of README

Follow-up to https://github.com/mozilla/gcp-ingestion/pull/2047 which
I had thought would update the rendered docs.

* spelling
2022-04-06 15:58:44 -04:00
Jeff Klukas 4f0d1b3021
Bug 1737861 Tag pings from BrowserStack ISP as automation (#2031)
* Bug 1737861 Tag pings from BrowserStack ISP as automation

See https://bugzilla.mozilla.org/show_bug.cgi?id=1757216

* Add non-tagged test

* lower automation in docs
2022-03-23 09:42:29 -04:00
Jeff Klukas 7815c1264c
Bug 1742172 Add SanitizeAttributes for timestamp granularity (#1896)
* Bug 1742172 Add SanitizeAttributes for timestamp granularity

See https://bugzilla.mozilla.org/show_bug.cgi?id=1742172 for more on the design here.
2021-11-22 16:37:22 -05:00
whd c0dabae9e7
Bug 1736466 Add X-LB-Tags header (#1880) 2021-11-09 19:16:29 +00:00
Jeff Klukas 3632e88887
Another fix to diagram path (#1826) 2021-09-17 11:48:57 -04:00
Jeff Klukas 7b8cfa29c2
Fix diagram link (#1825) 2021-09-16 11:23:16 -04:00
Jeff Klukas 0f7c0e3529
Publish docs on ContextualServicesReporter (#1824)
* Publish docs on ContextualServicesReporter

* Apply suggestions from code review

Co-authored-by: Will Lachance <wlachance@mozilla.com>

* Link to data blog post

Co-authored-by: Will Lachance <wlachance@mozilla.com>
2021-09-16 10:57:45 -04:00
Ben Wu 34dbde4b3e
Create doc page for ctxsvc reporter (#1801) 2021-08-26 07:30:35 -07:00
Daniel Thorn cfc826cf65
Upgrade ingestion-sink to java 11 (#1791) 2021-08-23 13:28:54 -04:00
Daniel Thorn fa98ac0c8f
Remove support for Dataflow classic templates (#1789) 2021-08-17 12:02:23 -07:00
Jeff Klukas d5fac21565
Bug 1719353 Support X-Foxsec-IP-Reputation header (#1745)
* Bug 1719353 Support X-Foxsec-IP-Reputation header

Co-authored-by: whd <whd@users.noreply.github.com>
2021-07-07 13:35:00 -04:00
Jeff Klukas 123956bc37
Bug 1711706 Support X-Telemetry-Agent header (#1699)
See https://bugzilla.mozilla.org/show_bug.cgi?id=1711706
2021-05-18 15:15:40 -04:00
Anthony Miyaguchi 1da3d3202b
Add basic documentation for rally decoder job (#1673)
* Add basic documentation for rally decoder job

* Update docs/ingestion-beam/rally-job.md

Co-authored-by: akkomar <akkomar@users.noreply.github.com>

* Update spelling and run prettier

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2021-05-10 21:15:50 +02:00
Daniel Thorn 494a2fc5b6
Add support for publishing Dataflow flex templates (#1549) 2021-04-08 10:01:42 -07:00
whd 93ff621946
Rename default branch (#1636) 2021-04-06 20:58:49 +00:00
Jeff Klukas f96fb3b8c6
Update rendered docs in line with bigquery-etl (#1611)
Uses the `material` theme and some additional plugins.
2021-03-19 06:17:37 -07:00
Jeff Klukas e1c97fdad5
Remove Memorystore and AET from architecture overview (#1610)
Both of these features have been removed from the infrastructure.
2021-03-18 14:31:40 -04:00
Jeff Klukas 1ea77ec0aa
Update pain_points.md (#1594)
* Update pain_points.md

* Prettier
2021-03-10 15:37:50 -05:00
Jeff Klukas 2c494f48eb
Update docs about deduplication (#1577)
* Update docs about deduplication

To match reality of https://bugzilla.mozilla.org/show_bug.cgi?id=1694764
and allow for clear communication to data users about the change.

A separate follow-up will be to remove code from the repository that
handles interaction with Redis.

* Apply suggestions from code review

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2021-02-25 15:57:37 -05:00
Daniel Thorn bc7854b0b5
Format markdown with prettier and enforce in CI (#1399) 2020-09-10 13:04:04 -07:00
Daniel Thorn f573460e9a
edge disk cleanup service (#1322) 2020-08-12 14:36:13 -07:00
Mark Reid cd6548d2c5
Update index.md (#1327) 2020-08-06 10:59:26 -03:00
Jeff Klukas 0893023ea7
Bug 1655477 Support X-Source-Tags header (#1328) 2020-08-03 15:02:00 -04:00
Daniel Thorn d375025ff9
add sink documentation (#1212) 2020-07-28 15:56:24 -07:00
Jeff Klukas f15656f203 Annotate some lines on diagram 2020-06-12 10:47:47 -04:00
Jeff Klukas 764101dee8 Add error topic and sink 2020-06-12 10:47:47 -04:00
Jeff Klukas e3f2d1c097 Update architecture overview
This corrects info on how AET support is deployed, updates some outdated info
(sinks are now all Kube jobs rather than Dataflow), and now mentions the
concept of pipeline families.
2020-06-12 10:47:47 -04:00
Jeff Klukas e5b390c7b3 Update docs 2020-06-10 10:46:09 -04:00
Anna Scholtz ffdc653b32 Remove --perDocTypeEnabledList and --perDocTypeDestination options 2020-04-01 11:39:11 -07:00
Jeff Klukas 32d6c45026 Implement proposal for flagging unwanted data
See [the proposal Google
doc](https://docs.google.com/document/d/1VWI-PsD9tFaKVWp5hC75lu65lrtgDEBhO4S7XYvHkqQ/edit#)

We refactor some code here, fill in some documentation,
add a new exception type, and also add a single new case of unwanted data,
fixing https://bugzilla.mozilla.org/show_bug.cgi?id=1612933
2020-02-27 15:49:29 -05:00
Jeff Klukas 5e345c193d Specify 413 2020-02-06 13:53:27 -05:00
Jeff Klukas ae09d47b68 Limit decoded payload size to 8 MB
Closes #776 and also addresses documentation request in
https://bugzilla.mozilla.org/show_bug.cgi?id=1605503
2020-02-06 13:53:27 -05:00
Jeff Klukas 539b7dee6b Link to architecture section 2020-02-05 13:44:15 -05:00
Jeff Klukas 30d4707d1d Update docs to match current state
This updates the docs to better reflect the names we actually use for
components, adapt to how most of our sinks are now using `ingestion-edge`,
etc.
2020-02-05 13:39:58 -05:00
Anna Scholtz cad125dab8 Update instructions to download GeoLite2-City dataset and fix markdown format 2020-01-09 14:01:38 -08:00
Anthony Miyaguchi 65769c6208
Bug 1601139 - Update ingestion-beam integration workflow with sampled documents (#1021)
* Replace sampled-landfill with bq query

* Update samples for testing the pipeline

* Update documentation to include document sample

* Use CTE and remove reference to landfill
2019-12-06 12:24:02 -08:00
Daniel Thorn 192fc05d87
Replace maven inter-module deps with copying src (#887)
* Replace maven inter-module deps with copying src

* Update docs/ingestion-beam/sink-job.md

Co-Authored-By: Jeff Klukas <jeff@klukas.net>
2019-10-02 15:09:28 -07:00
William Lachance 3b9664deda Create seperate sections for sink, decoder, and republish jobs in ingestion-beam (#845)
This makes it easier to navigate through the ingestion-beam documentation,
which is getting rather long.
2019-09-24 12:40:45 -07:00
William Lachance 143b6bcab3 Update instructions for running individual unit tests in ingestion-beam (#844) 2019-09-24 12:10:40 -07:00
Anthony Miyaguchi f25e6ce1c6
Add script to flatten schema hierarchy for updating bq datasets and tables (#832)
* Add script to flatten schema hierarchy for updating bq datasets and tables

* Rename default output name of download-sampled-landfill

* Update script to write to a single dataset

* Add comments in reference commands for running dataflow job

* Selectively mount GOOGLE_APPLICATION_CREDENTIALS into maven container

* Apply suggestions from code review

Co-Authored-By: whd <whd@users.noreply.github.com>

* Use bin/mvn script in testing workflow document
2019-09-17 15:05:01 -07:00
Anthony Miyaguchi 4ece5011d3 Revert "Add script to flatten schema hierarchy for updating bq datasets and tables (#827)"
This reverts commit cfbe9c0f04.
2019-09-16 15:13:04 -04:00
Anthony Miyaguchi cfbe9c0f04
Add script to flatten schema hierarchy for updating bq datasets and tables (#827)
* Add script to flatten schema hierarchy for updating bq datasets and tables

* Update bin/mvn to mount GOOGLE_APPLICATION_CREDENTIALS

* Rename default output name of download-sampled-landfill

* Remove extra #s from reflowing comments
2019-09-16 11:46:39 -07:00
William Lachance b0f74aad85
Add explicit building section for beam (#825) 2019-09-13 15:29:07 -04:00
William Lachance 4ac1578fbe
Add references to the class definitions for each ingestion-beam job in docs (#790)
Makes the mapping from source to implementation a little easier.
2019-09-03 13:49:19 -04:00