Граф коммитов

39 Коммитов

Автор SHA1 Сообщение Дата
whd 9100a2d529
Remove pioneer/rally special cases (#2621)
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2024-10-16 19:26:16 +02:00
Mikaël Ducharme 73c92c0f9e
chore(doc): Update ingestion diagram. (#2490) 2023-11-13 16:46:22 -06:00
Daniel Thorn e9edfd30c0
Simplify proxy handling (#2271)
and remove logic for x-pipeline-proxy

Co-authored-by: whd <whd@users.noreply.github.com>
2023-06-26 17:19:02 +00:00
Jeff Klukas 4f0d1b3021
Bug 1737861 Tag pings from BrowserStack ISP as automation (#2031)
* Bug 1737861 Tag pings from BrowserStack ISP as automation

See https://bugzilla.mozilla.org/show_bug.cgi?id=1757216

* Add non-tagged test

* lower automation in docs
2022-03-23 09:42:29 -04:00
Jeff Klukas 7815c1264c
Bug 1742172 Add SanitizeAttributes for timestamp granularity (#1896)
* Bug 1742172 Add SanitizeAttributes for timestamp granularity

See https://bugzilla.mozilla.org/show_bug.cgi?id=1742172 for more on the design here.
2021-11-22 16:37:22 -05:00
whd c0dabae9e7
Bug 1736466 Add X-LB-Tags header (#1880) 2021-11-09 19:16:29 +00:00
Daniel Thorn fa98ac0c8f
Remove support for Dataflow classic templates (#1789) 2021-08-17 12:02:23 -07:00
Jeff Klukas d5fac21565
Bug 1719353 Support X-Foxsec-IP-Reputation header (#1745)
* Bug 1719353 Support X-Foxsec-IP-Reputation header

Co-authored-by: whd <whd@users.noreply.github.com>
2021-07-07 13:35:00 -04:00
Jeff Klukas 123956bc37
Bug 1711706 Support X-Telemetry-Agent header (#1699)
See https://bugzilla.mozilla.org/show_bug.cgi?id=1711706
2021-05-18 15:15:40 -04:00
Daniel Thorn 494a2fc5b6
Add support for publishing Dataflow flex templates (#1549) 2021-04-08 10:01:42 -07:00
whd 93ff621946
Rename default branch (#1636) 2021-04-06 20:58:49 +00:00
Jeff Klukas e1c97fdad5
Remove Memorystore and AET from architecture overview (#1610)
Both of these features have been removed from the infrastructure.
2021-03-18 14:31:40 -04:00
Jeff Klukas 1ea77ec0aa
Update pain_points.md (#1594)
* Update pain_points.md

* Prettier
2021-03-10 15:37:50 -05:00
Jeff Klukas 2c494f48eb
Update docs about deduplication (#1577)
* Update docs about deduplication

To match reality of https://bugzilla.mozilla.org/show_bug.cgi?id=1694764
and allow for clear communication to data users about the change.

A separate follow-up will be to remove code from the repository that
handles interaction with Redis.

* Apply suggestions from code review

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2021-02-25 15:57:37 -05:00
Daniel Thorn bc7854b0b5
Format markdown with prettier and enforce in CI (#1399) 2020-09-10 13:04:04 -07:00
Jeff Klukas 0893023ea7
Bug 1655477 Support X-Source-Tags header (#1328) 2020-08-03 15:02:00 -04:00
Jeff Klukas f15656f203 Annotate some lines on diagram 2020-06-12 10:47:47 -04:00
Jeff Klukas 764101dee8 Add error topic and sink 2020-06-12 10:47:47 -04:00
Jeff Klukas e3f2d1c097 Update architecture overview
This corrects info on how AET support is deployed, updates some outdated info
(sinks are now all Kube jobs rather than Dataflow), and now mentions the
concept of pipeline families.
2020-06-12 10:47:47 -04:00
Jeff Klukas e5b390c7b3 Update docs 2020-06-10 10:46:09 -04:00
Jeff Klukas 32d6c45026 Implement proposal for flagging unwanted data
See [the proposal Google
doc](https://docs.google.com/document/d/1VWI-PsD9tFaKVWp5hC75lu65lrtgDEBhO4S7XYvHkqQ/edit#)

We refactor some code here, fill in some documentation,
add a new exception type, and also add a single new case of unwanted data,
fixing https://bugzilla.mozilla.org/show_bug.cgi?id=1612933
2020-02-27 15:49:29 -05:00
Jeff Klukas 5e345c193d Specify 413 2020-02-06 13:53:27 -05:00
Jeff Klukas ae09d47b68 Limit decoded payload size to 8 MB
Closes #776 and also addresses documentation request in
https://bugzilla.mozilla.org/show_bug.cgi?id=1605503
2020-02-06 13:53:27 -05:00
Jeff Klukas 30d4707d1d Update docs to match current state
This updates the docs to better reflect the names we actually use for
components, adapt to how most of our sinks are now using `ingestion-edge`,
etc.
2020-02-05 13:39:58 -05:00
Anna Scholtz cad125dab8 Update instructions to download GeoLite2-City dataset and fix markdown format 2020-01-09 14:01:38 -08:00
William Lachance 095270056b
More doc tweaks (#740)
* Add a repo url

This will let people jump directly to editing the docs from the UI,
in case further tweaks are required.

* Fix omitted renaming of "edge server" to "edge service"

* Hide the prev/next buttons by default

They aren't very useful and cause the header to collapse vertically
in a very awkward way.
2019-08-13 13:53:37 -04:00
William Lachance 85863eef1e
Create an mkdocs-based site (#732)
This substantially reorganizes the documentation as an mkdocs site. Main
changes:

* All documentation is now browseable and searchable in a single site, with
  handy table of contents on the side of each section
* Top-level README significantly slimmed down (just pointing to docs site)
* READMEs inside individual components removed (moved to subdirectories inside
  docs/ folder, accessible via top-level in docs site)
2019-08-12 17:06:19 -04:00
Jeff Klukas 51e99f7e6b
Coerce camelCase field names to snake_case in BQ sink (#689)
Fixes https://github.com/mozilla/gcp-ingestion/issues/671
2019-07-17 15:03:30 -04:00
Jeff Klukas a85073e2de
Bug 1559411 Reconfigure per-namespace republishing (#676)
See https://bugzilla.mozilla.org/show_bug.cgi?id=1559411
2019-06-27 15:50:22 -04:00
Jeff Klukas def75b13bd
Support republishing by random sample (#658)
Closes #650
2019-06-14 16:56:38 -04:00
Jeff Klukas 898ff6f28b
Document ParseProxy transform (#636)
* Document ParseProxy transform

A follow-up to #468 suggested in https://github.com/mozilla/gcp-ingestion/issues/496#issuecomment-497164953

* Respond to review comments

* Header details to ingestion-edge readme, etc.
2019-05-30 17:03:52 -04:00
Jeff Klukas ac0e3ee3e0
Docs update (#625)
Fixes https://github.com/mozilla/gcp-ingestion/issues/468

Co-Authored-By: whd <whd@users.noreply.github.com>
2019-05-29 13:40:43 -04:00
Jeff Klukas 9787889120
Factor Republisher and support per-namespace topics (#619)
Pocket has a need to ingest all doctypes associated with the `activity-stream`
namespace; it seems efficient to be able to deliver one topic rather than
multiple topics per docType.

We also take this chance to refactor the per-channel and per-doctype logic
out of Republisher proper into their own dedicated transforms.

As part of the refactor, we make the execution graph cleaner and a bit more
efficient. We now partition per channel or doctype first, rather than having
each configured channel and doctype branch directly off the initial input.
We also give the output transforms names specific to the channel or doctype.
2019-05-24 15:07:22 -04:00
Anthony Miyaguchi b2a64c304e
Add documentation on testing BigQuery from ingestion-beam (#543)
* Create documentation on testing BigQuery from ingestion-beam.

* Add a mermaid config file to the docs directory for overflow

* Fix spelling mistakes

* Move mermaid config to top-level .mermaid
2019-04-29 11:27:44 -07:00
Jeff Klukas d987f3246f Define republisher destinations at compile-time (#527)
Fixes #524
2019-04-12 06:34:07 +00:00
Jeff Klukas 700a46618a Republisher
Fixes #479 and #396

Adds the Republisher job as previously spec'd in `docs/architecture`.

Before this is deployed, we will need to create appropriate output topics
for the intended configuration.

Once this is deployed and stable, we can remove the decoded topic consumer
from Decoder. There will be a short overlap period where both the Decoder
and Republisher are marking messages as seen in Redis, but this shouldn't
cause any problems (other than the expense of two consumers).
2019-04-05 14:38:04 -04:00
Mark Reid ce949ca2b6 Update the colour for Monitoring Topics 2019-02-28 09:48:13 -05:00
Jeff Klukas e698c84204
Documentation for refactor with new Republisher job (#471)
* Documentation for refactor with new Republisher job

The Republisher job proposed in this PR would factor out MarkAsSeen
from Decoder which would lead to a more logical flow of data.
It also allows us to share the expense of the MarkAsSeen read with
the read needed to inspect message contents and republish to
smaller topics.

Specifically, the Republisher for structured ingestion would check for
the new debug header and publish message containing that header to
a debug topic per the Glean request in #458.

The Republisher for telemetry data would randomly sample messages to
produce the monitoring topics discussed in #396.

* Update diagram and include docker wrapper
2019-02-27 17:19:04 -05:00
Daniel Thorn ce99185ee9 Add design documentation for whole pipeline (#114) 2018-11-08 08:44:33 -05:00