gcp-ingestion

Граф коммитов

Автор	SHA1	Сообщение	Дата
whd	9100a2d529	Remove pioneer/rally special cases (#2621 ) Co-authored-by: akkomar <akkomar@users.noreply.github.com>	2024-10-16 19:26:16 +02:00
Mikaël Ducharme	73c92c0f9e	chore(doc): Update ingestion diagram. (#2490 )	2023-11-13 16:46:22 -06:00
Daniel Thorn	e9edfd30c0	Simplify proxy handling (#2271 ) and remove logic for x-pipeline-proxy Co-authored-by: whd <whd@users.noreply.github.com>	2023-06-26 17:19:02 +00:00
Jeff Klukas	4f0d1b3021	Bug 1737861 Tag pings from BrowserStack ISP as automation (#2031 ) * Bug 1737861 Tag pings from BrowserStack ISP as automation See https://bugzilla.mozilla.org/show_bug.cgi?id=1757216 * Add non-tagged test * lower automation in docs	2022-03-23 09:42:29 -04:00
Jeff Klukas	7815c1264c	Bug 1742172 Add SanitizeAttributes for timestamp granularity (#1896 ) * Bug 1742172 Add SanitizeAttributes for timestamp granularity See https://bugzilla.mozilla.org/show_bug.cgi?id=1742172 for more on the design here.	2021-11-22 16:37:22 -05:00
whd	c0dabae9e7	Bug 1736466 Add X-LB-Tags header (#1880 )	2021-11-09 19:16:29 +00:00
Daniel Thorn	fa98ac0c8f	Remove support for Dataflow classic templates (#1789 )	2021-08-17 12:02:23 -07:00
Jeff Klukas	d5fac21565	Bug 1719353 Support X-Foxsec-IP-Reputation header (#1745 ) * Bug 1719353 Support X-Foxsec-IP-Reputation header Co-authored-by: whd <whd@users.noreply.github.com>	2021-07-07 13:35:00 -04:00
Jeff Klukas	123956bc37	Bug 1711706 Support X-Telemetry-Agent header (#1699 ) See https://bugzilla.mozilla.org/show_bug.cgi?id=1711706	2021-05-18 15:15:40 -04:00
Daniel Thorn	494a2fc5b6	Add support for publishing Dataflow flex templates (#1549 )	2021-04-08 10:01:42 -07:00
whd	93ff621946	Rename default branch (#1636 )	2021-04-06 20:58:49 +00:00
Jeff Klukas	e1c97fdad5	Remove Memorystore and AET from architecture overview (#1610 ) Both of these features have been removed from the infrastructure.	2021-03-18 14:31:40 -04:00
Jeff Klukas	1ea77ec0aa	Update pain_points.md (#1594 ) * Update pain_points.md * Prettier	2021-03-10 15:37:50 -05:00
Jeff Klukas	2c494f48eb	Update docs about deduplication (#1577 ) * Update docs about deduplication To match reality of https://bugzilla.mozilla.org/show_bug.cgi?id=1694764 and allow for clear communication to data users about the change. A separate follow-up will be to remove code from the repository that handles interaction with Redis. * Apply suggestions from code review Co-authored-by: Daniel Thorn <dthorn@mozilla.com> Co-authored-by: Daniel Thorn <dthorn@mozilla.com>	2021-02-25 15:57:37 -05:00
Daniel Thorn	bc7854b0b5	Format markdown with prettier and enforce in CI (#1399 )	2020-09-10 13:04:04 -07:00
Jeff Klukas	0893023ea7	Bug 1655477 Support X-Source-Tags header (#1328 )	2020-08-03 15:02:00 -04:00
Jeff Klukas	f15656f203	Annotate some lines on diagram	2020-06-12 10:47:47 -04:00
Jeff Klukas	764101dee8	Add error topic and sink	2020-06-12 10:47:47 -04:00
Jeff Klukas	e3f2d1c097	Update architecture overview This corrects info on how AET support is deployed, updates some outdated info (sinks are now all Kube jobs rather than Dataflow), and now mentions the concept of pipeline families.	2020-06-12 10:47:47 -04:00
Jeff Klukas	e5b390c7b3	Update docs	2020-06-10 10:46:09 -04:00
Jeff Klukas	32d6c45026	Implement proposal for flagging unwanted data See [the proposal Google doc](https://docs.google.com/document/d/1VWI-PsD9tFaKVWp5hC75lu65lrtgDEBhO4S7XYvHkqQ/edit#) We refactor some code here, fill in some documentation, add a new exception type, and also add a single new case of unwanted data, fixing https://bugzilla.mozilla.org/show_bug.cgi?id=1612933	2020-02-27 15:49:29 -05:00
Jeff Klukas	5e345c193d	Specify 413	2020-02-06 13:53:27 -05:00
Jeff Klukas	ae09d47b68	Limit decoded payload size to 8 MB Closes #776 and also addresses documentation request in https://bugzilla.mozilla.org/show_bug.cgi?id=1605503	2020-02-06 13:53:27 -05:00
Jeff Klukas	30d4707d1d	Update docs to match current state This updates the docs to better reflect the names we actually use for components, adapt to how most of our sinks are now using `ingestion-edge`, etc.	2020-02-05 13:39:58 -05:00
Anna Scholtz	cad125dab8	Update instructions to download GeoLite2-City dataset and fix markdown format	2020-01-09 14:01:38 -08:00
William Lachance	095270056b	More doc tweaks (#740 ) * Add a repo url This will let people jump directly to editing the docs from the UI, in case further tweaks are required. * Fix omitted renaming of "edge server" to "edge service" * Hide the prev/next buttons by default They aren't very useful and cause the header to collapse vertically in a very awkward way.	2019-08-13 13:53:37 -04:00
William Lachance	85863eef1e	Create an mkdocs-based site (#732 ) This substantially reorganizes the documentation as an mkdocs site. Main changes: * All documentation is now browseable and searchable in a single site, with handy table of contents on the side of each section * Top-level README significantly slimmed down (just pointing to docs site) * READMEs inside individual components removed (moved to subdirectories inside docs/ folder, accessible via top-level in docs site)	2019-08-12 17:06:19 -04:00
Jeff Klukas	51e99f7e6b	Coerce camelCase field names to snake_case in BQ sink (#689 ) Fixes https://github.com/mozilla/gcp-ingestion/issues/671	2019-07-17 15:03:30 -04:00
Jeff Klukas	a85073e2de	Bug 1559411 Reconfigure per-namespace republishing (#676 ) See https://bugzilla.mozilla.org/show_bug.cgi?id=1559411	2019-06-27 15:50:22 -04:00
Jeff Klukas	def75b13bd	Support republishing by random sample (#658 ) Closes #650	2019-06-14 16:56:38 -04:00
Jeff Klukas	898ff6f28b	Document ParseProxy transform (#636 ) * Document ParseProxy transform A follow-up to #468 suggested in https://github.com/mozilla/gcp-ingestion/issues/496#issuecomment-497164953 * Respond to review comments * Header details to ingestion-edge readme, etc.	2019-05-30 17:03:52 -04:00
Jeff Klukas	ac0e3ee3e0	Docs update (#625 ) Fixes https://github.com/mozilla/gcp-ingestion/issues/468 Co-Authored-By: whd <whd@users.noreply.github.com>	2019-05-29 13:40:43 -04:00
Jeff Klukas	9787889120	Factor Republisher and support per-namespace topics (#619 ) Pocket has a need to ingest all doctypes associated with the `activity-stream` namespace; it seems efficient to be able to deliver one topic rather than multiple topics per docType. We also take this chance to refactor the per-channel and per-doctype logic out of Republisher proper into their own dedicated transforms. As part of the refactor, we make the execution graph cleaner and a bit more efficient. We now partition per channel or doctype first, rather than having each configured channel and doctype branch directly off the initial input. We also give the output transforms names specific to the channel or doctype.	2019-05-24 15:07:22 -04:00
Anthony Miyaguchi	b2a64c304e	Add documentation on testing BigQuery from ingestion-beam (#543 ) * Create documentation on testing BigQuery from ingestion-beam. * Add a mermaid config file to the docs directory for overflow * Fix spelling mistakes * Move mermaid config to top-level .mermaid	2019-04-29 11:27:44 -07:00
Jeff Klukas	d987f3246f	Define republisher destinations at compile-time (#527 ) Fixes #524	2019-04-12 06:34:07 +00:00
Jeff Klukas	700a46618a	Republisher Fixes #479 and #396 Adds the Republisher job as previously spec'd in `docs/architecture`. Before this is deployed, we will need to create appropriate output topics for the intended configuration. Once this is deployed and stable, we can remove the decoded topic consumer from Decoder. There will be a short overlap period where both the Decoder and Republisher are marking messages as seen in Redis, but this shouldn't cause any problems (other than the expense of two consumers).	2019-04-05 14:38:04 -04:00
Mark Reid	ce949ca2b6	Update the colour for Monitoring Topics	2019-02-28 09:48:13 -05:00
Jeff Klukas	e698c84204	Documentation for refactor with new Republisher job (#471 ) * Documentation for refactor with new Republisher job The Republisher job proposed in this PR would factor out MarkAsSeen from Decoder which would lead to a more logical flow of data. It also allows us to share the expense of the MarkAsSeen read with the read needed to inspect message contents and republish to smaller topics. Specifically, the Republisher for structured ingestion would check for the new debug header and publish message containing that header to a debug topic per the Glean request in #458. The Republisher for telemetry data would randomly sample messages to produce the monitoring topics discussed in #396. * Update diagram and include docker wrapper	2019-02-27 17:19:04 -05:00
Daniel Thorn	ce99185ee9	Add design documentation for whole pipeline (#114 )	2018-11-08 08:44:33 -05:00

39 Коммитов