7.5 KiB
Edge Service Specification
This document specifies the behavior of the server that accepts submissions from HTTP clients e.g. Firefox telemetry.
General Data Flow
HTTP submissions come in from the wild, hit a load balancer, then optionally an nginx proxy, then the HTTP edge server described in this document. Data is accepted via a POST/PUT request from clients, which the server will wrap in a PubSub message and forward to Google Cloud PubSub, where any further processing, analysis, and storage will be handled.
Namespaces
Namespaces are used to control the processing of data from different types of clients, from the metadata that is collected to the destinations where the data is written, processed and accessible. Data sent to a namespace that is not specifically configured is assumed to be in the non-Telemetry JSON format described here. To request a new namespace configuration file a bug against the Data Platform Team with a short description of what the namespace will be used for and the desired configuration options.
Forwarding to the pipeline
The message is written to PubSub. If the message cannot be written to PubSub it is written to a disk queue that will periodically retry writing to PubSub.
PubSub Message Schema
required string data // base64 encoded body
required group attributes {
required string submission_timestamp // server time, ISO 8601 with microseconds and timezone "Z", example: "2018-03-12T21:02:18.123456Z"
required string uri // example: "/submit/telemetry/6c49ec73-4350-45a0-9c8a-6c8f5aded0cf/main/Firefox/58.0.2/release/20180206200532"
required string protocol // example: "HTTP/1.1"
required string method // example: "POST"
optional string args // query parameters, example: "v=4"
// Headers
optional string remote_addr // usually a load balancer, example: "172.31.32.5"
optional string content_length // example: "4722"
optional string date // example: "Mon, 12 Mar 2018 21:02:18 GMT"
optional string dnt // example: "1"
optional string host // example: "incoming.telemetry.mozilla.org"
optional string user_agent // example: "pingsender/1.0"
optional string x_forwarded_for // example: "10.98.132.74, 103.3.237.12"
optional string x_pingsender_version // example: "1.0"
optional string x_debug_id // example: "my_debug_session_1"
optional string x_pipeline_proxy // time that the AWS->GCP tee received the message, ignored, example: "2018-03-12T21:02:18.123456Z"
optional string x_telemetry_agent // example: "Glean/0.40.0 (Kotlin on Android)"
optional string x_source_tags // example: "automation, other"
optional string x_foxsec_ip_reputation // example: "95"
optional string x_lb_tags // example: "TLSv1.3, 009C"
}
Server Request/Response
GET Request
Endpoint | Description |
---|---|
/__heartbeat__ |
check if service is healthy, and can reach PubSub or has space to store requests on disk |
/__lbheartbeat__ |
check if service is running |
/__version__ |
return Dockerflow version object |
GET Response codes
- 200 - ok, check succeeded
- 204 - ok, check succeeded, no response body
- 404 - not found, check doesn't exist
- 500 - all is not well
- 507 - insufficient storage, should occur at some configurable limit before disk is full
POST/PUT Request
Treat POST and PUT the same. Accept POST or PUT to URLs of the form
^/submit/namespace/[/dimensions]$
Example Telemetry format:
/submit/telemetry/docId/docType/appName/appVersion/appUpdateChannel/appBuildID
Specific Telemetry example:
/submit/telemetry/ce39b608-f595-4c69-b6a6-f7a436604648/main/Firefox/61.0a1/nightly/20180328030202
Example non-Telemetry format:
/submit/namespace/docType/docVersion/docId
Specific non-Telemetry example:
/submit/eng-workflow/hgpush/1/2c3a0767-d84a-4d02-8a92-fa54a3376049
Note that docId
above is a unique document ID, which is used for de-duping
submissions. This is not intended to be the clientId
field from Telemetry.
docId
is required and must be a UUID.
Legacy Systems
Accept TLS Error Reports
as POST or PUT to /submit/sslreports
with no docType
, docVersion
, or
docId
.
Accept Stub Installer pings
as GET to /stub/[docVersion]/[dimensions]
, with no docType
or docId
, and
over both HTTP and HTTPS. Use POST/PUT Response codes,
even though this endpoint is for GET requests.
POST/PUT Response codes
- 200 - ok, request accepted into the pipeline
- 400 - bad request, for example an unencoded space in the URL
- 404 - not found, for example using a telemetry format URL in a non-telemetry namespace or vice-versa
- 411 - missing content-length header
- 413 - request body too large (note that if we have badly-behaved clients that retry on
4XX
, we should send back 202 on body/path too long). - 414 - request path too long (see above)
- 500 - internal error
- 507 - insufficient storage, request failed because disk is full
Other Response codes
- 405 - wrong request type (anything other than GET|POST|PUT)
Other Considerations
Compression
It is not desirable to do decompression on the edge node. We want to pass along messages from the HTTP Edge node without "cracking the egg" of the payload.
We may also receive badly formed payloads, and we will want to track the incidence of such things.
Bad Messages
Since the actual message is not examined by the edge server the only failures that occur are defined by the response status codes above. Messages are only forwarded to the pipeline when a response code of 200 is returned to the client.
PubSub Topics
All messages that sent a response code of 200 are forwarded to a single PubSub topic for decoding and landfill.
GeoIP Lookups
No GeoIP lookup is performed by the edge server. If a client IP is available then the PubSub consumer performs the lookup and then discards the IP before the message is forwarded to a decoded PubSub topic.
Data Retention
The edge server only stores data when PubSub cannot be reached, and removes data after it is successfully written to PubSub. Down scaling will be disabled for the Kubernetes pod and cluster when data is being stored, so that data is not lost.
Submission Timestamp Format
submission_timestamp
is formatted as ISO 8601 with microseconds and timezone,
because it is compatible with BigQuery's
Timestamp Type,
so that the field doesn't need transformation.