application-services/components/sync15
Ben Dean-Kawamura 8678932286 Roundtrip unknown fields in bookmark records
Store unknown fields in the mirror table (moz_bookmarks_synced) and send
them back in outgoing records.

Added tests for the use cases we need to support.

To help with this, I refactored incoming bookmark validation to use the
new `into_content_with_fixup()` method. This also makes payload
evolution easier, since we can now ingest all the unknown fields using
`serde(flatten)`.

`into_content_with_fixup()` is a new version of `into_content()` that
allows for a fixup stage where we massage the JSON data.  Refactored the
incoming bookmarks code to use this.
2023-03-28 13:02:49 -04:00
..
android Update android dependencies [ci full] (#4345) 2021-07-26 11:09:11 +10:00
ios Fixes swiftformat due to new version (#5419) 2023-03-06 15:27:44 -05:00
src Roundtrip unknown fields in bookmark records 2023-03-28 13:02:49 -04:00
Cargo.toml Remove the not-quite-functional test-utils feature from sync15 (#5275) 2022-12-06 10:42:52 +11:00
README.md Rename 'engine' and 'store' to conform with the sync15 readme. (#3757) 2020-12-15 11:14:43 +11:00

README.md

Low-level sync-1.5 helper component

This component contains utility code to be shared between different data stores that want to sync against a Firefox Sync v1.5 sync server. It handles things like encrypting/decrypting records, obtaining and using storage node auth tokens, and so-on.

There are 2 key concepts to understand here - the implementation itself, and a rust trait for a "syncable store" where component-specific logic lives - but before we dive into them, some preamble might help put things into context.

Nomenclature

  • The term "store" is generally used as the interface to the database - ie, the thing that gets and saves items. It can also be seen as supplying the API used by most consumers of the component. Note that the "places" component is alone in using the term "api" for this object.

  • The term "engine" (or ideally, "sync engine") is used for the thing that actually does the syncing for a store. Sync engines implement the SyncEngine trait - the trait is either implemented directly by a store, or a new object that has a reference to a store.

Introduction and History

For many years Sync has worked exclusively against a "sync v1.5 server". This is a REST API described here. The important part is that the API is conceptually quite simple - there are arbitrary "collections" containing "records" indexed by a GUID, and lacking traditonal database concepts like joins. Because the record is encrypted, there's very little scope for the server to be much smarter. Thus it's reasonably easy to create a fairly generic abstraction over the API that can be easily reused.

Back in the deep past, we found ourselves with 2 different components that needed to sync against a sync v1.5 server. The apps using these components didn't have schedulers or any UI for choosing what to sync - so these components just looked at the existing state of the engines on the server and synced if they were enabled.

This was also pre-megazord - the idea was that apps could choose from a "menu" of components to include - so we didn't really want to bind these components together. Therefore, there was no concept of "sync all" - instead, each of the components had to be synced individually. So this component started out as more of a "library" than a "component" which individual components could reuse - and each of these components was a "syncable store" (ie, a store which could supply a "sync engine").

Fast forward to Fenix and we needed a UI for managing all the engines supported there, and a single "sync now" experience etc - so we also have a sync_manager component - see its README for more. But even though it exists, there are still some parts of this component that reflect these early days - for example, it's still possible to sync just a single component using sync15 (ie, without going via the "sync manager"), although this isn't used and should be removed - the "sync manager" allows you to choose which engines to sync, so that should be used exclusively.

Metadata

There's some metadata associated with a sync. Some of the metadata is "global" to the app (eg, the enabled state of engines, information about what servers to use, etc) and some is specific to an engine (eg, timestamp of the server's collection for this engine, guids for the collections, etc).

We made the decision early on that no storage should be done by this component:

  • The "global" metadata should be stored by the application - but because it doesn't need to interpret the data, we do this with an opaque string (that is JSON, but the app should never assume or introspect that)

  • Each engine should store its own metadata, so we don't end up in the situation where, say, a database is moved between profiles causing the metadata to refer to a completely different data set. So each engine stores its metadata in the same database as the data itself, so if the database is moved or copied, the metadata comes with it)

Sync Implementation

The core implementation does all of the interaction with things like the tokenserver, the meta/global and info/collections collections, etc. It does all network interaction (ie, individual engines don't need to interact with the network at all), tracks things like whether the server is asking us to "backoff" due to operational concerns, manages encryption keys and the encryption itself, etc. The general flow of a sync - which interacts with the SyncEngine trait - is:

  • Does all pre-sync setup, such as checking meta/global, and whether the sync IDs on the server match the sync IDs we last saw (ie, to check whether something drastic has happened since we last synced)
  • Asks the engine about how to formulate the URL query params to obtain the records the engine cares about. In most cases, this will simply be "records since the last modified timestamp of the last sync".
  • Downloads and decrypts these records.
  • Passes these records to the engine for processing, and obtains records that should be uploaded to the server.
  • Encrypts these outgoing records and uploads them.
  • Tells the engine about the result of the upload (ie, the last-modified timestamp of the POST so it can be saved as engine metadata)

As above, the sync15 component really only deals with a single engine at a time. See the "sync manager" for how multiple engine are managed (but the tl;dr is that the "sync manager" leans on this very heavily, but knows about multiple engine and manages shared state)

The SyncEngine trait

The SyncEngine trait is where all logic specific to a collection lives. A "sync engine" implements (or provides) this trait to implement actual syncing.

For reasons, it actually lives in the sync-traits helper but for the purposes of this document, you should consider it as owned by sync15.

This is actually quite a simple trait - at a high level, it's really just concerned with:

  • Get or set some metadata the sync15 component has decided should be saved or fetched.

  • In a normal sync, take some "incoming" records, process them, and return the "outgoing" records we should send to the server.

  • In some edge-cases, either "wipe" (ie, actually delete everything, which almost never happens) or "reset" (ie, pretend this engine has never before been synced)

And that's it!