Added fakespot query benchmark. Generalized some things so that code
can be shared between the ingest and query benchmarks and also so that
it will be easy to add more query bechmarks.
Added Fakespot ingest benchmarks. Re-organized the list so that the
ingest and ingest-again benchmarks are next to each other. This makes
it easier to add benchmarks for a new suggestion type.
Added `bench=false` to Cargo.toml. This avoids CLI errors when
criterion-specific arguments are passed. For details, see
https://bheisler.github.io/criterion.rs/book/faq.html#cargo-bench-gives-unrecognized-option-errors-for-valid-command-line-options
This commit demotes the `TabsNotClosed` error case from a
`DeviceCommandError` variant, to a `CloseTabsResult` variant
specifically for "close tabs".
`FirefoxAccount::close_tabs` only throws `FxaError`s for
account-related errors, as before. Successes and partial successes
are reported via `CloseTabsResult`.
This commit:
* Introduces a new `DeviceCommandError` type for
device command-related operations.
* Adds a `DeviceCommandError::TabsNotClosed` error variant, which
includes the URLs of any tabs that couldn't be sent in a
"close tabs" command.
* Reworks `FirefoxAccount::close_tabs` to pack URLs that exceed the
maximum payload size (16 KB by default) into multiple
device commands.
Fetch the icons using a separate query than manually joined them with
the suggestion data. This has the advantage of not needing to update
the schema.
Also updated the suggest CLI a bit so that I could use it to test that
the code was working in practice.
The plan is to use the current code for the experiment, so we don't need
a feature flag anymore. The main point of the flag was to avoid
creating multiple migrations as the schema changed, but it looks like
the schema will be stable for a while.
`f64::{min, max}_value()` is deprecated in newer versions of Rust,
and the previous range check allowed millisecond values that can't be
represented in an `f64` without loss of precision.
Create a FTS table for fakespot and populate it during ingestion.
Updated the `SuggestQuery` code to generate a string to use with the FTS
MATCH operator. I couldn't get DELETE triggers working for this, so I
made it so we delete the FTS data in `drop_suggestions`.
Also:
- Make the `clear_database` function delete the FTS data
- Added some more tests for the fakespot data
- Updated the `assert_schema_matches_new_database()` function to also
check the triggers. Followed the rule of 3 and refactored things now
we're checking tables, indexes, and triggers.
The tabs and clients engines use the same logic to truncate a list of
items to fit within a size limit when serialized to JSON, though
they handle failures differently: the tabs engine sends the complete
list of tabs; the clients engine clears the list of commands.
We'll reuse the same core logic to pack URLs for closed synced tabs
into multiple commands, so let's factor it out into a shared
support crate.
Added a method on `SuggestStoreBuilder` to load an SQLite3 extension.
This will be needed to enable FTS5 on Desktop. It shouldn't be needed
on Android/iOS, since FTS5 is normally loaded by default.
I tested this with a simple .so file that I downloaded myself. However,
I couldn't figure out a good way to check in this code.
Move some common functionality to `ingest_records_by_type()`:
- Updating the last stored modified time
- Dropping the old suggestion data for a record
This simplifies the code since we perform these tasks all in one place.
It also means we can remove the `ingest_record()` function and one level
of nested callbacks.
I think this way is more reliable, since it's always unconditionally
happening in the top-level function. For example, if a record gets
replaced with a new schema that we can't parse, we now delete the old
suggestions rather than keeping them in the DB like before.
We've seen shutdown hangs in the `drop_suggestions` function because
it's running slowly
(https://bugzilla.mozilla.org/show_bug.cgi?id=1895110). That function
has been speed up significantly, but it still would be good to check for
interruption to avoid the possibilities.
Added the `suggest-cli` crate that implements a simple clap CLI to
ingest and query suggestions. This can be run using `cargo suggest`.
My plan is to use this to test fakespot blocklist support.
Defined the `fakespot` feature and put most of the changes behind that
feature flag.
Updated ingest code to use 2 remote settings clients, one for fakespot
and one for everything else. I think I like this setup going forward,
since it makes it easy for new suggestions to have their own colleciton
if we decide it makes sense.
Implemented some very simple matching logic, the next step is for the
SNG team to update this with the real matching logic. I tried to leave
`FAKESPOT-TODO` comment breadcrumbs where they need to update the code.
I just realized that the JS consumer wants to set the bucket name. They
currently use the `remote_settings_config` method to do this, but it's
deprecated so we need to give them an alternative.
The main issue was that we had foreign keys without indexes for the
child key columns like SQLite suggests in
https://www.sqlite.org/foreignkeys.html.
My first try was to add the indexes and ran the benchmarks on my
machine:
- `ingest-again-amp-wikipedia` now completed successfully and returned
a time of ~1000ms.
- `ingest-amp-wikipedia` (the test for ingesting the first time) increased
from about ~450ms to ~575ms. It seems that adding these keys is
non-trivial, but there's no other way to make reingestion work at a
reasonable speed without a major rework of the system.
The second try was to remove the foreign keys altogether, which lead to
an even bigger speedup:
- `ingest-again-amp-wikipedia` was now ~600ms
- `ingest-amp-wikipedia` was now ~400ms, even faster than the
baseline.
- Database size staid the same as in `main`, whereas the previous
change increased it from 19m to 26m.
We discussed the pros-and-cons of this in the
https://github.com/mozilla/application-services/pull/6273 and decided to
go with dropping the foreign keys. This represents some loss of safety,
but it's not much because:
- Ingestion is deterministic, if we set up the relationships correctly
for one user then we set them up correctly for all users.
- We regularly throw away all data in the tables and re-create them.
AMP/Wikipedia dominates the total ingestion time, which is why I
focused on this benchmark. The second slowest benchmark is amp-mobile
which is doing very similar work, but with half the data and finishing
in about half the time. After that, all other benchmarks were < 15ms.
Updated the `assert_schema_matches_new_database` method to also check
that the indexes match.
Added new `ingest-again-*` benchmarks that test re-ingesting all
suggestions with a full database. In particular, this tests the
`drop_suggestions` method that's causing us issues.
Right now the `ingest-again-amp-wikipedia` is extremely slow. I've
always had to interrupt it before it finishes.
Also, updated `cargo suggest-debug-ingestion-sizes` to print out the
total database size. I want to use this to check how new indexes affect
it.