We want to keep the in-tree remote-settings dumps up to date. For this, a GitHub Actions workflow
seems to be the easiest way for now. The workflow will check once a week if there is new data (through
the `cargo remote-settings dump-sync` command). And commit any changes to a new branch and open a PR
against main.
This adds an index on `geonames_alternates(geoname_id)` plus some geonames and
weather benchmarks that I used to verify the change. The bug has some benchmark
numbers and discussion. The index speeds up weather queries by as much as ~98%,
going from hundreds of ms in the worst case to ~10ms.
Adding two commands to the cargo remote-settings CLI, `dump-sync` and `dump-get`. This allows
to download a local dump of a set of collections, and keep it up to date with the remote version.
It's also possible to open a PR right away to update this file in the app-services repo.
`cargo remote-settings dump-sync --create-pr` will create a local branch and push it to the repo.
When trying to `get_records` from the component, it first checks if the database has some,
if not, it checks if the collection exists and takes it from the local file.
This addresses the bug and also makes one other change: I added a new method to
`BenchmarkWithInput` called `global_input()` that returns some data that's then
passed by reference to the benchmark's iterations. That's a little better than
creating global data in benchmark constructors because it means only the global
data for the current benchmark is in memory instead of the global data for all
benchmarks in the group. In the case of query benchmarks, the global data is the
store, so we avoid having N stores in memory at once, N = the number of
benchmarks in the group. (I'm adding quite a few weather benchmarks.) It also
made sense to rename `generate_input()` to `iteration_input()` to emphasize that
that method is called on each iteration.
I'm open to reverting this other change though if you disagree.
The speedup between this and the main branch when running
`time cargo suggest-bench -- query/query` isn't dramatic but it's a little
better:
```
main branch: ~83s
this branch: ~68s (~18% improvement)
```
This implements processing the main parts of the search configuration JSON into the Rust structure using serde_json.
Currently the processing applies to:
* The identifier and base properties of the engine records.
* The default engine records.
As variant and environment handling is not yet implemented, the filter function will return all engines defined in the configuration.
This commit:
* Introduces an options argument to the Places
`noteHistoryMetadataObservation()` functions.
* Adds an `if_page_missing` option that specifies what to do if
the page for the observation doesn't exist in the Places database.
The default behavior is to ignore the observation.
We'll use this option to only insert new pages into Places for the
initial `DocumentType` observation, and ignore all other observations
if the page isn't in Places.
This is prep work for bug 1927210. There should be no functional
changes, just an extra `$let` block at the top. I want to get the
indentation change out of the way first before doing any other changes.
Multiple components in application-services need to filter for the exact firefox version.
Instead of duplicating the code, we move it into its own support crate, which other components
can now make use of.
In the original GeoNames data, each alternate name has an associated
`iso_language` that indicates what kind of name it is. In addition to actual
language codes, it can also be `"abbr"` for an abbreviation ("NY") and a handful
of other values that all mean an airport code ("PDX"). We need to keep track of
the `iso_language` per alternate so that we don't match on an abbreviation or
airport code when it's the only term in the query. (Or, we could store an
`is_abbreviation` in the RS data, but that's worse in the long term, and it's
also good to be consistent with the original data where feasible IMO.)
I modified the schema and RS structs accordingly. In `DownloadedGeoname`, I
changed the name of `alternate_names` to `alternate_names_2` because, while I'd
like to uplift this change, it's not clear there's enough time. So we may need
to support the old `alternate_names` on 133.
The other main change is to `filter_map_chunks()`: The filter-map function is
now passed the chunk size, and weather uses it to tell whether the chunk is the
last one in the query string.
We should ignore punctuation when matching weather suggestions, e.g., the comma
in: "weather waterloo, ia".
This modifies `fetch_weather_suggestions()` so it breaks the query string into
words, removes punctuation, and passes the words into `filter_map_chunks()`. I
modified `filter_map_chunks()` accordingly so it takes a slice directly instead
of taking a string and then breaking it into words.
We do a fair amount of lowercasing the query string in a bunch of places, the
Yelp functions also break the query into words, and we may want to remove
punctuation for other suggestion types too, so I started down the path of
creating a `NormalizedQuery` struct to cache all those things, but the patch
ended up being much larger of course, so I might try that in a separate PR.
I had to remove some of the derived implementations on `Geoname` (`Eq`, `Hash`,
`PartialEq`) due to the addition of these `f64`s. That had a couple of
consequences: I also had to remove `Geoname::geoname_type` since nothing uses it
anymore, and that meant I also had to remove `GeonameType::Other`. Equality
comparison and hashing are now based on `Geoname::geoname_id`, which is probably
how it should have worked all along.
Let's update these to match the JS client:
15245fdb8e/extension/experiments/remotesettings/api.js (L8-L11)
This is a breaking change to the public API, but this API is not exposed
via UniFFI and it's only used in 1 place:
50bf235b9a/components/nimbus/src/stateful/client/mod.rs (L24-L35).
AFAICT, the only affect this will have is that the error message will
change slightly.
Fixed an error when downloading attachments in the new client code. I
think I broke it with my last commit.
Also removed the `.unwrap()` calls. It shouldn't matter because there's
no way for `.join()` to fail, but it feels silly calling `unwrap()` from
a function that returns a result.
As of mozilla/application-services#6413, `MozillaTestServices`
outputs all generated sources into `MozillaTestServices/Generated`.
Let's ignore them, so that they don't show up as untracked in Git.