Looking up the BidiClass property via icu_properties is more efficient than the
built-in version provided in unicode-bidi.
Differential Revision: https://phabricator.services.mozilla.com/D198449
This will cause `./mach vendor rust` to pull it into the tree.
Note that actually running `./mach vendor rust` requires the file-size limit in vendor_rust.py
to be temporarily raised (from 100K to 300K is enough) because of large files in the icu_properties
crate. This does not contribute significant bloat to the binary, though, because only actually-used
data ends up in the build.
Differential Revision: https://phabricator.services.mozilla.com/D198446
unicode-bidi more closely follows the UAX#9 recommendations for processing embedding controls
if they're not actually removed from the text; but this is not an actual spec requirement,
so either behavior is acceptable.
Differential Revision: https://phabricator.services.mozilla.com/D197891
Rather than Diplomat-generated, this is hand-written to provide just the functionality needed in intl::Bidi,
to minimize the amount of glue code and ensure a close match to Gecko requirements.
Differential Revision: https://phabricator.services.mozilla.com/D197889
Looking up the BidiClass property via icu_properties is more efficient than the
built-in version provided in unicode-bidi.
Differential Revision: https://phabricator.services.mozilla.com/D198449
This will cause `./mach vendor rust` to pull it into the tree.
Note that actually running `./mach vendor rust` requires the file-size limit in vendor_rust.py
to be temporarily raised (from 100K to 300K is enough) because of large files in the icu_properties
crate. This does not contribute significant bloat to the binary, though, because only actually-used
data ends up in the build.
Differential Revision: https://phabricator.services.mozilla.com/D198446
unicode-bidi more closely follows the UAX#9 recommendations for processing embedding controls
if they're not actually removed from the text; but this is not an actual spec requirement,
so either behavior is acceptable.
Differential Revision: https://phabricator.services.mozilla.com/D197891
Rather than Diplomat-generated, this is hand-written to provide just the functionality needed in intl::Bidi,
to minimize the amount of glue code and ensure a close match to Gecko requirements.
Differential Revision: https://phabricator.services.mozilla.com/D197889
When reloading https://en.wikipedia.org/wiki/Barack_Obama that is used by
browsertime benchmark, `CountGraphemeClusters` is called around 3000 times.
But half calls are that `aText` is empty.
So if we add fast path for empty text, we can avoid a lot of heap allocations
of `ICU4XGraphemeClusterBreakIteratorUtf16`.
Differential Revision: https://phabricator.services.mozilla.com/D196008
When running wikipedia's page by browsertime benchmark, 0.5%-1% calls of
`LineBreaker::ComputeBreakPositions` is that aLength is 1. If this is 1,
we only set SOT break in ICU4X's line segmenter.
So we can add a fast-path for this situation. `ICU4XLineBreakIterator*`
always allocate rust heap, so we can reduce a few heap allocation costs.
Differential Revision: https://phabricator.services.mozilla.com/D195523
This file shows up after running `update-icu4x.sh`. It is part of the downloaded
`icu_capi` crate. We should check it in for completeness even if it is not used.
Differential Revision: https://phabricator.services.mozilla.com/D195591
Although since `icu_capi` uses weak dependency syntax, cargo vendor doesn't
recognize it. So this command will copy unnecessary crates. To avoid it, I
would like to use modified version of icu_capi.
And this has another issue. `icu_capi`'s C++ headers isn't compatible with
clang [*1]. So we need the workaround for it.
ICU4X 1.3 has another change for data provider with `icu_capi`.
From ICU4X 1.3, there are new `icu_*_data` crates to custom data file, instead
of `icu_testdata`. So we have to add each data crate if using `icu_capi`.
*1 https://github.com/llvm/llvm-project/issues/70162
Differential Revision: https://phabricator.services.mozilla.com/D192902
The matching behavior implemented in bug 1857742 did not quite follow the spec,
particularly with regard to language *ranges* (as used in the :lang() pseudo)
that are not themselves valid language *tags*.
This updates the LangTagCompare function to more correctly follow the BCP4647
"Extended Filtering" algorithm, and adjusts the relevant WPT tests (originally
from bug 1857742) to reflect the corrected behavior.
Differential Revision: https://phabricator.services.mozilla.com/D194054