Bug 1847869 - Update ICU4X document. r=TYLin,platform-i18n-reviewers

Differential Revision: https://phabricator.services.mozilla.com/D185771
This commit is contained in:
Makoto Kato 2023-08-10 06:51:38 +00:00
Родитель bc45488c4e
Коммит 778df10b83
3 изменённых файлов: 60 добавлений и 31 удалений

Просмотреть файл

@ -1,31 +0,0 @@
# Experimentation with ICU4X
We're currently conducting some experiments with using [ICU4X](https://github.com/unicode-org/icu4x) in Gecko rather than ICU4C. This file documents the procedures for building with ICU4X. The current implementation is incomplete, and hopefully we can begin to land code incrementally. This document will serve as a documentation for the status of this experimentation on what has been landed in tree.
## Enabling ICU4X
To enable the ICU4X experimentation:
1. Add the `ac_add_options --enable-icu4x` mozconfig.
2. Generate the locale data by running `./intl/update-icu4x.sh`.
3. Do a full build.
## Pieces of the ICU4X integration
#### Bundle the ICU4X data
The data is bundled directly into the binary in the `config/external/icu4x` directory. The "icu4xdata" is a separate library, which consists of an assembly file that directly includes the ICU4X locale binary data. Eventually this binary data will be directly accessed via ICU4X's StaticDataProvider.
The script `intl/update-icu4x.sh` can generate and update this binary data. At the time of this writing, this data is not checked in to source control since it's quite large with no ability to prune the data with only exporting certain keys. See [unicode-org/icu4x#192](https://github.com/unicode-org/icu4x/issues/192) for supporting splitting out keys.
#### Vendored ICU4X
At the time of this writing, ICU4X has been [added to the allow list for vendored code](https://searchfox.org/mozilla-central/rev/b24799980a929597dcc553cb0854aa6c960c82b5/python/mozbuild/mozbuild/vendor/vendor_rust.py#284-293) but the files still need to be vendored in. This is currently blocked on some large testdata being included. There is a tracking issue [unicode-org/icu4x#849](https://github.com/unicode-org/icu4x/issues/849) for the ICU4X integration, but we might find a way to prune the files on the Gecko vendoring code.
#### Static Data Provider
The data provider in ICU4X is the mechanism to load the locale-specific data. In [previous experiments](https://bugzilla.mozilla.org/show_bug.cgi?id=1713136) we used the `FsDataProvider` which hits the file system every time the APIs need any data. Future integrations should look into using the `StaticDataProvider`.
#### Building with FFIs
The final step in the ICU4X integration is to actually build and include the C++ FFI files. In the Summer 2021 experimentation we used manually built FFIs, but future experiments will rely on the [Diplomat](https://github.com/rust-diplomat/diplomat)-built FFIs.

59
intl/docs/icu4x.rst Normal file
Просмотреть файл

@ -0,0 +1,59 @@
#####
ICU4X
#####
This file documents the procedures for building with `ICU4X <https://github.com/unicode-org/icu4x>`__.
Enabling ICU4X
==============
#. Add the ``ac_add_options --enable-icu4x`` mozconfig. (This is the default)
#. Do a full build.
Updating the bundled ICU4X data
===============================
ICU4X data is bundled directly as a rust crate in the ``intl/icu_testdata``. Although this crate is the same name as ICU4X's ``icu_testdata`` crate, it has customized data for Gecko.
The script ``intl/update-icu4x.sh`` can generate and update this binary data. If you want to add the new data type, you modify this script and then, run it.
When using ICU4X 1.2.0 data, CLDR 43, and ICU4C 73.1, you have to run the following. The baked data is generated into ``intl/icu_testdata/data/baked``.
.. code:: bash
$ cd $(TOPSRCDIR)/intl
$ ./update-icu4x.sh https://github.com/unicode-org/icu4x.git icu@1.2.0 43.0.0 release-73-1
ICU4X 1.3 will have new feature "``complied_data``" that replaces customized ``icu_testdata``. After upgrading to 1.3, we update this document too.
Updating ICU4X
==============
If you update ICU4X crate into Gecko, you have to check ``Cargo.toml`` in Gecko's root directory. We might have some hacks to replace crate.io version with a custom version.
C/C++ FFI
=========
ICU4X provides ``icu_capi`` crate for C/C++ FFI. ``mozilla::intl::GetDataProvider`` returns ``capi::ICU4XDataProvider`` of ``icu_capi``. It can return valid data until shutting down.
Accessing the data provider from Rust
=====================================
``icu_testdata::any()`` returns any data provider.
If you want to use unstable data provider from Rust, you should add it to ``intl/icu_testdata/src/lib.rs`` like the following, then use ``icu_testdata::unstable()``.
.. code:: rust
pub fn unstable() -> UnstableDataProvider {
UnstableDataProvider
}
Adding new ICU4X features to Gecko
==================================
To reduce build time and binary size, embedded ICU4X in Gecko is minimal configuration. If you have to add new features, you have to update some files.
#. Adding the feature to ``icu_capi`` entry in ``js/src/rust/shared/Cargo.toml``.
#. Modify ``intl/update-icu4x.sh`` to add generated ICU4X data.
#. Modify ``intl/icu_testdata/Cargo.toml`` to add the data for enabled feature.

Просмотреть файл

@ -24,3 +24,4 @@ use Mozilla's I18n APIs.
locale locale
dataintl dataintl
icu icu
icu4x