История

Manish Goregaokar cd6c97ea60 Bug 1336607 - Update vendored Rust sources to include geckolib dependencies; r=froydnj MozReview-Commit-ID: BOgu41N351y --HG-- rename : third_party/rust/serde/.cargo-checksum.json => third_party/rust/serde-0.8.23/.cargo-checksum.json rename : third_party/rust/serde/Cargo.toml => third_party/rust/serde-0.8.23/Cargo.toml rename : third_party/rust/serde/src/bytes.rs => third_party/rust/serde-0.8.23/src/bytes.rs rename : third_party/rust/serde/src/de/impls.rs => third_party/rust/serde-0.8.23/src/de/impls.rs rename : third_party/rust/serde/src/de/mod.rs => third_party/rust/serde-0.8.23/src/de/mod.rs rename : third_party/rust/serde/src/de/value.rs => third_party/rust/serde-0.8.23/src/de/value.rs rename : third_party/rust/serde/src/error.rs => third_party/rust/serde-0.8.23/src/error.rs rename : third_party/rust/serde/src/lib.rs => third_party/rust/serde-0.8.23/src/lib.rs rename : third_party/rust/serde/src/macros.rs => third_party/rust/serde-0.8.23/src/macros.rs rename : third_party/rust/serde/src/ser/impls.rs => third_party/rust/serde-0.8.23/src/ser/impls.rs rename : third_party/rust/serde/src/ser/mod.rs => third_party/rust/serde-0.8.23/src/ser/mod.rs extra : rebase_source : d015147c7a6c01b34c5a1abf035d71f8ecfe0c12		2017-02-10 12:19:18 -08:00
..
benches	…
src	…
.cargo-checksum.json	…
.cargo-ok	…
.gitignore	…
.travis.yml	…
COPYING	…
Cargo.toml	…
LICENSE-MIT	…
Makefile	…
README.md	…
UNLICENSE	…
ctags.rust	…
session.vim	…

README.md

utf8-ranges

This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte ranges. This is useful when constructing byte based automata from Unicode. Stated differently, this lets one embed UTF-8 decoding as part of one's automaton.

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/utf8-ranges

Example

This shows how to convert a scalar value range (e.g., the basic multilingual plane) to a sequence of byte based character classes.

extern crate utf8_ranges;

use utf8_ranges::Utf8Sequences;

fn main() {
    for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
        println!("{:?}", range);
    }
}

The output:

[0-7F]
[C2-DF][80-BF]
[E0][A0-BF][80-BF]
[E1-EC][80-BF][80-BF]
[ED][80-9F][80-BF]
[EE-EF][80-BF][80-BF]

These ranges can then be used to build an automaton. Namely:

Every arbitrary sequence of bytes matches exactly one of the sequences of ranges or none of them.
Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous encodings of surrogate codepoints in UTF-8 cannot match any of the byte ranges above.)