This used to work.
This still doesn't work fully, but it will have to do for now.
Change-Id: Idabfdf9b5ebffad098962b9d5c817e1febc09837
Reviewed-on: https://go-review.googlesource.com/9800
Reviewed-by: Nigel Tao <nigeltao@golang.org>
- CLDR now sports (optional) subversions.
- Added var for the common SerbianLatin locale.
- Allow for UNICODE_VERSION and CLDR_VERSION in internal/gen, as these
values cannot be passed on the command line to go generate.
Note that the display package simplified and reduced in size. This is
a consequence of that latest version of CLDR reducing the complexity
of the language hierarchy and streamlining some of the differences
between languages.
Change-Id: I086679a73815a7bb0aca099ad73ad43994b57633
Reviewed-on: https://go-review.googlesource.com/9625
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Language matching for collation and search is subtly different from
the usual matching algorithm.
Changed collate to use it and implemented constructor for search.
Change-Id: Id21400061668ae800d993b08bec4451388b1f82e
Reviewed-on: https://go-review.googlesource.com/9187
Reviewed-by: Nigel Tao <nigeltao@golang.org>
This brings this package in line with other packages in text and
makes it clearer that the functions in the package are transformers.
Note that the Bytes and String function check for errors. An error
will almost never occur, though. The only case where a Transformer
will result in an error is if the user passes a Transformer that may
generate an error to If.
Change-Id: I38d8bba4575f6b2955d4d687fc7a3a1fdd0f46b4
Reviewed-on: https://go-review.googlesource.com/9130
Reviewed-by: Nigel Tao <nigeltao@golang.org>
The current options did not allow the UTF-16 encoding to be specified
accoding to RFC recommendations (as well as common practice). Added
UseBOM policy to fix this.
Implemented BOM policy internally as a bitmask of options. This makes
the semantics of the three options clearer (at least to me) at first
glance.
UTF16 now also assigns MIB constants to UTF-16 encodings, if possible.
Note that some configurations map to the same MIB identifier. RFC 2781
has requirements and recommendations. Some of the "configurations"
are merely recommendations, so multiple configurations could match.
Change-Id: Id2da4293b054cdd774c876d288127259afccffd1
Reviewed-on: https://go-review.googlesource.com/8395
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Added Map, Remove, and If transforms and Set interface.
Change-Id: Ia11d06efdf8186028bc89e0090d8c9d9db13f0ea
Reviewed-on: https://go-review.googlesource.com/8336
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Files are copied from text/collate/colltab to text/internal/colltab.
This starts a new, internal colltab package. Collation is moving to fractional
weights. Almost everything in colltab needs to be rewritten as a result,
but the contraction trie can be reused.
The package is in text/internal, instead of text/collate/internal as other
packages may need to use collation tables (e.g. search).
Change-Id: I55fb3f439a4924e807bb301ad29f8b485d54bf1f
Reviewed-on: https://go-review.googlesource.com/4550
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Proposal for the two most common index types.
Change-Id: I7759cb9823ededd4a975b3beb51b76bb3dcf2ff3
Reviewed-on: https://go-review.googlesource.com/8393
Reviewed-by: Nigel Tao <nigeltao@golang.org>
The implementation of this package will largely be based on
collate/colltab.Weighter.
Change-Id: Iecb35ee3e058669980315269d96a93a3a7db283f
Reviewed-on: https://go-review.googlesource.com/7927
Reviewed-by: Rob Pike <r@golang.org>
- note that the decoupling also means packages can assign names to
encodings. Something that is otherwise hard or impossible to do.
This is what started off these changes in the first place.
Did a bit of refactoring. Adding Register methods to all types would
have been cumbersome. The new approach, with a shared internal type,
alleviates this burdon.
Adding the All slices make Encodings discoverable at the
package level. These variables, including their reference to all
tables can still be purged by the linker if they are not used.
(Needs to be verified.)
Change-Id: I0ade994523a0f4126521bfff9e65962ec589b377
Reviewed-on: https://go-review.googlesource.com/7677
Reviewed-by: Nigel Tao <nigeltao@golang.org>
- ExpectBOM switches between UTF-16BE and UTF-16LE.
- It does not accept no BOM (this is a bug, or at least there
should be a mode where this is allowed).
- It does not allow falling back to UTF-8 (and it shouldn't).
- BOMOverride switches to any of UTF-8, UTF-16LE or UTF-16BE
based on the BOM.
- In absence of a BOM, it will switch to ANY encoding, being
the fallback encoding provided.
Following guidlines from W3C.
Change-Id: I5eadbe4e5f74f21c8804529b9d2dd274a821882b
Reviewed-on: https://go-review.googlesource.com/8072
Reviewed-by: Nigel Tao <nigeltao@golang.org>
- Fixed wrong handling of single termination bytes
(did not make progress by not replacing it with U+FFFD and
returning 0 for nSrc.
- Fixed gobbling of proper Unicode when preceded by an unpaired
low surrogate.
- Identified a few more deviations from the standard and documented
them (they require a bit more involved changes).
- Added some more TODOs with suggestions for making the transforms
more conform.
Change-Id: I0fc20ab6bf3435e9c84ca447125495e9137e532d
Reviewed-on: https://go-review.googlesource.com/7992
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Added package identifier for association Encodings with canonical references
to CSS/CES standards.
Change-Id: Id7794a4d2b4ca168c68239b8cf9f259968b335a9
Reviewed-on: https://go-review.googlesource.com/7928
Reviewed-by: Nigel Tao <nigeltao@golang.org>
This is really just a one liner, but with lots of tables changing.
Change-Id: I1c57a013a8961e7e0d09ac0fb00d0678f79794e2
Reviewed-on: https://go-review.googlesource.com/7929
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Also added use case in display package.
Change-Id: Idf4bd518e9f5d8a9de6ac53b86dbf17165ed253b
Reviewed-on: https://go-review.googlesource.com/7792
Reviewed-by: Nigel Tao <nigeltao@golang.org>
This helps the repo maintainer upgrading packages to new versions
and also serves as documentation on dependencies.
@r: it would be convenient to also generate the core unicode tables
using this tool. It would be handy to slightly streamline the flags
of the unicode package for this. I can do this if changing some of
the semantics of the flags is acceptable.
Change-Id: Ib67161f7fe8724007d3c1479d5d173af65f646c7
Reviewed-on: https://go-review.googlesource.com/7495
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
See golang.org/cl/4131 for context.
Change-Id: I6ed8aefd50df5470b2016efc65602ea0395a93af
Reviewed-on: https://go-review.googlesource.com/7722
Reviewed-by: David Crawshaw <crawshaw@golang.org>
This also makes Fold, Narrow, and Widen more prominent in the godoc,
which is otherwise a bit of an issue with Variables.
Also added the type to the var declarations so that they get collated
at the Transformer definition.
Change-Id: I00822a49d7604600fa46df0bbda0086e1a82787c
Reviewed-on: https://go-review.googlesource.com/7542
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reduces running time for ASCII by over 90%.
Ensured test coverage is 100% for the transforms.
Change-Id: I4954819c31ef5d1fb4a572c6519d0bc1d44343ab
Reviewed-on: https://go-review.googlesource.com/7492
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Remarks:
- Removed hasMapping bit by adding a dummy entry to the inverseData
table. The means that a zero index now always means no mapping.
- Slightly changed documentation of width kinds: the reality is not
entirely consistent with the definition. Also, documented some
unexpected properties, even though not inconsistent with the
definitions.
- Narrow now also maps from Ambiguous runes.
Change-Id: I44eb320110e581cb764fb85234268bf13163a9e1
Reviewed-on: https://go-review.googlesource.com/7446
Reviewed-by: Nigel Tao <nigeltao@golang.org>
This will be used to add Wide and Narrow functionality. Needed to add an indirection to allow
for two mappings per entry. Alternatively one could have two tries. This would have resulted
in slightly more space usage (considering most blocks can be shared). This approach also
improves performance for mappings, as the EncodeRune is replaced with a simple byte copy.
Also simplified implementation a bit now we have some bits to spare.
Change-Id: I80d7f6cbcdbb8b0a7efcfab777ab9c88cbeecb91
Reviewed-on: https://go-review.googlesource.com/7180
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Handy, even though this package will undergo major overhaul soon.
- Changed *Version vars to funcs to work around initialization issues.
- Changed regtest to be a test using the --long flag like several other
tests in the text repo.
- Use -u-va-posix for POSIX variant. (us-EN-POSIX is not valid BCP 47;
a standard and correct way to represent this has only been converged
upon recently.)
Change-Id: I62ff17d50be946509487128e0c5dd3a50368e9fa
Reviewed-on: https://go-review.googlesource.com/7263
Reviewed-by: Nigel Tao <nigeltao@golang.org>
And prepare for more involved changes.
- Cleaned up imports.
- Made flags more consisten with other packages in text.
- Support old- and new-style collation default values.
- Use language.Compose.
Change-Id: Ieb5c63869f58606068c79a932a1530bd5dfdcdf5
Reviewed-on: https://go-review.googlesource.com/2755
Reviewed-by: Rob Pike <r@golang.org>
Added cldr package in the mix.
More usages of go/format converted.
Implemented variant of Nigel's suggestion to make a WriteFile func.
Change-Id: I73113bc0a9d6e350a86f20435215a054c772f9ed
Reviewed-on: https://go-review.googlesource.com/6923
Reviewed-by: Nigel Tao <nigeltao@golang.org>
This improves the throughput of the ASCII benchmark by over a factor of 10.
Using copy is about twice as fast than using a more trivial loop.
Change-Id: I270e3900faec353ded1fc066df9a26e1690f699c
Reviewed-on: https://go-review.googlesource.com/7128
Reviewed-by: Nigel Tao <nigeltao@golang.org>
I love go generate, but as it doesn't take command line flags, it is now
hard to select a local mirror without manually fiddling with commands.
Also, over time, all gen and maketable commands have gotten slightly
different command lines and local mirror structures. This change cleans
this up and makes it much easier to build tables.
Change-Id: I9d61f72447f43c45d52e1b5c2e3a6de7735687d4
Reviewed-on: https://go-review.googlesource.com/6591
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Need to update tests like unicode/norm to use them.
Change-Id: I45f30ac1c85b9f9c52c93582e6068a0f53a70ebc
Reviewed-on: https://go-review.googlesource.com/5581
Reviewed-by: Nigel Tao <nigeltao@golang.org>