Граф коммитов

161 Коммитов

Автор SHA1 Сообщение Дата
Marcel van Lohuizen c93e7c9fff collate/tools/colcmp: fix build breakage for darwin.
This used to work.
This still doesn't work fully, but it will have to do for now.

Change-Id: Idabfdf9b5ebffad098962b9d5c817e1febc09837
Reviewed-on: https://go-review.googlesource.com/9800
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-05-07 01:02:14 +00:00
Marcel van Lohuizen 6c3b324efd collate: fix test breakage
Silly me.

Change-Id: Id90499214e807782f6700eedef53e76c3603656f
Reviewed-on: https://go-review.googlesource.com/9750
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-05-06 00:53:39 +00:00
Marcel van Lohuizen cee5b80e82 text: upgrade to CLDR 27.0.1
- CLDR now sports (optional) subversions.
- Added var for the common SerbianLatin locale.
- Allow for UNICODE_VERSION and CLDR_VERSION in internal/gen, as these
  values cannot be passed on the command line to go generate.

Note that the display package simplified and reduced in size. This is
a consequence of that latest version of CLDR reducing the complexity
of the language hierarchy and streamlining some of the differences
between languages.

Change-Id: I086679a73815a7bb0aca099ad73ad43994b57633
Reviewed-on: https://go-review.googlesource.com/9625
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-05-05 17:57:39 +00:00
Marcel van Lohuizen af4c2d73d0 search: added language matcher for collation and search.
Language matching for collation and search is subtly different from
the usual matching algorithm.

Changed collate to use it and implemented constructor for search.

Change-Id: Id21400061668ae800d993b08bec4451388b1f82e
Reviewed-on: https://go-review.googlesource.com/9187
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-23 08:23:31 +00:00
Marcel van Lohuizen 583b6acb6c search: generate search tables.
Modified collation maketables to generate search tables.

Change-Id: Ia7c3dbc980bc0daca0267c054401095b852917ab
Reviewed-on: https://go-review.googlesource.com/9131
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>
2015-04-21 06:26:33 +00:00
Marcel van Lohuizen f7bc91ea2d runes: Added Transformer wrapper.
This brings this package in line with other packages in text and
makes it clearer that the functions in the package are transformers.

Note that the Bytes and String function check for errors. An error
will almost never occur, though. The only case where a Transformer
will result in an error is if the user passes a Transformer that may
generate an error to If.

Change-Id: I38d8bba4575f6b2955d4d687fc7a3a1fdd0f46b4
Reviewed-on: https://go-review.googlesource.com/9130
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-21 06:12:18 +00:00
Marcel van Lohuizen 8d2a9d0829 encoding/unicode: new BOM policy for standard UTF-16.
The current options did not allow the UTF-16 encoding to be specified
accoding to RFC recommendations (as well as common practice). Added
UseBOM policy to fix this.

Implemented BOM policy internally as a bitmask of options. This makes
the semantics of the three options clearer (at least to me) at first
glance.

UTF16 now also assigns MIB constants to UTF-16 encodings, if possible.
Note that some configurations map to the same MIB identifier. RFC 2781
has requirements and recommendations. Some of the "configurations"
are merely recommendations, so multiple configurations could match.

Change-Id: Id2da4293b054cdd774c876d288127259afccffd1
Reviewed-on: https://go-review.googlesource.com/8395
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-20 07:59:04 +00:00
Marcel van Lohuizen c3788f3665 runes: new runes package.
Added Map, Remove, and If transforms and Set interface.

Change-Id: Ia11d06efdf8186028bc89e0090d8c9d9db13f0ea
Reviewed-on: https://go-review.googlesource.com/8336
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-20 07:21:46 +00:00
Marcel van Lohuizen fe7970425f internal/colltab: some cleanup suggested earlier by Nigel.
Change-Id: Iffc9f2267e5a2a76e1fd41efdc7d14f57fa0e73b
Reviewed-on: https://go-review.googlesource.com/8967
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-20 07:19:49 +00:00
Marcel van Lohuizen 4a2c3890cd text/internal/colltab: copy contract*.go files
Files are copied from text/collate/colltab to text/internal/colltab.
This starts a new, internal colltab package. Collation is moving to fractional
weights. Almost everything in colltab needs to be rewritten as a result,
but the contraction trie can be reused.

The package is in text/internal, instead of text/collate/internal as other
packages may need to use collation tables (e.g. search).

Change-Id: I55fb3f439a4924e807bb301ad29f8b485d54bf1f
Reviewed-on: https://go-review.googlesource.com/4550
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-04-17 09:18:23 +00:00
Marcel van Lohuizen 88b9f7e074 encoding/ianaindex|htmlindex: API proposal for indexes.
Proposal for the two most common index types.

Change-Id: I7759cb9823ededd4a975b3beb51b76bb3dcf2ff3
Reviewed-on: https://go-review.googlesource.com/8393
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-17 09:16:00 +00:00
Marcel van Lohuizen d48eb58d19 text/encoding/charmap: a few bug fixes in global vars
- need pointer references to implement interface
- wrong constant assignments.

Change-Id: I3e5c77b70ea57ea7288c940c46e1c5db80047810
Reviewed-on: https://go-review.googlesource.com/8253
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-04-01 06:09:25 +00:00
Marcel van Lohuizen a9f4d1a427 text/search: api proposal
The implementation of this package will largely be based on
collate/colltab.Weighter.

Change-Id: Iecb35ee3e058669980315269d96a93a3a7db283f
Reviewed-on: https://go-review.googlesource.com/7927
Reviewed-by: Rob Pike <r@golang.org>
2015-03-30 12:23:53 +00:00
Marcel van Lohuizen 2076e9cab4 text/encoding: add MIB types to encodings.
- note that the decoupling also means packages can assign names to
  encodings. Something that is otherwise hard or impossible to do.
  This is what started off these changes in the first place.

Did a bit of refactoring. Adding Register methods to all types would
have been cumbersome. The new approach, with a shared internal type,
alleviates this burdon.

Adding the All slices make Encodings discoverable at the
package level. These variables, including their reference to all
tables can still be purged by the linker if they are not used.
(Needs to be verified.)

Change-Id: I0ade994523a0f4126521bfff9e65962ec589b377
Reviewed-on: https://go-review.googlesource.com/7677
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-27 03:07:07 +00:00
Marcel van Lohuizen 7be3218a83 text/encoding/unicode: added BOMOverride.
- ExpectBOM switches between UTF-16BE and UTF-16LE.
  - It does not accept no BOM (this is a bug, or at least there
    should be a mode where this is allowed).
  - It does not allow falling back to UTF-8 (and it shouldn't).

- BOMOverride switches to any of UTF-8, UTF-16LE or UTF-16BE
  based on the BOM.
  - In absence of a BOM, it will switch to ANY encoding, being
    the fallback encoding provided.

Following guidlines from W3C.

Change-Id: I5eadbe4e5f74f21c8804529b9d2dd274a821882b
Reviewed-on: https://go-review.googlesource.com/8072
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-27 02:59:51 +00:00
Marcel van Lohuizen 5f741289c4 text/encoding/unicode: added tests and fixed bugs.
- Fixed wrong handling of single termination bytes
  (did not make progress by not replacing it with U+FFFD and
  returning 0 for nSrc.

- Fixed gobbling of proper Unicode when preceded by an unpaired
  low surrogate.

- Identified a few more deviations from the standard and documented
  them (they require a bit more involved changes).

- Added some more TODOs with suggestions for making the transforms
  more conform.

Change-Id: I0fc20ab6bf3435e9c84ca447125495e9137e532d
Reviewed-on: https://go-review.googlesource.com/7992
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-25 06:02:29 +00:00
Marcel van Lohuizen 000f7931a1 text/encoding/internal/identifier:
Added package identifier for association Encodings with canonical references
to CSS/CES standards.

Change-Id: Id7794a4d2b4ca168c68239b8cf9f259968b335a9
Reviewed-on: https://go-review.googlesource.com/7928
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-24 10:30:36 +00:00
Marcel van Lohuizen 017c4054c0 text/language: strip unneeded struct type qualifiers.
This is really just a one liner, but with lots of tables changing.

Change-Id: I1c57a013a8961e7e0d09ac0fb00d0678f79794e2
Reviewed-on: https://go-review.googlesource.com/7929
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-23 23:51:59 +00:00
Marcel van Lohuizen c6e0c33b23 text/internal/testtext: added code size measurer.
Also added use case in display package.

Change-Id: Idf4bd518e9f5d8a9de6ac53b86dbf17165ed253b
Reviewed-on: https://go-review.googlesource.com/7792
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-20 06:47:38 +00:00
Marcel van Lohuizen b6e95e7b86 text: added repository-wide generate tool.
This helps the repo maintainer upgrading packages to new versions
and also serves as documentation on dependencies.

@r: it would be convenient to also generate the core unicode tables
using this tool. It would be handy to slightly streamline the flags
of the unicode package for this. I can do this if changing some of
the semantics of the flags is acceptable.

Change-Id: Ib67161f7fe8724007d3c1479d5d173af65f646c7
Reviewed-on: https://go-review.googlesource.com/7495
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-03-20 06:39:55 +00:00
Josh Bleecher Snyder c6bc7e82e2 text: add codereview.cfg
See golang.org/cl/4131 for context.

Change-Id: I6ed8aefd50df5470b2016efc65602ea0395a93af
Reviewed-on: https://go-review.googlesource.com/7722
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-03-18 17:03:13 +00:00
Marcel van Lohuizen 26df76be81 text/internal/gen: generalized Open (needed for w3c data).
Change-Id: I6fe6e9983921a593eb706f829184b5f39b8246fe
Reviewed-on: https://go-review.googlesource.com/7671
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-17 05:10:38 +00:00
Marcel van Lohuizen 313fa8d603 text/width: added examples/ godoc improvements.
This also makes Fold, Narrow, and Widen more prominent in the godoc,
which is otherwise a bit of an issue with Variables.

Also added the type to the var declarations so that they get collated
at the Transformer definition.

Change-Id: I00822a49d7604600fa46df0bbda0086e1a82787c
Reviewed-on: https://go-review.googlesource.com/7542
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-16 03:24:05 +00:00
Marcel van Lohuizen cc9d297a7d text/width: added ASCII fast path for Narrow.
Reduces running time for ASCII by over 90%.
Ensured test coverage is 100% for the transforms.

Change-Id: I4954819c31ef5d1fb4a572c6519d0bc1d44343ab
Reviewed-on: https://go-review.googlesource.com/7492
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-13 04:02:53 +00:00
Marcel van Lohuizen 83ae7f6abc text/width: added support for narrow and wide mappings.
Remarks:
- Removed hasMapping bit by adding a dummy entry to the inverseData
  table. The means that a zero index now always means no mapping.
- Slightly changed documentation of width kinds: the reality is not
  entirely consistent with the definition. Also, documented some
  unexpected properties, even though not inconsistent with the
  definitions.
- Narrow now also maps from Ambiguous runes.

Change-Id: I44eb320110e581cb764fb85234268bf13163a9e1
Reviewed-on: https://go-review.googlesource.com/7446
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-13 02:06:02 +00:00
Marcel van Lohuizen f54a5874ba text/width: added inverse mapping
This will be used to add Wide and Narrow functionality. Needed to add an indirection to allow
for two mappings per entry. Alternatively one could have two tries. This would have resulted
in slightly more space usage (considering most blocks can be shared). This approach also
improves performance for mappings, as the EncodeRune is replaced with a simple byte copy.

Also simplified implementation a bit now we have some bits to spare.

Change-Id: I80d7f6cbcdbb8b0a7efcfab777ab9c88cbeecb91
Reviewed-on: https://go-review.googlesource.com/7180
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-12 01:57:53 +00:00
Marcel van Lohuizen d1927f6997 text/collate: use internal/gen
Handy, even though this package will undergo major overhaul soon.
- Changed *Version vars to funcs to work around initialization issues.
- Changed regtest to be a test using the --long flag like several other
tests in the text repo.
- Use -u-va-posix for POSIX variant. (us-EN-POSIX is not valid BCP 47;
a standard and correct way to represent this has only been converged
upon recently.)

Change-Id: I62ff17d50be946509487128e0c5dd3a50368e9fa
Reviewed-on: https://go-review.googlesource.com/7263
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-10 07:00:45 +00:00
Marcel van Lohuizen 8b847a42ba text/collate: support new-style default values.
And prepare for more involved changes.
- Cleaned up imports.
- Made flags more consisten with other packages in text.
- Support old- and new-style collation default values.
- Use language.Compose.

Change-Id: Ieb5c63869f58606068c79a932a1530bd5dfdcdf5
Reviewed-on: https://go-review.googlesource.com/2755
Reviewed-by: Rob Pike <r@golang.org>
2015-03-10 02:43:05 +00:00
Marcel van Lohuizen 7923bc82a1 text/internal/gen: more cleanup
Added cldr package in the mix.
More usages of go/format converted.
Implemented variant of Nigel's suggestion to make a WriteFile func.

Change-Id: I73113bc0a9d6e350a86f20435215a054c772f9ed
Reviewed-on: https://go-review.googlesource.com/6923
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-09 05:49:43 +00:00
Marcel van Lohuizen c4b40d98a1 text/width: added ASCII fast path.
This improves the throughput of the ASCII benchmark by over a factor of 10.
Using copy is about twice as fast than using a more trivial loop.

Change-Id: I270e3900faec353ded1fc066df9a26e1690f699c
Reviewed-on: https://go-review.googlesource.com/7128
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-09 04:13:28 +00:00
Marcel van Lohuizen f3f2426fe0 text/language: updated tables with fresh IANA data.
Change-Id: I7807c1f28528a946f124b97a136b0dbd352deba5
Reviewed-on: https://go-review.googlesource.com/6922
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-06 08:47:21 +00:00
Marcel van Lohuizen 731229fe6e text/internal/gen: factored out common generation code.
I love go generate, but as it doesn't take command line flags, it is now
hard to select a local mirror without manually fiddling with commands.
Also, over time, all gen and maketable commands have gotten slightly
different command lines and local mirror structures. This change cleans
this up and makes it much easier to build tables.

Change-Id: I9d61f72447f43c45d52e1b5c2e3a6de7735687d4
Reviewed-on: https://go-review.googlesource.com/6591
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-05 15:18:24 +00:00
Marcel van Lohuizen c92eb3cd6e text/width: first stab at API and Fold transform.
Change-Id: I6977fce33d07c19cecca7af87a2eb3a2acc49831
Reviewed-on: https://go-review.googlesource.com/5580
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-02 10:45:06 +00:00
Marcel van Lohuizen 07e167dcad text/internal/testtext: added often-used constants in tests.
Need to update tests like unicode/norm to use them.

Change-Id: I45f30ac1c85b9f9c52c93582e6068a0f53a70ebc
Reviewed-on: https://go-review.googlesource.com/5581
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-02-25 08:08:54 +00:00
Andrew Gerrand dadbda34c6 doc: add CONTRIBUTING.md
Change-Id: Icd4a3c290b5fa9358042292db4f46a5e151e48c2
Reviewed-on: https://go-review.googlesource.com/5411
Reviewed-by: Minux Ma <minux@golang.org>
2015-02-20 05:00:44 +00:00
Marcel van Lohuizen ed53bb6dd6 text/width: tables for new package.
Change-Id: Id122f56fc4400dedd008b0f6db184559b8f57d90
Reviewed-on: https://go-review.googlesource.com/4730
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-02-18 09:47:25 +00:00
Marcel van Lohuizen 8ec34a0272 text/cases: API proposal for case folding.
Change-Id: I2f93b7401f63732c5c9978c6422f156fe6925d5f
Reviewed-on: https://go-review.googlesource.com/2754
Reviewed-by: Rob Pike <r@golang.org>
2015-02-11 22:11:27 +00:00
Marcel van Lohuizen 68280a7edb text/unicode/rangetable: added func Merge.
Change-Id: Ife6116a2b24502902b692cbed4369b1033915452
Reviewed-on: https://go-review.googlesource.com/4552
Reviewed-by: Rob Pike <r@golang.org>
2015-02-11 22:04:30 +00:00
Marcel van Lohuizen aa6e509d01 text/internal/colltab: added package commment
Change-Id: I6963165a20ac33bf01de01dd38d785bc1d49eb29
Reviewed-on: https://go-review.googlesource.com/4551
Reviewed-by: Rob Pike <r@golang.org>
2015-02-11 22:01:43 +00:00
Marcel van Lohuizen bfad311ce9 text/collate: move and rename
- colelem* -> collelem*
- Weigher -> Weighter

Change-Id: Ibe6586161e5fda993d99007ed5479a375e7b0b1d
Reviewed-on: https://go-review.googlesource.com/2756
Reviewed-by: Rob Pike <r@golang.org>
2015-01-14 22:53:35 +00:00
Marcel van Lohuizen cf8f4cf170 text/collate: added support for numeric sorting.
Change-Id: I724b6cdd133ab8d8ca07309baa24ef9010ff7458
Reviewed-on: https://go-review.googlesource.com/1300
Reviewed-by: Rob Pike <r@golang.org>
2015-01-14 15:36:27 +00:00
Alex Brainman c980adc4a8 text: add .gitattributes (fixes windows build)
Fixes #9281

Change-Id: If3aad25a9bc24f04d604a534bef4f2e08ce00ffc
Reviewed-on: https://go-review.googlesource.com/2070
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2014-12-23 06:16:57 +00:00
David Symonds ef0bf1da95 text: fix a few import comments in code generators.
Change-Id: I6d660ad7faf3567a6595c1b643b5874bb224dfcd
Reviewed-on: https://go-review.googlesource.com/1294
Reviewed-by: Andrew Gerrand <adg@golang.org>
2014-12-10 04:18:48 +00:00
David Symonds f3d52e0263 text: add import comments.
Change-Id: Ifdb6e8b968ae432c1e4c703c25ae034382c94cc9
Reviewed-on: https://go-review.googlesource.com/1243
Reviewed-by: Andrew Gerrand <adg@golang.org>
2014-12-09 22:46:39 +00:00
David Symonds 985ee5acfa remove codereview.cfg. 2014-12-08 10:59:03 +11:00
Marcel van Lohuizen b8ac178645 text: bunch of stylistic improvements. One added comment.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/179710043
2014-12-03 10:04:23 +01:00
Marcel van Lohuizen 4a20eb8c43 text/display: vetted some test failures due to data changes.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/184010043
2014-12-02 13:53:11 +01:00
Marcel van Lohuizen 6cb0e8bd63 text/cases: fixed import path that crossed.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/180450043
2014-12-02 13:52:48 +01:00
Marcel van Lohuizen 29c876dd07 text/transform: fixed test build
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/176510043
2014-12-02 13:52:02 +01:00
Marcel van Lohuizen bb537491b9 text/cldr: added parser for new Collation Rule Syntax (CLDR 25 onwards)
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/181730043
2014-11-24 11:46:39 +01:00