Граф коммитов

19 Коммитов

Автор SHA1 Сообщение Дата
Marcel van Lohuizen 4b139bd6df language: remove Currency type.
Users should use Currency type defined in currency package instead.

Also removed Currency from Coverage type. Note that Currency coverage
had not been implemented yet, so we're unlikely to break anyone here.

Change-Id: I094d3bbcec3a05481d627e497daf428191ad2eea
Reviewed-on: https://go-review.googlesource.com/16868
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-11-16 20:14:01 +00:00
Marcel van Lohuizen d91e619555 language: pass user options in matched tag
Applications should always use a single matched Tag per user session
for selecting the display language and language-specific services.
So far, the tag returned from a match was the original supported tag
and as such did not include any user options specified in the -u
extension. It was hard for a user to add this as well, as the Match
method does not return the preferred tag that resulted in the match.

The Match method now copies in the -u extension of the user's tag.
It is assumed that the set of supported tags do not specify a -u
section, which is fair to assume as these are typically needed only
in the next phase of a match.

Also:
- updated documentation
- changed interpretation of grandfathered tag en-GB-oed.

Change-Id: Icb6be5735fb8d7bbdee3469b6a691c7ee74eb16a
Reviewed-on: https://go-review.googlesource.com/16560
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-11-04 15:01:04 +00:00
Marcel van Lohuizen 3ce2b8897f language: user -u-va-posix instead of -x-posix.
Use CLDR variant. Collate already used this. This provides
a little bit more structure and plays nice with collation.

Note that maketables.go contained a bug that happened to work
given the current data in CLDR.

Change-Id: I40d07d7cd2a8615bbe0e223a074df5b701b7b833
Reviewed-on: https://go-review.googlesource.com/14805
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-09-22 06:45:18 +00:00
Marcel van Lohuizen b4ffa8e2f1 language: support en_US_POSIX
Also make grandfathered tags case-insensitve, as they should be and
support the legacy en_US_POSIX. The latter is widely used in CLDR
so not supporting it was kind of a pain.

Note that checking for grandfathered tags is in the critical path.

Change-Id: Ie635ecdf56e5e9de95c9c451bc7afb6985587d66
Reviewed-on: https://go-review.googlesource.com/14555
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-09-15 11:33:57 +00:00
Marcel van Lohuizen 505f8b49cc internal/tag: factored out tag.Index
tag.Index will also be used by currency and possibly other packages.

gen now also writes types (if there is no stutter) for vars and const.
This allows strings to be of a specific type.

index data in language is now written as consts. No performance impact
was measured.

Change-Id: I1b63a5bc5e54264acd825000df5af67f8ae759a6
Reviewed-on: https://go-review.googlesource.com/13922
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-08-27 18:55:02 +00:00
Marcel van Lohuizen 6af233d0d5 text/language: upgraded to CLDR 26.
- CLDR 26 introduced a new type of legacy language alias from ISO-639.
	  Instead of having three different tables of language replacements, there
        is now just one where each entry is marked with a replacement type.
      - Moved to using go generate.
      - Introduced code file shared by maketables and package to make the use
        of constants a bit more manageable. Expect more to move there over time.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/166640043
2014-11-19 13:06:59 +01:00
Marcel van Lohuizen 0106b12141 go.text/language: map grandfathered tags to ICU values, instead of CLDR.
The mapping is now hand-coded. This is fine as the table is small and
not likely to change.

ICU defines mappings for all grandfathered tags, whereas CLDR does not.
Furthermore, ICU maps zh-guoyu to cmn, instead of zh. cmn is slightly more
informative and arguably preferable. Note that the Matcher will map
cmn to zh without issue when appropriate. Also, canonicalizing with the
Macro option will map cmn to zh as well.

LGTM=roubert, r
R=roubert, r
CC=golang-codereviews
https://golang.org/cl/160790043
2014-10-14 08:15:55 +02:00
Marcel van Lohuizen d17d24a805 go.text/language: added TLD method to get the ccTLD equivalent of a region.
- Added Region.Canonicalize to make its use more practical in some cases.
  This is generally a useful operation and is warranted as a shortcut for
  converting to a Tag, canonicalizing and converting back.
- Added regionTypes lookup table to quickly determine if a region is a
  valid ccTLD. This table takes less space than the equivalent code needed
  to compute this from existing data. It also allows for other space and
  time optimizations for other routine.
- Added Region.IsGroup method as a counterpart to IsCountry.
- Cleaned up meaning of IsPrivateUse. Now is strictly private use as defined
  by BCP 47 and ISO. Internal mappings are ignored but can be detected using
  IsCountry or IsGroup.
- Moved splitting of IANA registry ranges to registry parsing time.
- For maketables, always print source URL in comment, not the local copy.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/144820043
2014-09-19 18:19:18 +02:00
Marcel van Lohuizen 5816925f42 go.text/cldr: Upgraded cldr, display, and language package to CLDR 25.
collatation is tricky and will be handled in a different CL.
display:
- Minor naming updates.

language:
- Added support for the provisional tag for Kosovo (XK).
- Hani script is no longer a special case. Removed respective code.
- Likely tags now make specialize a group region to a single country.
  This requires a new table for group regions.
- addTags now specializes region groups whenever it can make an
  unambiguous choice.
- Matching changed format. maketables now supports both.
- Added Contains method to Region for determining wheter a region is
  contained by another.

Do later:
- Cash rounding and decimals for currencies.

LGTM=r
R=r
CC=golang-codereviews, nigeltao
https://golang.org/cl/102120044
2014-06-04 10:35:00 +02:00
Marcel van Lohuizen cd75f379c5 go.text/language: added predefined common tag values.
The benefits of using predefined tags are:
- convenience for the user
- an additional layer or indirection for tag representations that may
  change in the future (e.g. pt to mean Brazilian Portuguese or just
  Portuguese).

Predefined values are only provided for a selection of tags. The selection
is based on CLDR data and augmented with languages for populous areas where
speakers do not typically also fluently speak a language that was already
included in the set (thereby increasing the likelihood of needing it) and
languages for which tags may be ambiguous.

We do not intend to provide similar constants for Regions, Scripts and Base
values. Specifically the set of Regions is much more "dynamic" than the set
of languages. The (Tag|Base|Script|Region|Currency)OrDie methods are provided
as a convenience for the user to specify values at startup.

Details:
- Went with names instead of codes to allow for a level of indirection on
  top of tags.
- Names for the tags are taken from CLDR for the en locale.
- Internal tag constants are now prefixed with '_' instead of lang_, reg,
  scr and cur. All tags are unique between these for so this does not pose
  a problem. It makes the predefined values for tags look a lot better in
  the godocs.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/59740047
2014-02-18 12:38:45 +01:00
Marcel van Lohuizen 73f318262f go.text/language: make Region.ISO3 return "ZZZ" in case of non-exising ISO3
code.
The occurence of a non-existing ISO3 code is rather rare to the point it
seems better to just return something.

R=r
CC=golang-codereviews
https://golang.org/cl/55230043
2014-02-04 14:03:48 +01:00
Marcel van Lohuizen 22a6057374 go.text/language: the 'e' is kind of phonetically redundant, but I think
it is better to include it anyway.

R=r
CC=golang-codereviews
https://golang.org/cl/48230043
2014-01-07 11:57:59 +01:00
Marcel van Lohuizen b8c3e1927b go.text/language: fixed bug that caused EncodeM49 to panic for some values.
The added test doesn't test much, but mainly serves the purpose of asserting
no panic occurs for any value.

R=r
CC=golang-codereviews
https://golang.org/cl/48080043
2014-01-06 17:59:22 +01:00
Marcel van Lohuizen a36a459697 go.text/language: added a seperate table for mapping from UN.M49 to region code.
It turned out that the mapping is a bit more complex than simply using
a single table, with the following features:
- blocks of sorted array of UN.M49 and region pairs.
- each block is indexed by the 3 msb of the 10-bit UN.M49 code.
- each block contains a sorted list of uint16s where the 7 msb are the 7 lsb
  of the UN.M49 code and the 9 lsb are the region code.
- total table size increase is 582 bytes
A more straightforward approach would have lead to a table size of at least 1K,
up to 2k. However, the lookup code for these approaches is either not
substantially smaller or the table size is notably larger.

The table also includes a few more new entries from the IANA registry.

R=r
CC=golang-codereviews
https://golang.org/cl/44560043
2013-12-23 09:34:53 +01:00
Marcel van Lohuizen a7e91de037 go.text/language: canonicalize deprecated regions and scripts.
Analoguous to deprecated languages, deprecated regions are represented with
their own internal codes and get canonicalized when necessary. The deprecated
regions were previously not recognized.
- refactored code for writing sorted maps.
- refactored region testing code in separate components to
  simplify tests.
- CLDR does not include the deprecated 3-letter ISO codes for all deprecated
  codes. We added them for completeness.
- Script deprecation is hard-coded. The CLDR data only contains one remapping.
  maketables.go checks that this is indeed the only one.
This adds about 250 bytes of data.

R=r
CC=golang-dev
https://golang.org/cl/19850043
2013-11-08 12:52:05 +01:00
Marcel van Lohuizen 240601eac0 go.text/language: change Zyyy to Zzzz as representation of undefined script.
This a a choice between more conformance to BCP 47 on the one hand and
Unicode and CLDR on the other hand.
The user can use the returned Confidence value to determine whether the
script was unspecified or explicitly specified as Zzzz (in the rare case the
user would care at all).
Updated comments.

R=r
CC=golang-dev
https://golang.org/cl/16020043
2013-10-24 15:04:51 +02:00
Marcel van Lohuizen 71ab14c455 go.text/language: revamped error handling:
- ValueError now exported as new type. ValueError retains the problematic value,
  allowing the user to inspect and correct it.
- Dynamically allocated errors returned in case of a syntax error are replaced
  by a error variable.
- Fixed bug: return error if an "u" extension has a type without a value.
- Added benchmarks or parsing code.
- Renamed MissingLikelyData to ErrMissingLikelyData to be consistent with other
  Go packages. This variable is not yet returned, so this change is not likely to cause
  a big issue.
- Removed Set type as long as there is no demand for it.

The code is measurably faster after removing the dynamically allocated errors.
A ValueError is 8 bytes and should not require allocation when passed as an error.
Returning a fixed error variable instead of a ValueError did not significantly improve
performance.

I considered returning a syntax error with the position at which the error occurred.
This extra management needed for this slowed down the code a bit, so I opted not to
support this. This could still be implemented if there turns out to be a need for it.

R=r, mpvl
CC=golang-dev
https://golang.org/cl/14162044
2013-10-08 19:51:01 +02:00
Marcel van Lohuizen 4a56690205 go.text/language: A few small changes:
- Added Tag methods to Base, Script and Region types to convert them in a proper tag.
- Factored out part of Canonicalize that does not remake the string (used in upcoming matcher code).
- Added "nb" -> "no" conversion in the tables to allow more consistency for code using these tables directly.
- changed to short name used in some methods for type Base so that it consistenly appears as "b" in the documentation.

R=r
CC=golang-dev
https://golang.org/cl/13647043
2013-09-23 11:03:22 +02:00
Marcel van Lohuizen b38db9f15a go.text/language: renaming of locale package:
- renamed package locale to language
- renamed type ID to Tag (language.Tag)
- renamed type Language to Base (language.Base)
- deleting locale package
- changed occurences of "locale identifier" in comments to "language tag".
- renamed method variable names from id or loc to t when the receiver type is Tag.

R=r, nigeltao
CC=golang-dev
https://golang.org/cl/13468043
2013-09-05 11:16:24 +02:00