Update the public suffix list to the latest (October 15. 2013)
data of publicsuffix.org's list which adds around 60 new gTLDs.
The .ar rules changed, the corresponding tests are modified to
reflect this change in the list.
R=nigeltao
CC=golang-dev
https://golang.org/cl/14930048
add a publicsuffix.PublicSuffix function.
This required moving the encoded node type bits from the nodes array
to the children array.
R=dr.volker.dobler, rsc
CC=golang-dev, rsleevi
https://golang.org/cl/7060046
On the full list (running gen.go with -subset=false):
Before, there were 6086 nodes (at 8 bytes per node) before. After,
there were 6086 nodes (at 4 bytes per node) plus 354 children entries
(at 4 bytes per node). The difference is 22928 bytes.
In comparison, the (crushed) text is 21082 bytes, and for the curious,
the longest label is 36 bytes: "xn--correios-e-telecomunicaes-ghc29a".
All 32 bits in the nodes table are used, but there's wiggle room to
accomodate future changes to effective_tld_names.dat:
The largest children index is 353 (in 9 bits, so max is 511).
The largest node type is 2 (in 2 bits, so max is 3).
The largest text offset is 21080 (in 15 bits, so max is 32767).
The largest text length is 36 (in 6 bits, so max is 63).
benchmark old ns/op new ns/op delta
BenchmarkPublicSuffix 19948 19744 -1.02%
R=dr.volker.dobler
CC=golang-dev
https://golang.org/cl/6999045
The tables were generated by:
go run gen.go -subset -version "subset of publicsuffix.org's effective_tld_names.dat, hg revision 05b11a8d1ace (2012-11-09)" >table.go
go run gen.go -subset -version "subset of publicsuffix.org's effective_tld_names.dat, hg revision 05b11a8d1ace (2012-11-09)" -test >table_test.go
The input data is subsetted so that code review is easier while still
covering the interesting * and ! rules. A follow-up changelist will
check in the unfiltered public suffix list.
Update golang/go#1960.
R=rsc, dr.volker.dobler
CC=golang-dev
https://golang.org/cl/6912045