* Avoid misdetecting windows-1252 English as windows-1254.
* Avoid misdetecting windows-1252 English as IBM866.
* Avoid misdetecting windows-1252 English as GBK or EUC-KR.
* Improve Chinese and Japanese detection by not giving single-byte encodings score for letter next to digit.
* Improve Italian, Portuguese, Castilian, Catalan, and Galician detection by taking into account ordinal indicator use.
* Reduce lookup table size.
Differential Revision: https://phabricator.services.mozilla.com/D73237
* Properly take into account non-ASCII bytes at word boundaries for windows-1252. (Especially relevant for Italian, Catalan, Icelandic, and Faroese.)
* Move Estonian from the Baltic model to the Western model. This improves overall Estonian detection but causes š and ž encoded as windows-1257, ISO-8859-13, or ISO-8859-4 to get misdecoded. (It would be possible to add a post-processing step to adjust for š and ž, but this would cause reloads given the way chardetng is integrated with Firefox.)
* Improve Thai accuracy a lot.
* Improve Vietnamese, Lithuanian, and Latvian accuracy a bit.
* Improve accuracy for most Central European languages a bit.
* Regress accuracy for some Central European languages a bit (as side effect of fixing Italian and Catalan).
* Properly classify letters that ISO-8859-4 has but windows-1257 doesn't have in order to avoid misdetecting non-ISO-8859-4 input as ISO-8859-4.
* Improve character classification of windows-1254.
* Avoid classifying byte 0xA1 or above as space-like to avoid misdetection.
* Reduce binary size.
Differential Revision: https://phabricator.services.mozilla.com/D63197
--HG--
extra : moz-landing-system : lando