Граф коммитов

6 Коммитов

Автор SHA1 Сообщение Дата
Henri Sivonen c55405f18e Bug 1678175 - Avoid detecting windows-1252 euro sign as GBK. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D98005
2020-11-29 08:07:45 +00:00
Henri Sivonen c9cae42014 Bug 1631983 - Update chardetng to 0.1.9. r=m_kato
* Avoid misdetecting windows-1252 English as windows-1254.
* Avoid misdetecting windows-1252 English as IBM866.
* Avoid misdetecting windows-1252 English as GBK or EUC-KR.
* Improve Chinese and Japanese detection by not giving single-byte encodings score for letter next to digit.
* Improve Italian, Portuguese, Castilian, Catalan, and Galician detection by taking into account ordinal indicator use.
* Reduce lookup table size.

Differential Revision: https://phabricator.services.mozilla.com/D73237
2020-05-12 13:56:29 +00:00
Henri Sivonen 52a6fe2427 Bug 1615836 - Update chardetng to 0.1.6. r=emk
* Properly take into account non-ASCII bytes at word boundaries for windows-1252. (Especially relevant for Italian, Catalan, Icelandic, and Faroese.)
* Move Estonian from the Baltic model to the Western model. This improves overall Estonian detection but causes š and ž encoded as windows-1257, ISO-8859-13, or ISO-8859-4 to get misdecoded. (It would be possible to add a post-processing step to adjust for š and ž, but this would cause reloads given the way chardetng is integrated with Firefox.)
* Improve Thai accuracy a lot.
* Improve Vietnamese, Lithuanian, and Latvian accuracy a bit.
* Improve accuracy for most Central European languages a bit.
* Regress accuracy for some Central European languages a bit (as side effect of fixing Italian and Catalan).
* Properly classify letters that ISO-8859-4 has but windows-1257 doesn't have in order to avoid misdetecting non-ISO-8859-4 input as ISO-8859-4.
* Improve character classification of windows-1254.
* Avoid classifying byte 0xA1 or above as space-like to avoid misdetection.
* Reduce binary size.

Differential Revision: https://phabricator.services.mozilla.com/D63197

--HG--
extra : moz-landing-system : lando
2020-02-18 22:31:00 +00:00
Henri Sivonen 5c2bad25ab Bug 1551276 - Autodetect legacy encodings on unlabeled pages. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D56362

--HG--
extra : moz-landing-system : lando
2019-12-12 17:50:19 +00:00
Oana Pop Rus df78d6011c Backed out changeset 0810ad586986 (bug 1551276) for wpt failures in ar-ISO-8859-6-late.tentative.html on a CLOSED TREE 2019-12-12 16:38:54 +02:00
Henri Sivonen 07527a83c9 Bug 1551276 - Autodetect legacy encodings on unlabeled pages. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D56362

--HG--
extra : moz-landing-system : lando
2019-12-12 12:59:47 +00:00