зеркало из https://github.com/mozilla/pjs.git
Bug 479334 - Firefox and Thunderbird need a better English language dictionary; r=gavin
This commit is contained in:
Родитель
51047b1d8b
Коммит
9c409b2b4a
|
@ -1,259 +1,259 @@
|
|||
2007-11-24 Release
|
||||
|
||||
README file for en_US and en_CA Hunspell dictionaries
|
||||
|
||||
These dictionaries are created using the speller/make-hunspell-dict
|
||||
dictionary in SCOWL, SVN revision 47.
|
||||
|
||||
The NOSUGGEST flag was added to certain taboo words. While I made an
|
||||
honest attempt to flag the strongest taboo words with the NOSUGGEST
|
||||
flag, I MAKE NO GUARANTEE THAT I FLAGGED EVERY POSSIBLE TABOO WORD.
|
||||
The list was originally derived from Németh László, however I removed
|
||||
some words which, while being considered taboo by some dictionaries,
|
||||
are not really considered swear words in today's society.
|
||||
|
||||
You can find SCOWL and friend at http://wordlist.sourceforge.net/.
|
||||
Bug reports should go to the Issue Tracker found on the previously
|
||||
mentioned web site. General discussion should go to the
|
||||
wordlist-devel at sourceforge net mailing list.
|
||||
|
||||
COPYRIGHT, SOURCES, and CREDITS:
|
||||
|
||||
The en_US and en_CA dictionaries come directly from SCOWL (up to level
|
||||
60) and is thus under the same copyright of SCOWL. The affix file is
|
||||
a heavily modified version of the original english.aff file which was
|
||||
released as part of Geoff Kuenning's Ispell and as such is covered by
|
||||
his BSD license. Part of SCOWL is also based on Ispell thus the
|
||||
Ispell copyright is included with the SCOWL copyright.
|
||||
|
||||
The collective work is Copyright 2000-2007 by Kevin Atkinson as well
|
||||
as any of the copyrights mentioned below:
|
||||
|
||||
Copyright 2000-2007 by Kevin Atkinson
|
||||
|
||||
Permission to use, copy, modify, distribute and sell these word
|
||||
lists, the associated scripts, the output created from the scripts,
|
||||
and its documentation for any purpose is hereby granted without fee,
|
||||
provided that the above copyright notice appears in all copies and
|
||||
that both that copyright notice and this permission notice appear in
|
||||
supporting documentation. Kevin Atkinson makes no representations
|
||||
about the suitability of this array for any purpose. It is provided
|
||||
"as is" without express or implied warranty.
|
||||
|
||||
Alan Beale <biljir@pobox.com> also deserves special credit as he has,
|
||||
in addition to providing the 12Dicts package and being a major
|
||||
contributor to the ENABLE word list, given me an incredible amount of
|
||||
feedback and created a number of special lists (those found in the
|
||||
Supplement) in order to help improve the overall quality of SCOWL.
|
||||
|
||||
The 10 level includes the 1000 most common English words (according to
|
||||
the Moby (TM) Words II [MWords] package), a subset of the 1000 most
|
||||
common words on the Internet (again, according to Moby Words II), and
|
||||
frequently class 16 from Brian Kelk's "UK English Wordlist
|
||||
with Frequency Classification".
|
||||
|
||||
The MWords package was explicitly placed in the public domain:
|
||||
|
||||
The Moby lexicon project is complete and has
|
||||
been place into the public domain. Use, sell,
|
||||
rework, excerpt and use in any way on any platform.
|
||||
|
||||
Placing this material on internal or public servers is
|
||||
also encouraged. The compiler is not aware of any
|
||||
export restrictions so freely distribute world-wide.
|
||||
|
||||
You can verify the public domain status by contacting
|
||||
|
||||
Grady Ward
|
||||
3449 Martha Ct.
|
||||
Arcata, CA 95521-4884
|
||||
|
||||
grady@netcom.com
|
||||
grady@northcoast.com
|
||||
|
||||
The "UK English Wordlist With Frequency Classification" is also in the
|
||||
Public Domain:
|
||||
|
||||
Date: Sat, 08 Jul 2000 20:27:21 +0100
|
||||
From: Brian Kelk <Brian.Kelk@cl.cam.ac.uk>
|
||||
|
||||
> I was wondering what the copyright status of your "UK English
|
||||
> Wordlist With Frequency Classification" word list as it seems to
|
||||
> be lacking any copyright notice.
|
||||
|
||||
There were many many sources in total, but any text marked
|
||||
"copyright" was avoided. Locally-written documentation was one
|
||||
source. An earlier version of the list resided in a filespace called
|
||||
PUBLIC on the University mainframe, because it was considered public
|
||||
domain.
|
||||
|
||||
Date: Tue, 11 Jul 2000 19:31:34 +0100
|
||||
|
||||
> So are you saying your word list is also in the public domain?
|
||||
|
||||
That is the intention.
|
||||
|
||||
The 20 level includes frequency classes 7-15 from Brian's word list.
|
||||
|
||||
The 35 level includes frequency classes 2-6 and words appearing in at
|
||||
least 11 of 12 dictionaries as indicated in the 12Dicts package. All
|
||||
words from the 12Dicts package have had likely inflections added via
|
||||
my inflection database.
|
||||
|
||||
The 12Dicts package and Supplement is in the Public Domain.
|
||||
|
||||
The WordNet database, which was used in the creation of the
|
||||
Inflections database, is under the following copyright:
|
||||
|
||||
This software and database is being provided to you, the LICENSEE,
|
||||
by Princeton University under the following license. By obtaining,
|
||||
using and/or copying this software and database, you agree that you
|
||||
have read, understood, and will comply with these terms and
|
||||
conditions.:
|
||||
|
||||
Permission to use, copy, modify and distribute this software and
|
||||
database and its documentation for any purpose and without fee or
|
||||
royalty is hereby granted, provided that you agree to comply with
|
||||
the following copyright notice and statements, including the
|
||||
disclaimer, and that the same appear on ALL copies of the software,
|
||||
database and documentation, including modifications that you make
|
||||
for internal use or for distribution.
|
||||
|
||||
WordNet 1.6 Copyright 1997 by Princeton University. All rights
|
||||
reserved.
|
||||
|
||||
THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
|
||||
UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
|
||||
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON
|
||||
UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT-
|
||||
ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE
|
||||
LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY
|
||||
THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
|
||||
|
||||
The name of Princeton University or Princeton may not be used in
|
||||
advertising or publicity pertaining to distribution of the software
|
||||
and/or database. Title to copyright in this software, database and
|
||||
any associated documentation shall at all times remain with
|
||||
Princeton University and LICENSEE agrees to preserve same.
|
||||
|
||||
The 40 level includes words from Alan's 3esl list found in version 4.0
|
||||
of his 12dicts package. Like his other stuff the 3esl list is also in the
|
||||
public domain.
|
||||
|
||||
The 50 level includes Brian's frequency class 1, words words appearing
|
||||
in at least 5 of 12 of the dictionaries as indicated in the 12Dicts
|
||||
package, and uppercase words in at least 4 of the previous 12
|
||||
dictionaries. A decent number of proper names is also included: The
|
||||
top 1000 male, female, and Last names from the 1990 Census report; a
|
||||
list of names sent to me by Alan Beale; and a few names that I added
|
||||
myself. Finally a small list of abbreviations not commonly found in
|
||||
other word lists is included.
|
||||
|
||||
The name files form the Census report is a government document which I
|
||||
don't think can be copyrighted.
|
||||
|
||||
The file special-jargon.50 uses common.lst and word.lst from the
|
||||
"Unofficial Jargon File Word Lists" which is derived from "The Jargon
|
||||
File". All of which is in the Public Domain. This file also contain
|
||||
a few extra UNIX terms which are found in the file "unix-terms" in the
|
||||
special/ directory.
|
||||
|
||||
The 55 level includes words from Alan's 2of4brif list found in version
|
||||
4.0 of his 12dicts package. Like his other stuff the 2of4brif is also
|
||||
in the public domain.
|
||||
|
||||
The 60 level includes Brian's frequency class 0 and all words
|
||||
appearing in at least 2 of the 12 dictionaries as indicated by the
|
||||
12Dicts package. A large number of names are also included: The 4,946
|
||||
female names and the 3,897 male names from the MWords package.
|
||||
|
||||
The 70 level includes the 74,550 common dictionary words and the
|
||||
21,986 names list from the MWords package The common dictionary words,
|
||||
like those from the 12Dicts package, have had all likely inflections
|
||||
added. The 70 level also included the 5desk list from version 4.0 of
|
||||
the 12Dics package which is the public domain
|
||||
|
||||
The 80 level includes the ENABLE word list, all the lists in the
|
||||
ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics
|
||||
Dictionary" (UKACD), the list of signature words in from YAWL package,
|
||||
and the 10,196 places list from the MWords package.
|
||||
|
||||
The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>,
|
||||
is in the Public Domain:
|
||||
|
||||
The ENABLE master word list, WORD.LST, is herewith formally released
|
||||
into the Public Domain. Anyone is free to use it or distribute it in
|
||||
any manner they see fit. No fee or registration is required for its
|
||||
use nor are "contributions" solicited (if you feel you absolutely
|
||||
must contribute something for your own peace of mind, the authors of
|
||||
the ENABLE list ask that you make a donation on their behalf to your
|
||||
favorite charity). This word list is our gift to the Scrabble
|
||||
community, as an alternate to "official" word lists. Game designers
|
||||
may feel free to incorporate the WORD.LST into their games. Please
|
||||
mention the source and credit us as originators of the list. Note
|
||||
that if you, as a game designer, use the WORD.LST in your product,
|
||||
you may still copyright and protect your product, but you may *not*
|
||||
legally copyright or in any way restrict redistribution of the
|
||||
WORD.LST portion of your product. This *may* under law restrict your
|
||||
rights to restrict your users' rights, but that is only fair.
|
||||
|
||||
UKACD, by J Ross Beresford <ross@bryson.demon.co.uk>, is under the
|
||||
following copyright:
|
||||
|
||||
Copyright (c) J Ross Beresford 1993-1999. All Rights Reserved.
|
||||
|
||||
The following restriction is placed on the use of this publication:
|
||||
if The UK Advanced Cryptics Dictionary is used in a software package
|
||||
or redistributed in any form, the copyright notice must be
|
||||
prominently displayed and the text of this document must be included
|
||||
verbatim.
|
||||
|
||||
There are no other restrictions: I would like to see the list
|
||||
distributed as widely as possible.
|
||||
|
||||
The 95 level includes the 354,984 single words and 256,772 compound
|
||||
words from the MWords package, ABLE.LST from the ENABLE Supplement,
|
||||
and some additional words found in my part-of-speech database that
|
||||
were not found anywhere else.
|
||||
|
||||
Accent information was taken from UKACD.
|
||||
|
||||
My VARCON package was used to create the American, British, and
|
||||
Canadian word list.
|
||||
|
||||
Since the original word lists used used in the VARCON package came
|
||||
from the Ispell distribution they are under the Ispell copyright:
|
||||
|
||||
Copyright 1993, Geoff Kuenning, Granada Hills, CA
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
3. All modifications to the source code must be clearly marked as
|
||||
such. Binary redistributions based on modified source code
|
||||
must be clearly marked as modified versions in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
(clause 4 removed with permission from Geoff Kuenning)
|
||||
5. The name of Geoff Kuenning may not be used to endorse or promote
|
||||
products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS
|
||||
IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
|
||||
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF
|
||||
KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
||||
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
|
||||
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
|
||||
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
2008-12-05 Release
|
||||
|
||||
README file for en_US and en_CA Hunspell dictionaries
|
||||
|
||||
These dictionaries are created using the speller/make-hunspell-dict
|
||||
dictionary in SCOWL, SVN revision 74.
|
||||
|
||||
The NOSUGGEST flag was added to certain taboo words. While I made an
|
||||
honest attempt to flag the strongest taboo words with the NOSUGGEST
|
||||
flag, I MAKE NO GUARANTEE THAT I FLAGGED EVERY POSSIBLE TABOO WORD.
|
||||
The list was originally derived from Németh László, however I removed
|
||||
some words which, while being considered taboo by some dictionaries,
|
||||
are not really considered swear words in today's society.
|
||||
|
||||
You can find SCOWL and friend at http://wordlist.sourceforge.net/.
|
||||
Bug reports should go to the Issue Tracker found on the previously
|
||||
mentioned web site. General discussion should go to the
|
||||
wordlist-devel at sourceforge net mailing list.
|
||||
|
||||
COPYRIGHT, SOURCES, and CREDITS:
|
||||
|
||||
The en_US and en_CA dictionaries come directly from SCOWL (up to level
|
||||
60) and is thus under the same copyright of SCOWL. The affix file is
|
||||
a heavily modified version of the original english.aff file which was
|
||||
released as part of Geoff Kuenning's Ispell and as such is covered by
|
||||
his BSD license. Part of SCOWL is also based on Ispell thus the
|
||||
Ispell copyright is included with the SCOWL copyright.
|
||||
|
||||
The collective work is Copyright 2000-2007 by Kevin Atkinson as well
|
||||
as any of the copyrights mentioned below:
|
||||
|
||||
Copyright 2000-2007 by Kevin Atkinson
|
||||
|
||||
Permission to use, copy, modify, distribute and sell these word
|
||||
lists, the associated scripts, the output created from the scripts,
|
||||
and its documentation for any purpose is hereby granted without fee,
|
||||
provided that the above copyright notice appears in all copies and
|
||||
that both that copyright notice and this permission notice appear in
|
||||
supporting documentation. Kevin Atkinson makes no representations
|
||||
about the suitability of this array for any purpose. It is provided
|
||||
"as is" without express or implied warranty.
|
||||
|
||||
Alan Beale <biljir@pobox.com> also deserves special credit as he has,
|
||||
in addition to providing the 12Dicts package and being a major
|
||||
contributor to the ENABLE word list, given me an incredible amount of
|
||||
feedback and created a number of special lists (those found in the
|
||||
Supplement) in order to help improve the overall quality of SCOWL.
|
||||
|
||||
The 10 level includes the 1000 most common English words (according to
|
||||
the Moby (TM) Words II [MWords] package), a subset of the 1000 most
|
||||
common words on the Internet (again, according to Moby Words II), and
|
||||
frequently class 16 from Brian Kelk's "UK English Wordlist
|
||||
with Frequency Classification".
|
||||
|
||||
The MWords package was explicitly placed in the public domain:
|
||||
|
||||
The Moby lexicon project is complete and has
|
||||
been place into the public domain. Use, sell,
|
||||
rework, excerpt and use in any way on any platform.
|
||||
|
||||
Placing this material on internal or public servers is
|
||||
also encouraged. The compiler is not aware of any
|
||||
export restrictions so freely distribute world-wide.
|
||||
|
||||
You can verify the public domain status by contacting
|
||||
|
||||
Grady Ward
|
||||
3449 Martha Ct.
|
||||
Arcata, CA 95521-4884
|
||||
|
||||
grady@netcom.com
|
||||
grady@northcoast.com
|
||||
|
||||
The "UK English Wordlist With Frequency Classification" is also in the
|
||||
Public Domain:
|
||||
|
||||
Date: Sat, 08 Jul 2000 20:27:21 +0100
|
||||
From: Brian Kelk <Brian.Kelk@cl.cam.ac.uk>
|
||||
|
||||
> I was wondering what the copyright status of your "UK English
|
||||
> Wordlist With Frequency Classification" word list as it seems to
|
||||
> be lacking any copyright notice.
|
||||
|
||||
There were many many sources in total, but any text marked
|
||||
"copyright" was avoided. Locally-written documentation was one
|
||||
source. An earlier version of the list resided in a filespace called
|
||||
PUBLIC on the University mainframe, because it was considered public
|
||||
domain.
|
||||
|
||||
Date: Tue, 11 Jul 2000 19:31:34 +0100
|
||||
|
||||
> So are you saying your word list is also in the public domain?
|
||||
|
||||
That is the intention.
|
||||
|
||||
The 20 level includes frequency classes 7-15 from Brian's word list.
|
||||
|
||||
The 35 level includes frequency classes 2-6 and words appearing in at
|
||||
least 11 of 12 dictionaries as indicated in the 12Dicts package. All
|
||||
words from the 12Dicts package have had likely inflections added via
|
||||
my inflection database.
|
||||
|
||||
The 12Dicts package and Supplement is in the Public Domain.
|
||||
|
||||
The WordNet database, which was used in the creation of the
|
||||
Inflections database, is under the following copyright:
|
||||
|
||||
This software and database is being provided to you, the LICENSEE,
|
||||
by Princeton University under the following license. By obtaining,
|
||||
using and/or copying this software and database, you agree that you
|
||||
have read, understood, and will comply with these terms and
|
||||
conditions.:
|
||||
|
||||
Permission to use, copy, modify and distribute this software and
|
||||
database and its documentation for any purpose and without fee or
|
||||
royalty is hereby granted, provided that you agree to comply with
|
||||
the following copyright notice and statements, including the
|
||||
disclaimer, and that the same appear on ALL copies of the software,
|
||||
database and documentation, including modifications that you make
|
||||
for internal use or for distribution.
|
||||
|
||||
WordNet 1.6 Copyright 1997 by Princeton University. All rights
|
||||
reserved.
|
||||
|
||||
THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
|
||||
UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
|
||||
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON
|
||||
UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT-
|
||||
ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE
|
||||
LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY
|
||||
THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
|
||||
|
||||
The name of Princeton University or Princeton may not be used in
|
||||
advertising or publicity pertaining to distribution of the software
|
||||
and/or database. Title to copyright in this software, database and
|
||||
any associated documentation shall at all times remain with
|
||||
Princeton University and LICENSEE agrees to preserve same.
|
||||
|
||||
The 40 level includes words from Alan's 3esl list found in version 4.0
|
||||
of his 12dicts package. Like his other stuff the 3esl list is also in the
|
||||
public domain.
|
||||
|
||||
The 50 level includes Brian's frequency class 1, words words appearing
|
||||
in at least 5 of 12 of the dictionaries as indicated in the 12Dicts
|
||||
package, and uppercase words in at least 4 of the previous 12
|
||||
dictionaries. A decent number of proper names is also included: The
|
||||
top 1000 male, female, and Last names from the 1990 Census report; a
|
||||
list of names sent to me by Alan Beale; and a few names that I added
|
||||
myself. Finally a small list of abbreviations not commonly found in
|
||||
other word lists is included.
|
||||
|
||||
The name files form the Census report is a government document which I
|
||||
don't think can be copyrighted.
|
||||
|
||||
The file special-jargon.50 uses common.lst and word.lst from the
|
||||
"Unofficial Jargon File Word Lists" which is derived from "The Jargon
|
||||
File". All of which is in the Public Domain. This file also contain
|
||||
a few extra UNIX terms which are found in the file "unix-terms" in the
|
||||
special/ directory.
|
||||
|
||||
The 55 level includes words from Alan's 2of4brif list found in version
|
||||
4.0 of his 12dicts package. Like his other stuff the 2of4brif is also
|
||||
in the public domain.
|
||||
|
||||
The 60 level includes Brian's frequency class 0 and all words
|
||||
appearing in at least 2 of the 12 dictionaries as indicated by the
|
||||
12Dicts package. A large number of names are also included: The 4,946
|
||||
female names and the 3,897 male names from the MWords package.
|
||||
|
||||
The 70 level includes the 74,550 common dictionary words and the
|
||||
21,986 names list from the MWords package The common dictionary words,
|
||||
like those from the 12Dicts package, have had all likely inflections
|
||||
added. The 70 level also included the 5desk list from version 4.0 of
|
||||
the 12Dics package which is the public domain
|
||||
|
||||
The 80 level includes the ENABLE word list, all the lists in the
|
||||
ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics
|
||||
Dictionary" (UKACD), the list of signature words in from YAWL package,
|
||||
and the 10,196 places list from the MWords package.
|
||||
|
||||
The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>,
|
||||
is in the Public Domain:
|
||||
|
||||
The ENABLE master word list, WORD.LST, is herewith formally released
|
||||
into the Public Domain. Anyone is free to use it or distribute it in
|
||||
any manner they see fit. No fee or registration is required for its
|
||||
use nor are "contributions" solicited (if you feel you absolutely
|
||||
must contribute something for your own peace of mind, the authors of
|
||||
the ENABLE list ask that you make a donation on their behalf to your
|
||||
favorite charity). This word list is our gift to the Scrabble
|
||||
community, as an alternate to "official" word lists. Game designers
|
||||
may feel free to incorporate the WORD.LST into their games. Please
|
||||
mention the source and credit us as originators of the list. Note
|
||||
that if you, as a game designer, use the WORD.LST in your product,
|
||||
you may still copyright and protect your product, but you may *not*
|
||||
legally copyright or in any way restrict redistribution of the
|
||||
WORD.LST portion of your product. This *may* under law restrict your
|
||||
rights to restrict your users' rights, but that is only fair.
|
||||
|
||||
UKACD, by J Ross Beresford <ross@bryson.demon.co.uk>, is under the
|
||||
following copyright:
|
||||
|
||||
Copyright (c) J Ross Beresford 1993-1999. All Rights Reserved.
|
||||
|
||||
The following restriction is placed on the use of this publication:
|
||||
if The UK Advanced Cryptics Dictionary is used in a software package
|
||||
or redistributed in any form, the copyright notice must be
|
||||
prominently displayed and the text of this document must be included
|
||||
verbatim.
|
||||
|
||||
There are no other restrictions: I would like to see the list
|
||||
distributed as widely as possible.
|
||||
|
||||
The 95 level includes the 354,984 single words and 256,772 compound
|
||||
words from the MWords package, ABLE.LST from the ENABLE Supplement,
|
||||
and some additional words found in my part-of-speech database that
|
||||
were not found anywhere else.
|
||||
|
||||
Accent information was taken from UKACD.
|
||||
|
||||
My VARCON package was used to create the American, British, and
|
||||
Canadian word list.
|
||||
|
||||
Since the original word lists used used in the VARCON package came
|
||||
from the Ispell distribution they are under the Ispell copyright:
|
||||
|
||||
Copyright 1993, Geoff Kuenning, Granada Hills, CA
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
3. All modifications to the source code must be clearly marked as
|
||||
such. Binary redistributions based on modified source code
|
||||
must be clearly marked as modified versions in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
(clause 4 removed with permission from Geoff Kuenning)
|
||||
5. The name of Geoff Kuenning may not be used to endorse or promote
|
||||
products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS
|
||||
IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
|
||||
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF
|
||||
KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
||||
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
|
||||
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
|
||||
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
|
@ -1,5 +1,33 @@
|
|||
Mozilla has added some words to its English dictionary via mozilla_words.diff.
|
||||
Some of these words are regular words that are missing from the Hunspell
|
||||
dictionary, and some are brand names such as "SeaMonkey" that are valuable to
|
||||
Mozilla's users. The patch for these words is checked in alongside the
|
||||
dictionary. This patch should be re-applied if the dictionary is updated.
|
||||
README_mozilla
|
||||
|
||||
The dictionary en-US.dic is generated by merging the following dictionaries, in the dictionary-sources subdirectory, using the merge-dictionaries bash script which
|
||||
automatically patches, merges, sorts, and identifies duplicates.
|
||||
|
||||
hunspell-en_US-20081205.dic:
|
||||
|
||||
2008-12-05 Release, en_US Hunspell dictionary from http://wordlist.sourceforge.net/
|
||||
These dictionaries are created using the speller/make-hunspell-dict
|
||||
dictionary in SCOWL, SVN revision 74.
|
||||
|
||||
upstream-hunspell.diff:
|
||||
|
||||
Mozilla-specific additions to the upstream Hunspell dictionary. Some of
|
||||
these changes should be upstreamed, and others should probably just be removed
|
||||
(bug 499444).
|
||||
|
||||
chromium_en_US.dic_delta:
|
||||
|
||||
Chromium wordlist autogenerated by Google,
|
||||
svn - Revision 18580,
|
||||
of http://src.chromium.org/svn/trunk/src/chrome/third_party/hunspell/dictionaries/en_US.dic_delta
|
||||
|
||||
upstream-chromium.diff:
|
||||
|
||||
Patches Chromium wordlist, to remove junk and words that are redundant with the
|
||||
newer Hunspell dictionary we use.
|
||||
|
||||
mozilla-specific.txt:
|
||||
|
||||
Mozilla-specific words, separated out from Hunspell and Chromium word lists.
|
||||
"Firefox" goes here. (See bug 237921)
|
||||
|
||||
|
|
|
@ -0,0 +1,674 @@
|
|||
Abbas/6
|
||||
AbeBooks
|
||||
ACAD/m
|
||||
AcDbEntity
|
||||
acetyl
|
||||
acknowledgement
|
||||
acknowledgement/7
|
||||
actin
|
||||
ActiveX/6
|
||||
Acura/7
|
||||
acyl
|
||||
AddThis/6
|
||||
adipex
|
||||
admin/7
|
||||
ageing
|
||||
Agilent
|
||||
agonist
|
||||
ajax
|
||||
Alibaba
|
||||
aliphatic
|
||||
alkoxy
|
||||
altitudeMode
|
||||
aluminium
|
||||
Amapedia
|
||||
ambien
|
||||
america
|
||||
american
|
||||
amine
|
||||
analyse
|
||||
analysed
|
||||
Andhra
|
||||
Anglais
|
||||
anglais
|
||||
anime
|
||||
antisense
|
||||
antivirus/7
|
||||
API/7
|
||||
apoptoses
|
||||
apoptosis
|
||||
app/7
|
||||
Arabidopsis
|
||||
Arbre/6
|
||||
arg/7
|
||||
Arial/6
|
||||
arXiv/6
|
||||
aryl/7
|
||||
Athlon/7
|
||||
ATPase/6
|
||||
aurei
|
||||
aureus
|
||||
auteur/7
|
||||
Bacteriol
|
||||
Bahasa/6
|
||||
Bancorp
|
||||
Beckham
|
||||
Belkin/6
|
||||
Bellevue/6
|
||||
benzyl
|
||||
BibSonomy/6
|
||||
BibTeX/6
|
||||
bio/7
|
||||
biochem
|
||||
bioinformatic/7
|
||||
biophys
|
||||
biosyntheses
|
||||
biosynthesis
|
||||
biotech/6
|
||||
BizRate/6
|
||||
BlackBerry/6
|
||||
BlinkList/6
|
||||
Bloglines/6
|
||||
blogroll/7
|
||||
blonde/7
|
||||
Bloomberg/6
|
||||
blowjob/30
|
||||
Bluetooth/6
|
||||
Bollywood/6
|
||||
bool/7
|
||||
Brabois/6
|
||||
Bracknell
|
||||
Brasil/6
|
||||
Buenos
|
||||
Burkina/6
|
||||
Caicos/6
|
||||
Cambridgeshire/6
|
||||
cancelled
|
||||
cancelling
|
||||
carboxylic
|
||||
CareerBuilder/6
|
||||
carisoprodol
|
||||
Carlisle/6
|
||||
casa
|
||||
Casio/6
|
||||
Cassini/6
|
||||
cDNA
|
||||
celebs
|
||||
centro/6
|
||||
cerevisiae/7
|
||||
charset/7
|
||||
Chatham/6
|
||||
ChemPort/6
|
||||
Cheney/6
|
||||
Chennai/6
|
||||
chloro
|
||||
Choi/6
|
||||
Cialis/6
|
||||
Cingular/6
|
||||
Cisco/6
|
||||
Citebase/6
|
||||
CiteULike/6
|
||||
Citysearch/6
|
||||
Clarkson/6
|
||||
codec/7
|
||||
codon/7
|
||||
ColdFusion/6
|
||||
coli
|
||||
Comcast/6
|
||||
Computerworld/6
|
||||
conf
|
||||
config/7
|
||||
Connotea/6
|
||||
const/7
|
||||
Coulter/6
|
||||
Councillor/7
|
||||
counselled
|
||||
counselling
|
||||
Courriel/6
|
||||
craigslist/6
|
||||
crore/7
|
||||
CrossRef/6
|
||||
cryonic/7
|
||||
Ctrl
|
||||
cultivar/7
|
||||
Cumbria/6
|
||||
cyber
|
||||
cysteine/7
|
||||
cytokine/7
|
||||
cytokines/7
|
||||
Dansk
|
||||
Darfur/6
|
||||
Dari/6
|
||||
datasheet/7
|
||||
datatype/7
|
||||
Daytona/6
|
||||
DealTime/6
|
||||
Debian
|
||||
dehydrogenase/6
|
||||
Deloitte/6
|
||||
Denton/6
|
||||
desc
|
||||
Deseret/6
|
||||
designee
|
||||
Deutsche/6
|
||||
Deutschland/6
|
||||
deze/7
|
||||
Diaz/6
|
||||
diff/7
|
||||
diff/7
|
||||
Digg/7
|
||||
Digi
|
||||
dihydro
|
||||
Directgov/6
|
||||
DivX/6
|
||||
Dorset/6
|
||||
downloadable
|
||||
Dreamweaver/6
|
||||
DSpace/6
|
||||
DVDs
|
||||
Easton/6
|
||||
eBay/6
|
||||
eBook/7
|
||||
ECNext/6
|
||||
eCommerce/6
|
||||
EconPapers/6
|
||||
Elsevier/6
|
||||
Eminem/6
|
||||
EndNote/6
|
||||
Endocrinol
|
||||
Engadget/6
|
||||
Enron/6
|
||||
Envirofacts/6
|
||||
Epinions/6
|
||||
Epson/6
|
||||
Ericsson/6
|
||||
Eukaryota/7
|
||||
Euro/7
|
||||
euro/7
|
||||
eval/7
|
||||
exon/7
|
||||
expandera/7
|
||||
Expedia/6
|
||||
Facebook/6
|
||||
Faso/6
|
||||
Favorited
|
||||
Feng/6
|
||||
filmography
|
||||
FilmSpot/6
|
||||
financials
|
||||
FindArticles/6
|
||||
FindLaw/6
|
||||
fioricet/7
|
||||
Firefox/6
|
||||
Flickr/6
|
||||
Flickr/6
|
||||
flyer/7
|
||||
Forex/6
|
||||
forma
|
||||
FreeBSD/6
|
||||
Freeware/6
|
||||
freeware/7
|
||||
FrontPage/6
|
||||
fundraiser/7
|
||||
fundraising
|
||||
fundraising
|
||||
GaAs
|
||||
GameBase/6
|
||||
GameCube/6
|
||||
GameFAQs/6
|
||||
GameSpot/6
|
||||
Garmin/6
|
||||
gastroenterology
|
||||
GenBank/6
|
||||
GeneID/6
|
||||
genomic
|
||||
genomic/7
|
||||
gigabit/7
|
||||
Giuliani/6
|
||||
globalization
|
||||
Gloucestershire/6
|
||||
GmbH
|
||||
Google/6
|
||||
GrainGenes/6
|
||||
grande
|
||||
grantor/7
|
||||
grey
|
||||
Griffiths/7
|
||||
guestbook/7
|
||||
Guildford/6
|
||||
Hamas/6
|
||||
handheld/7
|
||||
Hassan/6
|
||||
healthcare
|
||||
hentai
|
||||
Hertfordshire/6
|
||||
hexane/7
|
||||
Hezbollah/6
|
||||
HighBeam/6
|
||||
Hillsborough/6
|
||||
holdem
|
||||
hotline/7
|
||||
Hotmail/6
|
||||
Houghton/6
|
||||
howto/7
|
||||
html
|
||||
httpd/6
|
||||
hydrocodone/7
|
||||
IETF/6
|
||||
IMDb/6
|
||||
IMDbPro/6
|
||||
IMDbTV/6
|
||||
Immunol
|
||||
inbox/6
|
||||
indices
|
||||
indie
|
||||
Infotrieve/6
|
||||
Ingenta/6
|
||||
IngentaConnect/6
|
||||
init/7
|
||||
inkjet/7
|
||||
InterPro/6
|
||||
Intl
|
||||
intl
|
||||
intranet/7
|
||||
iPhone/6
|
||||
iPod/6
|
||||
IPOs
|
||||
ischemia
|
||||
ischemic
|
||||
Italia
|
||||
Italiano
|
||||
iTunes/6
|
||||
Ivoire
|
||||
Japonica/6
|
||||
Jelsoft/6
|
||||
Jiang
|
||||
Joomla/6
|
||||
jpeg
|
||||
judgement
|
||||
Juni/6
|
||||
Karnataka/6
|
||||
kbps
|
||||
Kerala/6
|
||||
keygen
|
||||
Kijiji/6
|
||||
kinase
|
||||
Kitts/6
|
||||
Klicka/6
|
||||
Klicken/6
|
||||
KnowledgeStorm/6
|
||||
Kosovo/6
|
||||
Kuala/6
|
||||
labelled/12
|
||||
Lakers/6
|
||||
lang
|
||||
langue/7
|
||||
Lanka/6
|
||||
learnt
|
||||
LegCo/6
|
||||
Leica/6
|
||||
Lett/6
|
||||
Levitra/6
|
||||
LexisNexis/6
|
||||
Lexmark/6
|
||||
Libros/6
|
||||
licence//101
|
||||
licencee/7
|
||||
licences/14
|
||||
licencing/14
|
||||
licencor/7
|
||||
Lincolnshire/6
|
||||
LinkedIn/6
|
||||
Listmania/6
|
||||
LiveJournal/6
|
||||
localhost
|
||||
localisation/7
|
||||
locator/7
|
||||
Logitech/6
|
||||
LookSmart/6
|
||||
Lycos/6
|
||||
Macau/6
|
||||
Macromedia/6
|
||||
mailto
|
||||
mammalia
|
||||
manga
|
||||
Manga/6
|
||||
MapQuest/6
|
||||
MasterCard/6
|
||||
MathML/6
|
||||
Mbps
|
||||
McAfee/6
|
||||
McCann/6
|
||||
MediaWiki/6
|
||||
Medline/6
|
||||
meetup/7
|
||||
memberlist/7
|
||||
mesothelioma/6
|
||||
Metacafe/6
|
||||
metadata
|
||||
metadata/7
|
||||
meth
|
||||
methoxy
|
||||
mexico
|
||||
mgmt/7
|
||||
microbiol
|
||||
milf
|
||||
mins
|
||||
mitochondria
|
||||
mitochondrial
|
||||
mmol
|
||||
modelling
|
||||
modelling
|
||||
Mohammed/6
|
||||
Motorola/6
|
||||
Motorsport/7
|
||||
Mozilla/6
|
||||
mpeg
|
||||
mRNA
|
||||
msgid
|
||||
msgs
|
||||
msgstr
|
||||
multicast
|
||||
Mumbai/6
|
||||
murine
|
||||
musculus
|
||||
Muze/6
|
||||
MyCareerBuilder/6
|
||||
mySimon
|
||||
MySpace/6
|
||||
Myspace/6
|
||||
MySQL/6
|
||||
MyYahoo/6
|
||||
Nadu/6
|
||||
namespace
|
||||
namespace/7
|
||||
nano
|
||||
nanotechnology
|
||||
Napa/6
|
||||
Naruto/6
|
||||
Nasdaq/6
|
||||
Nederland/7
|
||||
NetBSD/6
|
||||
Neurol/6
|
||||
neurosci
|
||||
neuroscience/7
|
||||
Newsvine/6
|
||||
newswires
|
||||
NexTag/6
|
||||
Nextel/6
|
||||
NGOs/6
|
||||
nitro
|
||||
Niue/6
|
||||
Nokia/6
|
||||
Norsk/6
|
||||
Nortel/6
|
||||
nowrap
|
||||
Obama/6
|
||||
offence
|
||||
offences
|
||||
offline
|
||||
offline
|
||||
offsite
|
||||
Oleh/6
|
||||
oligonucleotide
|
||||
onsite
|
||||
OpenBSD/6
|
||||
opensource
|
||||
OpenURL/6
|
||||
OpenView/6
|
||||
organisation
|
||||
organisational
|
||||
organisations
|
||||
organised
|
||||
Oryza/6
|
||||
oxidase
|
||||
Oxley/6
|
||||
Palau/6
|
||||
Panasonic/6
|
||||
Papua/6
|
||||
param
|
||||
PARC
|
||||
PatentStorm/6
|
||||
Pathol/6
|
||||
PDAs
|
||||
PDFs
|
||||
permalink
|
||||
permalink/7
|
||||
permittee
|
||||
Peterborough/6
|
||||
Pfam/6
|
||||
pgSQL
|
||||
pharma
|
||||
pharmacol
|
||||
phentermine
|
||||
Phentermine/6
|
||||
Philips/6
|
||||
phonephone
|
||||
phosphorylation
|
||||
Photobucket/6
|
||||
Photoshop/6
|
||||
phpBB
|
||||
physiol
|
||||
pics
|
||||
pics
|
||||
pkgsrc
|
||||
Placemark/7
|
||||
playlist/7
|
||||
playlist/7
|
||||
PlayStation/6
|
||||
plugin/7
|
||||
plugin/7
|
||||
Polska
|
||||
Polski
|
||||
poly
|
||||
poly
|
||||
polynucleotide/7
|
||||
polypeptide/7
|
||||
popup/7
|
||||
portlet/7
|
||||
portlet/7
|
||||
PostgreSQL/6
|
||||
PostScript/6
|
||||
Poynter/6
|
||||
prev
|
||||
prev
|
||||
Prix
|
||||
PRNewswire/6
|
||||
proc
|
||||
proc
|
||||
prog
|
||||
Propecia/6
|
||||
propyl
|
||||
ProQuest/6
|
||||
Pseudomonas/7
|
||||
PubMed/6
|
||||
Qaeda/6
|
||||
Quicklinks/6
|
||||
QuickList/6
|
||||
QuickMix/6
|
||||
QuickTime/6
|
||||
Qwest/6
|
||||
Rebecca/6
|
||||
Reddit/6
|
||||
reductase/6
|
||||
RePEc/6
|
||||
RetCode/6
|
||||
Rhode/6
|
||||
ringtone/7
|
||||
Rolex/6
|
||||
Rolex/6
|
||||
rotatable
|
||||
rotatably
|
||||
Routledge/6
|
||||
Rumsfeld/6
|
||||
Saccharomyces/6
|
||||
Sams/6
|
||||
Samsung/6
|
||||
sapiens
|
||||
Sarbanes/6
|
||||
sativa
|
||||
savoir
|
||||
sbjct
|
||||
Schwarz/6
|
||||
Scopus/6
|
||||
Screenonline/6
|
||||
screensaver
|
||||
screenshot/7
|
||||
searchable
|
||||
Secunia/6
|
||||
seleccionar
|
||||
Sera
|
||||
serine
|
||||
setcmykcolor/7
|
||||
sgml
|
||||
SharePoint/6
|
||||
Sharma/6
|
||||
shemale/30
|
||||
Shen/6
|
||||
Shenzhen/6
|
||||
Shopzilla/6
|
||||
showtime/7
|
||||
Shrek/6
|
||||
Sigmer/7
|
||||
signalling
|
||||
signup/7
|
||||
Simpson/7
|
||||
Simpy/6
|
||||
Singh/6
|
||||
Sion
|
||||
sitemap/7
|
||||
sizeof
|
||||
Skype/6
|
||||
Slashdot/6
|
||||
slideshow/7
|
||||
smartphone/7
|
||||
SMEs/6
|
||||
SNPs/6
|
||||
Solaris/6
|
||||
Sony/6
|
||||
Sparc
|
||||
Spurl/6
|
||||
spyware/6
|
||||
spyware/7
|
||||
Starbucks/6
|
||||
stent/7
|
||||
struct/7
|
||||
StumbleUpon/6
|
||||
substituent/7
|
||||
Sunderland/6
|
||||
Suomi/6
|
||||
Svenska
|
||||
Swindon/6
|
||||
Symantec/6
|
||||
Symbian/6
|
||||
synthase/7
|
||||
sysop/7
|
||||
tagline/7
|
||||
Taliban/6
|
||||
taxa
|
||||
taxon
|
||||
Technic/7
|
||||
techno
|
||||
technol
|
||||
Technorati/6
|
||||
TechRepublic/6
|
||||
telecom/7
|
||||
telecom/7
|
||||
Tesco/6
|
||||
thaliana
|
||||
therebetween
|
||||
timeline/7
|
||||
TimesSelect/6
|
||||
Timor/6
|
||||
titre/7
|
||||
Topix/6
|
||||
trackback/7
|
||||
Tramadol/6
|
||||
Tramadol/6
|
||||
transcriptional
|
||||
transfected
|
||||
transgenic
|
||||
travelling
|
||||
Treo/6
|
||||
TripAdvisor/6
|
||||
tRNA/6
|
||||
TrustPass/6
|
||||
Tuto/6
|
||||
TWiki/6
|
||||
TWiki/6
|
||||
Uitleg/6
|
||||
Ultram/6
|
||||
undef
|
||||
UniProt/6
|
||||
URLs
|
||||
usergroup/7
|
||||
username/7
|
||||
username/7
|
||||
userpic/6
|
||||
util/7
|
||||
Utvid/6
|
||||
valitsemalla
|
||||
Valium/6
|
||||
Vandoeuvre/6
|
||||
vBulletin/6
|
||||
Vdata/6
|
||||
Verizon/6
|
||||
Verizon/6
|
||||
vertebrata
|
||||
vertices
|
||||
Vgroup/6
|
||||
vhosts
|
||||
Viagra/6
|
||||
Vicodin/6
|
||||
Virol/6
|
||||
Vodafone/6
|
||||
VoIP
|
||||
Warcraft/6
|
||||
Warwickshire/6
|
||||
Waterford/6
|
||||
WebBoard
|
||||
webcam/7
|
||||
webcam/7
|
||||
webcast/7
|
||||
webcast/7
|
||||
webdesign/7
|
||||
weblog/7
|
||||
weblog/7
|
||||
webpage/7
|
||||
Webshots/6
|
||||
webshots/7
|
||||
WebSphere/6
|
||||
Welch/6
|
||||
whitepaper/7
|
||||
widescreen/7
|
||||
WiFi/6
|
||||
wiki/7
|
||||
wiki/7
|
||||
Wikibooks/6
|
||||
Wikimedia/6
|
||||
Wikinews/6
|
||||
WikiPatents/6
|
||||
Wikipedia/6
|
||||
Wikiquote/6
|
||||
Wikisource/6
|
||||
Wiktionary/6
|
||||
wildcard/7
|
||||
Wiltshire/6
|
||||
wishlist/7
|
||||
wishlist/7
|
||||
WordPress/6
|
||||
workflow/7
|
||||
workflow/7
|
||||
WorldCat/6
|
||||
Xanax/6
|
||||
Xbox/6
|
||||
YouTube/6
|
||||
ZDNet/6
|
||||
Zhang/6
|
||||
Zhao/6
|
||||
Zhou/6
|
||||
Ziff/6
|
||||
Zope/6
|
||||
zShops/6
|
||||
Zune/6
|
||||
Zurich/6
|
|
@ -0,0 +1,34 @@
|
|||
#!/usr/bin/perl
|
||||
#
|
||||
# dupe-dictionary.pl
|
||||
#
|
||||
# This will find all duplicate words in a myspell/hunspell format .dic file.
|
||||
# It ignores affix rules, so 'One/ADG' = 'One' = 'One/12'
|
||||
#
|
||||
# Returns error if dupes are found
|
||||
|
||||
use strict;
|
||||
use warnings;
|
||||
|
||||
my %seen;
|
||||
my $bad = 0;
|
||||
|
||||
print "Duplicated entries:\n";
|
||||
|
||||
while (<>) {
|
||||
my $key = (split /([\n\/])/)[0];
|
||||
if (($key ne "") && (exists $seen{$key})) {
|
||||
$seen{$key}++;
|
||||
print "$key\n";
|
||||
# print ord($key);
|
||||
$bad++;
|
||||
} else {
|
||||
$seen{$key} = 1;
|
||||
}
|
||||
}
|
||||
|
||||
if ($bad == 0) {
|
||||
print "None!\n";
|
||||
} else {
|
||||
die "Duplicates found!";
|
||||
}
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,68 @@
|
|||
#!/bin/bash
|
||||
#
|
||||
# merge-dictionaries
|
||||
# 15/Apr/2010, Matt Caywood (caywood@gmail.com)
|
||||
|
||||
# input files:
|
||||
CHROMIUM_START=chromium_en_US.dic_delta
|
||||
CHROMIUM_DIFF=upstream-chromium.diff
|
||||
CHROMIUM_PATCHED=$CHROMIUM_START-patched
|
||||
CHROMIUM_AFFIX_CONVERTED=$CHROMIUM_START-affix-converted
|
||||
|
||||
HUNSPELL_START=hunspell-en_US-20081205.dic
|
||||
HUNSPELL_DIFF=upstream-hunspell.diff
|
||||
HUNSPELL_PATCHED=$HUNSPELL_START-patched
|
||||
HUNSPELL_PATCHED_STRIPPED=$HUNSPELL_PATCHED-stripped
|
||||
|
||||
MOZILLA_START=mozilla-specific.txt
|
||||
|
||||
MERGED_SORTED=merged-list-sorted
|
||||
MERGED_FINISH=en-US.dic
|
||||
|
||||
rm -f $CHROMIUM_PATCHED $CHROMIUM_AFFIX_CONVERTED $HUNSPELL_PATCHED $HUNSPELL_PATCHED_STRIPPED $MERGED_SORTED
|
||||
rm -f $MERGED_FINISH
|
||||
|
||||
# Patch Chromium ($CHROMIUM_START --> $CHROMIUM_PATCHED)
|
||||
echo Patching Chromium dictionary
|
||||
cp $CHROMIUM_START $CHROMIUM_PATCHED
|
||||
patch $CHROMIUM_PATCHED $CHROMIUM_DIFF
|
||||
|
||||
# Patch Hunspell ($HUNSPELL_START --> $HUNSPELL_PATCHED)
|
||||
echo Patching Hunspell dictionary
|
||||
cp $HUNSPELL_START $HUNSPELL_PATCHED
|
||||
patch $HUNSPELL_PATCHED $HUNSPELL_DIFF
|
||||
|
||||
# Chromium's dictionary uses numeric shortcuts from en-US.aff, so that /7 stands in for /MS etc.
|
||||
# We need to replace these with the full alphabetic affix rules.
|
||||
#
|
||||
# This line just does affix conversions for the 4 rules of over 800(!) they are currently using.
|
||||
# If in the future more are added, those affixes will need to be converted or else they will not be handled.
|
||||
|
||||
echo Updating Chromium affixes
|
||||
sed -e 's/6/M/g;s/7/MS/g;s/12/U/g;s/30/MS\!/g;s/251/\!/g' $CHROMIUM_PATCHED > $CHROMIUM_AFFIX_CONVERTED
|
||||
|
||||
# To check that conversion was correct, just search chromium-affix-converted for any numbers that are left over after conversion.
|
||||
|
||||
if (grep [0123456789] $CHROMIUM_AFFIX_CONVERTED); then
|
||||
warn 'Some affix rules may not have been converted\n\n';
|
||||
fi
|
||||
|
||||
# Strip old word count (first line) from $HUNSPELL_PATCHED
|
||||
sed '1d' $HUNSPELL_PATCHED > $HUNSPELL_PATCHED_STRIPPED
|
||||
|
||||
# Combine dictionaries and sort
|
||||
echo Combining dictionaries
|
||||
sort $CHROMIUM_AFFIX_CONVERTED $HUNSPELL_PATCHED_STRIPPED $MOZILLA_START > $MERGED_SORTED
|
||||
|
||||
# Display any dupes.
|
||||
perl dupe-dictionary.pl $MERGED_SORTED
|
||||
|
||||
# If that completed OK, add line count
|
||||
if [ "$?" = "0" ]; then
|
||||
linecount=`cat $MERGED_SORTED | wc -l`
|
||||
echo Adding line count $linecount
|
||||
echo $linecount | cat - $MERGED_SORTED > $MERGED_FINISH
|
||||
fi
|
||||
|
||||
# Clean up
|
||||
rm -f $CHROMIUM_PATCHED $CHROMIUM_AFFIX_CONVERTED $HUNSPELL_PATCHED $HUNSPELL_PATCHED_STRIPPED $MERGED_SORTED
|
|
@ -0,0 +1,37 @@
|
|||
Bespin/M
|
||||
Bonsai/M
|
||||
Bugzilla/M
|
||||
Camino/M
|
||||
ChatZilla/M
|
||||
Composer/M
|
||||
Fennec/M
|
||||
Firefox/M
|
||||
Flock/M
|
||||
Gecko/M
|
||||
JavaScript/M
|
||||
Lightning/M
|
||||
Mozilla/M
|
||||
Necko/M
|
||||
Netscape/M
|
||||
NSPR/M
|
||||
NSS/M
|
||||
Penelope/M
|
||||
Prism/M
|
||||
Rhino/M
|
||||
SeaMonkey/M
|
||||
Snowl/M
|
||||
Songbird/M
|
||||
Sunbird/M
|
||||
SpiderMonkey/M
|
||||
Tamarin/M
|
||||
Thunderbird/M
|
||||
Tinderbox/M
|
||||
Ubiquity/M
|
||||
Venkman/M
|
||||
Weave/M
|
||||
XBL/M
|
||||
XPCOM/M
|
||||
XPConnect/M
|
||||
XPInstall/M
|
||||
XUL/M
|
||||
XULRunner/M
|
|
@ -0,0 +1,696 @@
|
|||
--- chromium_en_US.dic_delta 2009-10-01 01:04:06.000000000 -0700
|
||||
+++ edited_chromium_en_US.dic_delta 2009-10-01 01:01:32.000000000 -0700
|
||||
@@ -1,674 +1,332 @@
|
||||
Abbas/6
|
||||
AbeBooks
|
||||
-ACAD/m
|
||||
-AcDbEntity
|
||||
acetyl
|
||||
-acknowledgement
|
||||
-acknowledgement/7
|
||||
actin
|
||||
ActiveX/6
|
||||
Acura/7
|
||||
acyl
|
||||
AddThis/6
|
||||
-adipex
|
||||
-admin/7
|
||||
-ageing
|
||||
Agilent
|
||||
agonist
|
||||
-ajax
|
||||
Alibaba
|
||||
-aliphatic
|
||||
alkoxy
|
||||
-altitudeMode
|
||||
-aluminium
|
||||
-Amapedia
|
||||
-ambien
|
||||
-america
|
||||
-american
|
||||
-amine
|
||||
-analyse
|
||||
-analysed
|
||||
-Andhra
|
||||
-Anglais
|
||||
-anglais
|
||||
anime
|
||||
antisense
|
||||
antivirus/7
|
||||
API/7
|
||||
-apoptoses
|
||||
apoptosis
|
||||
-app/7
|
||||
Arabidopsis
|
||||
-Arbre/6
|
||||
-arg/7
|
||||
Arial/6
|
||||
arXiv/6
|
||||
aryl/7
|
||||
Athlon/7
|
||||
ATPase/6
|
||||
aurei
|
||||
aureus
|
||||
-auteur/7
|
||||
-Bacteriol
|
||||
-Bahasa/6
|
||||
Bancorp
|
||||
-Beckham
|
||||
+Beckham/6
|
||||
Belkin/6
|
||||
Bellevue/6
|
||||
benzyl
|
||||
BibSonomy/6
|
||||
BibTeX/6
|
||||
-bio/7
|
||||
-biochem
|
||||
bioinformatic/7
|
||||
-biophys
|
||||
biosyntheses
|
||||
biosynthesis
|
||||
biotech/6
|
||||
BizRate/6
|
||||
BlackBerry/6
|
||||
BlinkList/6
|
||||
Bloglines/6
|
||||
blogroll/7
|
||||
-blonde/7
|
||||
Bloomberg/6
|
||||
blowjob/30
|
||||
Bluetooth/6
|
||||
-Bollywood/6
|
||||
-bool/7
|
||||
-Brabois/6
|
||||
-Bracknell
|
||||
-Brasil/6
|
||||
-Buenos
|
||||
Burkina/6
|
||||
Caicos/6
|
||||
Cambridgeshire/6
|
||||
-cancelled
|
||||
-cancelling
|
||||
carboxylic
|
||||
CareerBuilder/6
|
||||
-carisoprodol
|
||||
Carlisle/6
|
||||
-casa
|
||||
-Casio/6
|
||||
Cassini/6
|
||||
cDNA
|
||||
celebs
|
||||
-centro/6
|
||||
+Centro/6
|
||||
cerevisiae/7
|
||||
-charset/7
|
||||
Chatham/6
|
||||
-ChemPort/6
|
||||
-Cheney/6
|
||||
-Chennai/6
|
||||
-chloro
|
||||
Choi/6
|
||||
-Cialis/6
|
||||
Cingular/6
|
||||
-Cisco/6
|
||||
-Citebase/6
|
||||
-CiteULike/6
|
||||
Citysearch/6
|
||||
Clarkson/6
|
||||
codec/7
|
||||
codon/7
|
||||
ColdFusion/6
|
||||
coli
|
||||
Comcast/6
|
||||
Computerworld/6
|
||||
-conf
|
||||
config/7
|
||||
Connotea/6
|
||||
-const/7
|
||||
-Coulter/6
|
||||
Councillor/7
|
||||
-counselled
|
||||
-counselling
|
||||
-Courriel/6
|
||||
-craigslist/6
|
||||
-crore/7
|
||||
-CrossRef/6
|
||||
-cryonic/7
|
||||
-Ctrl
|
||||
+Craigslist/6
|
||||
+cryonic
|
||||
cultivar/7
|
||||
Cumbria/6
|
||||
cyber
|
||||
-cysteine/7
|
||||
cytokine/7
|
||||
-cytokines/7
|
||||
-Dansk
|
||||
-Darfur/6
|
||||
-Dari/6
|
||||
datasheet/7
|
||||
-datatype/7
|
||||
Daytona/6
|
||||
DealTime/6
|
||||
-Debian
|
||||
dehydrogenase/6
|
||||
Deloitte/6
|
||||
Denton/6
|
||||
-desc
|
||||
Deseret/6
|
||||
designee
|
||||
-Deutsche/6
|
||||
-Deutschland/6
|
||||
-deze/7
|
||||
-Diaz/6
|
||||
-diff/7
|
||||
-diff/7
|
||||
Digg/7
|
||||
-Digi
|
||||
dihydro
|
||||
-Directgov/6
|
||||
DivX/6
|
||||
-Dorset/6
|
||||
downloadable
|
||||
Dreamweaver/6
|
||||
-DSpace/6
|
||||
DVDs
|
||||
Easton/6
|
||||
-eBay/6
|
||||
eBook/7
|
||||
-ECNext/6
|
||||
eCommerce/6
|
||||
-EconPapers/6
|
||||
Elsevier/6
|
||||
-Eminem/6
|
||||
EndNote/6
|
||||
-Endocrinol
|
||||
Engadget/6
|
||||
-Enron/6
|
||||
-Envirofacts/6
|
||||
Epinions/6
|
||||
-Epson/6
|
||||
-Ericsson/6
|
||||
Eukaryota/7
|
||||
-Euro/7
|
||||
-euro/7
|
||||
-eval/7
|
||||
exon/7
|
||||
-expandera/7
|
||||
Expedia/6
|
||||
Facebook/6
|
||||
Faso/6
|
||||
Favorited
|
||||
Feng/6
|
||||
filmography
|
||||
FilmSpot/6
|
||||
financials
|
||||
FindArticles/6
|
||||
FindLaw/6
|
||||
-fioricet/7
|
||||
-Firefox/6
|
||||
-Flickr/6
|
||||
Flickr/6
|
||||
flyer/7
|
||||
Forex/6
|
||||
forma
|
||||
FreeBSD/6
|
||||
-Freeware/6
|
||||
-freeware/7
|
||||
FrontPage/6
|
||||
-fundraiser/7
|
||||
-fundraising
|
||||
fundraising
|
||||
GaAs
|
||||
-GameBase/6
|
||||
GameCube/6
|
||||
GameFAQs/6
|
||||
GameSpot/6
|
||||
Garmin/6
|
||||
gastroenterology
|
||||
GenBank/6
|
||||
-GeneID/6
|
||||
-genomic
|
||||
-genomic/7
|
||||
-gigabit/7
|
||||
-Giuliani/6
|
||||
-globalization
|
||||
Gloucestershire/6
|
||||
GmbH
|
||||
-Google/6
|
||||
-GrainGenes/6
|
||||
grande
|
||||
-grantor/7
|
||||
-grey
|
||||
-Griffiths/7
|
||||
+Griffiths
|
||||
guestbook/7
|
||||
Guildford/6
|
||||
Hamas/6
|
||||
handheld/7
|
||||
Hassan/6
|
||||
-healthcare
|
||||
-hentai
|
||||
+hentai/251
|
||||
Hertfordshire/6
|
||||
-hexane/7
|
||||
-Hezbollah/6
|
||||
-HighBeam/6
|
||||
Hillsborough/6
|
||||
holdem
|
||||
-hotline/7
|
||||
Hotmail/6
|
||||
Houghton/6
|
||||
-howto/7
|
||||
-html
|
||||
-httpd/6
|
||||
-hydrocodone/7
|
||||
-IETF/6
|
||||
IMDb/6
|
||||
IMDbPro/6
|
||||
-IMDbTV/6
|
||||
-Immunol
|
||||
inbox/6
|
||||
-indices
|
||||
-indie
|
||||
-Infotrieve/6
|
||||
-Ingenta/6
|
||||
-IngentaConnect/6
|
||||
-init/7
|
||||
inkjet/7
|
||||
-InterPro/6
|
||||
-Intl
|
||||
-intl
|
||||
-intranet/7
|
||||
iPhone/6
|
||||
-iPod/6
|
||||
-IPOs
|
||||
+IPO/7
|
||||
ischemia
|
||||
ischemic
|
||||
-Italia
|
||||
-Italiano
|
||||
-iTunes/6
|
||||
Ivoire
|
||||
-Japonica/6
|
||||
-Jelsoft/6
|
||||
-Jiang
|
||||
-Joomla/6
|
||||
-jpeg
|
||||
-judgement
|
||||
-Juni/6
|
||||
+JPEG/7
|
||||
Karnataka/6
|
||||
kbps
|
||||
Kerala/6
|
||||
-keygen
|
||||
Kijiji/6
|
||||
kinase
|
||||
Kitts/6
|
||||
-Klicka/6
|
||||
-Klicken/6
|
||||
-KnowledgeStorm/6
|
||||
Kosovo/6
|
||||
Kuala/6
|
||||
labelled/12
|
||||
Lakers/6
|
||||
-lang
|
||||
-langue/7
|
||||
Lanka/6
|
||||
-learnt
|
||||
-LegCo/6
|
||||
Leica/6
|
||||
-Lett/6
|
||||
-Levitra/6
|
||||
LexisNexis/6
|
||||
Lexmark/6
|
||||
-Libros/6
|
||||
-licence//101
|
||||
-licencee/7
|
||||
-licences/14
|
||||
-licencing/14
|
||||
-licencor/7
|
||||
Lincolnshire/6
|
||||
LinkedIn/6
|
||||
-Listmania/6
|
||||
LiveJournal/6
|
||||
-localhost
|
||||
-localisation/7
|
||||
-locator/7
|
||||
Logitech/6
|
||||
LookSmart/6
|
||||
Lycos/6
|
||||
Macau/6
|
||||
Macromedia/6
|
||||
-mailto
|
||||
mammalia
|
||||
manga
|
||||
-Manga/6
|
||||
MapQuest/6
|
||||
-MasterCard/6
|
||||
MathML/6
|
||||
Mbps
|
||||
McAfee/6
|
||||
McCann/6
|
||||
-MediaWiki/6
|
||||
Medline/6
|
||||
meetup/7
|
||||
-memberlist/7
|
||||
mesothelioma/6
|
||||
Metacafe/6
|
||||
-metadata
|
||||
-metadata/7
|
||||
+metadata/6
|
||||
meth
|
||||
methoxy
|
||||
-mexico
|
||||
-mgmt/7
|
||||
-microbiol
|
||||
-milf
|
||||
-mins
|
||||
-mitochondria
|
||||
-mitochondrial
|
||||
-mmol
|
||||
-modelling
|
||||
-modelling
|
||||
-Mohammed/6
|
||||
-Motorola/6
|
||||
-Motorsport/7
|
||||
-Mozilla/6
|
||||
-mpeg
|
||||
+modeller/7
|
||||
+modelling/7
|
||||
+motorsport/7
|
||||
+MPEG/7
|
||||
mRNA
|
||||
-msgid
|
||||
-msgs
|
||||
-msgstr
|
||||
multicast
|
||||
-Mumbai/6
|
||||
murine
|
||||
musculus
|
||||
-Muze/6
|
||||
-MyCareerBuilder/6
|
||||
mySimon
|
||||
-MySpace/6
|
||||
-Myspace/6
|
||||
MySQL/6
|
||||
-MyYahoo/6
|
||||
Nadu/6
|
||||
-namespace
|
||||
namespace/7
|
||||
nano
|
||||
-nanotechnology
|
||||
Napa/6
|
||||
Naruto/6
|
||||
-Nasdaq/6
|
||||
-Nederland/7
|
||||
NetBSD/6
|
||||
-Neurol/6
|
||||
-neurosci
|
||||
-neuroscience/7
|
||||
Newsvine/6
|
||||
newswires
|
||||
NexTag/6
|
||||
Nextel/6
|
||||
-NGOs/6
|
||||
+NGO/7
|
||||
nitro
|
||||
Niue/6
|
||||
-Nokia/6
|
||||
-Norsk/6
|
||||
Nortel/6
|
||||
-nowrap
|
||||
-Obama/6
|
||||
-offence
|
||||
-offences
|
||||
-offline
|
||||
-offline
|
||||
offsite
|
||||
-Oleh/6
|
||||
oligonucleotide
|
||||
onsite
|
||||
OpenBSD/6
|
||||
-opensource
|
||||
-OpenURL/6
|
||||
-OpenView/6
|
||||
-organisation
|
||||
-organisational
|
||||
-organisations
|
||||
-organised
|
||||
Oryza/6
|
||||
oxidase
|
||||
Oxley/6
|
||||
-Palau/6
|
||||
-Panasonic/6
|
||||
Papua/6
|
||||
-param
|
||||
-PARC
|
||||
-PatentStorm/6
|
||||
-Pathol/6
|
||||
-PDAs
|
||||
-PDFs
|
||||
-permalink
|
||||
+PDA/7
|
||||
+PDF/7
|
||||
permalink/7
|
||||
permittee
|
||||
Peterborough/6
|
||||
-Pfam/6
|
||||
-pgSQL
|
||||
-pharma
|
||||
-pharmacol
|
||||
-phentermine
|
||||
-Phentermine/6
|
||||
Philips/6
|
||||
-phonephone
|
||||
-phosphorylation
|
||||
-Photobucket/6
|
||||
-Photoshop/6
|
||||
-phpBB
|
||||
-physiol
|
||||
-pics
|
||||
-pics
|
||||
-pkgsrc
|
||||
-Placemark/7
|
||||
-playlist/7
|
||||
+Photoshop/7
|
||||
playlist/7
|
||||
-PlayStation/6
|
||||
-plugin/7
|
||||
-plugin/7
|
||||
-Polska
|
||||
-Polski
|
||||
-poly
|
||||
-poly
|
||||
polynucleotide/7
|
||||
-polypeptide/7
|
||||
popup/7
|
||||
-portlet/7
|
||||
-portlet/7
|
||||
PostgreSQL/6
|
||||
PostScript/6
|
||||
Poynter/6
|
||||
-prev
|
||||
-prev
|
||||
Prix
|
||||
PRNewswire/6
|
||||
-proc
|
||||
-proc
|
||||
-prog
|
||||
-Propecia/6
|
||||
propyl
|
||||
ProQuest/6
|
||||
Pseudomonas/7
|
||||
PubMed/6
|
||||
Qaeda/6
|
||||
-Quicklinks/6
|
||||
QuickList/6
|
||||
-QuickMix/6
|
||||
QuickTime/6
|
||||
Qwest/6
|
||||
Rebecca/6
|
||||
Reddit/6
|
||||
reductase/6
|
||||
-RePEc/6
|
||||
-RetCode/6
|
||||
-Rhode/6
|
||||
ringtone/7
|
||||
-Rolex/6
|
||||
-Rolex/6
|
||||
rotatable
|
||||
rotatably
|
||||
Routledge/6
|
||||
-Rumsfeld/6
|
||||
Saccharomyces/6
|
||||
-Sams/6
|
||||
-Samsung/6
|
||||
sapiens
|
||||
Sarbanes/6
|
||||
sativa
|
||||
savoir
|
||||
-sbjct
|
||||
Schwarz/6
|
||||
-Scopus/6
|
||||
-Screenonline/6
|
||||
-screensaver
|
||||
+screensaver/7
|
||||
screenshot/7
|
||||
searchable
|
||||
Secunia/6
|
||||
-seleccionar
|
||||
-Sera
|
||||
serine
|
||||
-setcmykcolor/7
|
||||
-sgml
|
||||
SharePoint/6
|
||||
Sharma/6
|
||||
shemale/30
|
||||
Shen/6
|
||||
Shenzhen/6
|
||||
Shopzilla/6
|
||||
-showtime/7
|
||||
-Shrek/6
|
||||
-Sigmer/7
|
||||
signalling
|
||||
signup/7
|
||||
-Simpson/7
|
||||
-Simpy/6
|
||||
-Singh/6
|
||||
Sion
|
||||
sitemap/7
|
||||
-sizeof
|
||||
-Skype/6
|
||||
-Slashdot/6
|
||||
slideshow/7
|
||||
smartphone/7
|
||||
-SMEs/6
|
||||
-SNPs/6
|
||||
+SNP/7
|
||||
Solaris/6
|
||||
-Sony/6
|
||||
Sparc
|
||||
-Spurl/6
|
||||
-spyware/6
|
||||
spyware/7
|
||||
-Starbucks/6
|
||||
stent/7
|
||||
-struct/7
|
||||
StumbleUpon/6
|
||||
substituent/7
|
||||
Sunderland/6
|
||||
-Suomi/6
|
||||
-Svenska
|
||||
Swindon/6
|
||||
Symantec/6
|
||||
Symbian/6
|
||||
synthase/7
|
||||
-sysop/7
|
||||
tagline/7
|
||||
-Taliban/6
|
||||
taxa
|
||||
taxon
|
||||
-Technic/7
|
||||
-techno
|
||||
-technol
|
||||
Technorati/6
|
||||
TechRepublic/6
|
||||
telecom/7
|
||||
-telecom/7
|
||||
Tesco/6
|
||||
thaliana
|
||||
therebetween
|
||||
timeline/7
|
||||
-TimesSelect/6
|
||||
-Timor/6
|
||||
-titre/7
|
||||
-Topix/6
|
||||
trackback/7
|
||||
-Tramadol/6
|
||||
-Tramadol/6
|
||||
-transcriptional
|
||||
-transfected
|
||||
-transgenic
|
||||
-travelling
|
||||
Treo/6
|
||||
TripAdvisor/6
|
||||
tRNA/6
|
||||
-TrustPass/6
|
||||
-Tuto/6
|
||||
-TWiki/6
|
||||
-TWiki/6
|
||||
-Uitleg/6
|
||||
-Ultram/6
|
||||
-undef
|
||||
-UniProt/6
|
||||
URLs
|
||||
-usergroup/7
|
||||
-username/7
|
||||
username/7
|
||||
-userpic/6
|
||||
-util/7
|
||||
-Utvid/6
|
||||
-valitsemalla
|
||||
-Valium/6
|
||||
-Vandoeuvre/6
|
||||
-vBulletin/6
|
||||
-Vdata/6
|
||||
-Verizon/6
|
||||
-Verizon/6
|
||||
vertebrata
|
||||
-vertices
|
||||
-Vgroup/6
|
||||
-vhosts
|
||||
-Viagra/6
|
||||
Vicodin/6
|
||||
-Virol/6
|
||||
Vodafone/6
|
||||
VoIP
|
||||
Warcraft/6
|
||||
Warwickshire/6
|
||||
-Waterford/6
|
||||
-WebBoard
|
||||
webcam/7
|
||||
-webcam/7
|
||||
-webcast/7
|
||||
webcast/7
|
||||
webdesign/7
|
||||
weblog/7
|
||||
-weblog/7
|
||||
webpage/7
|
||||
-Webshots/6
|
||||
-webshots/7
|
||||
WebSphere/6
|
||||
-Welch/6
|
||||
whitepaper/7
|
||||
widescreen/7
|
||||
WiFi/6
|
||||
-wiki/7
|
||||
-wiki/7
|
||||
Wikibooks/6
|
||||
Wikimedia/6
|
||||
Wikinews/6
|
||||
WikiPatents/6
|
||||
-Wikipedia/6
|
||||
Wikiquote/6
|
||||
Wikisource/6
|
||||
Wiktionary/6
|
||||
wildcard/7
|
||||
Wiltshire/6
|
||||
wishlist/7
|
||||
-wishlist/7
|
||||
WordPress/6
|
||||
workflow/7
|
||||
-workflow/7
|
||||
WorldCat/6
|
||||
Xanax/6
|
||||
Xbox/6
|
||||
-YouTube/6
|
||||
ZDNet/6
|
||||
Zhang/6
|
||||
Zhao/6
|
||||
Zhou/6
|
||||
Ziff/6
|
||||
-Zope/6
|
||||
-zShops/6
|
||||
Zune/6
|
||||
-Zurich/6
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -110,17 +110,13 @@ SFX B e able [^aeiou]e
|
|||
SFX L Y 1
|
||||
SFX L 0 ment .
|
||||
|
||||
SFX i N 1
|
||||
SFX i us i us
|
||||
|
||||
REP 90
|
||||
REP 88
|
||||
REP a ei
|
||||
REP ei a
|
||||
REP a ey
|
||||
REP ey a
|
||||
REP ai ie
|
||||
REP ie ai
|
||||
REP alot a_lot
|
||||
REP are air
|
||||
REP are ear
|
||||
REP are eir
|
||||
|
@ -203,4 +199,3 @@ REP ss z
|
|||
REP shun tion
|
||||
REP shun sion
|
||||
REP shun cion
|
||||
REP sitted sat
|
||||
|
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -1,445 +0,0 @@
|
|||
--- en-US.aff Sat Nov 24 21:50:56 2007
|
||||
+++ en-US.aff Sat Nov 29 11:02:17 2008
|
||||
@@ -109,15 +109,19 @@ SFX B e able [^aeiou]e
|
||||
|
||||
SFX L Y 1
|
||||
SFX L 0 ment .
|
||||
|
||||
-REP 88
|
||||
+SFX i N 1
|
||||
+SFX i us i us
|
||||
+
|
||||
+REP 90
|
||||
REP a ei
|
||||
REP ei a
|
||||
REP a ey
|
||||
REP ey a
|
||||
REP ai ie
|
||||
REP ie ai
|
||||
+REP alot a_lot
|
||||
REP are air
|
||||
REP are ear
|
||||
REP are eir
|
||||
REP air are
|
||||
@@ -198,4 +202,5 @@ REP z ss
|
||||
REP ss z
|
||||
REP shun tion
|
||||
REP shun sion
|
||||
REP shun cion
|
||||
+REP sitted sat
|
||||
--- en-US.dic Sat Nov 24 21:50:56 2007
|
||||
+++ en-US.dic Sat Nov 29 11:03:25 2008
|
||||
@@ -1426,8 +1426,9 @@ Baotou/M
|
||||
Baptist/SM
|
||||
Baptiste/M
|
||||
Bar/H
|
||||
Barabbas
|
||||
+Barack/M
|
||||
Barb/MR
|
||||
Barbabas/M
|
||||
Barbabra/M
|
||||
Barbadian/SM
|
||||
@@ -1835,8 +1836,9 @@ Bible/MS
|
||||
Bic/M
|
||||
Biddie/M
|
||||
Biddle
|
||||
Biddy/M
|
||||
+Biden/MS
|
||||
Bidget/M
|
||||
Bierce
|
||||
Bigfoot/M
|
||||
Biggles/M
|
||||
@@ -2300,8 +2302,9 @@ Budweiser/M
|
||||
Buffalo/M
|
||||
Buffy/M
|
||||
Buford/M
|
||||
Bugatti/M
|
||||
+Bugzilla/M
|
||||
Buick/M
|
||||
Buiron/M
|
||||
Bujumbura/M
|
||||
Bukhara
|
||||
@@ -2503,8 +2506,9 @@ Cami/M
|
||||
Camila/M
|
||||
Camile/M
|
||||
Camilla/M
|
||||
Camille/M
|
||||
+Camino/M
|
||||
Cammi/M
|
||||
Cammie/M
|
||||
Cammy/M
|
||||
Camoens/M
|
||||
@@ -2942,8 +2946,9 @@ Chateaubriand/M
|
||||
Chattahoochee/M
|
||||
Chattanooga/M
|
||||
Chatterley/M
|
||||
Chatterton
|
||||
+ChatZilla/M
|
||||
Chaucer/M
|
||||
Chaunce/M
|
||||
Chauncey/M
|
||||
Chautauqua
|
||||
@@ -6325,8 +6330,9 @@ HS
|
||||
HST
|
||||
HT
|
||||
HTML/M
|
||||
HTTP
|
||||
+HTTPS
|
||||
HUD/M
|
||||
Ha
|
||||
Haas/M
|
||||
Habakkuk
|
||||
@@ -6418,9 +6424,9 @@ Haney/M
|
||||
Hangul/M
|
||||
Hangzhou/M
|
||||
Hank/M
|
||||
Hanna/M
|
||||
-Hannah
|
||||
+Hannah/M
|
||||
Hanni/SM
|
||||
Hannibal/M
|
||||
Hannie/M
|
||||
Hanny/M
|
||||
@@ -6945,8 +6951,9 @@ Hun/SM
|
||||
Hunfredo/M
|
||||
Hung
|
||||
Hungarian/SM
|
||||
Hungary/M
|
||||
+Hunspell/M
|
||||
Hunt/R
|
||||
Hunter/M
|
||||
Huntington/M
|
||||
Huntlee/M
|
||||
@@ -7358,9 +7365,9 @@ Jamaican/SM
|
||||
Jamal/M
|
||||
Jamar/M
|
||||
Jame/SM
|
||||
Jamel/M
|
||||
-James/M
|
||||
+James
|
||||
Jameson
|
||||
Jamestown
|
||||
Jamesy/M
|
||||
Jamey/M
|
||||
@@ -7377,9 +7384,9 @@ Jana/M
|
||||
Janacek/M
|
||||
Janaya/M
|
||||
Janaye/M
|
||||
Jandy/M
|
||||
-Jane
|
||||
+Jane/M
|
||||
Janean/M
|
||||
Janeczka/M
|
||||
Janeen/M
|
||||
Janek/M
|
||||
@@ -7390,9 +7397,9 @@ Janella/M
|
||||
Janelle/M
|
||||
Janene/M
|
||||
Janenna/M
|
||||
Janessa/M
|
||||
-Janet
|
||||
+Janet/M
|
||||
Janeta/M
|
||||
Janetta/M
|
||||
Janette/M
|
||||
Janeva/M
|
||||
@@ -9504,8 +9511,9 @@ Mandela
|
||||
Mandelbrot
|
||||
Mandi/M
|
||||
Mandie/M
|
||||
Mandingo
|
||||
+Mandriva/M
|
||||
Mandy/M
|
||||
Manet
|
||||
Manfred/M
|
||||
Manhattan/SM
|
||||
@@ -10619,8 +10627,9 @@ Myrtie/M
|
||||
Myrtle/M
|
||||
Myrvyn/M
|
||||
Myrwyn/M
|
||||
Mysore
|
||||
+MySpell/M
|
||||
Myst/M
|
||||
N'Djamena
|
||||
N/MD
|
||||
NAACP
|
||||
@@ -11159,8 +11168,9 @@ Oates
|
||||
Oaxaca/M
|
||||
Ob/MD
|
||||
Obadiah
|
||||
Obadias/M
|
||||
+Obama/MS
|
||||
Obed/M
|
||||
Obediah/M
|
||||
Oberlin/M
|
||||
Oberon
|
||||
@@ -11501,8 +11511,9 @@ Palestine/M
|
||||
Palestinian/SM
|
||||
Palestrina
|
||||
Paley
|
||||
Palikir/M
|
||||
+Palin/MS
|
||||
Palisades/M
|
||||
Pall/M
|
||||
Palladio
|
||||
Palm/MR
|
||||
@@ -13240,8 +13251,9 @@ Scythia
|
||||
Scythian
|
||||
Se/MH
|
||||
Seaborg
|
||||
Seagram/M
|
||||
+SeaMonkey/M
|
||||
Seamus/M
|
||||
Sean/M
|
||||
Seana/M
|
||||
Sears/M
|
||||
@@ -13979,8 +13991,9 @@ Sumner/M
|
||||
Sumter
|
||||
Sun/SM
|
||||
Sunbeam/M
|
||||
Sunbelt/M
|
||||
+Sunbird/M
|
||||
Sundanese/M
|
||||
Sundas
|
||||
Sunday/MS
|
||||
Sung
|
||||
@@ -14424,9 +14437,9 @@ Thieu/M
|
||||
Thimbu
|
||||
Thimphu
|
||||
Thom/M
|
||||
Thoma/SM
|
||||
-Thomas/M
|
||||
+Thomas
|
||||
Thomasa/M
|
||||
Thomasin/M
|
||||
Thomasina/M
|
||||
Thomasine/M
|
||||
@@ -14895,9 +14908,8 @@ Unitarian/MS
|
||||
Unitarianism/MS
|
||||
Unitas/M
|
||||
Unix/S
|
||||
Unukalhai/M
|
||||
-Ununtu/M
|
||||
Upanishads
|
||||
Updike
|
||||
Upjohn/M
|
||||
Upton/M
|
||||
@@ -15437,8 +15449,9 @@ Wiemar/M
|
||||
Wiesel/M
|
||||
Wiesenthal/M
|
||||
Wiggins
|
||||
Wigner/M
|
||||
+Wikipedia/M
|
||||
Wilberforce
|
||||
Wilbert/M
|
||||
Wilbur/M
|
||||
Wilburn/M
|
||||
@@ -17628,8 +17641,9 @@ arcana
|
||||
arcane/PY
|
||||
arch/PZTGVMDRSY
|
||||
archaeological/Y
|
||||
archaeologist/SM
|
||||
+archaeology/M
|
||||
archaic
|
||||
archaically
|
||||
archaism/MS
|
||||
archaist/MS
|
||||
@@ -17642,8 +17656,10 @@ archdiocesan
|
||||
archdiocese/SM
|
||||
archduchess/MS
|
||||
archduke/MS
|
||||
archenemy/SM
|
||||
+archeological/Y
|
||||
+archeologist/SM
|
||||
archeology/M
|
||||
archeopteryx
|
||||
archer/M
|
||||
archery/M
|
||||
@@ -18309,8 +18325,9 @@ awning/M
|
||||
awoke
|
||||
awoken
|
||||
awry
|
||||
ax/MDSG
|
||||
+axe/M
|
||||
axehead/S
|
||||
axeman
|
||||
axial/Y
|
||||
axillary
|
||||
@@ -20700,10 +20717,9 @@ cachet/MS
|
||||
cackle/MZGDRS
|
||||
cackler/M
|
||||
cacophonous
|
||||
cacophony/SM
|
||||
-cacti
|
||||
-cactus/M
|
||||
+cactus/Mi
|
||||
cad/SM
|
||||
cadaver/SM
|
||||
cadaverous
|
||||
caddish/YP
|
||||
@@ -25527,9 +25543,10 @@ dialectic/SM
|
||||
dialectical/Y
|
||||
dialectics/M
|
||||
dialer
|
||||
dialing/S
|
||||
-dialog/SM
|
||||
+dialog/SMGD
|
||||
+dialogue/SMRGD
|
||||
dialysis/M
|
||||
dialyzes
|
||||
diam
|
||||
diamante
|
||||
@@ -27384,8 +27401,9 @@ encumber/EGSD
|
||||
encumbered/U
|
||||
encumbrance/SM
|
||||
ency
|
||||
encyclical/SM
|
||||
+encyclopaedia
|
||||
encyclopedia/MS
|
||||
encyclopedic
|
||||
encyst/LSGD
|
||||
encystment/M
|
||||
@@ -29493,11 +29511,10 @@ foamy/RTP
|
||||
fob/SM
|
||||
fobbed
|
||||
fobbing
|
||||
focal/Y
|
||||
-focus's
|
||||
-focus/ADSG
|
||||
-focused/U
|
||||
+focus/ACGRSMBi
|
||||
+focused/UC
|
||||
fodder/SM
|
||||
foe/SM
|
||||
foetid
|
||||
fog's
|
||||
@@ -36323,9 +36340,8 @@ locatable/A
|
||||
locate/EAGNVDS
|
||||
location's/A
|
||||
location/ESM
|
||||
locational
|
||||
-loci
|
||||
lock/MDRSBZG
|
||||
locked/A
|
||||
locker/M
|
||||
locket/MS
|
||||
@@ -36339,9 +36355,9 @@ loco/S
|
||||
locomotion/M
|
||||
locomotive/MS
|
||||
locoweed/SM
|
||||
locum/S
|
||||
-locus/M
|
||||
+locus/Mi
|
||||
locust/SM
|
||||
locution/MS
|
||||
lode/MS
|
||||
lodestar/MS
|
||||
@@ -37405,8 +37421,9 @@ megalomania/M
|
||||
megalomaniac/SM
|
||||
megalopolis/MS
|
||||
megaparsec
|
||||
megaphone/DSMG
|
||||
+megapixel
|
||||
megastar/S
|
||||
megaton/SM
|
||||
megawatt/MS
|
||||
meiosis/M
|
||||
@@ -40140,9 +40157,9 @@ octant
|
||||
octave/MS
|
||||
octavo/MS
|
||||
octet/SM
|
||||
octogenarian/MS
|
||||
-octopus/MS
|
||||
+octopus/MSi
|
||||
ocular/MS
|
||||
oculist/SM
|
||||
odalisque/SM
|
||||
odd/STRYLP
|
||||
@@ -42230,8 +42247,10 @@ philosophical/Y
|
||||
philosophize/ZGDRS
|
||||
philosophizer/M
|
||||
philosophy/SM
|
||||
philter/MS
|
||||
+phish/DGS
|
||||
+phisher/MS
|
||||
phlebitis/M
|
||||
phlebotomy
|
||||
phlegm/M
|
||||
phlegmatic
|
||||
@@ -43400,10 +43419,11 @@ predominant/Y
|
||||
predominate/DSYG
|
||||
preemie/SM
|
||||
preeminence/M
|
||||
preeminent/Y
|
||||
-preempt/GVSD
|
||||
+preempt/GSD
|
||||
preemption/M
|
||||
+preemptive/Y
|
||||
preen/DSG
|
||||
preexist/DGS
|
||||
preexistence/M
|
||||
pref
|
||||
@@ -44994,8 +45014,9 @@ receivables/M
|
||||
receive/DRSZGB
|
||||
received/U
|
||||
receiver/M
|
||||
receivership/M
|
||||
+recency
|
||||
recension
|
||||
recent/YTP
|
||||
recentness/M
|
||||
receptacle/MS
|
||||
@@ -49102,8 +49123,9 @@ spaghetti/M
|
||||
spake
|
||||
spam/S
|
||||
spamblock/S
|
||||
spammed
|
||||
+spammer/MS
|
||||
spamming
|
||||
span/MS
|
||||
spandex/M
|
||||
spandrels
|
||||
@@ -53649,8 +53671,10 @@ unilateralist
|
||||
unimportance
|
||||
unimportant
|
||||
unimpressive
|
||||
uninhibited/Y
|
||||
+uninstall/GSBD
|
||||
+uninstaller/MS
|
||||
uninsured
|
||||
unintellectual
|
||||
unintelligent
|
||||
unintended
|
||||
@@ -53811,8 +53835,10 @@ unstamped
|
||||
unsteady/PTR
|
||||
unstinting/Y
|
||||
unstoppably
|
||||
unstrapping
|
||||
+unsubscribe/DGS
|
||||
+unsubscriber/MS
|
||||
unsubstantial
|
||||
unsubtle
|
||||
unsure/P
|
||||
unsuspecting/Y
|
||||
@@ -54984,9 +55010,10 @@ web/SM
|
||||
webbed
|
||||
webbing/M
|
||||
webfeet
|
||||
webfoot/M
|
||||
-webmaster/S
|
||||
+webmaster/MS
|
||||
+webmistress/S
|
||||
website/SM
|
||||
wed/AS
|
||||
wedded/A
|
||||
wedder
|
||||
@@ -55310,8 +55337,9 @@ wiglet/SM
|
||||
wigwag/SM
|
||||
wigwagged
|
||||
wigwagging
|
||||
wigwam/SM
|
||||
+wiki/SM
|
||||
wild/MRYSTP
|
||||
wildcat/MS
|
||||
wildcatted
|
||||
wildcatter/MS
|
Загрузка…
Ссылка в новой задаче