Bug #319778 --> hunspell spell check engine. This is the eventual replacement to myspell. Currently *NPOTB*

thanks to nemeth (the author of hunspell) for making it possible for this to be a part of mozilla.

thanks to Ryan VanderMeulen for sheparding this patch along.

sr=mscott
This commit is contained in:
scott%scott-macgregor.org 2007-07-02 17:29:22 +00:00
Родитель af37c2dc72
Коммит a54331573e
27 изменённых файлов: 80691 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,47 @@
# ****** BEGIN LICENSE BLOCK ******
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
#
# The contents of this file are subject to the Mozilla Public License Version
# 1.1 (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS IS" basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
#
# The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
# and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
# are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
#
# Contributor(s): David Einstein (deinst@world.std.com)
# László Németh (nemethl@gyorsposta.hu)
# Ryan VanderMeulen (ryanvm@gmail.com)
#
# Alternatively, the contents of this file may be used under the terms of
# either the GNU General Public License Version 2 or later (the "GPL"), or
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
# in which case the provisions of the GPL or the LGPL are applicable instead
# of those above. If you wish to allow use of your version of this file only
# under the terms of either the GPL or the LGPL, and not to allow others to
# use your version of this file under the terms of the MPL, indicate your
# decision by deleting the provisions above and replace them with the notice
# and other provisions required by the GPL or the LGPL. If you do not delete
# the provisions above, a recipient may use your version of this file under
# the terms of any one of the MPL, the GPL or the LGPL.
#
# ****** END LICENSE BLOCK ******
DEPTH = ../../..
topsrcdir = @top_srcdir@
srcdir = @srcdir@
VPATH = @srcdir@
include $(DEPTH)/config/autoconf.mk
MODULE = hunspell
DIRS = src
include $(topsrcdir)/config/rules.mk

Просмотреть файл

@ -0,0 +1,76 @@
# ****** BEGIN LICENSE BLOCK ******
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
#
# The contents of this file are subject to the Mozilla Public License Version
# 1.1 (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS IS" basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
#
# The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
# and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
# are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
#
# Contributor(s): David Einstein (deinst@world.std.com)
# László Németh (nemethl@gyorsposta.hu)
# Ryan VanderMeulen (ryanvm@gmail.com)
#
# Alternatively, the contents of this file may be used under the terms of
# either the GNU General Public License Version 2 or later (the "GPL"), or
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
# in which case the provisions of the GPL or the LGPL are applicable instead
# of those above. If you wish to allow use of your version of this file only
# under the terms of either the GPL or the LGPL, and not to allow others to
# use your version of this file under the terms of the MPL, indicate your
# decision by deleting the provisions above and replace them with the notice
# and other provisions required by the GPL or the LGPL. If you do not delete
# the provisions above, a recipient may use your version of this file under
# the terms of any one of the MPL, the GPL or the LGPL.
#
# ****** END LICENSE BLOCK ******
DEPTH = ../../../..
topsrcdir = @top_srcdir@
srcdir = @srcdir@
VPATH = @srcdir@
include $(DEPTH)/config/autoconf.mk
MODULE = hunspell
LIBRARY_NAME = hunspell_s
FORCE_STATIC_LIB = 1
LIBXUL_LIBRARY = 1
REQUIRES = xpcom \
string \
uconv \
unicharutil \
spellchecker \
xulapp \
$(NULL)
CPPSRCS = affentry.cpp \
affixmgr.cpp \
hashmgr.cpp \
suggestmgr.cpp \
csutil.cpp \
hunspell.cpp \
mozHunspell.cpp \
$(NULL)
ifdef MOZ_XUL_APP
CPPSRCS += mozHunspellDirProvider.cpp
endif
EXTRA_DSO_LDOPTS = \
$(LIBS_DIR) \
$(XPCOM_LIBS) \
$(NSPR_LIBS) \
$(MOZ_UNICHARUTIL_LIBS) \
$(NULL)
include $(topsrcdir)/config/rules.mk

Просмотреть файл

@ -0,0 +1,59 @@
******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Ryan VanderMeulen (ryanvm@gmail.com)
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******
Hunspell Version: 1.1.6
Hunspell Author: László Németh
MySpell Author: Kevin Hendricks & David Einstein
Hunspell is a spell checker and morphological analyser library. Hunspell
is based on OpenOffice.org's Myspell. Documentation, tests, and examples
are available at http://hunspell.sourceforge.net.
A special thanks and credit goes to Geoff Kuenning, the creator of Ispell.
MySpell's affix algorithms were based on those of Ispell, which should be
noted is copyright Geoff Kuenning et.al. and now available under a BSD-style
license. For more information on Ispell and affix compression in general,
please see: http://lasr.cs.ucla.edu/geoff/ispell.html (Ispell homepage)
An almost complete rewrite of MySpell for use by the Mozilla project was
developed by David Einstein. David was a significant help in improving MySpell.
Special thanks also goes to László Németh, who is the author of the Hungarian
dictionary and who developed and contributed the code to support compound words
in MySpell and fixed numerous problems with the encoding case conversion tables
along with rewriting MySpell as Hunspell and ensuring compatibility with the
Mozilla codebase.

Просмотреть файл

@ -0,0 +1,923 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef MOZILLA_CLIENT
#include <cstdlib>
#include <cstring>
#include <cctype>
#include <cstdio>
#else
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#endif
#include "affentry.hxx"
#include "csutil.hxx"
#ifndef MOZILLA_CLIENT
#ifndef W32
using namespace std;
#endif
#endif
PfxEntry::PfxEntry(AffixMgr* pmgr, affentry* dp)
{
// register affix manager
pmyMgr = pmgr;
// set up its intial values
aflag = dp->aflag; // flag
strip = dp->strip; // string to strip
appnd = dp->appnd; // string to append
stripl = dp->stripl; // length of strip string
appndl = dp->appndl; // length of append string
numconds = dp->numconds; // number of conditions to match
opts = dp->opts; // cross product flag
// then copy over all of the conditions
memcpy(&conds.base[0],&dp->conds.base[0],SETSIZE*sizeof(conds.base[0]));
next = NULL;
nextne = NULL;
nexteq = NULL;
#ifdef HUNSPELL_EXPERIMENTAL
morphcode = dp->morphcode;
#endif
contclass = dp->contclass;
contclasslen = dp->contclasslen;
}
PfxEntry::~PfxEntry()
{
aflag = 0;
if (appnd) free(appnd);
if (strip) free(strip);
pmyMgr = NULL;
appnd = NULL;
strip = NULL;
if (opts & aeUTF8) {
for (int i = 0; i < 8; i++) {
if (conds.utf8.wchars[i]) free(conds.utf8.wchars[i]);
}
}
#ifdef HUNSPELL_EXPERIMENTAL
if (morphcode && !(opts & aeALIASM)) free(morphcode);
#endif
if (contclass && !(opts & aeALIASF)) free(contclass);
}
// add prefix to this word assuming conditions hold
char * PfxEntry::add(const char * word, int len)
{
char tword[MAXWORDUTF8LEN + 4];
if ((len > stripl) && (len >= numconds) && test_condition(word) &&
(!stripl || (strncmp(word, strip, stripl) == 0)) &&
((MAXWORDUTF8LEN + 4) > (len + appndl - stripl))) {
/* we have a match so add prefix */
char * pp = tword;
if (appndl) {
strcpy(tword,appnd);
pp += appndl;
}
strcpy(pp, (word + stripl));
return mystrdup(tword);
}
return NULL;
}
inline int PfxEntry::test_condition(const char * st)
{
int cond;
unsigned char * cp = (unsigned char *)st;
if (!(opts & aeUTF8)) { // 256-character codepage
for (cond = 0; cond < numconds; cond++) {
if ((conds.base[*cp++] & (1 << cond)) == 0) return 0;
}
} else { // UTF-8 encoding
unsigned short wc;
for (cond = 0; cond < numconds; cond++) {
// a simple 7-bit ASCII character in UTF-8
if ((*cp >> 7) == 0) {
// also check limit (end of word)
if ((!*cp) || ((conds.utf8.ascii[*cp++] & (1 << cond)) == 0)) return 0;
// UTF-8 multibyte character
} else {
// not dot wildcard in rule
if (!conds.utf8.all[cond]) {
if (conds.utf8.neg[cond]) {
u8_u16((w_char *) &wc, 1, (char *) cp);
if (conds.utf8.wchars[cond] &&
flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
wc, (short) conds.utf8.wlen[cond])) return 0;
} else {
if (!conds.utf8.wchars[cond]) return 0;
u8_u16((w_char *) &wc, 1, (char *) cp);
if (!flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
wc, (short)conds.utf8.wlen[cond])) return 0;
}
}
// jump to next UTF-8 character
for(cp++; (*cp & 0xc0) == 0x80; cp++);
}
}
}
return 1;
}
// check if this prefix entry matches
struct hentry * PfxEntry::checkword(const char * word, int len, char in_compound, const FLAG needflag)
{
int tmpl; // length of tmpword
struct hentry * he; // hash entry of root word or NULL
char tmpword[MAXWORDUTF8LEN + 4];
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
if (stripl) strcpy (tmpword, strip);
strcpy ((tmpword + stripl), (word + appndl));
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword)) {
tmpl += stripl;
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
do {
if (TESTAFF(he->astr, aflag, he->alen) &&
// forbid single prefixes with pseudoroot flag
! TESTAFF(contclass, pmyMgr->get_pseudoroot(), contclasslen) &&
// needflag
((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||
(contclass && TESTAFF(contclass, needflag, contclasslen))))
return he;
he = he->next_homonym; // check homonyms
} while (he);
}
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// ross checked combined with a suffix
//if ((opts & aeXPRODUCT) && in_compound) {
if ((opts & aeXPRODUCT)) {
he = pmyMgr->suffix_check(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this, NULL,
0, NULL, FLAG_NULL, needflag, in_compound);
if (he) return he;
}
}
}
return NULL;
}
// check if this prefix entry matches
struct hentry * PfxEntry::check_twosfx(const char * word, int len,
char in_compound, const FLAG needflag)
{
int tmpl; // length of tmpword
struct hentry * he; // hash entry of root word or NULL
char tmpword[MAXWORDUTF8LEN + 4];
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
if (stripl) strcpy (tmpword, strip);
strcpy ((tmpword + stripl), (word + appndl));
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword)) {
tmpl += stripl;
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// cross checked combined with a suffix
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
he = pmyMgr->suffix_check_twosfx(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this, needflag);
if (he) return he;
}
}
}
return NULL;
}
#ifdef HUNSPELL_EXPERIMENTAL
// check if this prefix entry matches
char * PfxEntry::check_twosfx_morph(const char * word, int len,
char in_compound, const FLAG needflag)
{
int tmpl; // length of tmpword
char tmpword[MAXWORDUTF8LEN + 4];
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
if (stripl) strcpy (tmpword, strip);
strcpy ((tmpword + stripl), (word + appndl));
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword)) {
tmpl += stripl;
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// ross checked combined with a suffix
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
return pmyMgr->suffix_check_twosfx_morph(tmpword, tmpl,
aeXPRODUCT, (AffEntry *)this, needflag);
}
}
}
return NULL;
}
// check if this prefix entry matches
char * PfxEntry::check_morph(const char * word, int len, char in_compound, const FLAG needflag)
{
int tmpl; // length of tmpword
struct hentry * he; // hash entry of root word or NULL
char tmpword[MAXWORDUTF8LEN + 4];
char result[MAXLNLEN];
char * st;
*result = '\0';
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
if (stripl) strcpy (tmpword, strip);
strcpy ((tmpword + stripl), (word + appndl));
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword)) {
tmpl += stripl;
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
do {
if (TESTAFF(he->astr, aflag, he->alen) &&
// forbid single prefixes with pseudoroot flag
! TESTAFF(contclass, pmyMgr->get_pseudoroot(), contclasslen) &&
// needflag
((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||
(contclass && TESTAFF(contclass, needflag, contclasslen)))) {
if (morphcode) strcat(result, morphcode); else strcat(result,getKey());
if (he->description) {
if ((*(he->description)=='[')||(*(he->description)=='<')) strcat(result,he->word);
strcat(result,he->description);
}
strcat(result, "\n");
}
he = he->next_homonym;
} while (he);
}
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// ross checked combined with a suffix
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
st = pmyMgr->suffix_check_morph(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this,
FLAG_NULL, needflag);
if (st) {
strcat(result, st);
free(st);
}
}
}
}
if (*result) return mystrdup(result);
return NULL;
}
#endif // END OF HUNSPELL_EXPERIMENTAL CODE
SfxEntry::SfxEntry(AffixMgr * pmgr, affentry* dp)
{
// register affix manager
pmyMgr = pmgr;
// set up its intial values
aflag = dp->aflag; // char flag
strip = dp->strip; // string to strip
appnd = dp->appnd; // string to append
stripl = dp->stripl; // length of strip string
appndl = dp->appndl; // length of append string
numconds = dp->numconds; // number of conditions to match
opts = dp->opts; // cross product flag
// then copy over all of the conditions
memcpy(&conds.base[0],&dp->conds.base[0],SETSIZE*sizeof(conds.base[0]));
rappnd = myrevstrdup(appnd);
#ifdef HUNSPELL_EXPERIMENTAL
morphcode = dp->morphcode;
#endif
contclass = dp->contclass;
contclasslen = dp->contclasslen;
}
SfxEntry::~SfxEntry()
{
aflag = 0;
if (appnd) free(appnd);
if (rappnd) free(rappnd);
if (strip) free(strip);
pmyMgr = NULL;
appnd = NULL;
strip = NULL;
if (opts & aeUTF8) {
for (int i = 0; i < 8; i++) {
if (conds.utf8.wchars[i]) free(conds.utf8.wchars[i]);
}
}
#ifdef HUNSPELL_EXPERIMENTAL
if (morphcode && !(opts & aeALIASM)) free(morphcode);
#endif
if (contclass && !(opts & aeALIASF)) free(contclass);
}
// add suffix to this word assuming conditions hold
char * SfxEntry::add(const char * word, int len)
{
char tword[MAXWORDUTF8LEN + 4];
/* make sure all conditions match */
if ((len > stripl) && (len >= numconds) && test_condition(word + len, word) &&
(!stripl || (strcmp(word + len - stripl, strip) == 0)) &&
((MAXWORDUTF8LEN + 4) > (len + appndl - stripl))) {
/* we have a match so add suffix */
strcpy(tword,word);
if (appndl) {
strcpy(tword + len - stripl, appnd);
} else {
*(tword + len - stripl) = '\0';
}
return mystrdup(tword);
}
return NULL;
}
inline int SfxEntry::test_condition(const char * st, const char * beg)
{
int cond;
unsigned char * cp = (unsigned char *) st;
if (!(opts & aeUTF8)) { // 256-character codepage
// Domolki affix algorithm
for (cond = numconds; --cond >= 0; ) {
if ((conds.base[*--cp] & (1 << cond)) == 0) return 0;
}
} else { // UTF-8 encoding
unsigned short wc;
for (cond = numconds; --cond >= 0; ) {
// go to next character position and check limit
if ((char *) --cp < beg) return 0;
// a simple 7-bit ASCII character in UTF-8
if ((*cp >> 7) == 0) {
if ((conds.utf8.ascii[*cp] & (1 << cond)) == 0) return 0;
// UTF-8 multibyte character
} else {
// go to first character of UTF-8 multibyte character
for (; (*cp & 0xc0) == 0x80; cp--);
// not dot wildcard in rule
if (!conds.utf8.all[cond]) {
if (conds.utf8.neg[cond]) {
u8_u16((w_char *) &wc, 1, (char *) cp);
if (conds.utf8.wchars[cond] &&
flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
wc, (short) conds.utf8.wlen[cond])) return 0;
} else {
if (!conds.utf8.wchars[cond]) return 0;
u8_u16((w_char *) &wc, 1, (char *) cp);
if (!flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
wc, (short)conds.utf8.wlen[cond])) return 0;
}
}
}
}
}
return 1;
}
// see if this suffix is present in the word
struct hentry * SfxEntry::checkword(const char * word, int len, int optflags,
AffEntry* ppfx, char ** wlst, int maxSug, int * ns, const FLAG cclass, const FLAG needflag,
const FLAG badflag)
{
int tmpl; // length of tmpword
struct hentry * he; // hash entry pointer
unsigned char * cp;
char tmpword[MAXWORDUTF8LEN + 4];
PfxEntry* ep = (PfxEntry *) ppfx;
// if this suffix is being cross checked with a prefix
// but it does not support cross products skip it
if (((optflags & aeXPRODUCT) != 0) && ((opts & aeXPRODUCT) == 0))
return NULL;
// upon entry suffix is 0 length or already matches the end of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
// the second condition is not enough for UTF-8 strings
// it checked in test_condition()
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing suffix and adding
// back any characters that would have been stripped or
// or null terminating the shorter string
strcpy (tmpword, word);
cp = (unsigned char *)(tmpword + tmpl);
if (stripl) {
strcpy ((char *)cp, strip);
tmpl += stripl;
cp = (unsigned char *)(tmpword + tmpl);
} else *cp = '\0';
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being // tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition((char *) cp, (char *) tmpword)) {
#ifdef SZOSZABLYA_POSSIBLE_ROOTS
fprintf(stdout,"%s %s %c\n", word, tmpword, aflag);
#endif
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
do {
// check conditional suffix (enabled by prefix)
if ((TESTAFF(he->astr, aflag, he->alen) || (ep && ep->getCont() &&
TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&
(((optflags & aeXPRODUCT) == 0) ||
TESTAFF(he->astr, ep->getFlag(), he->alen) ||
// enabled by prefix
((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
) &&
// handle cont. class
((!cclass) ||
((contclass) && TESTAFF(contclass, cclass, contclasslen))
) &&
// check only in compound homonyms (bad flags)
(!badflag || !TESTAFF(he->astr, badflag, he->alen)
) &&
// handle required flag
((!needflag) ||
(TESTAFF(he->astr, needflag, he->alen) ||
((contclass) && TESTAFF(contclass, needflag, contclasslen)))
)
) return he;
he = he->next_homonym; // check homonyms
} while (he);
// obsolote stemming code (used only by the
// experimental SuffixMgr:suggest_pos_stems)
// store resulting root in wlst
} else if (wlst && (*ns < maxSug)) {
int cwrd = 1;
for (int k=0; k < *ns; k++)
if (strcmp(tmpword, wlst[k]) == 0) cwrd = 0;
if (cwrd) {
wlst[*ns] = mystrdup(tmpword);
if (wlst[*ns] == NULL) {
for (int j=0; j<*ns; j++) free(wlst[j]);
*ns = -1;
return NULL;
}
(*ns)++;
}
}
}
}
return NULL;
}
// see if two-level suffix is present in the word
struct hentry * SfxEntry::check_twosfx(const char * word, int len, int optflags,
AffEntry* ppfx, const FLAG needflag)
{
int tmpl; // length of tmpword
struct hentry * he; // hash entry pointer
unsigned char * cp;
char tmpword[MAXWORDUTF8LEN + 4];
PfxEntry* ep = (PfxEntry *) ppfx;
// if this suffix is being cross checked with a prefix
// but it does not support cross products skip it
if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)
return NULL;
// upon entry suffix is 0 length or already matches the end of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing suffix and adding
// back any characters that would have been stripped or
// or null terminating the shorter string
strcpy (tmpword, word);
cp = (unsigned char *)(tmpword + tmpl);
if (stripl) {
strcpy ((char *)cp, strip);
tmpl += stripl;
cp = (unsigned char *)(tmpword + tmpl);
} else *cp = '\0';
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then recall suffix_check
if (test_condition((char *) cp, (char *) tmpword)) {
if (ppfx) {
// handle conditional suffix
if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
he = pmyMgr->suffix_check(tmpword, tmpl, 0, NULL, NULL, 0, NULL, (FLAG) aflag, needflag);
else
he = pmyMgr->suffix_check(tmpword, tmpl, optflags, ppfx, NULL, 0, NULL, (FLAG) aflag, needflag);
} else {
he = pmyMgr->suffix_check(tmpword, tmpl, 0, NULL, NULL, 0, NULL, (FLAG) aflag, needflag);
}
if (he) return he;
}
}
return NULL;
}
#ifdef HUNSPELL_EXPERIMENTAL
// see if two-level suffix is present in the word
char * SfxEntry::check_twosfx_morph(const char * word, int len, int optflags,
AffEntry* ppfx, const FLAG needflag)
{
int tmpl; // length of tmpword
unsigned char * cp;
char tmpword[MAXWORDUTF8LEN + 4];
PfxEntry* ep = (PfxEntry *) ppfx;
char * st;
char result[MAXLNLEN];
*result = '\0';
// if this suffix is being cross checked with a prefix
// but it does not support cross products skip it
if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)
return NULL;
// upon entry suffix is 0 length or already matches the end of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
tmpl = len - appndl;
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
// generate new root word by removing suffix and adding
// back any characters that would have been stripped or
// or null terminating the shorter string
strcpy (tmpword, word);
cp = (unsigned char *)(tmpword + tmpl);
if (stripl) {
strcpy ((char *)cp, strip);
tmpl += stripl;
cp = (unsigned char *)(tmpword + tmpl);
} else *cp = '\0';
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then recall suffix_check
if (test_condition((char *) cp, (char *) tmpword)) {
if (ppfx) {
// handle conditional suffix
if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen)) {
st = pmyMgr->suffix_check_morph(tmpword, tmpl, 0, NULL, aflag, needflag);
if (st) {
if (((PfxEntry *) ppfx)->getMorph()) {
strcat(result, ((PfxEntry *) ppfx)->getMorph());
}
strcat(result,st);
free(st);
mychomp(result);
}
} else {
st = pmyMgr->suffix_check_morph(tmpword, tmpl, optflags, ppfx, aflag, needflag);
if (st) {
strcat(result, st);
free(st);
mychomp(result);
}
}
} else {
st = pmyMgr->suffix_check_morph(tmpword, tmpl, 0, NULL, aflag, needflag);
if (st) {
strcat(result, st);
free(st);
mychomp(result);
}
}
if (*result) return mystrdup(result);
}
}
return NULL;
}
#endif // END OF HUNSPELL_EXPERIMENTAL CODE
// get next homonym with same affix
struct hentry * SfxEntry::get_next_homonym(struct hentry * he, int optflags, AffEntry* ppfx,
const FLAG cclass, const FLAG needflag)
{
PfxEntry* ep = (PfxEntry *) ppfx;
while (he->next_homonym) {
he = he->next_homonym;
if ((TESTAFF(he->astr, aflag, he->alen) || (ep && ep->getCont() && TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&
((optflags & aeXPRODUCT) == 0 ||
TESTAFF(he->astr, ep->getFlag(), he->alen) ||
// handle conditional suffix
((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
) &&
// handle cont. class
((!cclass) ||
((contclass) && TESTAFF(contclass, cclass, contclasslen))
) &&
// handle required flag
((!needflag) ||
(TESTAFF(he->astr, needflag, he->alen) ||
((contclass) && TESTAFF(contclass, needflag, contclasslen)))
)
) return he;
}
return NULL;
}
#if 0
Appendix: Understanding Affix Code
An affix is either a prefix or a suffix attached to root words to make
other words.
Basically a Prefix or a Suffix is set of AffEntry objects
which store information about the prefix or suffix along
with supporting routines to check if a word has a particular
prefix or suffix or a combination.
The structure affentry is defined as follows:
struct affentry
{
unsigned short aflag; // ID used to represent the affix
char * strip; // string to strip before adding affix
char * appnd; // the affix string to add
unsigned char stripl; // length of the strip string
unsigned char appndl; // length of the affix string
char numconds; // the number of conditions that must be met
char opts; // flag: aeXPRODUCT- combine both prefix and suffix
char conds[SETSIZE]; // array which encodes the conditions to be met
};
Here is a suffix borrowed from the en_US.aff file. This file
is whitespace delimited.
SFX D Y 4
SFX D 0 e d
SFX D y ied [^aeiou]y
SFX D 0 ed [^ey]
SFX D 0 ed [aeiou]y
This information can be interpreted as follows:
In the first line has 4 fields
Field
-----
1 SFX - indicates this is a suffix
2 D - is the name of the character flag which represents this suffix
3 Y - indicates it can be combined with prefixes (cross product)
4 4 - indicates that sequence of 4 affentry structures are needed to
properly store the affix information
The remaining lines describe the unique information for the 4 SfxEntry
objects that make up this affix. Each line can be interpreted
as follows: (note fields 1 and 2 are as a check against line 1 info)
Field
-----
1 SFX - indicates this is a suffix
2 D - is the name of the character flag for this affix
3 y - the string of chars to strip off before adding affix
(a 0 here indicates the NULL string)
4 ied - the string of affix characters to add
5 [^aeiou]y - the conditions which must be met before the affix
can be applied
Field 5 is interesting. Since this is a suffix, field 5 tells us that
there are 2 conditions that must be met. The first condition is that
the next to the last character in the word must *NOT* be any of the
following "a", "e", "i", "o" or "u". The second condition is that
the last character of the word must end in "y".
So how can we encode this information concisely and be able to
test for both conditions in a fast manner? The answer is found
but studying the wonderful ispell code of Geoff Kuenning, et.al.
(now available under a normal BSD license).
If we set up a conds array of 256 bytes indexed (0 to 255) and access it
using a character (cast to an unsigned char) of a string, we have 8 bits
of information we can store about that character. Specifically we
could use each bit to say if that character is allowed in any of the
last (or first for prefixes) 8 characters of the word.
Basically, each character at one end of the word (up to the number
of conditions) is used to index into the conds array and the resulting
value found there says whether the that character is valid for a
specific character position in the word.
For prefixes, it does this by setting bit 0 if that char is valid
in the first position, bit 1 if valid in the second position, and so on.
If a bit is not set, then that char is not valid for that postion in the
word.
If working with suffixes bit 0 is used for the character closest
to the front, bit 1 for the next character towards the end, ...,
with bit numconds-1 representing the last char at the end of the string.
Note: since entries in the conds[] are 8 bits, only 8 conditions
(read that only 8 character positions) can be examined at one
end of a word (the beginning for prefixes and the end for suffixes.
So to make this clearer, lets encode the conds array values for the
first two affentries for the suffix D described earlier.
For the first affentry:
numconds = 1 (only examine the last character)
conds['e'] = (1 << 0) (the word must end in an E)
all others are all 0
For the second affentry:
numconds = 2 (only examine the last two characters)
conds[X] = conds[X] | (1 << 0) (aeiou are not allowed)
where X is all characters *but* a, e, i, o, or u
conds['y'] = (1 << 1) (the last char must be a y)
all other bits for all other entries in the conds array are zero
#endif

Просмотреть файл

@ -0,0 +1,187 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _AFFIX_HXX_
#define _AFFIX_HXX_
#include "atypes.hxx"
#include "baseaffix.hxx"
#include "affixmgr.hxx"
/* A Prefix Entry */
class PfxEntry : public AffEntry
{
AffixMgr* pmyMgr;
PfxEntry * next;
PfxEntry * nexteq;
PfxEntry * nextne;
PfxEntry * flgnxt;
public:
PfxEntry(AffixMgr* pmgr, affentry* dp );
~PfxEntry();
inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); }
struct hentry * checkword(const char * word, int len, char in_compound,
const FLAG needflag = FLAG_NULL);
struct hentry * check_twosfx(const char * word, int len, char in_compound, const FLAG needflag = NULL);
char * check_morph(const char * word, int len, char in_compound,
const FLAG needflag = FLAG_NULL);
char * check_twosfx_morph(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
inline FLAG getFlag() { return aflag; }
inline const char * getKey() { return appnd; }
char * add(const char * word, int len);
inline short getKeyLen() { return appndl; }
inline const char * getMorph() { return morphcode; }
inline const unsigned short * getCont() { return contclass; }
inline short getContLen() { return contclasslen; }
inline PfxEntry * getNext() { return next; }
inline PfxEntry * getNextNE() { return nextne; }
inline PfxEntry * getNextEQ() { return nexteq; }
inline PfxEntry * getFlgNxt() { return flgnxt; }
inline void setNext(PfxEntry * ptr) { next = ptr; }
inline void setNextNE(PfxEntry * ptr) { nextne = ptr; }
inline void setNextEQ(PfxEntry * ptr) { nexteq = ptr; }
inline void setFlgNxt(PfxEntry * ptr) { flgnxt = ptr; }
inline int test_condition(const char * st);
};
/* A Suffix Entry */
class SfxEntry : public AffEntry
{
AffixMgr* pmyMgr;
char * rappnd;
SfxEntry * next;
SfxEntry * nexteq;
SfxEntry * nextne;
SfxEntry * flgnxt;
SfxEntry * l_morph;
SfxEntry * r_morph;
SfxEntry * eq_morph;
public:
SfxEntry(AffixMgr* pmgr, affentry* dp );
~SfxEntry();
inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); }
struct hentry * checkword(const char * word, int len, int optflags,
AffEntry* ppfx, char ** wlst, int maxSug, int * ns,
// const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, char in_compound=IN_CPD_NOT);
const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, const FLAG badflag = 0);
struct hentry * check_twosfx(const char * word, int len, int optflags, AffEntry* ppfx, const FLAG needflag = NULL);
char * check_twosfx_morph(const char * word, int len, int optflags,
AffEntry* ppfx, const FLAG needflag = FLAG_NULL);
struct hentry * get_next_homonym(struct hentry * he);
struct hentry * get_next_homonym(struct hentry * word, int optflags, AffEntry* ppfx,
const FLAG cclass, const FLAG needflag);
inline FLAG getFlag() { return aflag; }
inline const char * getKey() { return rappnd; }
char * add(const char * word, int len);
inline const char * getMorph() { return morphcode; }
inline const unsigned short * getCont() { return contclass; }
inline short getContLen() { return contclasslen; }
inline const char * getAffix() { return appnd; }
inline short getKeyLen() { return appndl; }
inline SfxEntry * getNext() { return next; }
inline SfxEntry * getNextNE() { return nextne; }
inline SfxEntry * getNextEQ() { return nexteq; }
inline SfxEntry * getLM() { return l_morph; }
inline SfxEntry * getRM() { return r_morph; }
inline SfxEntry * getEQM() { return eq_morph; }
inline SfxEntry * getFlgNxt() { return flgnxt; }
inline void setNext(SfxEntry * ptr) { next = ptr; }
inline void setNextNE(SfxEntry * ptr) { nextne = ptr; }
inline void setNextEQ(SfxEntry * ptr) { nexteq = ptr; }
inline void setFlgNxt(SfxEntry * ptr) { flgnxt = ptr; }
inline int test_condition(const char * st, const char * begin);
};
#endif

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,265 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _AFFIXMGR_HXX_
#define _AFFIXMGR_HXX_
#ifdef MOZILLA_CLIENT
#ifdef __SUNPRO_CC // for SunONE Studio compiler
using namespace std;
#endif
#include <stdio.h>
#else
#include <cstdio>
#endif
#include "atypes.hxx"
#include "baseaffix.hxx"
#include "hashmgr.hxx"
// check flag duplication
#define dupSFX (1 << 0)
#define dupPFX (1 << 1)
class AffixMgr
{
AffEntry * pStart[SETSIZE];
AffEntry * sStart[SETSIZE];
AffEntry * pFlag[CONTSIZE];
AffEntry * sFlag[CONTSIZE];
HashMgr * pHMgr;
char * trystring;
char * encoding;
struct cs_info * csconv;
int utf8;
int complexprefixes;
FLAG compoundflag;
FLAG compoundbegin;
FLAG compoundmiddle;
FLAG compoundend;
FLAG compoundroot;
FLAG compoundforbidflag;
FLAG compoundpermitflag;
int checkcompounddup;
int checkcompoundrep;
int checkcompoundcase;
int checkcompoundtriple;
FLAG forbiddenword;
FLAG nosuggest;
FLAG pseudoroot;
int cpdmin;
int numrep;
replentry * reptable;
int nummap;
mapentry * maptable;
int numbreak;
char ** breaktable;
int numcheckcpd;
replentry * checkcpdtable;
int numdefcpd;
flagentry * defcpdtable;
int maxngramsugs;
int nosplitsugs;
int sugswithdots;
int cpdwordmax;
int cpdmaxsyllable;
char * cpdvowels;
w_char * cpdvowels_utf16;
int cpdvowels_utf16_len;
char * cpdsyllablenum;
const char * pfxappnd; // BUG: not stateless
const char * sfxappnd; // BUG: not stateless
FLAG sfxflag; // BUG: not stateless
char * derived; // BUG: not stateless
AffEntry * sfx; // BUG: not stateless
AffEntry * pfx; // BUG: not stateless
int checknum;
char * wordchars;
unsigned short * wordchars_utf16;
int wordchars_utf16_len;
char * ignorechars;
unsigned short * ignorechars_utf16;
int ignorechars_utf16_len;
char * version;
char * lang;
int langnum;
FLAG lemma_present;
FLAG circumfix;
FLAG onlyincompound;
FLAG keepcase;
int checksharps;
int havecontclass; // boolean variable
char contclasses[CONTSIZE]; // flags of possible continuing classes (twofold affix)
flag flag_mode;
public:
AffixMgr(const char * affpath, HashMgr * ptr);
~AffixMgr();
struct hentry * affix_check(const char * word, int len,
const unsigned short needflag = (unsigned short) 0, char in_compound = IN_CPD_NOT);
struct hentry * prefix_check(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
inline int isSubset(const char * s1, const char * s2);
struct hentry * prefix_check_twosfx(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
inline int isRevSubset(const char * s1, const char * end_of_s2, int len);
struct hentry * suffix_check(const char * word, int len, int sfxopts, AffEntry* ppfx,
char ** wlst, int maxSug, int * ns, const FLAG cclass = FLAG_NULL,
const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
struct hentry * suffix_check_twosfx(const char * word, int len,
int sfxopts, AffEntry* ppfx, const FLAG needflag = FLAG_NULL);
char * affix_check_morph(const char * word, int len,
const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
char * prefix_check_morph(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
char * suffix_check_morph (const char * word, int len, int sfxopts, AffEntry * ppfx,
const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
char * prefix_check_twosfx_morph(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
char * suffix_check_twosfx_morph(const char * word, int len,
int sfxopts, AffEntry * ppfx, const FLAG needflag = FLAG_NULL);
int expand_rootword(struct guessword * wlst, int maxn, const char * ts,
int wl, const unsigned short * ap, unsigned short al, char * bad, int);
short get_syllable (const char * word, int wlen);
int cpdrep_check(const char * word, int len);
int cpdpat_check(const char * word, int len);
int defcpd_check(hentry *** words, short wnum, hentry * rv, hentry ** rwords, char all);
int cpdcase_check(const char * word, int len);
inline int candidate_check(const char * word, int len);
struct hentry * compound_check(const char * word, int len,
short wordnum, short numsyllable, short maxwordnum, short wnum, hentry ** words,
char hu_mov_rule, int * cmpdstemnum, int * cmpdstem, char is_sug);
int compound_check_morph(const char * word, int len,
short wordnum, short numsyllable, short maxwordnum, short wnum, hentry ** words,
char hu_mov_rule, char ** result, char * partresult);
struct hentry * lookup(const char * word);
int get_numrep();
struct replentry * get_reptable();
int get_nummap();
struct mapentry * get_maptable();
int get_numbreak();
char ** get_breaktable();
char * get_encoding();
int get_langnum();
char * get_try_string();
const char * get_wordchars();
unsigned short * get_wordchars_utf16(int * len);
char * get_ignore();
unsigned short * get_ignore_utf16(int * len);
int get_compound();
FLAG get_compoundflag();
FLAG get_compoundbegin();
FLAG get_forbiddenword();
FLAG get_nosuggest();
// FLAG get_circumfix();
FLAG get_pseudoroot();
FLAG get_onlyincompound();
FLAG get_compoundroot();
FLAG get_lemma_present();
int get_checknum();
char * get_possible_root();
const char * get_prefix();
const char * get_suffix();
const char * get_derived();
const char * get_version();
const int have_contclass();
int get_utf8();
int get_complexprefixes();
char * get_suffixed(char );
int get_maxngramsugs();
int get_nosplitsugs();
int get_sugswithdots(void);
FLAG get_keepcase(void);
int get_checksharps(void);
private:
int parse_file(const char * affpath);
// int parse_string(char * line, char ** out, const char * name);
int parse_flag(char * line, unsigned short * out, const char * name);
int parse_num(char * line, int * out, const char * name);
// int parse_array(char * line, char ** out, unsigned short ** out_utf16,
// int * out_utf16_len, const char * name);
int parse_cpdsyllable(char * line);
int parse_reptable(char * line, FILE * af);
int parse_maptable(char * line, FILE * af);
int parse_breaktable(char * line, FILE * af);
int parse_checkcpdtable(char * line, FILE * af);
int parse_defcpdtable(char * line, FILE * af);
int parse_affix(char * line, const char at, FILE * af, char * dupflags);
int encodeit(struct affentry * ptr, char * cs);
int build_pfxtree(AffEntry* pfxptr);
int build_sfxtree(AffEntry* sfxptr);
int process_pfx_order();
int process_sfx_order();
AffEntry * process_pfx_in_order(AffEntry * ptr, AffEntry * nptr);
AffEntry * process_sfx_in_order(AffEntry * ptr, AffEntry * nptr);
int process_pfx_tree_to_list();
int process_sfx_tree_to_list();
int redundant_condition(char, char * strip, int stripl, const char * cond, char *);
};
#endif

Просмотреть файл

@ -0,0 +1,157 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _ATYPES_HXX_
#define _ATYPES_HXX_
#ifndef HUNSPELL_WARNING
#ifdef HUNSPELL_WARNING_ON
#define HUNSPELL_WARNING fprintf
#else
#define HUNSPELL_WARNING
#endif
#endif
// HUNSTEM def.
#define HUNSTEM
#include "csutil.hxx"
#include "hashmgr.hxx"
#define SETSIZE 256
#define CONTSIZE 65536
#define MAXWORDLEN 100
#define MAXWORDUTF8LEN (MAXWORDLEN * 4)
// affentry options
#define aeXPRODUCT (1 << 0)
#define aeUTF8 (1 << 1)
#define aeALIASF (1 << 2)
#define aeALIASM (1 << 3)
#define aeINFIX (1 << 4)
// compound options
#define IN_CPD_NOT 0
#define IN_CPD_BEGIN 1
#define IN_CPD_END 2
#define IN_CPD_OTHER 3
#define MAXLNLEN 8192 * 4
#define MINCPDLEN 3
#define MAXCOMPOUND 10
#define MAXACC 1000
#define FLAG unsigned short
#define FLAG_NULL 0x00
#define FREE_FLAG(a) a = 0
#define TESTAFF( a, b , c ) flag_bsearch((unsigned short *) a, (unsigned short) b, c)
struct affentry
{
char * strip;
char * appnd;
unsigned char stripl;
unsigned char appndl;
char numconds;
char opts;
unsigned short aflag;
union {
char base[SETSIZE];
struct {
char ascii[SETSIZE/2];
char neg[8];
char all[8];
w_char * wchars[8];
int wlen[8];
} utf8;
} conds;
#ifdef HUNSPELL_EXPERIMENTAL
char * morphcode;
#endif
unsigned short * contclass;
short contclasslen;
};
struct replentry {
char * pattern;
char * pattern2;
};
struct mapentry {
char * set;
w_char * set_utf16;
int len;
};
struct flagentry {
FLAG * def;
int len;
};
struct guessword {
char * word;
bool allow;
};
#endif

Просмотреть файл

@ -0,0 +1,87 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _BASEAFF_HXX_
#define _BASEAFF_HXX_
class AffEntry
{
public:
protected:
char * appnd;
char * strip;
unsigned char appndl;
unsigned char stripl;
char numconds;
char opts;
unsigned short aflag;
union {
char base[SETSIZE];
struct {
char ascii[SETSIZE/2];
char neg[8];
char all[8];
w_char * wchars[8];
int wlen[8];
} utf8;
} conds;
char * morphcode;
unsigned short * contclass;
short contclasslen;
};
#endif

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,215 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef __CSUTILHXX__
#define __CSUTILHXX__
// First some base level utility routines
#define NOCAP 0
#define INITCAP 1
#define ALLCAP 2
#define HUHCAP 3
#define HUHINITCAP 4
#define ONLYUPCASEFLAG 65535
typedef struct {
unsigned char l;
unsigned char h;
} w_char;
// convert UTF-16 characters to UTF-8
char * u16_u8(char * dest, int size, const w_char * src, int srclen);
// convert UTF-8 characters to UTF-16
int u8_u16(w_char * dest, int size, const char * src);
// sort 2-byte vector
void flag_qsort(unsigned short flags[], int begin, int end);
// binary search in 2-byte vector
int flag_bsearch(unsigned short flags[], unsigned short flag, int right);
// remove end of line char(s)
void mychomp(char * s);
// duplicate string
char * mystrdup(const char * s);
// duplicate reverse of string
char * myrevstrdup(const char * s);
// parse into tokens with char delimiter
char * mystrsep(char ** sptr, const char delim);
// parse into tokens with char delimiter
char * mystrsep2(char ** sptr, const char delim);
// parse into tokens with char delimiter
char * mystrrep(char *, const char *, const char *);
// append s to ends of every lines in text
void strlinecat(char * lines, const char * s);
// tokenize into lines with new line
int line_tok(const char * text, char *** lines);
// tokenize into lines with new line and uniq in place
char * line_uniq(char * text);
// change \n to c in place
char * line_join(char * text, char c);
// leave only last {[^}]*} pattern in string
char * delete_zeros(char * morphout);
// reverse word
int reverseword(char *);
// reverse word
int reverseword_utf(char *);
// character encoding information
struct cs_info {
unsigned char ccase;
unsigned char clower;
unsigned char cupper;
};
// Unicode character encoding information
struct unicode_info {
unsigned short c;
unsigned short cupper;
unsigned short clower;
};
struct unicode_info2 {
char cletter;
unsigned short cupper;
unsigned short clower;
};
int initialize_utf_tbl();
void free_utf_tbl();
unsigned short unicodetoupper(unsigned short c, int langnum);
unsigned short unicodetolower(unsigned short c, int langnum);
int unicodeisalpha(unsigned short c);
struct enc_entry {
const char * enc_name;
struct cs_info * cs_table;
};
// language to encoding default map
struct lang_map {
const char * lang;
const char * def_enc;
int num;
};
struct cs_info * get_current_cs(const char * es);
const char * get_default_enc(const char * lang);
// get language identifiers of language codes
int get_lang_num(const char * lang);
// get characters of the given 8bit encoding with lower- and uppercase forms
char * get_casechars(const char * enc);
// convert null terminated string to all caps using encoding
void enmkallcap(char * d, const char * p, const char * encoding);
// convert null terminated string to all little using encoding
void enmkallsmall(char * d, const char * p, const char * encoding);
// convert null terminated string to have intial capital using encoding
void enmkinitcap(char * d, const char * p, const char * encoding);
// convert null terminated string to all caps
void mkallcap(char * p, const struct cs_info * csconv);
// convert null terminated string to all little
void mkallsmall(char * p, const struct cs_info * csconv);
// convert null terminated string to have intial capital
void mkinitcap(char * p, const struct cs_info * csconv);
// convert first nc characters of UTF-8 string to little
void mkallsmall_utf(w_char * u, int nc, int langnum);
// convert first nc characters of UTF-8 string to capital
void mkallcap_utf(w_char * u, int nc, int langnum);
// get type of capitalization
int get_captype(char * q, int nl, cs_info *);
// get type of capitalization (UTF-8)
int get_captype_utf8(char * q, int nl, int langnum);
// strip all ignored characters in the string
void remove_ignored_chars_utf(char * word, unsigned short ignored_chars[], int ignored_len);
// strip all ignored characters in the string
void remove_ignored_chars(char * word, char * ignored_chars);
int parse_string(char * line, char ** out, const char * name);
int parse_array(char * line, char ** out,
unsigned short ** out_utf16, int * out_utf16_len, const char * name, int utf8);
#endif

Просмотреть файл

@ -0,0 +1,915 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef MOZILLA_CLIENT
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <cctype>
#else
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#endif
#include "hashmgr.hxx"
#include "csutil.hxx"
#include "atypes.hxx"
#ifdef MOZILLA_CLIENT
#ifdef __SUNPRO_CC // for SunONE Studio compiler
using namespace std;
#endif
#else
#ifndef W32
using namespace std;
#endif
#endif
// build a hash table from a munched word list
HashMgr::HashMgr(const char * tpath, const char * apath)
{
tablesize = 0;
tableptr = NULL;
flag_mode = FLAG_CHAR;
complexprefixes = 0;
utf8 = 0;
langnum = 0;
lang = NULL;
enc = NULL;
csconv = 0;
ignorechars = NULL;
ignorechars_utf16 = NULL;
ignorechars_utf16_len = 0;
numaliasf = 0;
aliasf = NULL;
numaliasm = 0;
aliasm = NULL;
forbiddenword = FLAG_NULL; // forbidden word signing flag
load_config(apath);
int ec = load_tables(tpath);
if (ec) {
/* error condition - what should we do here */
HUNSPELL_WARNING(stderr, "Hash Manager Error : %d\n",ec);
if (tableptr) {
free(tableptr);
}
tablesize = 0;
}
}
HashMgr::~HashMgr()
{
if (tableptr) {
// now pass through hash table freeing up everything
// go through column by column of the table
for (int i=0; i < tablesize; i++) {
struct hentry * pt = &tableptr[i];
struct hentry * nt = NULL;
if (pt) {
if (pt->astr && !aliasf) free(pt->astr);
if (pt->word) free(pt->word);
#ifdef HUNSPELL_EXPERIMENTAL
if (pt->description && !aliasm) free(pt->description);
#endif
pt = pt->next;
}
while(pt) {
nt = pt->next;
if (pt->astr && !aliasf) free(pt->astr);
if (pt->word) free(pt->word);
#ifdef HUNSPELL_EXPERIMENTAL
if (pt->description && !aliasm) free(pt->description);
#endif
free(pt);
pt = nt;
}
}
free(tableptr);
}
tablesize = 0;
if (aliasf) {
for (int j = 0; j < (numaliasf); j++) free(aliasf[j]);
free(aliasf);
aliasf = NULL;
if (aliasflen) {
free(aliasflen);
aliasflen = NULL;
}
}
if (aliasm) {
for (int j = 0; j < (numaliasm); j++) free(aliasm[j]);
free(aliasm);
aliasm = NULL;
}
if (enc) free(enc);
if (lang) free(lang);
if (ignorechars) free(ignorechars);
if (ignorechars_utf16) free(ignorechars_utf16);
}
// lookup a root word in the hashtable
struct hentry * HashMgr::lookup(const char *word) const
{
struct hentry * dp;
if (tableptr) {
dp = &tableptr[hash(word)];
if (dp->word == NULL) return NULL;
for ( ; dp != NULL; dp = dp->next) {
if (strcmp(word,dp->word) == 0) return dp;
}
}
return NULL;
}
// add a word to the hash table (private)
int HashMgr::add_word(const char * word, int wl, unsigned short * aff,
int al, const char * desc, bool onlyupcase)
{
char * st = mystrdup(word);
bool upcasehomonym = false;
if (wl && !st) return 1;
if (ignorechars != NULL) {
if (utf8) {
remove_ignored_chars_utf(st, ignorechars_utf16, ignorechars_utf16_len);
} else {
remove_ignored_chars(st, ignorechars);
}
}
if (complexprefixes) {
if (utf8) reverseword_utf(st); else reverseword(st);
}
int i = hash(st);
struct hentry * dp = &tableptr[i];
if (dp->word == NULL) {
dp->wlen = (short) wl;
dp->alen = (short) al;
dp->word = st;
dp->astr = aff;
dp->next = NULL;
dp->next_homonym = NULL;
#ifdef HUNSPELL_EXPERIMENTAL
if (aliasm) {
dp->description = (desc) ? get_aliasm(atoi(desc)) : mystrdup(desc);
} else {
dp->description = mystrdup(desc);
if (desc && !dp->description) return 1;
if (dp->description && complexprefixes) {
if (utf8) reverseword_utf(dp->description); else reverseword(dp->description);
}
}
#endif
} else {
struct hentry* hp = (struct hentry *) malloc (sizeof(struct hentry));
if (!hp) return 1;
hp->wlen = (short) wl;
hp->alen = (short) al;
hp->word = st;
hp->astr = aff;
hp->next = NULL;
hp->next_homonym = NULL;
#ifdef HUNSPELL_EXPERIMENTAL
if (aliasm) {
hp->description = (desc) ? get_aliasm(atoi(desc)) : mystrdup(desc);
} else {
hp->description = mystrdup(desc);
if (desc && !hp->description) return 1;
if (dp->description && complexprefixes) {
if (utf8) reverseword_utf(hp->description); else reverseword(hp->description);
}
}
#endif
while (dp->next != NULL) {
if ((!dp->next_homonym) && (strcmp(hp->word, dp->word) == 0)) {
// remove hidden onlyupcase homonym
if (!onlyupcase) {
if ((dp->astr) && TESTAFF(dp->astr, ONLYUPCASEFLAG, dp->alen)) {
free(dp->astr);
dp->astr = hp->astr;
free(hp->word);
free(hp);
return 0;
} else {
dp->next_homonym = hp;
}
} else {
upcasehomonym = true;
}
}
dp=dp->next;
}
if (strcmp(hp->word, dp->word) == 0) {
// remove hidden onlyupcase homonym
if (!onlyupcase) {
if ((dp->astr) && TESTAFF(dp->astr, ONLYUPCASEFLAG, dp->alen)) {
free(dp->astr);
dp->astr = hp->astr;
free(hp->word);
free(hp);
return 0;
} else {
dp->next_homonym = hp;
}
} else {
upcasehomonym = true;
}
}
if (!upcasehomonym) {
dp->next = hp;
} else {
// remove hidden onlyupcase homonym
free(hp->word);
if (hp->astr) free(hp->astr);
free(hp);
}
}
return 0;
}
// add a custom dic. word to the hash table (public)
int HashMgr::put_word(const char * word, int wl, char * aff)
{
unsigned short * flags;
int al = 0;
if (aff) {
al = decode_flags(&flags, aff);
flag_qsort(flags, 0, al);
} else {
flags = NULL;
}
add_word(word, wl, flags, al, NULL, false);
return 0;
}
int HashMgr::put_word_pattern(const char * word, int wl, const char * pattern)
{
unsigned short * flags;
struct hentry * dp = lookup(pattern);
if (!dp || !dp->astr) return 1;
flags = (unsigned short *) malloc (dp->alen * sizeof(short));
memcpy((void *) flags, (void *) dp->astr, dp->alen * sizeof(short));
add_word(word, wl, flags, dp->alen, NULL, false);
return 0;
}
// walk the hash table entry by entry - null at end
struct hentry * HashMgr::walk_hashtable(int &col, struct hentry * hp) const
{
//reset to start
if ((col < 0) || (hp == NULL)) {
col = -1;
hp = NULL;
}
if (hp && hp->next != NULL) {
hp = hp->next;
} else {
col++;
hp = (col < tablesize) ? &tableptr[col] : NULL;
// search for next non-blank column entry
while (hp && (hp->word == NULL)) {
col ++;
hp = (col < tablesize) ? &tableptr[col] : NULL;
}
if (col < tablesize) return hp;
hp = NULL;
col = -1;
}
return hp;
}
// load a munched word list and build a hash table on the fly
int HashMgr::load_tables(const char * tpath)
{
int wl, al;
char * ap;
char * dp;
unsigned short * flags;
int captype;
// raw dictionary - munched file
FILE * rawdict = fopen(tpath, "r");
if (rawdict == NULL) return 1;
// first read the first line of file to get hash table size */
char ts[MAXDELEN];
if (! fgets(ts, MAXDELEN-1,rawdict)) {
HUNSPELL_WARNING(stderr, "error: empty dic file\n");
fclose(rawdict);
return 2;
}
mychomp(ts);
/* remove byte order mark */
if (strncmp(ts,"\xEF\xBB\xBF",3) == 0) {
memmove(ts, ts+3, strlen(ts+3)+1);
HUNSPELL_WARNING(stderr, "warning: dic file begins with byte order mark: possible incompatibility with old Hunspell versions\n");
}
if ((*ts < '1') || (*ts > '9')) HUNSPELL_WARNING(stderr, "error - missing word count in dictionary file\n");
tablesize = atoi(ts);
if (!tablesize) {
fclose(rawdict);
return 4;
}
tablesize = tablesize + 5 + USERWORD;
if ((tablesize %2) == 0) tablesize++;
// allocate the hash table
tableptr = (struct hentry *) calloc(tablesize, sizeof(struct hentry));
if (! tableptr) {
fclose(rawdict);
return 3;
}
for (int i=0; i<tablesize; i++) tableptr[i].word = NULL;
// loop through all words on much list and add to hash
// table and create word and affix strings
while (fgets(ts,MAXDELEN-1,rawdict)) {
mychomp(ts);
// split each line into word and morphological description
dp = strchr(ts,'\t');
if (dp) {
*dp = '\0';
dp++;
} else {
dp = NULL;
}
// split each line into word and affix char strings
// "\/" signs slash in words (not affix separator)
// "/" at beginning of the line is word character (not affix separator)
ap = strchr(ts,'/');
while (ap) {
if (ap == ts) {
ap++;
continue;
} else if (*(ap - 1) != '\\') break;
// replace "\/" with "/"
for (char * sp = ap - 1; *sp; *sp = *(sp + 1), sp++);
ap = strchr(ap,'/');
}
if (ap) {
*ap = '\0';
if (aliasf) {
int index = atoi(ap + 1);
al = get_aliasf(index, &flags);
if (!al) {
HUNSPELL_WARNING(stderr, "error - bad flag vector alias: %s\n", ts);
*ap = '\0';
}
} else {
al = decode_flags(&flags, ap + 1);
flag_qsort(flags, 0, al);
}
} else {
al = 0;
ap = NULL;
flags = NULL;
}
wl = strlen(ts);
// add the word and its index
if (add_word(ts,wl,flags,al,dp, false)) {
fclose(rawdict);
return 5;
}
// add decapizatalized forms to handle following cases
// OpenOffice.org -> OPENOFFICE.ORG
// CIA's -> CIA'S
captype = utf8 ? get_captype_utf8(ts, wl, langnum) : get_captype(ts, wl, csconv);
if (((captype == HUHCAP) || (captype == HUHINITCAP) ||
((captype == ALLCAP) && (flags != NULL))) &&
!((flags != NULL) && TESTAFF(flags, forbiddenword, al))) {
unsigned short * flags2 = (unsigned short *) malloc (sizeof(unsigned short *)* (al + 1));
memcpy(flags2, flags, al * sizeof(unsigned short *));
flags2[al] = ONLYUPCASEFLAG;
if (utf8) {
char st[MAXDELEN];
w_char w[MAXDELEN];
int wlen = u8_u16(w, MAXDELEN, ts);
mkallsmall_utf(w, wlen, langnum);
mkallcap_utf(w, 1, langnum);
u16_u8(st, MAXDELEN, w, wlen);
if (add_word(st,wl,flags2,al+1,dp, true)) {
fclose(rawdict);
return 5;
}
} else {
mkallsmall(ts, csconv);
mkinitcap(ts, csconv);
if (add_word(ts,wl,flags2,al+1,dp, true)) {
fclose(rawdict);
return 5;
}
}
}
}
fclose(rawdict);
return 0;
}
// the hash function is a simple load and rotate
// algorithm borrowed
int HashMgr::hash(const char * word) const
{
long hv = 0;
for (int i=0; i < 4 && *word != 0; i++)
hv = (hv << 8) | (*word++);
while (*word != 0) {
ROTATE(hv,ROTATE_LEN);
hv ^= (*word++);
}
return (unsigned long) hv % tablesize;
}
int HashMgr::decode_flags(unsigned short ** result, char * flags) {
int len;
switch (flag_mode) {
case FLAG_LONG: { // two-character flags (1x2yZz -> 1x 2y Zz)
len = strlen(flags);
if (len%2 == 1) HUNSPELL_WARNING(stderr, "error: length of FLAG_LONG flagvector is odd: %s\n", flags);
len = len/2;
*result = (unsigned short *) malloc(len * sizeof(short));
for (int i = 0; i < len; i++) {
(*result)[i] = (((unsigned short) flags[i * 2]) << 8) + (unsigned short) flags[i * 2 + 1];
}
break;
}
case FLAG_NUM: { // decimal numbers separated by comma (4521,23,233 -> 4521 23 233)
len = 1;
char * src = flags;
unsigned short * dest;
char * p;
for (p = flags; *p; p++) {
if (*p == ',') len++;
}
*result = (unsigned short *) malloc(len * sizeof(short));
dest = *result;
for (p = flags; *p; p++) {
if (*p == ',') {
*dest = (unsigned short) atoi(src);
if (*dest == 0) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
src = p + 1;
dest++;
}
}
*dest = (unsigned short) atoi(src);
if (*dest == 0) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
break;
}
case FLAG_UNI: { // UTF-8 characters
w_char w[MAXDELEN/2];
len = u8_u16(w, MAXDELEN/2, flags);
*result = (unsigned short *) malloc(len * sizeof(short));
memcpy(*result, w, len * sizeof(short));
break;
}
default: { // Ispell's one-character flags (erfg -> e r f g)
unsigned short * dest;
len = strlen(flags);
*result = (unsigned short *) malloc(len * sizeof(short));
dest = *result;
for (unsigned char * p = (unsigned char *) flags; *p; p++) {
*dest = (unsigned short) *p;
dest++;
}
}
}
return len;
}
unsigned short HashMgr::decode_flag(const char * f) {
unsigned short s = 0;
switch (flag_mode) {
case FLAG_LONG:
s = ((unsigned short) f[0] << 8) + (unsigned short) f[1];
break;
case FLAG_NUM:
s = (unsigned short) atoi(f);
break;
case FLAG_UNI:
u8_u16((w_char *) &s, 1, f);
break;
default:
s = (unsigned short) *((unsigned char *)f);
}
if (!s) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
return s;
}
char * HashMgr::encode_flag(unsigned short f) {
unsigned char ch[10];
if (f==0) return mystrdup("(NULL)");
if (flag_mode == FLAG_LONG) {
ch[0] = (unsigned char) (f >> 8);
ch[1] = (unsigned char) (f - ((f >> 8) << 8));
ch[2] = '\0';
} else if (flag_mode == FLAG_NUM) {
sprintf((char *) ch, "%d", f);
} else if (flag_mode == FLAG_UNI) {
u16_u8((char *) &ch, 10, (w_char *) &f, 1);
} else {
ch[0] = (unsigned char) (f);
ch[1] = '\0';
}
return mystrdup((char *) ch);
}
// read in aff file and set flag mode
int HashMgr::load_config(const char * affpath)
{
int firstline = 1;
// io buffers
char line[MAXDELEN+1];
// open the affix file
FILE * afflst;
afflst = fopen(affpath,"r");
if (!afflst) {
HUNSPELL_WARNING(stderr, "Error - could not open affix description file %s\n",affpath);
return 1;
}
// read in each line ignoring any that do not
// start with a known line type indicator
while (fgets(line,MAXDELEN,afflst)) {
mychomp(line);
/* remove byte order mark */
if (firstline) {
firstline = 0;
if (strncmp(line,"\xEF\xBB\xBF",3) == 0) memmove(line, line+3, strlen(line+3)+1);
}
/* parse in the try string */
if ((strncmp(line,"FLAG",4) == 0) && isspace(line[4])) {
if (flag_mode != FLAG_CHAR) {
HUNSPELL_WARNING(stderr, "error: duplicate FLAG parameter\n");
}
if (strstr(line, "long")) flag_mode = FLAG_LONG;
if (strstr(line, "num")) flag_mode = FLAG_NUM;
if (strstr(line, "UTF-8")) flag_mode = FLAG_UNI;
if (flag_mode == FLAG_CHAR) {
HUNSPELL_WARNING(stderr, "error: FLAG need `num', `long' or `UTF-8' parameter: %s\n", line);
}
}
if (strncmp(line,"FORBIDDENWORD",13) == 0) {
char * st = NULL;
if (parse_string(line, &st, "FORBIDDENWORD")) {
fclose(afflst);
return 1;
}
forbiddenword = decode_flag(st);
free(st);
}
if (strncmp(line, "SET", 3) == 0) {
if (parse_string(line, &enc, "SET")) {
fclose(afflst);
return 1;
}
if (strcmp(enc, "UTF-8") == 0) {
utf8 = 1;
#ifndef OPENOFFICEORG
#ifndef MOZILLA_CLIENT
initialize_utf_tbl();
#endif
#endif
} else csconv = get_current_cs(enc);
}
if (strncmp(line, "LANG", 4) == 0) {
if (parse_string(line, &lang, "LANG")) {
fclose(afflst);
return 1;
}
langnum = get_lang_num(lang);
}
/* parse in the ignored characters (for example, Arabic optional diacritics characters */
if (strncmp(line,"IGNORE",6) == 0) {
if (parse_array(line, &ignorechars, &ignorechars_utf16, &ignorechars_utf16_len, "IGNORE", utf8)) {
fclose(afflst);
return 1;
}
}
if ((strncmp(line,"AF",2) == 0) && isspace(line[2])) {
if (parse_aliasf(line, afflst)) {
fclose(afflst);
return 1;
}
}
#ifdef HUNSPELL_EXPERIMENTAL
if ((strncmp(line,"AM",2) == 0) && isspace(line[2])) {
if (parse_aliasm(line, afflst)) {
fclose(afflst);
return 1;
}
}
#endif
if (strncmp(line,"COMPLEXPREFIXES",15) == 0) complexprefixes = 1;
if (((strncmp(line,"SFX",3) == 0) || (strncmp(line,"PFX",3) == 0)) && isspace(line[3])) break;
}
if (csconv == NULL) csconv = get_current_cs("ISO8859-1");
fclose(afflst);
return 0;
}
/* parse in the ALIAS table */
int HashMgr::parse_aliasf(char * line, FILE * af)
{
if (numaliasf != 0) {
HUNSPELL_WARNING(stderr, "error: duplicate AF (alias for flag vector) tables used\n");
return 1;
}
char * tp = line;
char * piece;
int i = 0;
int np = 0;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: { np++; break; }
case 1: {
numaliasf = atoi(piece);
if (numaliasf < 1) {
numaliasf = 0;
aliasf = NULL;
aliasflen = NULL;
HUNSPELL_WARNING(stderr, "incorrect number of entries in AF table\n");
free(piece);
return 1;
}
aliasf = (unsigned short **) malloc(numaliasf * sizeof(unsigned short *));
aliasflen = (unsigned short *) malloc(numaliasf * sizeof(short));
if (!aliasf || !aliasflen) {
numaliasf = 0;
if (aliasf) free(aliasf);
if (aliasflen) free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
return 1;
}
np++;
break;
}
default: break;
}
i++;
}
free(piece);
piece = mystrsep(&tp, 0);
}
if (np != 2) {
numaliasf = 0;
free(aliasf);
free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
HUNSPELL_WARNING(stderr, "error: missing AF table information\n");
return 1;
}
/* now parse the numaliasf lines to read in the remainder of the table */
char * nl = line;
for (int j=0; j < numaliasf; j++) {
if (!fgets(nl,MAXDELEN,af)) return 1;
mychomp(nl);
tp = nl;
i = 0;
aliasf[j] = NULL;
aliasflen[j] = 0;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: {
if (strncmp(piece,"AF",2) != 0) {
numaliasf = 0;
free(aliasf);
free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
HUNSPELL_WARNING(stderr, "error: AF table is corrupt\n");
free(piece);
return 1;
}
break;
}
case 1: {
aliasflen[j] = (unsigned short) decode_flags(&(aliasf[j]), piece);
flag_qsort(aliasf[j], 0, aliasflen[j]);
break;
}
default: break;
}
i++;
}
free(piece);
piece = mystrsep(&tp, 0);
}
if (!aliasf[j]) {
free(aliasf);
free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
numaliasf = 0;
HUNSPELL_WARNING(stderr, "error: AF table is corrupt\n");
return 1;
}
}
return 0;
}
int HashMgr::is_aliasf() {
return (aliasf != NULL);
}
int HashMgr::get_aliasf(int index, unsigned short ** fvec) {
if ((index > 0) && (index <= numaliasf)) {
*fvec = aliasf[index - 1];
return aliasflen[index - 1];
}
HUNSPELL_WARNING(stderr, "error: bad flag alias index: %d\n", index);
*fvec = NULL;
return 0;
}
#ifdef HUNSPELL_EXPERIMENTAL
/* parse morph alias definitions */
int HashMgr::parse_aliasm(char * line, FILE * af)
{
if (numaliasm != 0) {
HUNSPELL_WARNING(stderr, "error: duplicate AM (aliases for morphological descriptions) tables used\n");
return 1;
}
char * tp = line;
char * piece;
int i = 0;
int np = 0;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: { np++; break; }
case 1: {
numaliasm = atoi(piece);
if (numaliasm < 1) {
HUNSPELL_WARNING(stderr, "incorrect number of entries in AM table\n");
free(piece);
return 1;
}
aliasm = (char **) malloc(numaliasm * sizeof(char *));
if (!aliasm) {
numaliasm = 0;
return 1;
}
np++;
break;
}
default: break;
}
i++;
}
free(piece);
piece = mystrsep(&tp, 0);
}
if (np != 2) {
numaliasm = 0;
free(aliasm);
aliasm = NULL;
HUNSPELL_WARNING(stderr, "error: missing AM alias information\n");
return 1;
}
/* now parse the numaliasm lines to read in the remainder of the table */
char * nl = line;
for (int j=0; j < numaliasm; j++) {
if (!fgets(nl,MAXDELEN,af)) return 1;
mychomp(nl);
tp = nl;
i = 0;
aliasm[j] = NULL;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: {
if (strncmp(piece,"AM",2) != 0) {
HUNSPELL_WARNING(stderr, "error: AM table is corrupt\n");
free(piece);
numaliasm = 0;
free(aliasm);
aliasm = NULL;
return 1;
}
break;
}
case 1: {
if (complexprefixes) {
if (utf8) reverseword_utf(piece);
else reverseword(piece);
}
aliasm[j] = mystrdup(piece);
break; }
default: break;
}
i++;
}
free(piece);
piece = mystrsep(&tp, 0);
}
if (!aliasm[j]) {
numaliasm = 0;
free(aliasm);
aliasm = NULL;
HUNSPELL_WARNING(stderr, "error: map table is corrupt\n");
return 1;
}
}
return 0;
}
int HashMgr::is_aliasm() {
return (aliasm != NULL);
}
char * HashMgr::get_aliasm(int index) {
if ((index > 0) && (index <= numaliasm)) return aliasm[index - 1];
HUNSPELL_WARNING(stderr, "error: bad morph. alias index: %d\n", index);
return NULL;
}
#endif

Просмотреть файл

@ -0,0 +1,121 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _HASHMGR_HXX_
#define _HASHMGR_HXX_
#include <cstdio>
#include "htypes.hxx"
enum flag { FLAG_CHAR, FLAG_LONG, FLAG_NUM, FLAG_UNI };
class HashMgr
{
int tablesize;
struct hentry * tableptr;
int userword;
flag flag_mode;
int complexprefixes;
int utf8;
unsigned short forbiddenword;
int langnum;
char * enc;
char * lang;
struct cs_info * csconv;
char * ignorechars;
unsigned short * ignorechars_utf16;
int ignorechars_utf16_len;
int numaliasf; // flag vector `compression' with aliases
unsigned short ** aliasf;
unsigned short * aliasflen;
int numaliasm; // morphological desciption `compression' with aliases
char ** aliasm;
public:
HashMgr(const char * tpath, const char * apath);
~HashMgr();
struct hentry * lookup(const char *) const;
int hash(const char *) const;
struct hentry * walk_hashtable(int & col, struct hentry * hp) const;
int put_word(const char * word, int wl, char * ap);
int put_word_pattern(const char * word, int wl, const char * pattern);
int decode_flags(unsigned short ** result, char * flags);
unsigned short decode_flag(const char * flag);
char * encode_flag(unsigned short flag);
int is_aliasf();
int get_aliasf(int index, unsigned short ** fvec);
#ifdef HUNSPELL_EXPERIMENTAL
int is_aliasm();
char * get_aliasm(int index);
#endif
private:
int load_tables(const char * tpath);
int add_word(const char * word, int wl, unsigned short * ap, int al,
const char * desc, bool onlyupcase);
int load_config(const char * affpath);
int parse_aliasf(char * line, FILE * af);
#ifdef HUNSPELL_EXPERIMENTAL
int parse_aliasm(char * line, FILE * af);
#endif
};
#endif

Просмотреть файл

@ -0,0 +1,84 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _HTYPES_HXX_
#define _HTYPES_HXX_
#define MAXDELEN 8192
#define ROTATE_LEN 5
#define ROTATE(v,q) \
(v) = ((v) << (q)) | (((v) >> (32 - q)) & ((1 << (q))-1));
// approx. number of user defined words
#define USERWORD 1000
struct hentry
{
short wlen;
short alen;
char wbeg[2];
char * word;
unsigned short * astr;
struct hentry * next;
struct hentry * next_homonym;
#ifdef HUNSPELL_EXPERIMENTAL
char * description;
#endif
};
#endif

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,89 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _MYSPELLMGR_H_
#define _MYSPELLMGR_H_
#ifdef __cplusplus
extern "C" {
#endif
typedef struct Hunhandle Hunhandle;
Hunhandle *Hunspell_create(const char * affpath, const char * dpath);
void Hunspell_destroy(Hunhandle *pHunspell);
/* spell(word) - spellcheck word
* output: 0 = bad word, not 0 = good word
*/
int Hunspell_spell(Hunhandle *pHunspell, const char *);
char *Hunspell_get_dic_encoding(Hunhandle *pHunspell);
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
int Hunspell_suggest(Hunhandle *pHunspell, char*** slst, const char * word);
#ifdef __cplusplus
}
#endif
#endif

Просмотреть файл

@ -0,0 +1,190 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#include "hashmgr.hxx"
#include "affixmgr.hxx"
#include "suggestmgr.hxx"
#include "csutil.hxx"
#include "langnum.hxx"
#define SPELL_COMPOUND (1 << 0)
#define SPELL_FORBIDDEN (1 << 1)
#define SPELL_ALLCAP (1 << 2)
#define SPELL_NOCAP (1 << 3)
#define SPELL_INITCAP (1 << 4)
#define MAXSUGGESTION 15
#define MAXSHARPS 5
#ifdef W32
#define DLLTEST2_API __declspec(dllexport)
#endif
#ifndef _MYSPELLMGR_HXX_
#define _MYSPELLMGR_HXX_
#ifdef W32
class DLLTEST2_API Hunspell
#else
class Hunspell
#endif
{
AffixMgr* pAMgr;
HashMgr* pHMgr;
SuggestMgr* pSMgr;
char * encoding;
struct cs_info * csconv;
int langnum;
int utf8;
int complexprefixes;
char** wordbreak;
public:
/* Hunspell(aff, dic) - constructor of Hunspell class
* input: path of affix file and dictionary file
*/
Hunspell(const char * affpath, const char * dpath);
~Hunspell();
/* spell(word) - spellcheck word
* output: 0 = bad word, not 0 = good word
*
* plus output:
* info: information bit array, fields:
* SPELL_COMPOUND = a compound word
* SPELL_FORBIDDEN = an explicit forbidden word
* root: root (stem), when input is a word with affix(es)
*/
int spell(const char * word, int * info = NULL, char ** root = NULL);
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
int suggest(char*** slst, const char * word);
char * get_dic_encoding();
/* handling custom dictionary */
int put_word(const char * word);
/* pattern is a sample dictionary word
* put word into custom dictionary with affix flags of pattern word
*/
int put_word_pattern(const char * word, const char * pattern);
/* other */
/* get extra word characters definied in affix file for tokenization */
const char * get_wordchars();
unsigned short * get_wordchars_utf16(int * len);
// struct cs_info * get_csconv();
// int utf16_isalpha(unsigned short c);
const char * get_version();
/* experimental functions */
#ifdef HUNSPELL_EXPERIMENTAL
/* suffix is an affix flag string, similarly in dictionary files */
int put_word_suffix(const char * word, const char * suffix);
/* morphological analysis */
char * morph(const char * word);
int analyze(char*** out, const char *word);
char * morph_with_correction(const char * word);
/* stemmer function */
int stem(char*** slst, const char * word);
/* spec. suggestions */
int suggest_auto(char*** slst, const char * word);
int suggest_pos_stems(char*** slst, const char * word);
char * get_possible_root();
#endif
private:
int cleanword(char *, const char *, int * pcaptype, int * pabbrev);
int cleanword2(char *, const char *, w_char *, int * w_len, int * pcaptype, int * pabbrev);
void mkinitcap(char *);
int mkinitcap2(char * p, w_char * u, int nc);
int mkinitsmall2(char * p, w_char * u, int nc);
void mkallcap(char *);
int mkallcap2(char * p, w_char * u, int nc);
void mkallsmall(char *);
int mkallsmall2(char * p, w_char * u, int nc);
struct hentry * checkword(const char *, int * info, char **root);
char * sharps_u8_l1(char * dest, char * source);
hentry * spellsharps(char * base, char *, int, int, char * tmp, int * info, char **root);
int is_keepcase(const hentry * rv);
int insert_sug(char ***slst, char * word, int ns);
};
#endif

Просмотреть файл

@ -0,0 +1,94 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _LANGNUM_HXX_
#define _LANGNUM_HXX_
/*
language numbers for language specific codes
see http://l10n.openoffice.org/languages.html
*/
enum {
LANG_ar=96,
LANG_az=100, // custom number
LANG_bg=41,
LANG_ca=37,
LANG_cs=42,
LANG_da=45,
LANG_de=49,
LANG_el=30,
LANG_en=01,
LANG_es=34,
LANG_eu=10,
LANG_fr=02,
LANG_gl=38,
LANG_hr=78,
LANG_hu=36,
LANG_it=39,
LANG_la=99, // custom number
LANG_lv=101, // custom number
LANG_nl=31,
LANG_pl=48,
LANG_pt=03,
LANG_ru=07,
LANG_sv=50,
LANG_tr=90,
LANG_uk=80,
LANG_xx=999
};
#endif

Просмотреть файл

@ -0,0 +1,491 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Michiel van Leeuwen (mvl@exedo.nl)
* Caolan McNamara (cmc@openoffice.org)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#include "mozHunspell.h"
#include "nsReadableUtils.h"
#include "nsXPIDLString.h"
#include "nsIObserverService.h"
#include "nsISimpleEnumerator.h"
#include "nsIDirectoryEnumerator.h"
#include "nsIFile.h"
#include "nsDirectoryServiceUtils.h"
#include "nsDirectoryServiceDefs.h"
#include "mozISpellI18NManager.h"
#include "nsICharsetConverterManager.h"
#include "nsUnicharUtilCIID.h"
#include "nsUnicharUtils.h"
#include "nsCRT.h"
#include <stdlib.h>
static NS_DEFINE_CID(kCharsetConverterManagerCID, NS_ICHARSETCONVERTERMANAGER_CID);
static NS_DEFINE_CID(kUnicharUtilCID, NS_UNICHARUTIL_CID);
NS_IMPL_ISUPPORTS3(mozHunspell,
mozISpellCheckingEngine,
nsIObserver,
nsISupportsWeakReference)
nsresult
mozHunspell::Init()
{
if (!mDictionaries.Init())
return NS_ERROR_OUT_OF_MEMORY;
LoadDictionaryList();
nsCOMPtr<nsIObserverService> obs =
do_GetService("@mozilla.org/observer-service;1");
if (obs) {
obs->AddObserver(this, "profile-do-change", PR_TRUE);
}
return NS_OK;
}
mozHunspell::~mozHunspell()
{
mPersonalDictionary = nsnull;
delete mHunspell;
}
/* attribute wstring dictionary; */
NS_IMETHODIMP mozHunspell::GetDictionary(PRUnichar **aDictionary)
{
NS_ENSURE_ARG_POINTER(aDictionary);
if (mDictionary.IsEmpty())
return NS_ERROR_NOT_INITIALIZED;
*aDictionary = ToNewUnicode(mDictionary);
return *aDictionary ? NS_OK : NS_ERROR_OUT_OF_MEMORY;
}
/* set the Dictionary.
* This also Loads the dictionary and initializes the converter using the dictionaries converter
*/
NS_IMETHODIMP mozHunspell::SetDictionary(const PRUnichar *aDictionary)
{
NS_ENSURE_ARG_POINTER(aDictionary);
if (mDictionary.Equals(aDictionary))
return NS_OK;
nsIFile* affFile = mDictionaries.GetWeak(nsDependentString(aDictionary));
if (!affFile)
return NS_ERROR_FILE_NOT_FOUND;
nsCAutoString dictFileName, affFileName;
// XXX This isn't really good. nsIFile->NativePath isn't safe for all
// character sets on Windows.
// A better way would be to QI to nsILocalFile, and get a filehandle
// from there. Only problem is that hunspell wants a path
nsresult rv = affFile->GetNativePath(affFileName);
NS_ENSURE_SUCCESS(rv, rv);
dictFileName = affFileName;
PRInt32 dotPos = dictFileName.RFindChar('.');
if (dotPos == -1)
return NS_ERROR_FAILURE;
dictFileName.SetLength(dotPos);
dictFileName.AppendLiteral(".dic");
// SetDictionary can be called multiple times, so we might have a
// valid mHunspell instance which needs cleaned up.
delete mHunspell;
mDictionary = aDictionary;
mHunspell = new Hunspell(affFileName.get(),
dictFileName.get());
if (!mHunspell)
return NS_ERROR_OUT_OF_MEMORY;
nsCOMPtr<nsICharsetConverterManager> ccm =
do_GetService(NS_CHARSETCONVERTERMANAGER_CONTRACTID, &rv);
NS_ENSURE_SUCCESS(rv, rv);
rv = ccm->GetUnicodeDecoder(mHunspell->get_dic_encoding(),
getter_AddRefs(mDecoder));
NS_ENSURE_SUCCESS(rv, rv);
rv = ccm->GetUnicodeEncoder(mHunspell->get_dic_encoding(),
getter_AddRefs(mEncoder));
NS_ENSURE_SUCCESS(rv, rv);
if (mEncoder)
mEncoder->SetOutputErrorBehavior(mEncoder->kOnError_Signal, nsnull, '?');
PRInt32 pos = mDictionary.FindChar('-');
if (pos == -1)
pos = mDictionary.FindChar('_');
if (pos == -1)
mLanguage.Assign(mDictionary);
else
mLanguage = Substring(mDictionary, 0, pos);
return NS_OK;
}
/* readonly attribute wstring language; */
NS_IMETHODIMP mozHunspell::GetLanguage(PRUnichar **aLanguage)
{
NS_ENSURE_ARG_POINTER(aLanguage);
if (mDictionary.IsEmpty())
return NS_ERROR_NOT_INITIALIZED;
*aLanguage = ToNewUnicode(mLanguage);
return *aLanguage ? NS_OK : NS_ERROR_OUT_OF_MEMORY;
}
/* readonly attribute boolean providesPersonalDictionary; */
NS_IMETHODIMP mozHunspell::GetProvidesPersonalDictionary(PRBool *aProvidesPersonalDictionary)
{
NS_ENSURE_ARG_POINTER(aProvidesPersonalDictionary);
*aProvidesPersonalDictionary = PR_FALSE;
return NS_OK;
}
/* readonly attribute boolean providesWordUtils; */
NS_IMETHODIMP mozHunspell::GetProvidesWordUtils(PRBool *aProvidesWordUtils)
{
NS_ENSURE_ARG_POINTER(aProvidesWordUtils);
*aProvidesWordUtils = PR_FALSE;
return NS_OK;
}
/* readonly attribute wstring name; */
NS_IMETHODIMP mozHunspell::GetName(PRUnichar * *aName)
{
return NS_ERROR_NOT_IMPLEMENTED;
}
/* readonly attribute wstring copyright; */
NS_IMETHODIMP mozHunspell::GetCopyright(PRUnichar * *aCopyright)
{
return NS_ERROR_NOT_IMPLEMENTED;
}
/* attribute mozIPersonalDictionary personalDictionary; */
NS_IMETHODIMP mozHunspell::GetPersonalDictionary(mozIPersonalDictionary * *aPersonalDictionary)
{
*aPersonalDictionary = mPersonalDictionary;
NS_IF_ADDREF(*aPersonalDictionary);
return NS_OK;
}
NS_IMETHODIMP mozHunspell::SetPersonalDictionary(mozIPersonalDictionary * aPersonalDictionary)
{
mPersonalDictionary = aPersonalDictionary;
return NS_OK;
}
struct AppendNewStruct
{
PRUnichar **dics;
PRUint32 count;
PRBool failed;
};
static PLDHashOperator
AppendNewString(const nsAString& aString, nsIFile* aFile, void* aClosure)
{
AppendNewStruct *ans = (AppendNewStruct*) aClosure;
ans->dics[ans->count] = ToNewUnicode(aString);
if (!ans->dics[ans->count]) {
ans->failed = PR_TRUE;
return PL_DHASH_STOP;
}
++ans->count;
return PL_DHASH_NEXT;
}
/* void GetDictionaryList ([array, size_is (count)] out wstring dictionaries, out PRUint32 count); */
NS_IMETHODIMP mozHunspell::GetDictionaryList(PRUnichar ***aDictionaries,
PRUint32 *aCount)
{
if (!aDictionaries || !aCount)
return NS_ERROR_NULL_POINTER;
AppendNewStruct ans = {
(PRUnichar**) NS_Alloc(sizeof(PRUnichar*) * mDictionaries.Count()),
0,
PR_FALSE
};
// This pointer is used during enumeration
mDictionaries.EnumerateRead(AppendNewString, &ans);
if (ans.failed) {
while (ans.count) {
--ans.count;
NS_Free(ans.dics[ans.count]);
}
NS_Free(ans.dics);
return NS_ERROR_OUT_OF_MEMORY;
}
*aDictionaries = ans.dics;
*aCount = ans.count;
return NS_OK;
}
void
mozHunspell::LoadDictionaryList()
{
mDictionaries.Clear();
nsresult rv;
nsCOMPtr<nsIProperties> dirSvc =
do_GetService(NS_DIRECTORY_SERVICE_CONTRACTID);
if (!dirSvc)
return;
nsCOMPtr<nsIFile> dictDir;
rv = dirSvc->Get(DICTIONARY_SEARCH_DIRECTORY,
NS_GET_IID(nsIFile), getter_AddRefs(dictDir));
if (NS_FAILED(rv)) {
// default to appdir/dictionaries
rv = dirSvc->Get(NS_XPCOM_CURRENT_PROCESS_DIR,
NS_GET_IID(nsIFile), getter_AddRefs(dictDir));
if (NS_FAILED(rv))
return;
dictDir->AppendNative(NS_LITERAL_CSTRING("dictionaries"));
}
LoadDictionariesFromDir(dictDir);
nsCOMPtr<nsISimpleEnumerator> dictDirs;
rv = dirSvc->Get(DICTIONARY_SEARCH_DIRECTORY_LIST,
NS_GET_IID(nsISimpleEnumerator), getter_AddRefs(dictDirs));
if (NS_FAILED(rv))
return;
PRBool hasMore;
while (NS_SUCCEEDED(dictDirs->HasMoreElements(&hasMore)) && hasMore) {
nsCOMPtr<nsISupports> elem;
dictDirs->GetNext(getter_AddRefs(elem));
dictDir = do_QueryInterface(elem);
if (dictDir)
LoadDictionariesFromDir(dictDir);
}
}
void
mozHunspell::LoadDictionariesFromDir(nsIFile* aDir)
{
nsresult rv;
PRBool check = PR_FALSE;
rv = aDir->Exists(&check);
if (NS_FAILED(rv) || !check)
return;
rv = aDir->IsDirectory(&check);
if (NS_FAILED(rv) || !check)
return;
nsCOMPtr<nsISimpleEnumerator> e;
rv = aDir->GetDirectoryEntries(getter_AddRefs(e));
if (NS_FAILED(rv))
return;
nsCOMPtr<nsIDirectoryEnumerator> files(do_QueryInterface(e));
if (!files)
return;
nsCOMPtr<nsIFile> file;
while (NS_SUCCEEDED(files->GetNextFile(getter_AddRefs(file))) && file) {
nsAutoString leafName;
file->GetLeafName(leafName);
if (!StringEndsWith(leafName, NS_LITERAL_STRING(".dic")))
continue;
nsAutoString dict(leafName);
dict.SetLength(dict.Length() - 4); // magic length of ".dic"
// check for the presence of the .aff file
leafName = dict;
leafName.AppendLiteral(".aff");
file->SetLeafName(leafName);
rv = file->Exists(&check);
if (NS_FAILED(rv) || !check)
continue;
#ifdef DEBUG_bsmedberg
printf("Adding dictionary: %s\n", NS_ConvertUTF16toUTF8(dict).get());
#endif
mDictionaries.Put(dict, file);
}
}
nsresult mozHunspell::ConvertCharset(const PRUnichar* aStr, char ** aDst)
{
NS_ENSURE_ARG_POINTER(aDst);
NS_ENSURE_TRUE(mEncoder, NS_ERROR_NULL_POINTER);
PRInt32 outLength;
PRInt32 inLength = nsCRT::strlen(aStr);
nsresult rv = mEncoder->GetMaxLength(aStr, inLength, &outLength);
NS_ENSURE_SUCCESS(rv, rv);
*aDst = (char *) nsMemory::Alloc(sizeof(char) * (outLength+1));
NS_ENSURE_TRUE(*aDst, NS_ERROR_OUT_OF_MEMORY);
rv = mEncoder->Convert(aStr, &inLength, *aDst, &outLength);
if (NS_SUCCEEDED(rv))
(*aDst)[outLength] = '\0';
return rv;
}
/* boolean Check (in wstring word); */
NS_IMETHODIMP mozHunspell::Check(const PRUnichar *aWord, PRBool *aResult)
{
NS_ENSURE_ARG_POINTER(aWord);
NS_ENSURE_ARG_POINTER(aResult);
NS_ENSURE_TRUE(mHunspell, NS_ERROR_FAILURE);
nsXPIDLCString charsetWord;
nsresult rv = ConvertCharset(aWord, getter_Copies(charsetWord));
NS_ENSURE_SUCCESS(rv, rv);
*aResult = mHunspell->spell(charsetWord);
if (!*aResult && mPersonalDictionary)
rv = mPersonalDictionary->Check(aWord, mLanguage.get(), aResult);
return rv;
}
/* void Suggest (in wstring word, [array, size_is (count)] out wstring suggestions, out PRUint32 count); */
NS_IMETHODIMP mozHunspell::Suggest(const PRUnichar *aWord, PRUnichar ***aSuggestions, PRUint32 *aSuggestionCount)
{
NS_ENSURE_ARG_POINTER(aSuggestions);
NS_ENSURE_ARG_POINTER(aSuggestionCount);
NS_ENSURE_TRUE(mHunspell, NS_ERROR_FAILURE);
nsresult rv;
*aSuggestionCount = 0;
nsXPIDLCString charsetWord;
rv = ConvertCharset(aWord, getter_Copies(charsetWord));
NS_ENSURE_SUCCESS(rv, rv);
char ** wlst;
*aSuggestionCount = mHunspell->suggest(&wlst, charsetWord);
if (*aSuggestionCount) {
*aSuggestions = (PRUnichar **)nsMemory::Alloc(*aSuggestionCount * sizeof(PRUnichar *));
if (*aSuggestions) {
PRUint32 index = 0;
for (index = 0; index < *aSuggestionCount && NS_SUCCEEDED(rv); ++index) {
// Convert the suggestion to utf16
PRInt32 inLength = nsCRT::strlen(wlst[index]);
PRInt32 outLength;
rv = mDecoder->GetMaxLength(wlst[index], inLength, &outLength);
if (NS_SUCCEEDED(rv))
{
(*aSuggestions)[index] = (PRUnichar *) nsMemory::Alloc(sizeof(PRUnichar) * (outLength+1));
if ((*aSuggestions)[index])
{
rv = mDecoder->Convert(wlst[index], &inLength, (*aSuggestions)[index], &outLength);
if (NS_SUCCEEDED(rv))
(*aSuggestions)[index][outLength] = 0;
}
else
rv = NS_ERROR_OUT_OF_MEMORY;
}
}
if (NS_FAILED(rv))
NS_FREE_XPCOM_ALLOCATED_POINTER_ARRAY(index, *aSuggestions); // free the PRUnichar strings up to the point at which the error occurred
}
else // if (*aSuggestions)
rv = NS_ERROR_OUT_OF_MEMORY;
}
NS_FREE_XPCOM_ALLOCATED_POINTER_ARRAY(*aSuggestionCount, wlst);
return rv;
}
NS_IMETHODIMP
mozHunspell::Observe(nsISupports* aSubj, const char *aTopic,
const PRUnichar *aData)
{
NS_ASSERTION(!strcmp(aTopic, "profile-do-change"),
"Unexpected observer topic");
LoadDictionaryList();
return NS_OK;
}

Просмотреть файл

@ -0,0 +1,113 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Michiel van Leeuwen (mvl@exedo.nl)
* Caolan McNamara (cmc@openoffice.org)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef mozHunspell_h__
#define mozHunspell_h__
#include "hunspell.hxx"
#include "mozISpellCheckingEngine.h"
#include "mozIPersonalDictionary.h"
#include "nsString.h"
#include "nsCOMPtr.h"
#include "nsIObserver.h"
#include "nsIUnicodeEncoder.h"
#include "nsIUnicodeDecoder.h"
#include "nsInterfaceHashtable.h"
#include "nsWeakReference.h"
#define MOZ_HUNSPELL_CONTRACTID "@mozilla.org/spellchecker/hunspell;1"
#define MOZ_HUNSPELL_CID \
/* 56c778e4-1bee-45f3-a689-886692a97fe7 */ \
{ 0x56c778e4, 0x1bee, 0x45f3, \
{ 0xa6, 0x89, 0x88, 0x66, 0x92, 0xa9, 0x7f, 0xe7 } }
class mozHunspell : public mozISpellCheckingEngine,
public nsIObserver,
public nsSupportsWeakReference
{
public:
NS_DECL_ISUPPORTS
NS_DECL_MOZISPELLCHECKINGENGINE
NS_DECL_NSIOBSERVER
mozHunspell() : mHunspell(nsnull) { }
virtual ~mozHunspell();
nsresult Init();
void LoadDictionaryList();
void LoadDictionariesFromDir(nsIFile* aDir);
// helper method for converting a word to the charset of the dictionary
nsresult ConvertCharset(const PRUnichar* aStr, char ** aDst);
protected:
nsCOMPtr<mozIPersonalDictionary> mPersonalDictionary;
nsCOMPtr<nsIUnicodeEncoder> mEncoder;
nsCOMPtr<nsIUnicodeDecoder> mDecoder;
// Hashtable matches dictionary name to .aff file
nsInterfaceHashtable<nsStringHashKey, nsIFile> mDictionaries;
nsString mDictionary;
nsString mLanguage;
Hunspell *mHunspell;
};
#endif

Просмотреть файл

@ -0,0 +1,178 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Benjamin Smedberg (benjamin@smedbergs.us) (Original Code)
* László Németh (nemethl@gyorsposta.hu)
* Ryan VanderMeulen (ryanvm@gmail.com)
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#include "mozHunspellDirProvider.h"
#include "nsXULAppAPI.h"
#include "nsString.h"
#include "mozISpellCheckingEngine.h"
#include "nsICategoryManager.h"
NS_IMPL_ISUPPORTS2(mozHunspellDirProvider,
nsIDirectoryServiceProvider,
nsIDirectoryServiceProvider2)
NS_IMETHODIMP
mozHunspellDirProvider::GetFile(const char *aKey, PRBool *aPersist,
nsIFile* *aResult)
{
return NS_ERROR_FAILURE;
}
NS_IMETHODIMP
mozHunspellDirProvider::GetFiles(const char *aKey,
nsISimpleEnumerator* *aResult)
{
if (strcmp(aKey, DICTIONARY_SEARCH_DIRECTORY_LIST) != 0) {
return NS_ERROR_FAILURE;
}
nsCOMPtr<nsIProperties> dirSvc =
do_GetService(NS_DIRECTORY_SERVICE_CONTRACTID);
if (!dirSvc)
return NS_ERROR_FAILURE;
nsCOMPtr<nsISimpleEnumerator> list;
nsresult rv = dirSvc->Get(XRE_EXTENSIONS_DIR_LIST,
NS_GET_IID(nsISimpleEnumerator),
getter_AddRefs(list));
if (NS_FAILED(rv))
return rv;
nsCOMPtr<nsISimpleEnumerator> e = new AppendingEnumerator(list);
if (!e)
return NS_ERROR_OUT_OF_MEMORY;
*aResult = nsnull;
e.swap(*aResult);
return NS_SUCCESS_AGGREGATE_RESULT;
}
NS_IMPL_ISUPPORTS1(mozHunspellDirProvider::AppendingEnumerator,
nsISimpleEnumerator)
NS_IMETHODIMP
mozHunspellDirProvider::AppendingEnumerator::HasMoreElements(PRBool *aResult)
{
*aResult = mNext ? PR_TRUE : PR_FALSE;
return NS_OK;
}
NS_IMETHODIMP
mozHunspellDirProvider::AppendingEnumerator::GetNext(nsISupports* *aResult)
{
if (aResult)
NS_ADDREF(*aResult = mNext);
mNext = nsnull;
nsresult rv;
// Ignore all errors
PRBool more;
while (NS_SUCCEEDED(mBase->HasMoreElements(&more)) && more) {
nsCOMPtr<nsISupports> nextbasesupp;
mBase->GetNext(getter_AddRefs(nextbasesupp));
nsCOMPtr<nsIFile> nextbase(do_QueryInterface(nextbasesupp));
if (!nextbase)
continue;
nextbase->Clone(getter_AddRefs(mNext));
if (!mNext)
continue;
mNext->AppendNative(NS_LITERAL_CSTRING("dictionaries"));
PRBool exists;
rv = mNext->Exists(&exists);
if (NS_SUCCEEDED(rv) && exists)
break;
mNext = nsnull;
}
return NS_OK;
}
mozHunspellDirProvider::AppendingEnumerator::AppendingEnumerator
(nsISimpleEnumerator* aBase) :
mBase(aBase)
{
// Initialize mNext to begin
GetNext(nsnull);
}
NS_METHOD
mozHunspellDirProvider::Register(nsIComponentManager* aCompMgr,
nsIFile* aPath, const char *aLoaderStr,
const char *aType,
const nsModuleComponentInfo *aInfo)
{
nsresult rv;
nsCOMPtr<nsICategoryManager> catMan =
do_GetService(NS_CATEGORYMANAGER_CONTRACTID);
if (!catMan)
return NS_ERROR_FAILURE;
rv = catMan->AddCategoryEntry(XPCOM_DIRECTORY_PROVIDER_CATEGORY,
"spellcheck-directory-provider",
kContractID, PR_TRUE, PR_TRUE, nsnull);
return rv;
}
NS_METHOD
mozHunspellDirProvider::Unregister(nsIComponentManager* aCompMgr,
nsIFile* aPath,
const char *aLoaderStr,
const nsModuleComponentInfo *aInfo)
{
nsresult rv;
nsCOMPtr<nsICategoryManager> catMan =
do_GetService(NS_CATEGORYMANAGER_CONTRACTID);
if (!catMan)
return NS_ERROR_FAILURE;
rv = catMan->DeleteCategoryEntry(XPCOM_DIRECTORY_PROVIDER_CATEGORY,
"spellcheck-directory-provider",
PR_TRUE);
return rv;
}
char const *const
mozHunspellDirProvider::kContractID = "@mozilla.org/spellcheck/dir-provider;1";

Просмотреть файл

@ -0,0 +1,81 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Benjamin Smedberg (benjamin@smedbergs.us) (Original Code)
* László Németh (nemethl@gyorsposta.hu)
* Ryan VanderMeulen (ryanvm@gmail.com)
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef mozHunspellDirProvider_h__
#define mozHunspellDirProvider_h__
#include "nsIDirectoryService.h"
#include "nsIGenericFactory.h"
#include "nsISimpleEnumerator.h"
class mozHunspellDirProvider :
public nsIDirectoryServiceProvider2
{
public:
NS_DECL_ISUPPORTS
NS_DECL_NSIDIRECTORYSERVICEPROVIDER
NS_DECL_NSIDIRECTORYSERVICEPROVIDER2
static NS_METHOD Register(nsIComponentManager* aCompMgr,
nsIFile* aPath, const char *aLoaderStr,
const char *aType,
const nsModuleComponentInfo *aInfo);
static NS_METHOD Unregister(nsIComponentManager* aCompMgr,
nsIFile* aPath, const char *aLoaderStr,
const nsModuleComponentInfo *aInfo);
static char const *const kContractID;
private:
class AppendingEnumerator : public nsISimpleEnumerator
{
public:
NS_DECL_ISUPPORTS
NS_DECL_NSISIMPLEENUMERATOR
AppendingEnumerator(nsISimpleEnumerator* aBase);
private:
nsCOMPtr<nsISimpleEnumerator> mBase;
nsCOMPtr<nsIFile> mNext;
};
};
#define HUNSPELLDIRPROVIDER_CID \
{ 0x64d6174c, 0x1496, 0x4ffd, \
{ 0x87, 0xf2, 0xda, 0x26, 0x70, 0xf8, 0x89, 0x34 } }
#endif // mozHunspellDirProvider

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,151 @@
/******* BEGIN LICENSE BLOCK *******
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
*
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
* László Németh (nemethl@gyorsposta.hu)
* David Einstein (deinst@world.std.com)
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
******* END LICENSE BLOCK *******/
#ifndef _SUGGESTMGR_HXX_
#define _SUGGESTMGR_HXX_
#define MAXSWL 100
#define MAXSWUTF8L (MAXSWL * 4)
#define MAX_ROOTS 100
#define MAX_WORDS 100
#define MAX_GUESS 100
#define MAXNGRAMSUGS 5
#define MINTIMER 500
#define MAXPLUSTIMER 500
#define NGRAM_IGNORE_LENGTH 0
#define NGRAM_LONGER_WORSE 1
#define NGRAM_ANY_MISMATCH 2
#include "atypes.hxx"
#include "affixmgr.hxx"
#include "hashmgr.hxx"
#include "langnum.hxx"
#include <time.h>
enum { LCS_UP, LCS_LEFT, LCS_UPLEFT };
class SuggestMgr
{
char * ctry;
int ctryl;
w_char * ctry_utf;
AffixMgr* pAMgr;
int maxSug;
struct cs_info * csconv;
int utf8;
int nosplitsugs;
int maxngramsugs;
int complexprefixes;
public:
SuggestMgr(const char * tryme, int maxn, AffixMgr *aptr);
~SuggestMgr();
int suggest(char*** slst, const char * word, int nsug);
int ngsuggest(char ** wlst, char * word, HashMgr* pHMgr);
int suggest_auto(char*** slst, const char * word, int nsug);
int suggest_stems(char*** slst, const char * word, int nsug);
int suggest_pos_stems(char*** slst, const char * word, int nsug);
char * suggest_morph(const char * word);
char * suggest_morph_for_spelling_error(const char * word);
private:
int testsug(char** wlst, const char * candidate, int wl, int ns, int cpdsuggest,
int * timer, time_t * timelimit);
int checkword(const char *, int, int, int *, time_t *);
int check_forbidden(const char *, int);
int capchars(char **, const char *, int, int);
int replchars(char**, const char *, int, int);
int doubletwochars(char**, const char *, int, int);
int forgotchar(char **, const char *, int, int);
int swapchar(char **, const char *, int, int);
int longswapchar(char **, const char *, int, int);
int movechar(char **, const char *, int, int);
int extrachar(char **, const char *, int, int);
int badchar(char **, const char *, int, int);
int twowords(char **, const char *, int, int);
int fixstems(char **, const char *, int);
int capchars_utf(char **, const w_char *, int wl, int, int);
int doubletwochars_utf(char**, const w_char *, int wl, int, int);
int forgotchar_utf(char**, const w_char *, int wl, int, int);
int extrachar_utf(char**, const w_char *, int wl, int, int);
int badchar_utf(char **, const w_char *, int wl, int, int);
int swapchar_utf(char **, const w_char *, int wl, int, int);
int longswapchar_utf(char **, const w_char *, int, int, int);
int movechar_utf(char **, const w_char *, int, int, int);
int mapchars(char**, const char *, int);
int map_related(const char *, int, char ** wlst, int, const mapentry*, int, int *, time_t *);
int map_related_utf(w_char *, int, int, char ** wlst, int, const mapentry*, int, int *, time_t *);
int ngram(int n, char * s1, const char * s2, int uselen);
int mystrlen(const char * word);
int equalfirstletter(char * s1, const char * s2);
int commoncharacterpositions(char * s1, const char * s2, int * is_swap);
void bubblesort( char ** rwd, int * rsc, int n);
void lcs(const char * s, const char * s2, int * l1, int * l2, char ** result);
int lcslen(const char * s, const char* s2);
};
#endif

Просмотреть файл

@ -0,0 +1,49 @@
OpenOffice.org Hunspell en_US dictionary
2007-03-20 release
--
This dictionary is based on a subset of the original
English wordlist created by Kevin Atkinson for Pspell
and Aspell and thus is covered by his original
LGPL license. The affix file is a heavily modified
version of the original english.aff file which was
released as part of Geoff Kuenning's Ispell and as
such is covered by his BSD license.
Thanks to both authors for there wonderful work.
ChangeLog
2007-03-20 nemeth AT OOo
- alot -> a_lot REP suggestion, add "a lot"
- add Mozilla words (blog, cafe, inline, online, eBay, PayPal, etc.)
- add cybercafé
- alias compression (saving 15 kB disk space + 0.8 MB memory)
Mozilla 355178 - add scot-free
Mozilla 374411 - add Scotty
Mozilla 359305 - add archaeology, archeological, archeologist
Mozilla 358754 - add doughnut
Mozilla 254814 - add gauging, canoeing, *canoing, proactively
Issue 71718 - remove *opthalmic, *opthalmology; *opthalmologic -> ophthalmologic
Issue 68550 - *estoppal -> estoppel
Issue 69345 - add tapenade
Issue 67975 - add assistive
Issue 63541 - remove *dessicate
Issue 62599 - add toolbar
2006-02-07 nemeth AT OOo
Issue 48060 - add ordinal numbers with COMPOUNDRULE (1st, 11th, 101st etc.)
Issue 29112, 55498 - add NOSUGGEST flags to taboo words
Issue 56755 - add sequitor (non sequitor)
Issue 50616 - add open source words (GNOME, KDE, OOo, OpenOffice.org)
Issue 56389 - add Mozilla words (Mozilla, Firefox, Thunderbird)
Issue 29110 - add okay
Issue 58468 - add advisors
Issue 58708 - add hiragana & katakana
Issue 60240 - add arginine, histidine, monovalent, polymorphism, pyroelectric, pyroelectricity
2005-11-01 dnaber AT OOo
Issue 25797 - add proven, advisor, etc.

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Разница между файлами не показана из-за своего большого размера Загрузить разницу