зеркало из https://github.com/mozilla/pjs.git
Bug #319778 --> hunspell spell check engine. This is the eventual replacement to myspell. Currently *NPOTB*
thanks to nemeth (the author of hunspell) for making it possible for this to be a part of mozilla. thanks to Ryan VanderMeulen for sheparding this patch along. sr=mscott
This commit is contained in:
Родитель
a0e47657da
Коммит
988c3cbf34
|
@ -0,0 +1,47 @@
|
|||
# ****** BEGIN LICENSE BLOCK ******
|
||||
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
#
|
||||
# The contents of this file are subject to the Mozilla Public License Version
|
||||
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
# http://www.mozilla.org/MPL/
|
||||
#
|
||||
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
# for the specific language governing rights and limitations under the
|
||||
# License.
|
||||
#
|
||||
# The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
# and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
# are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
#
|
||||
# Contributor(s): David Einstein (deinst@world.std.com)
|
||||
# László Németh (nemethl@gyorsposta.hu)
|
||||
# Ryan VanderMeulen (ryanvm@gmail.com)
|
||||
#
|
||||
# Alternatively, the contents of this file may be used under the terms of
|
||||
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
# of those above. If you wish to allow use of your version of this file only
|
||||
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
# use your version of this file under the terms of the MPL, indicate your
|
||||
# decision by deleting the provisions above and replace them with the notice
|
||||
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
# the provisions above, a recipient may use your version of this file under
|
||||
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||
#
|
||||
# ****** END LICENSE BLOCK ******
|
||||
|
||||
DEPTH = ../../..
|
||||
topsrcdir = @top_srcdir@
|
||||
srcdir = @srcdir@
|
||||
VPATH = @srcdir@
|
||||
|
||||
include $(DEPTH)/config/autoconf.mk
|
||||
|
||||
MODULE = hunspell
|
||||
DIRS = src
|
||||
|
||||
include $(topsrcdir)/config/rules.mk
|
||||
|
|
@ -0,0 +1,76 @@
|
|||
# ****** BEGIN LICENSE BLOCK ******
|
||||
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
#
|
||||
# The contents of this file are subject to the Mozilla Public License Version
|
||||
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
# http://www.mozilla.org/MPL/
|
||||
#
|
||||
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
# for the specific language governing rights and limitations under the
|
||||
# License.
|
||||
#
|
||||
# The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
# and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
# are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
#
|
||||
# Contributor(s): David Einstein (deinst@world.std.com)
|
||||
# László Németh (nemethl@gyorsposta.hu)
|
||||
# Ryan VanderMeulen (ryanvm@gmail.com)
|
||||
#
|
||||
# Alternatively, the contents of this file may be used under the terms of
|
||||
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
# of those above. If you wish to allow use of your version of this file only
|
||||
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
# use your version of this file under the terms of the MPL, indicate your
|
||||
# decision by deleting the provisions above and replace them with the notice
|
||||
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
# the provisions above, a recipient may use your version of this file under
|
||||
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||
#
|
||||
# ****** END LICENSE BLOCK ******
|
||||
|
||||
DEPTH = ../../../..
|
||||
topsrcdir = @top_srcdir@
|
||||
srcdir = @srcdir@
|
||||
VPATH = @srcdir@
|
||||
|
||||
include $(DEPTH)/config/autoconf.mk
|
||||
|
||||
MODULE = hunspell
|
||||
LIBRARY_NAME = hunspell_s
|
||||
FORCE_STATIC_LIB = 1
|
||||
LIBXUL_LIBRARY = 1
|
||||
|
||||
REQUIRES = xpcom \
|
||||
string \
|
||||
uconv \
|
||||
unicharutil \
|
||||
spellchecker \
|
||||
xulapp \
|
||||
$(NULL)
|
||||
|
||||
CPPSRCS = affentry.cpp \
|
||||
affixmgr.cpp \
|
||||
hashmgr.cpp \
|
||||
suggestmgr.cpp \
|
||||
csutil.cpp \
|
||||
hunspell.cpp \
|
||||
mozHunspell.cpp \
|
||||
$(NULL)
|
||||
|
||||
ifdef MOZ_XUL_APP
|
||||
CPPSRCS += mozHunspellDirProvider.cpp
|
||||
endif
|
||||
|
||||
EXTRA_DSO_LDOPTS = \
|
||||
$(LIBS_DIR) \
|
||||
$(XPCOM_LIBS) \
|
||||
$(NSPR_LIBS) \
|
||||
$(MOZ_UNICHARUTIL_LIBS) \
|
||||
$(NULL)
|
||||
|
||||
include $(topsrcdir)/config/rules.mk
|
|
@ -0,0 +1,59 @@
|
|||
******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Ryan VanderMeulen (ryanvm@gmail.com)
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******
|
||||
|
||||
Hunspell Version: 1.1.6
|
||||
|
||||
Hunspell Author: László Németh
|
||||
MySpell Author: Kevin Hendricks & David Einstein
|
||||
|
||||
Hunspell is a spell checker and morphological analyser library. Hunspell
|
||||
is based on OpenOffice.org's Myspell. Documentation, tests, and examples
|
||||
are available at http://hunspell.sourceforge.net.
|
||||
|
||||
A special thanks and credit goes to Geoff Kuenning, the creator of Ispell.
|
||||
MySpell's affix algorithms were based on those of Ispell, which should be
|
||||
noted is copyright Geoff Kuenning et.al. and now available under a BSD-style
|
||||
license. For more information on Ispell and affix compression in general,
|
||||
please see: http://lasr.cs.ucla.edu/geoff/ispell.html (Ispell homepage)
|
||||
|
||||
An almost complete rewrite of MySpell for use by the Mozilla project was
|
||||
developed by David Einstein. David was a significant help in improving MySpell.
|
||||
|
||||
Special thanks also goes to László Németh, who is the author of the Hungarian
|
||||
dictionary and who developed and contributed the code to support compound words
|
||||
in MySpell and fixed numerous problems with the encoding case conversion tables
|
||||
along with rewriting MySpell as Hunspell and ensuring compatibility with the
|
||||
Mozilla codebase.
|
|
@ -0,0 +1,923 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef MOZILLA_CLIENT
|
||||
#include <cstdlib>
|
||||
#include <cstring>
|
||||
#include <cctype>
|
||||
#include <cstdio>
|
||||
#else
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include <ctype.h>
|
||||
#endif
|
||||
|
||||
#include "affentry.hxx"
|
||||
#include "csutil.hxx"
|
||||
|
||||
#ifndef MOZILLA_CLIENT
|
||||
#ifndef W32
|
||||
using namespace std;
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
PfxEntry::PfxEntry(AffixMgr* pmgr, affentry* dp)
|
||||
{
|
||||
// register affix manager
|
||||
pmyMgr = pmgr;
|
||||
|
||||
// set up its intial values
|
||||
|
||||
aflag = dp->aflag; // flag
|
||||
strip = dp->strip; // string to strip
|
||||
appnd = dp->appnd; // string to append
|
||||
stripl = dp->stripl; // length of strip string
|
||||
appndl = dp->appndl; // length of append string
|
||||
numconds = dp->numconds; // number of conditions to match
|
||||
opts = dp->opts; // cross product flag
|
||||
// then copy over all of the conditions
|
||||
memcpy(&conds.base[0],&dp->conds.base[0],SETSIZE*sizeof(conds.base[0]));
|
||||
next = NULL;
|
||||
nextne = NULL;
|
||||
nexteq = NULL;
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
morphcode = dp->morphcode;
|
||||
#endif
|
||||
contclass = dp->contclass;
|
||||
contclasslen = dp->contclasslen;
|
||||
}
|
||||
|
||||
|
||||
PfxEntry::~PfxEntry()
|
||||
{
|
||||
aflag = 0;
|
||||
if (appnd) free(appnd);
|
||||
if (strip) free(strip);
|
||||
pmyMgr = NULL;
|
||||
appnd = NULL;
|
||||
strip = NULL;
|
||||
if (opts & aeUTF8) {
|
||||
for (int i = 0; i < 8; i++) {
|
||||
if (conds.utf8.wchars[i]) free(conds.utf8.wchars[i]);
|
||||
}
|
||||
}
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if (morphcode && !(opts & aeALIASM)) free(morphcode);
|
||||
#endif
|
||||
if (contclass && !(opts & aeALIASF)) free(contclass);
|
||||
}
|
||||
|
||||
// add prefix to this word assuming conditions hold
|
||||
char * PfxEntry::add(const char * word, int len)
|
||||
{
|
||||
char tword[MAXWORDUTF8LEN + 4];
|
||||
|
||||
if ((len > stripl) && (len >= numconds) && test_condition(word) &&
|
||||
(!stripl || (strncmp(word, strip, stripl) == 0)) &&
|
||||
((MAXWORDUTF8LEN + 4) > (len + appndl - stripl))) {
|
||||
/* we have a match so add prefix */
|
||||
char * pp = tword;
|
||||
if (appndl) {
|
||||
strcpy(tword,appnd);
|
||||
pp += appndl;
|
||||
}
|
||||
strcpy(pp, (word + stripl));
|
||||
return mystrdup(tword);
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
|
||||
inline int PfxEntry::test_condition(const char * st)
|
||||
{
|
||||
int cond;
|
||||
unsigned char * cp = (unsigned char *)st;
|
||||
if (!(opts & aeUTF8)) { // 256-character codepage
|
||||
for (cond = 0; cond < numconds; cond++) {
|
||||
if ((conds.base[*cp++] & (1 << cond)) == 0) return 0;
|
||||
}
|
||||
} else { // UTF-8 encoding
|
||||
unsigned short wc;
|
||||
for (cond = 0; cond < numconds; cond++) {
|
||||
// a simple 7-bit ASCII character in UTF-8
|
||||
if ((*cp >> 7) == 0) {
|
||||
// also check limit (end of word)
|
||||
if ((!*cp) || ((conds.utf8.ascii[*cp++] & (1 << cond)) == 0)) return 0;
|
||||
// UTF-8 multibyte character
|
||||
} else {
|
||||
// not dot wildcard in rule
|
||||
if (!conds.utf8.all[cond]) {
|
||||
if (conds.utf8.neg[cond]) {
|
||||
u8_u16((w_char *) &wc, 1, (char *) cp);
|
||||
if (conds.utf8.wchars[cond] &&
|
||||
flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
|
||||
wc, (short) conds.utf8.wlen[cond])) return 0;
|
||||
} else {
|
||||
if (!conds.utf8.wchars[cond]) return 0;
|
||||
u8_u16((w_char *) &wc, 1, (char *) cp);
|
||||
if (!flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
|
||||
wc, (short)conds.utf8.wlen[cond])) return 0;
|
||||
}
|
||||
}
|
||||
// jump to next UTF-8 character
|
||||
for(cp++; (*cp & 0xc0) == 0x80; cp++);
|
||||
}
|
||||
}
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
||||
// check if this prefix entry matches
|
||||
struct hentry * PfxEntry::checkword(const char * word, int len, char in_compound, const FLAG needflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
struct hentry * he; // hash entry of root word or NULL
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
|
||||
// on entry prefix is 0 length or already matches the beginning of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing prefix and adding
|
||||
// back any characters that would have been stripped
|
||||
|
||||
if (stripl) strcpy (tmpword, strip);
|
||||
strcpy ((tmpword + stripl), (word + appndl));
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being
|
||||
// tested
|
||||
|
||||
// if all conditions are met then check if resulting
|
||||
// root word in the dictionary
|
||||
|
||||
if (test_condition(tmpword)) {
|
||||
tmpl += stripl;
|
||||
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
|
||||
do {
|
||||
if (TESTAFF(he->astr, aflag, he->alen) &&
|
||||
// forbid single prefixes with pseudoroot flag
|
||||
! TESTAFF(contclass, pmyMgr->get_pseudoroot(), contclasslen) &&
|
||||
// needflag
|
||||
((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||
|
||||
(contclass && TESTAFF(contclass, needflag, contclasslen))))
|
||||
return he;
|
||||
he = he->next_homonym; // check homonyms
|
||||
} while (he);
|
||||
}
|
||||
|
||||
// prefix matched but no root word was found
|
||||
// if aeXPRODUCT is allowed, try again but now
|
||||
// ross checked combined with a suffix
|
||||
|
||||
//if ((opts & aeXPRODUCT) && in_compound) {
|
||||
if ((opts & aeXPRODUCT)) {
|
||||
he = pmyMgr->suffix_check(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this, NULL,
|
||||
0, NULL, FLAG_NULL, needflag, in_compound);
|
||||
if (he) return he;
|
||||
}
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
// check if this prefix entry matches
|
||||
struct hentry * PfxEntry::check_twosfx(const char * word, int len,
|
||||
char in_compound, const FLAG needflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
struct hentry * he; // hash entry of root word or NULL
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
|
||||
// on entry prefix is 0 length or already matches the beginning of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing prefix and adding
|
||||
// back any characters that would have been stripped
|
||||
|
||||
if (stripl) strcpy (tmpword, strip);
|
||||
strcpy ((tmpword + stripl), (word + appndl));
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being
|
||||
// tested
|
||||
|
||||
// if all conditions are met then check if resulting
|
||||
// root word in the dictionary
|
||||
|
||||
if (test_condition(tmpword)) {
|
||||
tmpl += stripl;
|
||||
|
||||
// prefix matched but no root word was found
|
||||
// if aeXPRODUCT is allowed, try again but now
|
||||
// cross checked combined with a suffix
|
||||
|
||||
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
|
||||
he = pmyMgr->suffix_check_twosfx(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this, needflag);
|
||||
if (he) return he;
|
||||
}
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
// check if this prefix entry matches
|
||||
char * PfxEntry::check_twosfx_morph(const char * word, int len,
|
||||
char in_compound, const FLAG needflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
|
||||
// on entry prefix is 0 length or already matches the beginning of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing prefix and adding
|
||||
// back any characters that would have been stripped
|
||||
|
||||
if (stripl) strcpy (tmpword, strip);
|
||||
strcpy ((tmpword + stripl), (word + appndl));
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being
|
||||
// tested
|
||||
|
||||
// if all conditions are met then check if resulting
|
||||
// root word in the dictionary
|
||||
|
||||
if (test_condition(tmpword)) {
|
||||
tmpl += stripl;
|
||||
|
||||
// prefix matched but no root word was found
|
||||
// if aeXPRODUCT is allowed, try again but now
|
||||
// ross checked combined with a suffix
|
||||
|
||||
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
|
||||
return pmyMgr->suffix_check_twosfx_morph(tmpword, tmpl,
|
||||
aeXPRODUCT, (AffEntry *)this, needflag);
|
||||
}
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
// check if this prefix entry matches
|
||||
char * PfxEntry::check_morph(const char * word, int len, char in_compound, const FLAG needflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
struct hentry * he; // hash entry of root word or NULL
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
char result[MAXLNLEN];
|
||||
char * st;
|
||||
|
||||
*result = '\0';
|
||||
|
||||
// on entry prefix is 0 length or already matches the beginning of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing prefix and adding
|
||||
// back any characters that would have been stripped
|
||||
|
||||
if (stripl) strcpy (tmpword, strip);
|
||||
strcpy ((tmpword + stripl), (word + appndl));
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being
|
||||
// tested
|
||||
|
||||
// if all conditions are met then check if resulting
|
||||
// root word in the dictionary
|
||||
|
||||
if (test_condition(tmpword)) {
|
||||
tmpl += stripl;
|
||||
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
|
||||
do {
|
||||
if (TESTAFF(he->astr, aflag, he->alen) &&
|
||||
// forbid single prefixes with pseudoroot flag
|
||||
! TESTAFF(contclass, pmyMgr->get_pseudoroot(), contclasslen) &&
|
||||
// needflag
|
||||
((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||
|
||||
(contclass && TESTAFF(contclass, needflag, contclasslen)))) {
|
||||
if (morphcode) strcat(result, morphcode); else strcat(result,getKey());
|
||||
if (he->description) {
|
||||
if ((*(he->description)=='[')||(*(he->description)=='<')) strcat(result,he->word);
|
||||
strcat(result,he->description);
|
||||
}
|
||||
strcat(result, "\n");
|
||||
}
|
||||
he = he->next_homonym;
|
||||
} while (he);
|
||||
}
|
||||
|
||||
// prefix matched but no root word was found
|
||||
// if aeXPRODUCT is allowed, try again but now
|
||||
// ross checked combined with a suffix
|
||||
|
||||
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
|
||||
st = pmyMgr->suffix_check_morph(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this,
|
||||
FLAG_NULL, needflag);
|
||||
if (st) {
|
||||
strcat(result, st);
|
||||
free(st);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (*result) return mystrdup(result);
|
||||
return NULL;
|
||||
}
|
||||
#endif // END OF HUNSPELL_EXPERIMENTAL CODE
|
||||
|
||||
SfxEntry::SfxEntry(AffixMgr * pmgr, affentry* dp)
|
||||
{
|
||||
// register affix manager
|
||||
pmyMgr = pmgr;
|
||||
|
||||
// set up its intial values
|
||||
aflag = dp->aflag; // char flag
|
||||
strip = dp->strip; // string to strip
|
||||
appnd = dp->appnd; // string to append
|
||||
stripl = dp->stripl; // length of strip string
|
||||
appndl = dp->appndl; // length of append string
|
||||
numconds = dp->numconds; // number of conditions to match
|
||||
opts = dp->opts; // cross product flag
|
||||
|
||||
// then copy over all of the conditions
|
||||
memcpy(&conds.base[0],&dp->conds.base[0],SETSIZE*sizeof(conds.base[0]));
|
||||
|
||||
rappnd = myrevstrdup(appnd);
|
||||
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
morphcode = dp->morphcode;
|
||||
#endif
|
||||
contclass = dp->contclass;
|
||||
contclasslen = dp->contclasslen;
|
||||
}
|
||||
|
||||
|
||||
SfxEntry::~SfxEntry()
|
||||
{
|
||||
aflag = 0;
|
||||
if (appnd) free(appnd);
|
||||
if (rappnd) free(rappnd);
|
||||
if (strip) free(strip);
|
||||
pmyMgr = NULL;
|
||||
appnd = NULL;
|
||||
strip = NULL;
|
||||
if (opts & aeUTF8) {
|
||||
for (int i = 0; i < 8; i++) {
|
||||
if (conds.utf8.wchars[i]) free(conds.utf8.wchars[i]);
|
||||
}
|
||||
}
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if (morphcode && !(opts & aeALIASM)) free(morphcode);
|
||||
#endif
|
||||
if (contclass && !(opts & aeALIASF)) free(contclass);
|
||||
}
|
||||
|
||||
// add suffix to this word assuming conditions hold
|
||||
char * SfxEntry::add(const char * word, int len)
|
||||
{
|
||||
char tword[MAXWORDUTF8LEN + 4];
|
||||
|
||||
/* make sure all conditions match */
|
||||
if ((len > stripl) && (len >= numconds) && test_condition(word + len, word) &&
|
||||
(!stripl || (strcmp(word + len - stripl, strip) == 0)) &&
|
||||
((MAXWORDUTF8LEN + 4) > (len + appndl - stripl))) {
|
||||
/* we have a match so add suffix */
|
||||
strcpy(tword,word);
|
||||
if (appndl) {
|
||||
strcpy(tword + len - stripl, appnd);
|
||||
} else {
|
||||
*(tword + len - stripl) = '\0';
|
||||
}
|
||||
return mystrdup(tword);
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
|
||||
inline int SfxEntry::test_condition(const char * st, const char * beg)
|
||||
{
|
||||
int cond;
|
||||
unsigned char * cp = (unsigned char *) st;
|
||||
if (!(opts & aeUTF8)) { // 256-character codepage
|
||||
// Domolki affix algorithm
|
||||
for (cond = numconds; --cond >= 0; ) {
|
||||
if ((conds.base[*--cp] & (1 << cond)) == 0) return 0;
|
||||
}
|
||||
} else { // UTF-8 encoding
|
||||
unsigned short wc;
|
||||
for (cond = numconds; --cond >= 0; ) {
|
||||
// go to next character position and check limit
|
||||
if ((char *) --cp < beg) return 0;
|
||||
// a simple 7-bit ASCII character in UTF-8
|
||||
if ((*cp >> 7) == 0) {
|
||||
if ((conds.utf8.ascii[*cp] & (1 << cond)) == 0) return 0;
|
||||
// UTF-8 multibyte character
|
||||
} else {
|
||||
// go to first character of UTF-8 multibyte character
|
||||
for (; (*cp & 0xc0) == 0x80; cp--);
|
||||
// not dot wildcard in rule
|
||||
if (!conds.utf8.all[cond]) {
|
||||
if (conds.utf8.neg[cond]) {
|
||||
u8_u16((w_char *) &wc, 1, (char *) cp);
|
||||
if (conds.utf8.wchars[cond] &&
|
||||
flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
|
||||
wc, (short) conds.utf8.wlen[cond])) return 0;
|
||||
} else {
|
||||
if (!conds.utf8.wchars[cond]) return 0;
|
||||
u8_u16((w_char *) &wc, 1, (char *) cp);
|
||||
if (!flag_bsearch((unsigned short *)conds.utf8.wchars[cond],
|
||||
wc, (short)conds.utf8.wlen[cond])) return 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
||||
|
||||
// see if this suffix is present in the word
|
||||
struct hentry * SfxEntry::checkword(const char * word, int len, int optflags,
|
||||
AffEntry* ppfx, char ** wlst, int maxSug, int * ns, const FLAG cclass, const FLAG needflag,
|
||||
const FLAG badflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
struct hentry * he; // hash entry pointer
|
||||
unsigned char * cp;
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
PfxEntry* ep = (PfxEntry *) ppfx;
|
||||
|
||||
// if this suffix is being cross checked with a prefix
|
||||
// but it does not support cross products skip it
|
||||
|
||||
if (((optflags & aeXPRODUCT) != 0) && ((opts & aeXPRODUCT) == 0))
|
||||
return NULL;
|
||||
|
||||
// upon entry suffix is 0 length or already matches the end of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
// the second condition is not enough for UTF-8 strings
|
||||
// it checked in test_condition()
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing suffix and adding
|
||||
// back any characters that would have been stripped or
|
||||
// or null terminating the shorter string
|
||||
|
||||
strcpy (tmpword, word);
|
||||
cp = (unsigned char *)(tmpword + tmpl);
|
||||
if (stripl) {
|
||||
strcpy ((char *)cp, strip);
|
||||
tmpl += stripl;
|
||||
cp = (unsigned char *)(tmpword + tmpl);
|
||||
} else *cp = '\0';
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being // tested
|
||||
|
||||
// if all conditions are met then check if resulting
|
||||
// root word in the dictionary
|
||||
|
||||
if (test_condition((char *) cp, (char *) tmpword)) {
|
||||
|
||||
#ifdef SZOSZABLYA_POSSIBLE_ROOTS
|
||||
fprintf(stdout,"%s %s %c\n", word, tmpword, aflag);
|
||||
#endif
|
||||
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
|
||||
do {
|
||||
// check conditional suffix (enabled by prefix)
|
||||
if ((TESTAFF(he->astr, aflag, he->alen) || (ep && ep->getCont() &&
|
||||
TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&
|
||||
(((optflags & aeXPRODUCT) == 0) ||
|
||||
TESTAFF(he->astr, ep->getFlag(), he->alen) ||
|
||||
// enabled by prefix
|
||||
((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
|
||||
) &&
|
||||
// handle cont. class
|
||||
((!cclass) ||
|
||||
((contclass) && TESTAFF(contclass, cclass, contclasslen))
|
||||
) &&
|
||||
// check only in compound homonyms (bad flags)
|
||||
(!badflag || !TESTAFF(he->astr, badflag, he->alen)
|
||||
) &&
|
||||
// handle required flag
|
||||
((!needflag) ||
|
||||
(TESTAFF(he->astr, needflag, he->alen) ||
|
||||
((contclass) && TESTAFF(contclass, needflag, contclasslen)))
|
||||
)
|
||||
) return he;
|
||||
he = he->next_homonym; // check homonyms
|
||||
} while (he);
|
||||
|
||||
// obsolote stemming code (used only by the
|
||||
// experimental SuffixMgr:suggest_pos_stems)
|
||||
// store resulting root in wlst
|
||||
} else if (wlst && (*ns < maxSug)) {
|
||||
int cwrd = 1;
|
||||
for (int k=0; k < *ns; k++)
|
||||
if (strcmp(tmpword, wlst[k]) == 0) cwrd = 0;
|
||||
if (cwrd) {
|
||||
wlst[*ns] = mystrdup(tmpword);
|
||||
if (wlst[*ns] == NULL) {
|
||||
for (int j=0; j<*ns; j++) free(wlst[j]);
|
||||
*ns = -1;
|
||||
return NULL;
|
||||
}
|
||||
(*ns)++;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
// see if two-level suffix is present in the word
|
||||
struct hentry * SfxEntry::check_twosfx(const char * word, int len, int optflags,
|
||||
AffEntry* ppfx, const FLAG needflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
struct hentry * he; // hash entry pointer
|
||||
unsigned char * cp;
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
PfxEntry* ep = (PfxEntry *) ppfx;
|
||||
|
||||
|
||||
// if this suffix is being cross checked with a prefix
|
||||
// but it does not support cross products skip it
|
||||
|
||||
if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)
|
||||
return NULL;
|
||||
|
||||
// upon entry suffix is 0 length or already matches the end of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing suffix and adding
|
||||
// back any characters that would have been stripped or
|
||||
// or null terminating the shorter string
|
||||
|
||||
strcpy (tmpword, word);
|
||||
cp = (unsigned char *)(tmpword + tmpl);
|
||||
if (stripl) {
|
||||
strcpy ((char *)cp, strip);
|
||||
tmpl += stripl;
|
||||
cp = (unsigned char *)(tmpword + tmpl);
|
||||
} else *cp = '\0';
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being
|
||||
// tested
|
||||
|
||||
// if all conditions are met then recall suffix_check
|
||||
|
||||
if (test_condition((char *) cp, (char *) tmpword)) {
|
||||
if (ppfx) {
|
||||
// handle conditional suffix
|
||||
if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
|
||||
he = pmyMgr->suffix_check(tmpword, tmpl, 0, NULL, NULL, 0, NULL, (FLAG) aflag, needflag);
|
||||
else
|
||||
he = pmyMgr->suffix_check(tmpword, tmpl, optflags, ppfx, NULL, 0, NULL, (FLAG) aflag, needflag);
|
||||
} else {
|
||||
he = pmyMgr->suffix_check(tmpword, tmpl, 0, NULL, NULL, 0, NULL, (FLAG) aflag, needflag);
|
||||
}
|
||||
if (he) return he;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
// see if two-level suffix is present in the word
|
||||
char * SfxEntry::check_twosfx_morph(const char * word, int len, int optflags,
|
||||
AffEntry* ppfx, const FLAG needflag)
|
||||
{
|
||||
int tmpl; // length of tmpword
|
||||
unsigned char * cp;
|
||||
char tmpword[MAXWORDUTF8LEN + 4];
|
||||
PfxEntry* ep = (PfxEntry *) ppfx;
|
||||
char * st;
|
||||
|
||||
char result[MAXLNLEN];
|
||||
|
||||
*result = '\0';
|
||||
|
||||
// if this suffix is being cross checked with a prefix
|
||||
// but it does not support cross products skip it
|
||||
|
||||
if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)
|
||||
return NULL;
|
||||
|
||||
// upon entry suffix is 0 length or already matches the end of the word.
|
||||
// So if the remaining root word has positive length
|
||||
// and if there are enough chars in root word and added back strip chars
|
||||
// to meet the number of characters conditions, then test it
|
||||
|
||||
tmpl = len - appndl;
|
||||
|
||||
if ((tmpl > 0) && (tmpl + stripl >= numconds)) {
|
||||
|
||||
// generate new root word by removing suffix and adding
|
||||
// back any characters that would have been stripped or
|
||||
// or null terminating the shorter string
|
||||
|
||||
strcpy (tmpword, word);
|
||||
cp = (unsigned char *)(tmpword + tmpl);
|
||||
if (stripl) {
|
||||
strcpy ((char *)cp, strip);
|
||||
tmpl += stripl;
|
||||
cp = (unsigned char *)(tmpword + tmpl);
|
||||
} else *cp = '\0';
|
||||
|
||||
// now make sure all of the conditions on characters
|
||||
// are met. Please see the appendix at the end of
|
||||
// this file for more info on exactly what is being
|
||||
// tested
|
||||
|
||||
// if all conditions are met then recall suffix_check
|
||||
|
||||
if (test_condition((char *) cp, (char *) tmpword)) {
|
||||
if (ppfx) {
|
||||
// handle conditional suffix
|
||||
if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen)) {
|
||||
st = pmyMgr->suffix_check_morph(tmpword, tmpl, 0, NULL, aflag, needflag);
|
||||
if (st) {
|
||||
if (((PfxEntry *) ppfx)->getMorph()) {
|
||||
strcat(result, ((PfxEntry *) ppfx)->getMorph());
|
||||
}
|
||||
strcat(result,st);
|
||||
free(st);
|
||||
mychomp(result);
|
||||
}
|
||||
} else {
|
||||
st = pmyMgr->suffix_check_morph(tmpword, tmpl, optflags, ppfx, aflag, needflag);
|
||||
if (st) {
|
||||
strcat(result, st);
|
||||
free(st);
|
||||
mychomp(result);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
st = pmyMgr->suffix_check_morph(tmpword, tmpl, 0, NULL, aflag, needflag);
|
||||
if (st) {
|
||||
strcat(result, st);
|
||||
free(st);
|
||||
mychomp(result);
|
||||
}
|
||||
}
|
||||
if (*result) return mystrdup(result);
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
#endif // END OF HUNSPELL_EXPERIMENTAL CODE
|
||||
|
||||
// get next homonym with same affix
|
||||
struct hentry * SfxEntry::get_next_homonym(struct hentry * he, int optflags, AffEntry* ppfx,
|
||||
const FLAG cclass, const FLAG needflag)
|
||||
{
|
||||
PfxEntry* ep = (PfxEntry *) ppfx;
|
||||
|
||||
while (he->next_homonym) {
|
||||
he = he->next_homonym;
|
||||
if ((TESTAFF(he->astr, aflag, he->alen) || (ep && ep->getCont() && TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&
|
||||
((optflags & aeXPRODUCT) == 0 ||
|
||||
TESTAFF(he->astr, ep->getFlag(), he->alen) ||
|
||||
// handle conditional suffix
|
||||
((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
|
||||
) &&
|
||||
// handle cont. class
|
||||
((!cclass) ||
|
||||
((contclass) && TESTAFF(contclass, cclass, contclasslen))
|
||||
) &&
|
||||
// handle required flag
|
||||
((!needflag) ||
|
||||
(TESTAFF(he->astr, needflag, he->alen) ||
|
||||
((contclass) && TESTAFF(contclass, needflag, contclasslen)))
|
||||
)
|
||||
) return he;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
|
||||
#if 0
|
||||
|
||||
Appendix: Understanding Affix Code
|
||||
|
||||
|
||||
An affix is either a prefix or a suffix attached to root words to make
|
||||
other words.
|
||||
|
||||
Basically a Prefix or a Suffix is set of AffEntry objects
|
||||
which store information about the prefix or suffix along
|
||||
with supporting routines to check if a word has a particular
|
||||
prefix or suffix or a combination.
|
||||
|
||||
The structure affentry is defined as follows:
|
||||
|
||||
struct affentry
|
||||
{
|
||||
unsigned short aflag; // ID used to represent the affix
|
||||
char * strip; // string to strip before adding affix
|
||||
char * appnd; // the affix string to add
|
||||
unsigned char stripl; // length of the strip string
|
||||
unsigned char appndl; // length of the affix string
|
||||
char numconds; // the number of conditions that must be met
|
||||
char opts; // flag: aeXPRODUCT- combine both prefix and suffix
|
||||
char conds[SETSIZE]; // array which encodes the conditions to be met
|
||||
};
|
||||
|
||||
|
||||
Here is a suffix borrowed from the en_US.aff file. This file
|
||||
is whitespace delimited.
|
||||
|
||||
SFX D Y 4
|
||||
SFX D 0 e d
|
||||
SFX D y ied [^aeiou]y
|
||||
SFX D 0 ed [^ey]
|
||||
SFX D 0 ed [aeiou]y
|
||||
|
||||
This information can be interpreted as follows:
|
||||
|
||||
In the first line has 4 fields
|
||||
|
||||
Field
|
||||
-----
|
||||
1 SFX - indicates this is a suffix
|
||||
2 D - is the name of the character flag which represents this suffix
|
||||
3 Y - indicates it can be combined with prefixes (cross product)
|
||||
4 4 - indicates that sequence of 4 affentry structures are needed to
|
||||
properly store the affix information
|
||||
|
||||
The remaining lines describe the unique information for the 4 SfxEntry
|
||||
objects that make up this affix. Each line can be interpreted
|
||||
as follows: (note fields 1 and 2 are as a check against line 1 info)
|
||||
|
||||
Field
|
||||
-----
|
||||
1 SFX - indicates this is a suffix
|
||||
2 D - is the name of the character flag for this affix
|
||||
3 y - the string of chars to strip off before adding affix
|
||||
(a 0 here indicates the NULL string)
|
||||
4 ied - the string of affix characters to add
|
||||
5 [^aeiou]y - the conditions which must be met before the affix
|
||||
can be applied
|
||||
|
||||
Field 5 is interesting. Since this is a suffix, field 5 tells us that
|
||||
there are 2 conditions that must be met. The first condition is that
|
||||
the next to the last character in the word must *NOT* be any of the
|
||||
following "a", "e", "i", "o" or "u". The second condition is that
|
||||
the last character of the word must end in "y".
|
||||
|
||||
So how can we encode this information concisely and be able to
|
||||
test for both conditions in a fast manner? The answer is found
|
||||
but studying the wonderful ispell code of Geoff Kuenning, et.al.
|
||||
(now available under a normal BSD license).
|
||||
|
||||
If we set up a conds array of 256 bytes indexed (0 to 255) and access it
|
||||
using a character (cast to an unsigned char) of a string, we have 8 bits
|
||||
of information we can store about that character. Specifically we
|
||||
could use each bit to say if that character is allowed in any of the
|
||||
last (or first for prefixes) 8 characters of the word.
|
||||
|
||||
Basically, each character at one end of the word (up to the number
|
||||
of conditions) is used to index into the conds array and the resulting
|
||||
value found there says whether the that character is valid for a
|
||||
specific character position in the word.
|
||||
|
||||
For prefixes, it does this by setting bit 0 if that char is valid
|
||||
in the first position, bit 1 if valid in the second position, and so on.
|
||||
|
||||
If a bit is not set, then that char is not valid for that postion in the
|
||||
word.
|
||||
|
||||
If working with suffixes bit 0 is used for the character closest
|
||||
to the front, bit 1 for the next character towards the end, ...,
|
||||
with bit numconds-1 representing the last char at the end of the string.
|
||||
|
||||
Note: since entries in the conds[] are 8 bits, only 8 conditions
|
||||
(read that only 8 character positions) can be examined at one
|
||||
end of a word (the beginning for prefixes and the end for suffixes.
|
||||
|
||||
So to make this clearer, lets encode the conds array values for the
|
||||
first two affentries for the suffix D described earlier.
|
||||
|
||||
|
||||
For the first affentry:
|
||||
numconds = 1 (only examine the last character)
|
||||
|
||||
conds['e'] = (1 << 0) (the word must end in an E)
|
||||
all others are all 0
|
||||
|
||||
For the second affentry:
|
||||
numconds = 2 (only examine the last two characters)
|
||||
|
||||
conds[X] = conds[X] | (1 << 0) (aeiou are not allowed)
|
||||
where X is all characters *but* a, e, i, o, or u
|
||||
|
||||
|
||||
conds['y'] = (1 << 1) (the last char must be a y)
|
||||
all other bits for all other entries in the conds array are zero
|
||||
|
||||
|
||||
#endif
|
||||
|
|
@ -0,0 +1,187 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _AFFIX_HXX_
|
||||
#define _AFFIX_HXX_
|
||||
|
||||
#include "atypes.hxx"
|
||||
#include "baseaffix.hxx"
|
||||
#include "affixmgr.hxx"
|
||||
|
||||
/* A Prefix Entry */
|
||||
|
||||
class PfxEntry : public AffEntry
|
||||
{
|
||||
AffixMgr* pmyMgr;
|
||||
|
||||
PfxEntry * next;
|
||||
PfxEntry * nexteq;
|
||||
PfxEntry * nextne;
|
||||
PfxEntry * flgnxt;
|
||||
|
||||
public:
|
||||
|
||||
PfxEntry(AffixMgr* pmgr, affentry* dp );
|
||||
~PfxEntry();
|
||||
|
||||
inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); }
|
||||
struct hentry * checkword(const char * word, int len, char in_compound,
|
||||
const FLAG needflag = FLAG_NULL);
|
||||
|
||||
struct hentry * check_twosfx(const char * word, int len, char in_compound, const FLAG needflag = NULL);
|
||||
|
||||
char * check_morph(const char * word, int len, char in_compound,
|
||||
const FLAG needflag = FLAG_NULL);
|
||||
|
||||
char * check_twosfx_morph(const char * word, int len,
|
||||
char in_compound, const FLAG needflag = FLAG_NULL);
|
||||
|
||||
inline FLAG getFlag() { return aflag; }
|
||||
inline const char * getKey() { return appnd; }
|
||||
char * add(const char * word, int len);
|
||||
|
||||
inline short getKeyLen() { return appndl; }
|
||||
|
||||
inline const char * getMorph() { return morphcode; }
|
||||
|
||||
inline const unsigned short * getCont() { return contclass; }
|
||||
inline short getContLen() { return contclasslen; }
|
||||
|
||||
inline PfxEntry * getNext() { return next; }
|
||||
inline PfxEntry * getNextNE() { return nextne; }
|
||||
inline PfxEntry * getNextEQ() { return nexteq; }
|
||||
inline PfxEntry * getFlgNxt() { return flgnxt; }
|
||||
|
||||
inline void setNext(PfxEntry * ptr) { next = ptr; }
|
||||
inline void setNextNE(PfxEntry * ptr) { nextne = ptr; }
|
||||
inline void setNextEQ(PfxEntry * ptr) { nexteq = ptr; }
|
||||
inline void setFlgNxt(PfxEntry * ptr) { flgnxt = ptr; }
|
||||
|
||||
inline int test_condition(const char * st);
|
||||
};
|
||||
|
||||
|
||||
|
||||
|
||||
/* A Suffix Entry */
|
||||
|
||||
class SfxEntry : public AffEntry
|
||||
{
|
||||
AffixMgr* pmyMgr;
|
||||
char * rappnd;
|
||||
|
||||
SfxEntry * next;
|
||||
SfxEntry * nexteq;
|
||||
SfxEntry * nextne;
|
||||
SfxEntry * flgnxt;
|
||||
|
||||
SfxEntry * l_morph;
|
||||
SfxEntry * r_morph;
|
||||
SfxEntry * eq_morph;
|
||||
|
||||
public:
|
||||
|
||||
SfxEntry(AffixMgr* pmgr, affentry* dp );
|
||||
~SfxEntry();
|
||||
|
||||
inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); }
|
||||
struct hentry * checkword(const char * word, int len, int optflags,
|
||||
AffEntry* ppfx, char ** wlst, int maxSug, int * ns,
|
||||
// const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, char in_compound=IN_CPD_NOT);
|
||||
const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, const FLAG badflag = 0);
|
||||
|
||||
struct hentry * check_twosfx(const char * word, int len, int optflags, AffEntry* ppfx, const FLAG needflag = NULL);
|
||||
|
||||
char * check_twosfx_morph(const char * word, int len, int optflags,
|
||||
AffEntry* ppfx, const FLAG needflag = FLAG_NULL);
|
||||
struct hentry * get_next_homonym(struct hentry * he);
|
||||
struct hentry * get_next_homonym(struct hentry * word, int optflags, AffEntry* ppfx,
|
||||
const FLAG cclass, const FLAG needflag);
|
||||
|
||||
|
||||
inline FLAG getFlag() { return aflag; }
|
||||
inline const char * getKey() { return rappnd; }
|
||||
char * add(const char * word, int len);
|
||||
|
||||
|
||||
inline const char * getMorph() { return morphcode; }
|
||||
|
||||
inline const unsigned short * getCont() { return contclass; }
|
||||
inline short getContLen() { return contclasslen; }
|
||||
inline const char * getAffix() { return appnd; }
|
||||
|
||||
inline short getKeyLen() { return appndl; }
|
||||
|
||||
inline SfxEntry * getNext() { return next; }
|
||||
inline SfxEntry * getNextNE() { return nextne; }
|
||||
inline SfxEntry * getNextEQ() { return nexteq; }
|
||||
|
||||
inline SfxEntry * getLM() { return l_morph; }
|
||||
inline SfxEntry * getRM() { return r_morph; }
|
||||
inline SfxEntry * getEQM() { return eq_morph; }
|
||||
inline SfxEntry * getFlgNxt() { return flgnxt; }
|
||||
|
||||
inline void setNext(SfxEntry * ptr) { next = ptr; }
|
||||
inline void setNextNE(SfxEntry * ptr) { nextne = ptr; }
|
||||
inline void setNextEQ(SfxEntry * ptr) { nexteq = ptr; }
|
||||
inline void setFlgNxt(SfxEntry * ptr) { flgnxt = ptr; }
|
||||
|
||||
inline int test_condition(const char * st, const char * begin);
|
||||
};
|
||||
|
||||
#endif
|
||||
|
||||
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,265 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _AFFIXMGR_HXX_
|
||||
#define _AFFIXMGR_HXX_
|
||||
|
||||
#ifdef MOZILLA_CLIENT
|
||||
#ifdef __SUNPRO_CC // for SunONE Studio compiler
|
||||
using namespace std;
|
||||
#endif
|
||||
#include <stdio.h>
|
||||
#else
|
||||
#include <cstdio>
|
||||
#endif
|
||||
|
||||
#include "atypes.hxx"
|
||||
#include "baseaffix.hxx"
|
||||
#include "hashmgr.hxx"
|
||||
|
||||
// check flag duplication
|
||||
#define dupSFX (1 << 0)
|
||||
#define dupPFX (1 << 1)
|
||||
|
||||
class AffixMgr
|
||||
{
|
||||
|
||||
AffEntry * pStart[SETSIZE];
|
||||
AffEntry * sStart[SETSIZE];
|
||||
AffEntry * pFlag[CONTSIZE];
|
||||
AffEntry * sFlag[CONTSIZE];
|
||||
HashMgr * pHMgr;
|
||||
char * trystring;
|
||||
char * encoding;
|
||||
struct cs_info * csconv;
|
||||
int utf8;
|
||||
int complexprefixes;
|
||||
FLAG compoundflag;
|
||||
FLAG compoundbegin;
|
||||
FLAG compoundmiddle;
|
||||
FLAG compoundend;
|
||||
FLAG compoundroot;
|
||||
FLAG compoundforbidflag;
|
||||
FLAG compoundpermitflag;
|
||||
int checkcompounddup;
|
||||
int checkcompoundrep;
|
||||
int checkcompoundcase;
|
||||
int checkcompoundtriple;
|
||||
FLAG forbiddenword;
|
||||
FLAG nosuggest;
|
||||
FLAG pseudoroot;
|
||||
int cpdmin;
|
||||
int numrep;
|
||||
replentry * reptable;
|
||||
int nummap;
|
||||
mapentry * maptable;
|
||||
int numbreak;
|
||||
char ** breaktable;
|
||||
int numcheckcpd;
|
||||
replentry * checkcpdtable;
|
||||
int numdefcpd;
|
||||
flagentry * defcpdtable;
|
||||
int maxngramsugs;
|
||||
int nosplitsugs;
|
||||
int sugswithdots;
|
||||
int cpdwordmax;
|
||||
int cpdmaxsyllable;
|
||||
char * cpdvowels;
|
||||
w_char * cpdvowels_utf16;
|
||||
int cpdvowels_utf16_len;
|
||||
char * cpdsyllablenum;
|
||||
const char * pfxappnd; // BUG: not stateless
|
||||
const char * sfxappnd; // BUG: not stateless
|
||||
FLAG sfxflag; // BUG: not stateless
|
||||
char * derived; // BUG: not stateless
|
||||
AffEntry * sfx; // BUG: not stateless
|
||||
AffEntry * pfx; // BUG: not stateless
|
||||
int checknum;
|
||||
char * wordchars;
|
||||
unsigned short * wordchars_utf16;
|
||||
int wordchars_utf16_len;
|
||||
char * ignorechars;
|
||||
unsigned short * ignorechars_utf16;
|
||||
int ignorechars_utf16_len;
|
||||
char * version;
|
||||
char * lang;
|
||||
int langnum;
|
||||
FLAG lemma_present;
|
||||
FLAG circumfix;
|
||||
FLAG onlyincompound;
|
||||
FLAG keepcase;
|
||||
int checksharps;
|
||||
|
||||
int havecontclass; // boolean variable
|
||||
char contclasses[CONTSIZE]; // flags of possible continuing classes (twofold affix)
|
||||
flag flag_mode;
|
||||
|
||||
public:
|
||||
|
||||
AffixMgr(const char * affpath, HashMgr * ptr);
|
||||
~AffixMgr();
|
||||
struct hentry * affix_check(const char * word, int len,
|
||||
const unsigned short needflag = (unsigned short) 0, char in_compound = IN_CPD_NOT);
|
||||
struct hentry * prefix_check(const char * word, int len,
|
||||
char in_compound, const FLAG needflag = FLAG_NULL);
|
||||
inline int isSubset(const char * s1, const char * s2);
|
||||
struct hentry * prefix_check_twosfx(const char * word, int len,
|
||||
char in_compound, const FLAG needflag = FLAG_NULL);
|
||||
inline int isRevSubset(const char * s1, const char * end_of_s2, int len);
|
||||
struct hentry * suffix_check(const char * word, int len, int sfxopts, AffEntry* ppfx,
|
||||
char ** wlst, int maxSug, int * ns, const FLAG cclass = FLAG_NULL,
|
||||
const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
|
||||
struct hentry * suffix_check_twosfx(const char * word, int len,
|
||||
int sfxopts, AffEntry* ppfx, const FLAG needflag = FLAG_NULL);
|
||||
|
||||
char * affix_check_morph(const char * word, int len,
|
||||
const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
|
||||
char * prefix_check_morph(const char * word, int len,
|
||||
char in_compound, const FLAG needflag = FLAG_NULL);
|
||||
char * suffix_check_morph (const char * word, int len, int sfxopts, AffEntry * ppfx,
|
||||
const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
|
||||
|
||||
char * prefix_check_twosfx_morph(const char * word, int len,
|
||||
char in_compound, const FLAG needflag = FLAG_NULL);
|
||||
char * suffix_check_twosfx_morph(const char * word, int len,
|
||||
int sfxopts, AffEntry * ppfx, const FLAG needflag = FLAG_NULL);
|
||||
|
||||
int expand_rootword(struct guessword * wlst, int maxn, const char * ts,
|
||||
int wl, const unsigned short * ap, unsigned short al, char * bad, int);
|
||||
|
||||
short get_syllable (const char * word, int wlen);
|
||||
int cpdrep_check(const char * word, int len);
|
||||
int cpdpat_check(const char * word, int len);
|
||||
int defcpd_check(hentry *** words, short wnum, hentry * rv, hentry ** rwords, char all);
|
||||
int cpdcase_check(const char * word, int len);
|
||||
inline int candidate_check(const char * word, int len);
|
||||
struct hentry * compound_check(const char * word, int len,
|
||||
short wordnum, short numsyllable, short maxwordnum, short wnum, hentry ** words,
|
||||
char hu_mov_rule, int * cmpdstemnum, int * cmpdstem, char is_sug);
|
||||
|
||||
int compound_check_morph(const char * word, int len,
|
||||
short wordnum, short numsyllable, short maxwordnum, short wnum, hentry ** words,
|
||||
char hu_mov_rule, char ** result, char * partresult);
|
||||
|
||||
struct hentry * lookup(const char * word);
|
||||
int get_numrep();
|
||||
struct replentry * get_reptable();
|
||||
int get_nummap();
|
||||
struct mapentry * get_maptable();
|
||||
int get_numbreak();
|
||||
char ** get_breaktable();
|
||||
char * get_encoding();
|
||||
int get_langnum();
|
||||
char * get_try_string();
|
||||
const char * get_wordchars();
|
||||
unsigned short * get_wordchars_utf16(int * len);
|
||||
char * get_ignore();
|
||||
unsigned short * get_ignore_utf16(int * len);
|
||||
int get_compound();
|
||||
FLAG get_compoundflag();
|
||||
FLAG get_compoundbegin();
|
||||
FLAG get_forbiddenword();
|
||||
FLAG get_nosuggest();
|
||||
// FLAG get_circumfix();
|
||||
FLAG get_pseudoroot();
|
||||
FLAG get_onlyincompound();
|
||||
FLAG get_compoundroot();
|
||||
FLAG get_lemma_present();
|
||||
int get_checknum();
|
||||
char * get_possible_root();
|
||||
const char * get_prefix();
|
||||
const char * get_suffix();
|
||||
const char * get_derived();
|
||||
const char * get_version();
|
||||
const int have_contclass();
|
||||
int get_utf8();
|
||||
int get_complexprefixes();
|
||||
char * get_suffixed(char );
|
||||
int get_maxngramsugs();
|
||||
int get_nosplitsugs();
|
||||
int get_sugswithdots(void);
|
||||
FLAG get_keepcase(void);
|
||||
int get_checksharps(void);
|
||||
|
||||
private:
|
||||
int parse_file(const char * affpath);
|
||||
// int parse_string(char * line, char ** out, const char * name);
|
||||
int parse_flag(char * line, unsigned short * out, const char * name);
|
||||
int parse_num(char * line, int * out, const char * name);
|
||||
// int parse_array(char * line, char ** out, unsigned short ** out_utf16,
|
||||
// int * out_utf16_len, const char * name);
|
||||
int parse_cpdsyllable(char * line);
|
||||
int parse_reptable(char * line, FILE * af);
|
||||
int parse_maptable(char * line, FILE * af);
|
||||
int parse_breaktable(char * line, FILE * af);
|
||||
int parse_checkcpdtable(char * line, FILE * af);
|
||||
int parse_defcpdtable(char * line, FILE * af);
|
||||
int parse_affix(char * line, const char at, FILE * af, char * dupflags);
|
||||
|
||||
int encodeit(struct affentry * ptr, char * cs);
|
||||
int build_pfxtree(AffEntry* pfxptr);
|
||||
int build_sfxtree(AffEntry* sfxptr);
|
||||
int process_pfx_order();
|
||||
int process_sfx_order();
|
||||
AffEntry * process_pfx_in_order(AffEntry * ptr, AffEntry * nptr);
|
||||
AffEntry * process_sfx_in_order(AffEntry * ptr, AffEntry * nptr);
|
||||
int process_pfx_tree_to_list();
|
||||
int process_sfx_tree_to_list();
|
||||
int redundant_condition(char, char * strip, int stripl, const char * cond, char *);
|
||||
};
|
||||
|
||||
#endif
|
||||
|
|
@ -0,0 +1,157 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _ATYPES_HXX_
|
||||
#define _ATYPES_HXX_
|
||||
|
||||
#ifndef HUNSPELL_WARNING
|
||||
#ifdef HUNSPELL_WARNING_ON
|
||||
#define HUNSPELL_WARNING fprintf
|
||||
#else
|
||||
#define HUNSPELL_WARNING
|
||||
#endif
|
||||
#endif
|
||||
|
||||
// HUNSTEM def.
|
||||
#define HUNSTEM
|
||||
|
||||
#include "csutil.hxx"
|
||||
#include "hashmgr.hxx"
|
||||
|
||||
#define SETSIZE 256
|
||||
#define CONTSIZE 65536
|
||||
#define MAXWORDLEN 100
|
||||
#define MAXWORDUTF8LEN (MAXWORDLEN * 4)
|
||||
|
||||
// affentry options
|
||||
#define aeXPRODUCT (1 << 0)
|
||||
#define aeUTF8 (1 << 1)
|
||||
#define aeALIASF (1 << 2)
|
||||
#define aeALIASM (1 << 3)
|
||||
#define aeINFIX (1 << 4)
|
||||
|
||||
// compound options
|
||||
#define IN_CPD_NOT 0
|
||||
#define IN_CPD_BEGIN 1
|
||||
#define IN_CPD_END 2
|
||||
#define IN_CPD_OTHER 3
|
||||
|
||||
#define MAXLNLEN 8192 * 4
|
||||
|
||||
#define MINCPDLEN 3
|
||||
#define MAXCOMPOUND 10
|
||||
|
||||
#define MAXACC 1000
|
||||
|
||||
#define FLAG unsigned short
|
||||
#define FLAG_NULL 0x00
|
||||
#define FREE_FLAG(a) a = 0
|
||||
|
||||
#define TESTAFF( a, b , c ) flag_bsearch((unsigned short *) a, (unsigned short) b, c)
|
||||
|
||||
struct affentry
|
||||
{
|
||||
char * strip;
|
||||
char * appnd;
|
||||
unsigned char stripl;
|
||||
unsigned char appndl;
|
||||
char numconds;
|
||||
char opts;
|
||||
unsigned short aflag;
|
||||
union {
|
||||
char base[SETSIZE];
|
||||
struct {
|
||||
char ascii[SETSIZE/2];
|
||||
char neg[8];
|
||||
char all[8];
|
||||
w_char * wchars[8];
|
||||
int wlen[8];
|
||||
} utf8;
|
||||
} conds;
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
char * morphcode;
|
||||
#endif
|
||||
unsigned short * contclass;
|
||||
short contclasslen;
|
||||
};
|
||||
|
||||
struct replentry {
|
||||
char * pattern;
|
||||
char * pattern2;
|
||||
};
|
||||
|
||||
struct mapentry {
|
||||
char * set;
|
||||
w_char * set_utf16;
|
||||
int len;
|
||||
};
|
||||
|
||||
struct flagentry {
|
||||
FLAG * def;
|
||||
int len;
|
||||
};
|
||||
|
||||
struct guessword {
|
||||
char * word;
|
||||
bool allow;
|
||||
};
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _BASEAFF_HXX_
|
||||
#define _BASEAFF_HXX_
|
||||
|
||||
class AffEntry
|
||||
{
|
||||
public:
|
||||
|
||||
protected:
|
||||
char * appnd;
|
||||
char * strip;
|
||||
unsigned char appndl;
|
||||
unsigned char stripl;
|
||||
char numconds;
|
||||
char opts;
|
||||
unsigned short aflag;
|
||||
union {
|
||||
char base[SETSIZE];
|
||||
struct {
|
||||
char ascii[SETSIZE/2];
|
||||
char neg[8];
|
||||
char all[8];
|
||||
w_char * wchars[8];
|
||||
int wlen[8];
|
||||
} utf8;
|
||||
} conds;
|
||||
char * morphcode;
|
||||
unsigned short * contclass;
|
||||
short contclasslen;
|
||||
};
|
||||
|
||||
#endif
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,215 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef __CSUTILHXX__
|
||||
#define __CSUTILHXX__
|
||||
|
||||
// First some base level utility routines
|
||||
|
||||
#define NOCAP 0
|
||||
#define INITCAP 1
|
||||
#define ALLCAP 2
|
||||
#define HUHCAP 3
|
||||
#define HUHINITCAP 4
|
||||
|
||||
#define ONLYUPCASEFLAG 65535
|
||||
|
||||
typedef struct {
|
||||
unsigned char l;
|
||||
unsigned char h;
|
||||
} w_char;
|
||||
|
||||
// convert UTF-16 characters to UTF-8
|
||||
char * u16_u8(char * dest, int size, const w_char * src, int srclen);
|
||||
|
||||
// convert UTF-8 characters to UTF-16
|
||||
int u8_u16(w_char * dest, int size, const char * src);
|
||||
|
||||
// sort 2-byte vector
|
||||
void flag_qsort(unsigned short flags[], int begin, int end);
|
||||
|
||||
// binary search in 2-byte vector
|
||||
int flag_bsearch(unsigned short flags[], unsigned short flag, int right);
|
||||
|
||||
// remove end of line char(s)
|
||||
void mychomp(char * s);
|
||||
|
||||
// duplicate string
|
||||
char * mystrdup(const char * s);
|
||||
|
||||
// duplicate reverse of string
|
||||
char * myrevstrdup(const char * s);
|
||||
|
||||
// parse into tokens with char delimiter
|
||||
char * mystrsep(char ** sptr, const char delim);
|
||||
// parse into tokens with char delimiter
|
||||
char * mystrsep2(char ** sptr, const char delim);
|
||||
|
||||
// parse into tokens with char delimiter
|
||||
char * mystrrep(char *, const char *, const char *);
|
||||
|
||||
// append s to ends of every lines in text
|
||||
void strlinecat(char * lines, const char * s);
|
||||
|
||||
// tokenize into lines with new line
|
||||
int line_tok(const char * text, char *** lines);
|
||||
|
||||
// tokenize into lines with new line and uniq in place
|
||||
char * line_uniq(char * text);
|
||||
|
||||
// change \n to c in place
|
||||
char * line_join(char * text, char c);
|
||||
|
||||
// leave only last {[^}]*} pattern in string
|
||||
char * delete_zeros(char * morphout);
|
||||
|
||||
// reverse word
|
||||
int reverseword(char *);
|
||||
|
||||
// reverse word
|
||||
int reverseword_utf(char *);
|
||||
|
||||
// character encoding information
|
||||
struct cs_info {
|
||||
unsigned char ccase;
|
||||
unsigned char clower;
|
||||
unsigned char cupper;
|
||||
};
|
||||
|
||||
// Unicode character encoding information
|
||||
struct unicode_info {
|
||||
unsigned short c;
|
||||
unsigned short cupper;
|
||||
unsigned short clower;
|
||||
};
|
||||
|
||||
struct unicode_info2 {
|
||||
char cletter;
|
||||
unsigned short cupper;
|
||||
unsigned short clower;
|
||||
};
|
||||
|
||||
int initialize_utf_tbl();
|
||||
void free_utf_tbl();
|
||||
unsigned short unicodetoupper(unsigned short c, int langnum);
|
||||
unsigned short unicodetolower(unsigned short c, int langnum);
|
||||
int unicodeisalpha(unsigned short c);
|
||||
|
||||
struct enc_entry {
|
||||
const char * enc_name;
|
||||
struct cs_info * cs_table;
|
||||
};
|
||||
|
||||
// language to encoding default map
|
||||
|
||||
struct lang_map {
|
||||
const char * lang;
|
||||
const char * def_enc;
|
||||
int num;
|
||||
};
|
||||
|
||||
struct cs_info * get_current_cs(const char * es);
|
||||
|
||||
const char * get_default_enc(const char * lang);
|
||||
|
||||
// get language identifiers of language codes
|
||||
int get_lang_num(const char * lang);
|
||||
|
||||
// get characters of the given 8bit encoding with lower- and uppercase forms
|
||||
char * get_casechars(const char * enc);
|
||||
|
||||
// convert null terminated string to all caps using encoding
|
||||
void enmkallcap(char * d, const char * p, const char * encoding);
|
||||
|
||||
// convert null terminated string to all little using encoding
|
||||
void enmkallsmall(char * d, const char * p, const char * encoding);
|
||||
|
||||
// convert null terminated string to have intial capital using encoding
|
||||
void enmkinitcap(char * d, const char * p, const char * encoding);
|
||||
|
||||
// convert null terminated string to all caps
|
||||
void mkallcap(char * p, const struct cs_info * csconv);
|
||||
|
||||
// convert null terminated string to all little
|
||||
void mkallsmall(char * p, const struct cs_info * csconv);
|
||||
|
||||
// convert null terminated string to have intial capital
|
||||
void mkinitcap(char * p, const struct cs_info * csconv);
|
||||
|
||||
// convert first nc characters of UTF-8 string to little
|
||||
void mkallsmall_utf(w_char * u, int nc, int langnum);
|
||||
|
||||
// convert first nc characters of UTF-8 string to capital
|
||||
void mkallcap_utf(w_char * u, int nc, int langnum);
|
||||
|
||||
// get type of capitalization
|
||||
int get_captype(char * q, int nl, cs_info *);
|
||||
|
||||
// get type of capitalization (UTF-8)
|
||||
int get_captype_utf8(char * q, int nl, int langnum);
|
||||
|
||||
// strip all ignored characters in the string
|
||||
void remove_ignored_chars_utf(char * word, unsigned short ignored_chars[], int ignored_len);
|
||||
|
||||
// strip all ignored characters in the string
|
||||
void remove_ignored_chars(char * word, char * ignored_chars);
|
||||
|
||||
int parse_string(char * line, char ** out, const char * name);
|
||||
|
||||
int parse_array(char * line, char ** out,
|
||||
unsigned short ** out_utf16, int * out_utf16_len, const char * name, int utf8);
|
||||
|
||||
#endif
|
|
@ -0,0 +1,915 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef MOZILLA_CLIENT
|
||||
#include <cstdlib>
|
||||
#include <cstring>
|
||||
#include <cstdio>
|
||||
#include <cctype>
|
||||
#else
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include <ctype.h>
|
||||
#endif
|
||||
|
||||
#include "hashmgr.hxx"
|
||||
#include "csutil.hxx"
|
||||
#include "atypes.hxx"
|
||||
|
||||
#ifdef MOZILLA_CLIENT
|
||||
#ifdef __SUNPRO_CC // for SunONE Studio compiler
|
||||
using namespace std;
|
||||
#endif
|
||||
#else
|
||||
#ifndef W32
|
||||
using namespace std;
|
||||
#endif
|
||||
#endif
|
||||
|
||||
// build a hash table from a munched word list
|
||||
|
||||
HashMgr::HashMgr(const char * tpath, const char * apath)
|
||||
{
|
||||
tablesize = 0;
|
||||
tableptr = NULL;
|
||||
flag_mode = FLAG_CHAR;
|
||||
complexprefixes = 0;
|
||||
utf8 = 0;
|
||||
langnum = 0;
|
||||
lang = NULL;
|
||||
enc = NULL;
|
||||
csconv = 0;
|
||||
ignorechars = NULL;
|
||||
ignorechars_utf16 = NULL;
|
||||
ignorechars_utf16_len = 0;
|
||||
numaliasf = 0;
|
||||
aliasf = NULL;
|
||||
numaliasm = 0;
|
||||
aliasm = NULL;
|
||||
forbiddenword = FLAG_NULL; // forbidden word signing flag
|
||||
load_config(apath);
|
||||
int ec = load_tables(tpath);
|
||||
if (ec) {
|
||||
/* error condition - what should we do here */
|
||||
HUNSPELL_WARNING(stderr, "Hash Manager Error : %d\n",ec);
|
||||
if (tableptr) {
|
||||
free(tableptr);
|
||||
}
|
||||
tablesize = 0;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
HashMgr::~HashMgr()
|
||||
{
|
||||
if (tableptr) {
|
||||
// now pass through hash table freeing up everything
|
||||
// go through column by column of the table
|
||||
for (int i=0; i < tablesize; i++) {
|
||||
struct hentry * pt = &tableptr[i];
|
||||
struct hentry * nt = NULL;
|
||||
if (pt) {
|
||||
if (pt->astr && !aliasf) free(pt->astr);
|
||||
if (pt->word) free(pt->word);
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if (pt->description && !aliasm) free(pt->description);
|
||||
#endif
|
||||
pt = pt->next;
|
||||
}
|
||||
while(pt) {
|
||||
nt = pt->next;
|
||||
if (pt->astr && !aliasf) free(pt->astr);
|
||||
if (pt->word) free(pt->word);
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if (pt->description && !aliasm) free(pt->description);
|
||||
#endif
|
||||
free(pt);
|
||||
pt = nt;
|
||||
}
|
||||
}
|
||||
free(tableptr);
|
||||
}
|
||||
tablesize = 0;
|
||||
|
||||
if (aliasf) {
|
||||
for (int j = 0; j < (numaliasf); j++) free(aliasf[j]);
|
||||
free(aliasf);
|
||||
aliasf = NULL;
|
||||
if (aliasflen) {
|
||||
free(aliasflen);
|
||||
aliasflen = NULL;
|
||||
}
|
||||
}
|
||||
if (aliasm) {
|
||||
for (int j = 0; j < (numaliasm); j++) free(aliasm[j]);
|
||||
free(aliasm);
|
||||
aliasm = NULL;
|
||||
}
|
||||
|
||||
if (enc) free(enc);
|
||||
if (lang) free(lang);
|
||||
|
||||
if (ignorechars) free(ignorechars);
|
||||
if (ignorechars_utf16) free(ignorechars_utf16);
|
||||
}
|
||||
|
||||
// lookup a root word in the hashtable
|
||||
|
||||
struct hentry * HashMgr::lookup(const char *word) const
|
||||
{
|
||||
struct hentry * dp;
|
||||
if (tableptr) {
|
||||
dp = &tableptr[hash(word)];
|
||||
if (dp->word == NULL) return NULL;
|
||||
for ( ; dp != NULL; dp = dp->next) {
|
||||
if (strcmp(word,dp->word) == 0) return dp;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
// add a word to the hash table (private)
|
||||
|
||||
int HashMgr::add_word(const char * word, int wl, unsigned short * aff,
|
||||
int al, const char * desc, bool onlyupcase)
|
||||
{
|
||||
char * st = mystrdup(word);
|
||||
bool upcasehomonym = false;
|
||||
if (wl && !st) return 1;
|
||||
if (ignorechars != NULL) {
|
||||
if (utf8) {
|
||||
remove_ignored_chars_utf(st, ignorechars_utf16, ignorechars_utf16_len);
|
||||
} else {
|
||||
remove_ignored_chars(st, ignorechars);
|
||||
}
|
||||
}
|
||||
if (complexprefixes) {
|
||||
if (utf8) reverseword_utf(st); else reverseword(st);
|
||||
}
|
||||
int i = hash(st);
|
||||
struct hentry * dp = &tableptr[i];
|
||||
if (dp->word == NULL) {
|
||||
dp->wlen = (short) wl;
|
||||
dp->alen = (short) al;
|
||||
dp->word = st;
|
||||
dp->astr = aff;
|
||||
dp->next = NULL;
|
||||
dp->next_homonym = NULL;
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if (aliasm) {
|
||||
dp->description = (desc) ? get_aliasm(atoi(desc)) : mystrdup(desc);
|
||||
} else {
|
||||
dp->description = mystrdup(desc);
|
||||
if (desc && !dp->description) return 1;
|
||||
if (dp->description && complexprefixes) {
|
||||
if (utf8) reverseword_utf(dp->description); else reverseword(dp->description);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
} else {
|
||||
struct hentry* hp = (struct hentry *) malloc (sizeof(struct hentry));
|
||||
if (!hp) return 1;
|
||||
hp->wlen = (short) wl;
|
||||
hp->alen = (short) al;
|
||||
hp->word = st;
|
||||
hp->astr = aff;
|
||||
hp->next = NULL;
|
||||
hp->next_homonym = NULL;
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if (aliasm) {
|
||||
hp->description = (desc) ? get_aliasm(atoi(desc)) : mystrdup(desc);
|
||||
} else {
|
||||
hp->description = mystrdup(desc);
|
||||
if (desc && !hp->description) return 1;
|
||||
if (dp->description && complexprefixes) {
|
||||
if (utf8) reverseword_utf(hp->description); else reverseword(hp->description);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
while (dp->next != NULL) {
|
||||
if ((!dp->next_homonym) && (strcmp(hp->word, dp->word) == 0)) {
|
||||
// remove hidden onlyupcase homonym
|
||||
if (!onlyupcase) {
|
||||
if ((dp->astr) && TESTAFF(dp->astr, ONLYUPCASEFLAG, dp->alen)) {
|
||||
free(dp->astr);
|
||||
dp->astr = hp->astr;
|
||||
free(hp->word);
|
||||
free(hp);
|
||||
return 0;
|
||||
} else {
|
||||
dp->next_homonym = hp;
|
||||
}
|
||||
} else {
|
||||
upcasehomonym = true;
|
||||
}
|
||||
}
|
||||
dp=dp->next;
|
||||
}
|
||||
if (strcmp(hp->word, dp->word) == 0) {
|
||||
// remove hidden onlyupcase homonym
|
||||
if (!onlyupcase) {
|
||||
if ((dp->astr) && TESTAFF(dp->astr, ONLYUPCASEFLAG, dp->alen)) {
|
||||
free(dp->astr);
|
||||
dp->astr = hp->astr;
|
||||
free(hp->word);
|
||||
free(hp);
|
||||
return 0;
|
||||
} else {
|
||||
dp->next_homonym = hp;
|
||||
}
|
||||
} else {
|
||||
upcasehomonym = true;
|
||||
}
|
||||
}
|
||||
if (!upcasehomonym) {
|
||||
dp->next = hp;
|
||||
} else {
|
||||
// remove hidden onlyupcase homonym
|
||||
free(hp->word);
|
||||
if (hp->astr) free(hp->astr);
|
||||
free(hp);
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
// add a custom dic. word to the hash table (public)
|
||||
int HashMgr::put_word(const char * word, int wl, char * aff)
|
||||
{
|
||||
unsigned short * flags;
|
||||
int al = 0;
|
||||
if (aff) {
|
||||
al = decode_flags(&flags, aff);
|
||||
flag_qsort(flags, 0, al);
|
||||
} else {
|
||||
flags = NULL;
|
||||
}
|
||||
add_word(word, wl, flags, al, NULL, false);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int HashMgr::put_word_pattern(const char * word, int wl, const char * pattern)
|
||||
{
|
||||
unsigned short * flags;
|
||||
struct hentry * dp = lookup(pattern);
|
||||
if (!dp || !dp->astr) return 1;
|
||||
flags = (unsigned short *) malloc (dp->alen * sizeof(short));
|
||||
memcpy((void *) flags, (void *) dp->astr, dp->alen * sizeof(short));
|
||||
add_word(word, wl, flags, dp->alen, NULL, false);
|
||||
return 0;
|
||||
}
|
||||
|
||||
// walk the hash table entry by entry - null at end
|
||||
struct hentry * HashMgr::walk_hashtable(int &col, struct hentry * hp) const
|
||||
{
|
||||
//reset to start
|
||||
if ((col < 0) || (hp == NULL)) {
|
||||
col = -1;
|
||||
hp = NULL;
|
||||
}
|
||||
|
||||
if (hp && hp->next != NULL) {
|
||||
hp = hp->next;
|
||||
} else {
|
||||
col++;
|
||||
hp = (col < tablesize) ? &tableptr[col] : NULL;
|
||||
// search for next non-blank column entry
|
||||
while (hp && (hp->word == NULL)) {
|
||||
col ++;
|
||||
hp = (col < tablesize) ? &tableptr[col] : NULL;
|
||||
}
|
||||
if (col < tablesize) return hp;
|
||||
hp = NULL;
|
||||
col = -1;
|
||||
}
|
||||
return hp;
|
||||
}
|
||||
|
||||
// load a munched word list and build a hash table on the fly
|
||||
int HashMgr::load_tables(const char * tpath)
|
||||
{
|
||||
int wl, al;
|
||||
char * ap;
|
||||
char * dp;
|
||||
unsigned short * flags;
|
||||
int captype;
|
||||
|
||||
// raw dictionary - munched file
|
||||
FILE * rawdict = fopen(tpath, "r");
|
||||
if (rawdict == NULL) return 1;
|
||||
|
||||
// first read the first line of file to get hash table size */
|
||||
char ts[MAXDELEN];
|
||||
if (! fgets(ts, MAXDELEN-1,rawdict)) {
|
||||
HUNSPELL_WARNING(stderr, "error: empty dic file\n");
|
||||
fclose(rawdict);
|
||||
return 2;
|
||||
}
|
||||
mychomp(ts);
|
||||
|
||||
/* remove byte order mark */
|
||||
if (strncmp(ts,"\xEF\xBB\xBF",3) == 0) {
|
||||
memmove(ts, ts+3, strlen(ts+3)+1);
|
||||
HUNSPELL_WARNING(stderr, "warning: dic file begins with byte order mark: possible incompatibility with old Hunspell versions\n");
|
||||
}
|
||||
|
||||
if ((*ts < '1') || (*ts > '9')) HUNSPELL_WARNING(stderr, "error - missing word count in dictionary file\n");
|
||||
tablesize = atoi(ts);
|
||||
if (!tablesize) {
|
||||
fclose(rawdict);
|
||||
return 4;
|
||||
}
|
||||
tablesize = tablesize + 5 + USERWORD;
|
||||
if ((tablesize %2) == 0) tablesize++;
|
||||
|
||||
// allocate the hash table
|
||||
tableptr = (struct hentry *) calloc(tablesize, sizeof(struct hentry));
|
||||
if (! tableptr) {
|
||||
fclose(rawdict);
|
||||
return 3;
|
||||
}
|
||||
for (int i=0; i<tablesize; i++) tableptr[i].word = NULL;
|
||||
|
||||
// loop through all words on much list and add to hash
|
||||
// table and create word and affix strings
|
||||
|
||||
while (fgets(ts,MAXDELEN-1,rawdict)) {
|
||||
mychomp(ts);
|
||||
// split each line into word and morphological description
|
||||
dp = strchr(ts,'\t');
|
||||
|
||||
if (dp) {
|
||||
*dp = '\0';
|
||||
dp++;
|
||||
} else {
|
||||
dp = NULL;
|
||||
}
|
||||
|
||||
// split each line into word and affix char strings
|
||||
// "\/" signs slash in words (not affix separator)
|
||||
// "/" at beginning of the line is word character (not affix separator)
|
||||
ap = strchr(ts,'/');
|
||||
while (ap) {
|
||||
if (ap == ts) {
|
||||
ap++;
|
||||
continue;
|
||||
} else if (*(ap - 1) != '\\') break;
|
||||
// replace "\/" with "/"
|
||||
for (char * sp = ap - 1; *sp; *sp = *(sp + 1), sp++);
|
||||
ap = strchr(ap,'/');
|
||||
}
|
||||
|
||||
if (ap) {
|
||||
*ap = '\0';
|
||||
if (aliasf) {
|
||||
int index = atoi(ap + 1);
|
||||
al = get_aliasf(index, &flags);
|
||||
if (!al) {
|
||||
HUNSPELL_WARNING(stderr, "error - bad flag vector alias: %s\n", ts);
|
||||
*ap = '\0';
|
||||
}
|
||||
} else {
|
||||
al = decode_flags(&flags, ap + 1);
|
||||
flag_qsort(flags, 0, al);
|
||||
}
|
||||
} else {
|
||||
al = 0;
|
||||
ap = NULL;
|
||||
flags = NULL;
|
||||
}
|
||||
|
||||
wl = strlen(ts);
|
||||
|
||||
// add the word and its index
|
||||
if (add_word(ts,wl,flags,al,dp, false)) {
|
||||
fclose(rawdict);
|
||||
return 5;
|
||||
}
|
||||
|
||||
// add decapizatalized forms to handle following cases
|
||||
// OpenOffice.org -> OPENOFFICE.ORG
|
||||
// CIA's -> CIA'S
|
||||
captype = utf8 ? get_captype_utf8(ts, wl, langnum) : get_captype(ts, wl, csconv);
|
||||
if (((captype == HUHCAP) || (captype == HUHINITCAP) ||
|
||||
((captype == ALLCAP) && (flags != NULL))) &&
|
||||
!((flags != NULL) && TESTAFF(flags, forbiddenword, al))) {
|
||||
unsigned short * flags2 = (unsigned short *) malloc (sizeof(unsigned short *)* (al + 1));
|
||||
memcpy(flags2, flags, al * sizeof(unsigned short *));
|
||||
flags2[al] = ONLYUPCASEFLAG;
|
||||
if (utf8) {
|
||||
char st[MAXDELEN];
|
||||
w_char w[MAXDELEN];
|
||||
int wlen = u8_u16(w, MAXDELEN, ts);
|
||||
mkallsmall_utf(w, wlen, langnum);
|
||||
mkallcap_utf(w, 1, langnum);
|
||||
u16_u8(st, MAXDELEN, w, wlen);
|
||||
if (add_word(st,wl,flags2,al+1,dp, true)) {
|
||||
fclose(rawdict);
|
||||
return 5;
|
||||
}
|
||||
} else {
|
||||
mkallsmall(ts, csconv);
|
||||
mkinitcap(ts, csconv);
|
||||
if (add_word(ts,wl,flags2,al+1,dp, true)) {
|
||||
fclose(rawdict);
|
||||
return 5;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fclose(rawdict);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
// the hash function is a simple load and rotate
|
||||
// algorithm borrowed
|
||||
|
||||
int HashMgr::hash(const char * word) const
|
||||
{
|
||||
long hv = 0;
|
||||
for (int i=0; i < 4 && *word != 0; i++)
|
||||
hv = (hv << 8) | (*word++);
|
||||
while (*word != 0) {
|
||||
ROTATE(hv,ROTATE_LEN);
|
||||
hv ^= (*word++);
|
||||
}
|
||||
return (unsigned long) hv % tablesize;
|
||||
}
|
||||
|
||||
int HashMgr::decode_flags(unsigned short ** result, char * flags) {
|
||||
int len;
|
||||
switch (flag_mode) {
|
||||
case FLAG_LONG: { // two-character flags (1x2yZz -> 1x 2y Zz)
|
||||
len = strlen(flags);
|
||||
if (len%2 == 1) HUNSPELL_WARNING(stderr, "error: length of FLAG_LONG flagvector is odd: %s\n", flags);
|
||||
len = len/2;
|
||||
*result = (unsigned short *) malloc(len * sizeof(short));
|
||||
for (int i = 0; i < len; i++) {
|
||||
(*result)[i] = (((unsigned short) flags[i * 2]) << 8) + (unsigned short) flags[i * 2 + 1];
|
||||
}
|
||||
break;
|
||||
}
|
||||
case FLAG_NUM: { // decimal numbers separated by comma (4521,23,233 -> 4521 23 233)
|
||||
len = 1;
|
||||
char * src = flags;
|
||||
unsigned short * dest;
|
||||
char * p;
|
||||
for (p = flags; *p; p++) {
|
||||
if (*p == ',') len++;
|
||||
}
|
||||
*result = (unsigned short *) malloc(len * sizeof(short));
|
||||
dest = *result;
|
||||
for (p = flags; *p; p++) {
|
||||
if (*p == ',') {
|
||||
*dest = (unsigned short) atoi(src);
|
||||
if (*dest == 0) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
|
||||
src = p + 1;
|
||||
dest++;
|
||||
}
|
||||
}
|
||||
*dest = (unsigned short) atoi(src);
|
||||
if (*dest == 0) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
|
||||
break;
|
||||
}
|
||||
case FLAG_UNI: { // UTF-8 characters
|
||||
w_char w[MAXDELEN/2];
|
||||
len = u8_u16(w, MAXDELEN/2, flags);
|
||||
*result = (unsigned short *) malloc(len * sizeof(short));
|
||||
memcpy(*result, w, len * sizeof(short));
|
||||
break;
|
||||
}
|
||||
default: { // Ispell's one-character flags (erfg -> e r f g)
|
||||
unsigned short * dest;
|
||||
len = strlen(flags);
|
||||
*result = (unsigned short *) malloc(len * sizeof(short));
|
||||
dest = *result;
|
||||
for (unsigned char * p = (unsigned char *) flags; *p; p++) {
|
||||
*dest = (unsigned short) *p;
|
||||
dest++;
|
||||
}
|
||||
}
|
||||
}
|
||||
return len;
|
||||
}
|
||||
|
||||
unsigned short HashMgr::decode_flag(const char * f) {
|
||||
unsigned short s = 0;
|
||||
switch (flag_mode) {
|
||||
case FLAG_LONG:
|
||||
s = ((unsigned short) f[0] << 8) + (unsigned short) f[1];
|
||||
break;
|
||||
case FLAG_NUM:
|
||||
s = (unsigned short) atoi(f);
|
||||
break;
|
||||
case FLAG_UNI:
|
||||
u8_u16((w_char *) &s, 1, f);
|
||||
break;
|
||||
default:
|
||||
s = (unsigned short) *((unsigned char *)f);
|
||||
}
|
||||
if (!s) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
|
||||
return s;
|
||||
}
|
||||
|
||||
char * HashMgr::encode_flag(unsigned short f) {
|
||||
unsigned char ch[10];
|
||||
if (f==0) return mystrdup("(NULL)");
|
||||
if (flag_mode == FLAG_LONG) {
|
||||
ch[0] = (unsigned char) (f >> 8);
|
||||
ch[1] = (unsigned char) (f - ((f >> 8) << 8));
|
||||
ch[2] = '\0';
|
||||
} else if (flag_mode == FLAG_NUM) {
|
||||
sprintf((char *) ch, "%d", f);
|
||||
} else if (flag_mode == FLAG_UNI) {
|
||||
u16_u8((char *) &ch, 10, (w_char *) &f, 1);
|
||||
} else {
|
||||
ch[0] = (unsigned char) (f);
|
||||
ch[1] = '\0';
|
||||
}
|
||||
return mystrdup((char *) ch);
|
||||
}
|
||||
|
||||
// read in aff file and set flag mode
|
||||
int HashMgr::load_config(const char * affpath)
|
||||
{
|
||||
int firstline = 1;
|
||||
|
||||
// io buffers
|
||||
char line[MAXDELEN+1];
|
||||
|
||||
// open the affix file
|
||||
FILE * afflst;
|
||||
afflst = fopen(affpath,"r");
|
||||
if (!afflst) {
|
||||
HUNSPELL_WARNING(stderr, "Error - could not open affix description file %s\n",affpath);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// read in each line ignoring any that do not
|
||||
// start with a known line type indicator
|
||||
|
||||
while (fgets(line,MAXDELEN,afflst)) {
|
||||
mychomp(line);
|
||||
|
||||
/* remove byte order mark */
|
||||
if (firstline) {
|
||||
firstline = 0;
|
||||
if (strncmp(line,"\xEF\xBB\xBF",3) == 0) memmove(line, line+3, strlen(line+3)+1);
|
||||
}
|
||||
|
||||
/* parse in the try string */
|
||||
if ((strncmp(line,"FLAG",4) == 0) && isspace(line[4])) {
|
||||
if (flag_mode != FLAG_CHAR) {
|
||||
HUNSPELL_WARNING(stderr, "error: duplicate FLAG parameter\n");
|
||||
}
|
||||
if (strstr(line, "long")) flag_mode = FLAG_LONG;
|
||||
if (strstr(line, "num")) flag_mode = FLAG_NUM;
|
||||
if (strstr(line, "UTF-8")) flag_mode = FLAG_UNI;
|
||||
if (flag_mode == FLAG_CHAR) {
|
||||
HUNSPELL_WARNING(stderr, "error: FLAG need `num', `long' or `UTF-8' parameter: %s\n", line);
|
||||
}
|
||||
}
|
||||
if (strncmp(line,"FORBIDDENWORD",13) == 0) {
|
||||
char * st = NULL;
|
||||
if (parse_string(line, &st, "FORBIDDENWORD")) {
|
||||
fclose(afflst);
|
||||
return 1;
|
||||
}
|
||||
forbiddenword = decode_flag(st);
|
||||
free(st);
|
||||
}
|
||||
if (strncmp(line, "SET", 3) == 0) {
|
||||
if (parse_string(line, &enc, "SET")) {
|
||||
fclose(afflst);
|
||||
return 1;
|
||||
}
|
||||
if (strcmp(enc, "UTF-8") == 0) {
|
||||
utf8 = 1;
|
||||
#ifndef OPENOFFICEORG
|
||||
#ifndef MOZILLA_CLIENT
|
||||
initialize_utf_tbl();
|
||||
#endif
|
||||
#endif
|
||||
} else csconv = get_current_cs(enc);
|
||||
}
|
||||
if (strncmp(line, "LANG", 4) == 0) {
|
||||
if (parse_string(line, &lang, "LANG")) {
|
||||
fclose(afflst);
|
||||
return 1;
|
||||
}
|
||||
langnum = get_lang_num(lang);
|
||||
}
|
||||
|
||||
/* parse in the ignored characters (for example, Arabic optional diacritics characters */
|
||||
if (strncmp(line,"IGNORE",6) == 0) {
|
||||
if (parse_array(line, &ignorechars, &ignorechars_utf16, &ignorechars_utf16_len, "IGNORE", utf8)) {
|
||||
fclose(afflst);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
if ((strncmp(line,"AF",2) == 0) && isspace(line[2])) {
|
||||
if (parse_aliasf(line, afflst)) {
|
||||
fclose(afflst);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
if ((strncmp(line,"AM",2) == 0) && isspace(line[2])) {
|
||||
if (parse_aliasm(line, afflst)) {
|
||||
fclose(afflst);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
if (strncmp(line,"COMPLEXPREFIXES",15) == 0) complexprefixes = 1;
|
||||
if (((strncmp(line,"SFX",3) == 0) || (strncmp(line,"PFX",3) == 0)) && isspace(line[3])) break;
|
||||
}
|
||||
if (csconv == NULL) csconv = get_current_cs("ISO8859-1");
|
||||
fclose(afflst);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* parse in the ALIAS table */
|
||||
int HashMgr::parse_aliasf(char * line, FILE * af)
|
||||
{
|
||||
if (numaliasf != 0) {
|
||||
HUNSPELL_WARNING(stderr, "error: duplicate AF (alias for flag vector) tables used\n");
|
||||
return 1;
|
||||
}
|
||||
char * tp = line;
|
||||
char * piece;
|
||||
int i = 0;
|
||||
int np = 0;
|
||||
piece = mystrsep(&tp, 0);
|
||||
while (piece) {
|
||||
if (*piece != '\0') {
|
||||
switch(i) {
|
||||
case 0: { np++; break; }
|
||||
case 1: {
|
||||
numaliasf = atoi(piece);
|
||||
if (numaliasf < 1) {
|
||||
numaliasf = 0;
|
||||
aliasf = NULL;
|
||||
aliasflen = NULL;
|
||||
HUNSPELL_WARNING(stderr, "incorrect number of entries in AF table\n");
|
||||
free(piece);
|
||||
return 1;
|
||||
}
|
||||
aliasf = (unsigned short **) malloc(numaliasf * sizeof(unsigned short *));
|
||||
aliasflen = (unsigned short *) malloc(numaliasf * sizeof(short));
|
||||
if (!aliasf || !aliasflen) {
|
||||
numaliasf = 0;
|
||||
if (aliasf) free(aliasf);
|
||||
if (aliasflen) free(aliasflen);
|
||||
aliasf = NULL;
|
||||
aliasflen = NULL;
|
||||
return 1;
|
||||
}
|
||||
np++;
|
||||
break;
|
||||
}
|
||||
default: break;
|
||||
}
|
||||
i++;
|
||||
}
|
||||
free(piece);
|
||||
piece = mystrsep(&tp, 0);
|
||||
}
|
||||
if (np != 2) {
|
||||
numaliasf = 0;
|
||||
free(aliasf);
|
||||
free(aliasflen);
|
||||
aliasf = NULL;
|
||||
aliasflen = NULL;
|
||||
HUNSPELL_WARNING(stderr, "error: missing AF table information\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* now parse the numaliasf lines to read in the remainder of the table */
|
||||
char * nl = line;
|
||||
for (int j=0; j < numaliasf; j++) {
|
||||
if (!fgets(nl,MAXDELEN,af)) return 1;
|
||||
mychomp(nl);
|
||||
tp = nl;
|
||||
i = 0;
|
||||
aliasf[j] = NULL;
|
||||
aliasflen[j] = 0;
|
||||
piece = mystrsep(&tp, 0);
|
||||
while (piece) {
|
||||
if (*piece != '\0') {
|
||||
switch(i) {
|
||||
case 0: {
|
||||
if (strncmp(piece,"AF",2) != 0) {
|
||||
numaliasf = 0;
|
||||
free(aliasf);
|
||||
free(aliasflen);
|
||||
aliasf = NULL;
|
||||
aliasflen = NULL;
|
||||
HUNSPELL_WARNING(stderr, "error: AF table is corrupt\n");
|
||||
free(piece);
|
||||
return 1;
|
||||
}
|
||||
break;
|
||||
}
|
||||
case 1: {
|
||||
aliasflen[j] = (unsigned short) decode_flags(&(aliasf[j]), piece);
|
||||
flag_qsort(aliasf[j], 0, aliasflen[j]);
|
||||
break;
|
||||
}
|
||||
default: break;
|
||||
}
|
||||
i++;
|
||||
}
|
||||
free(piece);
|
||||
piece = mystrsep(&tp, 0);
|
||||
}
|
||||
if (!aliasf[j]) {
|
||||
free(aliasf);
|
||||
free(aliasflen);
|
||||
aliasf = NULL;
|
||||
aliasflen = NULL;
|
||||
numaliasf = 0;
|
||||
HUNSPELL_WARNING(stderr, "error: AF table is corrupt\n");
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int HashMgr::is_aliasf() {
|
||||
return (aliasf != NULL);
|
||||
}
|
||||
|
||||
int HashMgr::get_aliasf(int index, unsigned short ** fvec) {
|
||||
if ((index > 0) && (index <= numaliasf)) {
|
||||
*fvec = aliasf[index - 1];
|
||||
return aliasflen[index - 1];
|
||||
}
|
||||
HUNSPELL_WARNING(stderr, "error: bad flag alias index: %d\n", index);
|
||||
*fvec = NULL;
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
/* parse morph alias definitions */
|
||||
int HashMgr::parse_aliasm(char * line, FILE * af)
|
||||
{
|
||||
if (numaliasm != 0) {
|
||||
HUNSPELL_WARNING(stderr, "error: duplicate AM (aliases for morphological descriptions) tables used\n");
|
||||
return 1;
|
||||
}
|
||||
char * tp = line;
|
||||
char * piece;
|
||||
int i = 0;
|
||||
int np = 0;
|
||||
piece = mystrsep(&tp, 0);
|
||||
while (piece) {
|
||||
if (*piece != '\0') {
|
||||
switch(i) {
|
||||
case 0: { np++; break; }
|
||||
case 1: {
|
||||
numaliasm = atoi(piece);
|
||||
if (numaliasm < 1) {
|
||||
HUNSPELL_WARNING(stderr, "incorrect number of entries in AM table\n");
|
||||
free(piece);
|
||||
return 1;
|
||||
}
|
||||
aliasm = (char **) malloc(numaliasm * sizeof(char *));
|
||||
if (!aliasm) {
|
||||
numaliasm = 0;
|
||||
return 1;
|
||||
}
|
||||
np++;
|
||||
break;
|
||||
}
|
||||
default: break;
|
||||
}
|
||||
i++;
|
||||
}
|
||||
free(piece);
|
||||
piece = mystrsep(&tp, 0);
|
||||
}
|
||||
if (np != 2) {
|
||||
numaliasm = 0;
|
||||
free(aliasm);
|
||||
aliasm = NULL;
|
||||
HUNSPELL_WARNING(stderr, "error: missing AM alias information\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* now parse the numaliasm lines to read in the remainder of the table */
|
||||
char * nl = line;
|
||||
for (int j=0; j < numaliasm; j++) {
|
||||
if (!fgets(nl,MAXDELEN,af)) return 1;
|
||||
mychomp(nl);
|
||||
tp = nl;
|
||||
i = 0;
|
||||
aliasm[j] = NULL;
|
||||
piece = mystrsep(&tp, 0);
|
||||
while (piece) {
|
||||
if (*piece != '\0') {
|
||||
switch(i) {
|
||||
case 0: {
|
||||
if (strncmp(piece,"AM",2) != 0) {
|
||||
HUNSPELL_WARNING(stderr, "error: AM table is corrupt\n");
|
||||
free(piece);
|
||||
numaliasm = 0;
|
||||
free(aliasm);
|
||||
aliasm = NULL;
|
||||
return 1;
|
||||
}
|
||||
break;
|
||||
}
|
||||
case 1: {
|
||||
if (complexprefixes) {
|
||||
if (utf8) reverseword_utf(piece);
|
||||
else reverseword(piece);
|
||||
}
|
||||
aliasm[j] = mystrdup(piece);
|
||||
break; }
|
||||
default: break;
|
||||
}
|
||||
i++;
|
||||
}
|
||||
free(piece);
|
||||
piece = mystrsep(&tp, 0);
|
||||
}
|
||||
if (!aliasm[j]) {
|
||||
numaliasm = 0;
|
||||
free(aliasm);
|
||||
aliasm = NULL;
|
||||
HUNSPELL_WARNING(stderr, "error: map table is corrupt\n");
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int HashMgr::is_aliasm() {
|
||||
return (aliasm != NULL);
|
||||
}
|
||||
|
||||
char * HashMgr::get_aliasm(int index) {
|
||||
if ((index > 0) && (index <= numaliasm)) return aliasm[index - 1];
|
||||
HUNSPELL_WARNING(stderr, "error: bad morph. alias index: %d\n", index);
|
||||
return NULL;
|
||||
}
|
||||
#endif
|
|
@ -0,0 +1,121 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _HASHMGR_HXX_
|
||||
#define _HASHMGR_HXX_
|
||||
|
||||
#include <cstdio>
|
||||
#include "htypes.hxx"
|
||||
|
||||
enum flag { FLAG_CHAR, FLAG_LONG, FLAG_NUM, FLAG_UNI };
|
||||
|
||||
class HashMgr
|
||||
{
|
||||
int tablesize;
|
||||
struct hentry * tableptr;
|
||||
int userword;
|
||||
flag flag_mode;
|
||||
int complexprefixes;
|
||||
int utf8;
|
||||
unsigned short forbiddenword;
|
||||
int langnum;
|
||||
char * enc;
|
||||
char * lang;
|
||||
struct cs_info * csconv;
|
||||
char * ignorechars;
|
||||
unsigned short * ignorechars_utf16;
|
||||
int ignorechars_utf16_len;
|
||||
int numaliasf; // flag vector `compression' with aliases
|
||||
unsigned short ** aliasf;
|
||||
unsigned short * aliasflen;
|
||||
int numaliasm; // morphological desciption `compression' with aliases
|
||||
char ** aliasm;
|
||||
|
||||
|
||||
public:
|
||||
HashMgr(const char * tpath, const char * apath);
|
||||
~HashMgr();
|
||||
|
||||
struct hentry * lookup(const char *) const;
|
||||
int hash(const char *) const;
|
||||
struct hentry * walk_hashtable(int & col, struct hentry * hp) const;
|
||||
|
||||
int put_word(const char * word, int wl, char * ap);
|
||||
int put_word_pattern(const char * word, int wl, const char * pattern);
|
||||
int decode_flags(unsigned short ** result, char * flags);
|
||||
unsigned short decode_flag(const char * flag);
|
||||
char * encode_flag(unsigned short flag);
|
||||
int is_aliasf();
|
||||
int get_aliasf(int index, unsigned short ** fvec);
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
int is_aliasm();
|
||||
char * get_aliasm(int index);
|
||||
#endif
|
||||
|
||||
|
||||
private:
|
||||
int load_tables(const char * tpath);
|
||||
int add_word(const char * word, int wl, unsigned short * ap, int al,
|
||||
const char * desc, bool onlyupcase);
|
||||
int load_config(const char * affpath);
|
||||
int parse_aliasf(char * line, FILE * af);
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
int parse_aliasm(char * line, FILE * af);
|
||||
#endif
|
||||
|
||||
};
|
||||
|
||||
#endif
|
|
@ -0,0 +1,84 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _HTYPES_HXX_
|
||||
#define _HTYPES_HXX_
|
||||
|
||||
#define MAXDELEN 8192
|
||||
|
||||
#define ROTATE_LEN 5
|
||||
|
||||
#define ROTATE(v,q) \
|
||||
(v) = ((v) << (q)) | (((v) >> (32 - q)) & ((1 << (q))-1));
|
||||
|
||||
// approx. number of user defined words
|
||||
#define USERWORD 1000
|
||||
|
||||
struct hentry
|
||||
{
|
||||
short wlen;
|
||||
short alen;
|
||||
char wbeg[2];
|
||||
char * word;
|
||||
unsigned short * astr;
|
||||
struct hentry * next;
|
||||
struct hentry * next_homonym;
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
char * description;
|
||||
#endif
|
||||
};
|
||||
|
||||
#endif
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,89 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _MYSPELLMGR_H_
|
||||
#define _MYSPELLMGR_H_
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef struct Hunhandle Hunhandle;
|
||||
|
||||
Hunhandle *Hunspell_create(const char * affpath, const char * dpath);
|
||||
void Hunspell_destroy(Hunhandle *pHunspell);
|
||||
|
||||
/* spell(word) - spellcheck word
|
||||
* output: 0 = bad word, not 0 = good word
|
||||
*/
|
||||
int Hunspell_spell(Hunhandle *pHunspell, const char *);
|
||||
|
||||
char *Hunspell_get_dic_encoding(Hunhandle *pHunspell);
|
||||
|
||||
/* suggest(suggestions, word) - search suggestions
|
||||
* input: pointer to an array of strings pointer and the (bad) word
|
||||
* array of strings pointer (here *slst) may not be initialized
|
||||
* output: number of suggestions in string array, and suggestions in
|
||||
* a newly allocated array of strings (*slts will be NULL when number
|
||||
* of suggestion equals 0.)
|
||||
*/
|
||||
int Hunspell_suggest(Hunhandle *pHunspell, char*** slst, const char * word);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
|
@ -0,0 +1,190 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#include "hashmgr.hxx"
|
||||
#include "affixmgr.hxx"
|
||||
#include "suggestmgr.hxx"
|
||||
#include "csutil.hxx"
|
||||
#include "langnum.hxx"
|
||||
|
||||
#define SPELL_COMPOUND (1 << 0)
|
||||
#define SPELL_FORBIDDEN (1 << 1)
|
||||
#define SPELL_ALLCAP (1 << 2)
|
||||
#define SPELL_NOCAP (1 << 3)
|
||||
#define SPELL_INITCAP (1 << 4)
|
||||
|
||||
#define MAXSUGGESTION 15
|
||||
#define MAXSHARPS 5
|
||||
|
||||
#ifdef W32
|
||||
#define DLLTEST2_API __declspec(dllexport)
|
||||
#endif
|
||||
|
||||
#ifndef _MYSPELLMGR_HXX_
|
||||
#define _MYSPELLMGR_HXX_
|
||||
|
||||
#ifdef W32
|
||||
class DLLTEST2_API Hunspell
|
||||
#else
|
||||
class Hunspell
|
||||
#endif
|
||||
{
|
||||
AffixMgr* pAMgr;
|
||||
HashMgr* pHMgr;
|
||||
SuggestMgr* pSMgr;
|
||||
char * encoding;
|
||||
struct cs_info * csconv;
|
||||
int langnum;
|
||||
int utf8;
|
||||
int complexprefixes;
|
||||
char** wordbreak;
|
||||
|
||||
public:
|
||||
|
||||
/* Hunspell(aff, dic) - constructor of Hunspell class
|
||||
* input: path of affix file and dictionary file
|
||||
*/
|
||||
|
||||
Hunspell(const char * affpath, const char * dpath);
|
||||
|
||||
~Hunspell();
|
||||
|
||||
/* spell(word) - spellcheck word
|
||||
* output: 0 = bad word, not 0 = good word
|
||||
*
|
||||
* plus output:
|
||||
* info: information bit array, fields:
|
||||
* SPELL_COMPOUND = a compound word
|
||||
* SPELL_FORBIDDEN = an explicit forbidden word
|
||||
* root: root (stem), when input is a word with affix(es)
|
||||
*/
|
||||
|
||||
int spell(const char * word, int * info = NULL, char ** root = NULL);
|
||||
|
||||
/* suggest(suggestions, word) - search suggestions
|
||||
* input: pointer to an array of strings pointer and the (bad) word
|
||||
* array of strings pointer (here *slst) may not be initialized
|
||||
* output: number of suggestions in string array, and suggestions in
|
||||
* a newly allocated array of strings (*slts will be NULL when number
|
||||
* of suggestion equals 0.)
|
||||
*/
|
||||
|
||||
int suggest(char*** slst, const char * word);
|
||||
char * get_dic_encoding();
|
||||
|
||||
/* handling custom dictionary */
|
||||
|
||||
int put_word(const char * word);
|
||||
|
||||
/* pattern is a sample dictionary word
|
||||
* put word into custom dictionary with affix flags of pattern word
|
||||
*/
|
||||
|
||||
int put_word_pattern(const char * word, const char * pattern);
|
||||
|
||||
/* other */
|
||||
|
||||
/* get extra word characters definied in affix file for tokenization */
|
||||
const char * get_wordchars();
|
||||
unsigned short * get_wordchars_utf16(int * len);
|
||||
|
||||
// struct cs_info * get_csconv();
|
||||
// int utf16_isalpha(unsigned short c);
|
||||
const char * get_version();
|
||||
|
||||
/* experimental functions */
|
||||
|
||||
#ifdef HUNSPELL_EXPERIMENTAL
|
||||
/* suffix is an affix flag string, similarly in dictionary files */
|
||||
|
||||
int put_word_suffix(const char * word, const char * suffix);
|
||||
|
||||
/* morphological analysis */
|
||||
|
||||
char * morph(const char * word);
|
||||
int analyze(char*** out, const char *word);
|
||||
|
||||
char * morph_with_correction(const char * word);
|
||||
|
||||
/* stemmer function */
|
||||
|
||||
int stem(char*** slst, const char * word);
|
||||
|
||||
/* spec. suggestions */
|
||||
int suggest_auto(char*** slst, const char * word);
|
||||
int suggest_pos_stems(char*** slst, const char * word);
|
||||
char * get_possible_root();
|
||||
#endif
|
||||
|
||||
private:
|
||||
int cleanword(char *, const char *, int * pcaptype, int * pabbrev);
|
||||
int cleanword2(char *, const char *, w_char *, int * w_len, int * pcaptype, int * pabbrev);
|
||||
void mkinitcap(char *);
|
||||
int mkinitcap2(char * p, w_char * u, int nc);
|
||||
int mkinitsmall2(char * p, w_char * u, int nc);
|
||||
void mkallcap(char *);
|
||||
int mkallcap2(char * p, w_char * u, int nc);
|
||||
void mkallsmall(char *);
|
||||
int mkallsmall2(char * p, w_char * u, int nc);
|
||||
struct hentry * checkword(const char *, int * info, char **root);
|
||||
char * sharps_u8_l1(char * dest, char * source);
|
||||
hentry * spellsharps(char * base, char *, int, int, char * tmp, int * info, char **root);
|
||||
int is_keepcase(const hentry * rv);
|
||||
int insert_sug(char ***slst, char * word, int ns);
|
||||
|
||||
};
|
||||
|
||||
#endif
|
|
@ -0,0 +1,94 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _LANGNUM_HXX_
|
||||
#define _LANGNUM_HXX_
|
||||
|
||||
/*
|
||||
language numbers for language specific codes
|
||||
see http://l10n.openoffice.org/languages.html
|
||||
*/
|
||||
|
||||
enum {
|
||||
LANG_ar=96,
|
||||
LANG_az=100, // custom number
|
||||
LANG_bg=41,
|
||||
LANG_ca=37,
|
||||
LANG_cs=42,
|
||||
LANG_da=45,
|
||||
LANG_de=49,
|
||||
LANG_el=30,
|
||||
LANG_en=01,
|
||||
LANG_es=34,
|
||||
LANG_eu=10,
|
||||
LANG_fr=02,
|
||||
LANG_gl=38,
|
||||
LANG_hr=78,
|
||||
LANG_hu=36,
|
||||
LANG_it=39,
|
||||
LANG_la=99, // custom number
|
||||
LANG_lv=101, // custom number
|
||||
LANG_nl=31,
|
||||
LANG_pl=48,
|
||||
LANG_pt=03,
|
||||
LANG_ru=07,
|
||||
LANG_sv=50,
|
||||
LANG_tr=90,
|
||||
LANG_uk=80,
|
||||
LANG_xx=999
|
||||
};
|
||||
|
||||
#endif
|
|
@ -0,0 +1,491 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Michiel van Leeuwen (mvl@exedo.nl)
|
||||
* Caolan McNamara (cmc@openoffice.org)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#include "mozHunspell.h"
|
||||
#include "nsReadableUtils.h"
|
||||
#include "nsXPIDLString.h"
|
||||
#include "nsIObserverService.h"
|
||||
#include "nsISimpleEnumerator.h"
|
||||
#include "nsIDirectoryEnumerator.h"
|
||||
#include "nsIFile.h"
|
||||
#include "nsDirectoryServiceUtils.h"
|
||||
#include "nsDirectoryServiceDefs.h"
|
||||
#include "mozISpellI18NManager.h"
|
||||
#include "nsICharsetConverterManager.h"
|
||||
#include "nsUnicharUtilCIID.h"
|
||||
#include "nsUnicharUtils.h"
|
||||
#include "nsCRT.h"
|
||||
#include <stdlib.h>
|
||||
|
||||
static NS_DEFINE_CID(kCharsetConverterManagerCID, NS_ICHARSETCONVERTERMANAGER_CID);
|
||||
static NS_DEFINE_CID(kUnicharUtilCID, NS_UNICHARUTIL_CID);
|
||||
|
||||
NS_IMPL_ISUPPORTS3(mozHunspell,
|
||||
mozISpellCheckingEngine,
|
||||
nsIObserver,
|
||||
nsISupportsWeakReference)
|
||||
|
||||
nsresult
|
||||
mozHunspell::Init()
|
||||
{
|
||||
if (!mDictionaries.Init())
|
||||
return NS_ERROR_OUT_OF_MEMORY;
|
||||
|
||||
LoadDictionaryList();
|
||||
|
||||
nsCOMPtr<nsIObserverService> obs =
|
||||
do_GetService("@mozilla.org/observer-service;1");
|
||||
if (obs) {
|
||||
obs->AddObserver(this, "profile-do-change", PR_TRUE);
|
||||
}
|
||||
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
mozHunspell::~mozHunspell()
|
||||
{
|
||||
mPersonalDictionary = nsnull;
|
||||
delete mHunspell;
|
||||
}
|
||||
|
||||
/* attribute wstring dictionary; */
|
||||
NS_IMETHODIMP mozHunspell::GetDictionary(PRUnichar **aDictionary)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aDictionary);
|
||||
|
||||
if (mDictionary.IsEmpty())
|
||||
return NS_ERROR_NOT_INITIALIZED;
|
||||
|
||||
*aDictionary = ToNewUnicode(mDictionary);
|
||||
return *aDictionary ? NS_OK : NS_ERROR_OUT_OF_MEMORY;
|
||||
}
|
||||
|
||||
/* set the Dictionary.
|
||||
* This also Loads the dictionary and initializes the converter using the dictionaries converter
|
||||
*/
|
||||
NS_IMETHODIMP mozHunspell::SetDictionary(const PRUnichar *aDictionary)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aDictionary);
|
||||
|
||||
if (mDictionary.Equals(aDictionary))
|
||||
return NS_OK;
|
||||
|
||||
nsIFile* affFile = mDictionaries.GetWeak(nsDependentString(aDictionary));
|
||||
if (!affFile)
|
||||
return NS_ERROR_FILE_NOT_FOUND;
|
||||
|
||||
nsCAutoString dictFileName, affFileName;
|
||||
|
||||
// XXX This isn't really good. nsIFile->NativePath isn't safe for all
|
||||
// character sets on Windows.
|
||||
// A better way would be to QI to nsILocalFile, and get a filehandle
|
||||
// from there. Only problem is that hunspell wants a path
|
||||
|
||||
nsresult rv = affFile->GetNativePath(affFileName);
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
dictFileName = affFileName;
|
||||
PRInt32 dotPos = dictFileName.RFindChar('.');
|
||||
if (dotPos == -1)
|
||||
return NS_ERROR_FAILURE;
|
||||
|
||||
dictFileName.SetLength(dotPos);
|
||||
dictFileName.AppendLiteral(".dic");
|
||||
|
||||
// SetDictionary can be called multiple times, so we might have a
|
||||
// valid mHunspell instance which needs cleaned up.
|
||||
delete mHunspell;
|
||||
|
||||
mDictionary = aDictionary;
|
||||
|
||||
mHunspell = new Hunspell(affFileName.get(),
|
||||
dictFileName.get());
|
||||
if (!mHunspell)
|
||||
return NS_ERROR_OUT_OF_MEMORY;
|
||||
|
||||
nsCOMPtr<nsICharsetConverterManager> ccm =
|
||||
do_GetService(NS_CHARSETCONVERTERMANAGER_CONTRACTID, &rv);
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
rv = ccm->GetUnicodeDecoder(mHunspell->get_dic_encoding(),
|
||||
getter_AddRefs(mDecoder));
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
rv = ccm->GetUnicodeEncoder(mHunspell->get_dic_encoding(),
|
||||
getter_AddRefs(mEncoder));
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
|
||||
if (mEncoder)
|
||||
mEncoder->SetOutputErrorBehavior(mEncoder->kOnError_Signal, nsnull, '?');
|
||||
|
||||
PRInt32 pos = mDictionary.FindChar('-');
|
||||
if (pos == -1)
|
||||
pos = mDictionary.FindChar('_');
|
||||
|
||||
if (pos == -1)
|
||||
mLanguage.Assign(mDictionary);
|
||||
else
|
||||
mLanguage = Substring(mDictionary, 0, pos);
|
||||
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
/* readonly attribute wstring language; */
|
||||
NS_IMETHODIMP mozHunspell::GetLanguage(PRUnichar **aLanguage)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aLanguage);
|
||||
|
||||
if (mDictionary.IsEmpty())
|
||||
return NS_ERROR_NOT_INITIALIZED;
|
||||
|
||||
*aLanguage = ToNewUnicode(mLanguage);
|
||||
return *aLanguage ? NS_OK : NS_ERROR_OUT_OF_MEMORY;
|
||||
}
|
||||
|
||||
/* readonly attribute boolean providesPersonalDictionary; */
|
||||
NS_IMETHODIMP mozHunspell::GetProvidesPersonalDictionary(PRBool *aProvidesPersonalDictionary)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aProvidesPersonalDictionary);
|
||||
|
||||
*aProvidesPersonalDictionary = PR_FALSE;
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
/* readonly attribute boolean providesWordUtils; */
|
||||
NS_IMETHODIMP mozHunspell::GetProvidesWordUtils(PRBool *aProvidesWordUtils)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aProvidesWordUtils);
|
||||
|
||||
*aProvidesWordUtils = PR_FALSE;
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
/* readonly attribute wstring name; */
|
||||
NS_IMETHODIMP mozHunspell::GetName(PRUnichar * *aName)
|
||||
{
|
||||
return NS_ERROR_NOT_IMPLEMENTED;
|
||||
}
|
||||
|
||||
/* readonly attribute wstring copyright; */
|
||||
NS_IMETHODIMP mozHunspell::GetCopyright(PRUnichar * *aCopyright)
|
||||
{
|
||||
return NS_ERROR_NOT_IMPLEMENTED;
|
||||
}
|
||||
|
||||
/* attribute mozIPersonalDictionary personalDictionary; */
|
||||
NS_IMETHODIMP mozHunspell::GetPersonalDictionary(mozIPersonalDictionary * *aPersonalDictionary)
|
||||
{
|
||||
*aPersonalDictionary = mPersonalDictionary;
|
||||
NS_IF_ADDREF(*aPersonalDictionary);
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
NS_IMETHODIMP mozHunspell::SetPersonalDictionary(mozIPersonalDictionary * aPersonalDictionary)
|
||||
{
|
||||
mPersonalDictionary = aPersonalDictionary;
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
struct AppendNewStruct
|
||||
{
|
||||
PRUnichar **dics;
|
||||
PRUint32 count;
|
||||
PRBool failed;
|
||||
};
|
||||
|
||||
static PLDHashOperator
|
||||
AppendNewString(const nsAString& aString, nsIFile* aFile, void* aClosure)
|
||||
{
|
||||
AppendNewStruct *ans = (AppendNewStruct*) aClosure;
|
||||
ans->dics[ans->count] = ToNewUnicode(aString);
|
||||
if (!ans->dics[ans->count]) {
|
||||
ans->failed = PR_TRUE;
|
||||
return PL_DHASH_STOP;
|
||||
}
|
||||
|
||||
++ans->count;
|
||||
return PL_DHASH_NEXT;
|
||||
}
|
||||
|
||||
/* void GetDictionaryList ([array, size_is (count)] out wstring dictionaries, out PRUint32 count); */
|
||||
NS_IMETHODIMP mozHunspell::GetDictionaryList(PRUnichar ***aDictionaries,
|
||||
PRUint32 *aCount)
|
||||
{
|
||||
if (!aDictionaries || !aCount)
|
||||
return NS_ERROR_NULL_POINTER;
|
||||
|
||||
AppendNewStruct ans = {
|
||||
(PRUnichar**) NS_Alloc(sizeof(PRUnichar*) * mDictionaries.Count()),
|
||||
0,
|
||||
PR_FALSE
|
||||
};
|
||||
|
||||
// This pointer is used during enumeration
|
||||
mDictionaries.EnumerateRead(AppendNewString, &ans);
|
||||
|
||||
if (ans.failed) {
|
||||
while (ans.count) {
|
||||
--ans.count;
|
||||
NS_Free(ans.dics[ans.count]);
|
||||
}
|
||||
NS_Free(ans.dics);
|
||||
return NS_ERROR_OUT_OF_MEMORY;
|
||||
}
|
||||
|
||||
*aDictionaries = ans.dics;
|
||||
*aCount = ans.count;
|
||||
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
void
|
||||
mozHunspell::LoadDictionaryList()
|
||||
{
|
||||
mDictionaries.Clear();
|
||||
|
||||
nsresult rv;
|
||||
|
||||
nsCOMPtr<nsIProperties> dirSvc =
|
||||
do_GetService(NS_DIRECTORY_SERVICE_CONTRACTID);
|
||||
if (!dirSvc)
|
||||
return;
|
||||
|
||||
nsCOMPtr<nsIFile> dictDir;
|
||||
rv = dirSvc->Get(DICTIONARY_SEARCH_DIRECTORY,
|
||||
NS_GET_IID(nsIFile), getter_AddRefs(dictDir));
|
||||
if (NS_FAILED(rv)) {
|
||||
// default to appdir/dictionaries
|
||||
rv = dirSvc->Get(NS_XPCOM_CURRENT_PROCESS_DIR,
|
||||
NS_GET_IID(nsIFile), getter_AddRefs(dictDir));
|
||||
if (NS_FAILED(rv))
|
||||
return;
|
||||
|
||||
dictDir->AppendNative(NS_LITERAL_CSTRING("dictionaries"));
|
||||
}
|
||||
|
||||
LoadDictionariesFromDir(dictDir);
|
||||
|
||||
nsCOMPtr<nsISimpleEnumerator> dictDirs;
|
||||
rv = dirSvc->Get(DICTIONARY_SEARCH_DIRECTORY_LIST,
|
||||
NS_GET_IID(nsISimpleEnumerator), getter_AddRefs(dictDirs));
|
||||
if (NS_FAILED(rv))
|
||||
return;
|
||||
|
||||
PRBool hasMore;
|
||||
while (NS_SUCCEEDED(dictDirs->HasMoreElements(&hasMore)) && hasMore) {
|
||||
nsCOMPtr<nsISupports> elem;
|
||||
dictDirs->GetNext(getter_AddRefs(elem));
|
||||
|
||||
dictDir = do_QueryInterface(elem);
|
||||
if (dictDir)
|
||||
LoadDictionariesFromDir(dictDir);
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
mozHunspell::LoadDictionariesFromDir(nsIFile* aDir)
|
||||
{
|
||||
nsresult rv;
|
||||
|
||||
PRBool check = PR_FALSE;
|
||||
rv = aDir->Exists(&check);
|
||||
if (NS_FAILED(rv) || !check)
|
||||
return;
|
||||
|
||||
rv = aDir->IsDirectory(&check);
|
||||
if (NS_FAILED(rv) || !check)
|
||||
return;
|
||||
|
||||
nsCOMPtr<nsISimpleEnumerator> e;
|
||||
rv = aDir->GetDirectoryEntries(getter_AddRefs(e));
|
||||
if (NS_FAILED(rv))
|
||||
return;
|
||||
|
||||
nsCOMPtr<nsIDirectoryEnumerator> files(do_QueryInterface(e));
|
||||
if (!files)
|
||||
return;
|
||||
|
||||
nsCOMPtr<nsIFile> file;
|
||||
while (NS_SUCCEEDED(files->GetNextFile(getter_AddRefs(file))) && file) {
|
||||
nsAutoString leafName;
|
||||
file->GetLeafName(leafName);
|
||||
if (!StringEndsWith(leafName, NS_LITERAL_STRING(".dic")))
|
||||
continue;
|
||||
|
||||
nsAutoString dict(leafName);
|
||||
dict.SetLength(dict.Length() - 4); // magic length of ".dic"
|
||||
|
||||
// check for the presence of the .aff file
|
||||
leafName = dict;
|
||||
leafName.AppendLiteral(".aff");
|
||||
file->SetLeafName(leafName);
|
||||
rv = file->Exists(&check);
|
||||
if (NS_FAILED(rv) || !check)
|
||||
continue;
|
||||
|
||||
#ifdef DEBUG_bsmedberg
|
||||
printf("Adding dictionary: %s\n", NS_ConvertUTF16toUTF8(dict).get());
|
||||
#endif
|
||||
|
||||
mDictionaries.Put(dict, file);
|
||||
}
|
||||
}
|
||||
|
||||
nsresult mozHunspell::ConvertCharset(const PRUnichar* aStr, char ** aDst)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aDst);
|
||||
NS_ENSURE_TRUE(mEncoder, NS_ERROR_NULL_POINTER);
|
||||
|
||||
PRInt32 outLength;
|
||||
PRInt32 inLength = nsCRT::strlen(aStr);
|
||||
nsresult rv = mEncoder->GetMaxLength(aStr, inLength, &outLength);
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
*aDst = (char *) nsMemory::Alloc(sizeof(char) * (outLength+1));
|
||||
NS_ENSURE_TRUE(*aDst, NS_ERROR_OUT_OF_MEMORY);
|
||||
|
||||
rv = mEncoder->Convert(aStr, &inLength, *aDst, &outLength);
|
||||
if (NS_SUCCEEDED(rv))
|
||||
(*aDst)[outLength] = '\0';
|
||||
|
||||
return rv;
|
||||
}
|
||||
|
||||
/* boolean Check (in wstring word); */
|
||||
NS_IMETHODIMP mozHunspell::Check(const PRUnichar *aWord, PRBool *aResult)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aWord);
|
||||
NS_ENSURE_ARG_POINTER(aResult);
|
||||
NS_ENSURE_TRUE(mHunspell, NS_ERROR_FAILURE);
|
||||
|
||||
nsXPIDLCString charsetWord;
|
||||
nsresult rv = ConvertCharset(aWord, getter_Copies(charsetWord));
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
*aResult = mHunspell->spell(charsetWord);
|
||||
|
||||
|
||||
if (!*aResult && mPersonalDictionary)
|
||||
rv = mPersonalDictionary->Check(aWord, mLanguage.get(), aResult);
|
||||
|
||||
return rv;
|
||||
}
|
||||
|
||||
/* void Suggest (in wstring word, [array, size_is (count)] out wstring suggestions, out PRUint32 count); */
|
||||
NS_IMETHODIMP mozHunspell::Suggest(const PRUnichar *aWord, PRUnichar ***aSuggestions, PRUint32 *aSuggestionCount)
|
||||
{
|
||||
NS_ENSURE_ARG_POINTER(aSuggestions);
|
||||
NS_ENSURE_ARG_POINTER(aSuggestionCount);
|
||||
NS_ENSURE_TRUE(mHunspell, NS_ERROR_FAILURE);
|
||||
|
||||
nsresult rv;
|
||||
*aSuggestionCount = 0;
|
||||
|
||||
nsXPIDLCString charsetWord;
|
||||
rv = ConvertCharset(aWord, getter_Copies(charsetWord));
|
||||
NS_ENSURE_SUCCESS(rv, rv);
|
||||
|
||||
char ** wlst;
|
||||
*aSuggestionCount = mHunspell->suggest(&wlst, charsetWord);
|
||||
|
||||
if (*aSuggestionCount) {
|
||||
*aSuggestions = (PRUnichar **)nsMemory::Alloc(*aSuggestionCount * sizeof(PRUnichar *));
|
||||
if (*aSuggestions) {
|
||||
PRUint32 index = 0;
|
||||
for (index = 0; index < *aSuggestionCount && NS_SUCCEEDED(rv); ++index) {
|
||||
// Convert the suggestion to utf16
|
||||
PRInt32 inLength = nsCRT::strlen(wlst[index]);
|
||||
PRInt32 outLength;
|
||||
rv = mDecoder->GetMaxLength(wlst[index], inLength, &outLength);
|
||||
if (NS_SUCCEEDED(rv))
|
||||
{
|
||||
(*aSuggestions)[index] = (PRUnichar *) nsMemory::Alloc(sizeof(PRUnichar) * (outLength+1));
|
||||
if ((*aSuggestions)[index])
|
||||
{
|
||||
rv = mDecoder->Convert(wlst[index], &inLength, (*aSuggestions)[index], &outLength);
|
||||
if (NS_SUCCEEDED(rv))
|
||||
(*aSuggestions)[index][outLength] = 0;
|
||||
}
|
||||
else
|
||||
rv = NS_ERROR_OUT_OF_MEMORY;
|
||||
}
|
||||
}
|
||||
|
||||
if (NS_FAILED(rv))
|
||||
NS_FREE_XPCOM_ALLOCATED_POINTER_ARRAY(index, *aSuggestions); // free the PRUnichar strings up to the point at which the error occurred
|
||||
}
|
||||
else // if (*aSuggestions)
|
||||
rv = NS_ERROR_OUT_OF_MEMORY;
|
||||
}
|
||||
|
||||
NS_FREE_XPCOM_ALLOCATED_POINTER_ARRAY(*aSuggestionCount, wlst);
|
||||
return rv;
|
||||
}
|
||||
|
||||
NS_IMETHODIMP
|
||||
mozHunspell::Observe(nsISupports* aSubj, const char *aTopic,
|
||||
const PRUnichar *aData)
|
||||
{
|
||||
NS_ASSERTION(!strcmp(aTopic, "profile-do-change"),
|
||||
"Unexpected observer topic");
|
||||
|
||||
LoadDictionaryList();
|
||||
|
||||
return NS_OK;
|
||||
}
|
|
@ -0,0 +1,113 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Michiel van Leeuwen (mvl@exedo.nl)
|
||||
* Caolan McNamara (cmc@openoffice.org)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef mozHunspell_h__
|
||||
#define mozHunspell_h__
|
||||
|
||||
#include "hunspell.hxx"
|
||||
#include "mozISpellCheckingEngine.h"
|
||||
#include "mozIPersonalDictionary.h"
|
||||
#include "nsString.h"
|
||||
#include "nsCOMPtr.h"
|
||||
#include "nsIObserver.h"
|
||||
#include "nsIUnicodeEncoder.h"
|
||||
#include "nsIUnicodeDecoder.h"
|
||||
#include "nsInterfaceHashtable.h"
|
||||
#include "nsWeakReference.h"
|
||||
|
||||
#define MOZ_HUNSPELL_CONTRACTID "@mozilla.org/spellchecker/hunspell;1"
|
||||
#define MOZ_HUNSPELL_CID \
|
||||
/* 56c778e4-1bee-45f3-a689-886692a97fe7 */ \
|
||||
{ 0x56c778e4, 0x1bee, 0x45f3, \
|
||||
{ 0xa6, 0x89, 0x88, 0x66, 0x92, 0xa9, 0x7f, 0xe7 } }
|
||||
|
||||
class mozHunspell : public mozISpellCheckingEngine,
|
||||
public nsIObserver,
|
||||
public nsSupportsWeakReference
|
||||
{
|
||||
public:
|
||||
NS_DECL_ISUPPORTS
|
||||
NS_DECL_MOZISPELLCHECKINGENGINE
|
||||
NS_DECL_NSIOBSERVER
|
||||
|
||||
mozHunspell() : mHunspell(nsnull) { }
|
||||
virtual ~mozHunspell();
|
||||
|
||||
nsresult Init();
|
||||
|
||||
void LoadDictionaryList();
|
||||
void LoadDictionariesFromDir(nsIFile* aDir);
|
||||
|
||||
// helper method for converting a word to the charset of the dictionary
|
||||
nsresult ConvertCharset(const PRUnichar* aStr, char ** aDst);
|
||||
|
||||
protected:
|
||||
|
||||
nsCOMPtr<mozIPersonalDictionary> mPersonalDictionary;
|
||||
nsCOMPtr<nsIUnicodeEncoder> mEncoder;
|
||||
nsCOMPtr<nsIUnicodeDecoder> mDecoder;
|
||||
|
||||
// Hashtable matches dictionary name to .aff file
|
||||
nsInterfaceHashtable<nsStringHashKey, nsIFile> mDictionaries;
|
||||
nsString mDictionary;
|
||||
nsString mLanguage;
|
||||
|
||||
Hunspell *mHunspell;
|
||||
};
|
||||
|
||||
#endif
|
|
@ -0,0 +1,178 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Benjamin Smedberg (benjamin@smedbergs.us) (Original Code)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* Ryan VanderMeulen (ryanvm@gmail.com)
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#include "mozHunspellDirProvider.h"
|
||||
#include "nsXULAppAPI.h"
|
||||
#include "nsString.h"
|
||||
|
||||
#include "mozISpellCheckingEngine.h"
|
||||
#include "nsICategoryManager.h"
|
||||
|
||||
NS_IMPL_ISUPPORTS2(mozHunspellDirProvider,
|
||||
nsIDirectoryServiceProvider,
|
||||
nsIDirectoryServiceProvider2)
|
||||
|
||||
NS_IMETHODIMP
|
||||
mozHunspellDirProvider::GetFile(const char *aKey, PRBool *aPersist,
|
||||
nsIFile* *aResult)
|
||||
{
|
||||
return NS_ERROR_FAILURE;
|
||||
}
|
||||
|
||||
NS_IMETHODIMP
|
||||
mozHunspellDirProvider::GetFiles(const char *aKey,
|
||||
nsISimpleEnumerator* *aResult)
|
||||
{
|
||||
if (strcmp(aKey, DICTIONARY_SEARCH_DIRECTORY_LIST) != 0) {
|
||||
return NS_ERROR_FAILURE;
|
||||
}
|
||||
|
||||
nsCOMPtr<nsIProperties> dirSvc =
|
||||
do_GetService(NS_DIRECTORY_SERVICE_CONTRACTID);
|
||||
if (!dirSvc)
|
||||
return NS_ERROR_FAILURE;
|
||||
|
||||
nsCOMPtr<nsISimpleEnumerator> list;
|
||||
nsresult rv = dirSvc->Get(XRE_EXTENSIONS_DIR_LIST,
|
||||
NS_GET_IID(nsISimpleEnumerator),
|
||||
getter_AddRefs(list));
|
||||
if (NS_FAILED(rv))
|
||||
return rv;
|
||||
|
||||
nsCOMPtr<nsISimpleEnumerator> e = new AppendingEnumerator(list);
|
||||
if (!e)
|
||||
return NS_ERROR_OUT_OF_MEMORY;
|
||||
|
||||
*aResult = nsnull;
|
||||
e.swap(*aResult);
|
||||
return NS_SUCCESS_AGGREGATE_RESULT;
|
||||
}
|
||||
|
||||
NS_IMPL_ISUPPORTS1(mozHunspellDirProvider::AppendingEnumerator,
|
||||
nsISimpleEnumerator)
|
||||
|
||||
NS_IMETHODIMP
|
||||
mozHunspellDirProvider::AppendingEnumerator::HasMoreElements(PRBool *aResult)
|
||||
{
|
||||
*aResult = mNext ? PR_TRUE : PR_FALSE;
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
NS_IMETHODIMP
|
||||
mozHunspellDirProvider::AppendingEnumerator::GetNext(nsISupports* *aResult)
|
||||
{
|
||||
if (aResult)
|
||||
NS_ADDREF(*aResult = mNext);
|
||||
|
||||
mNext = nsnull;
|
||||
|
||||
nsresult rv;
|
||||
|
||||
// Ignore all errors
|
||||
|
||||
PRBool more;
|
||||
while (NS_SUCCEEDED(mBase->HasMoreElements(&more)) && more) {
|
||||
nsCOMPtr<nsISupports> nextbasesupp;
|
||||
mBase->GetNext(getter_AddRefs(nextbasesupp));
|
||||
|
||||
nsCOMPtr<nsIFile> nextbase(do_QueryInterface(nextbasesupp));
|
||||
if (!nextbase)
|
||||
continue;
|
||||
|
||||
nextbase->Clone(getter_AddRefs(mNext));
|
||||
if (!mNext)
|
||||
continue;
|
||||
|
||||
mNext->AppendNative(NS_LITERAL_CSTRING("dictionaries"));
|
||||
|
||||
PRBool exists;
|
||||
rv = mNext->Exists(&exists);
|
||||
if (NS_SUCCEEDED(rv) && exists)
|
||||
break;
|
||||
|
||||
mNext = nsnull;
|
||||
}
|
||||
|
||||
return NS_OK;
|
||||
}
|
||||
|
||||
mozHunspellDirProvider::AppendingEnumerator::AppendingEnumerator
|
||||
(nsISimpleEnumerator* aBase) :
|
||||
mBase(aBase)
|
||||
{
|
||||
// Initialize mNext to begin
|
||||
GetNext(nsnull);
|
||||
}
|
||||
|
||||
NS_METHOD
|
||||
mozHunspellDirProvider::Register(nsIComponentManager* aCompMgr,
|
||||
nsIFile* aPath, const char *aLoaderStr,
|
||||
const char *aType,
|
||||
const nsModuleComponentInfo *aInfo)
|
||||
{
|
||||
nsresult rv;
|
||||
|
||||
nsCOMPtr<nsICategoryManager> catMan =
|
||||
do_GetService(NS_CATEGORYMANAGER_CONTRACTID);
|
||||
if (!catMan)
|
||||
return NS_ERROR_FAILURE;
|
||||
|
||||
rv = catMan->AddCategoryEntry(XPCOM_DIRECTORY_PROVIDER_CATEGORY,
|
||||
"spellcheck-directory-provider",
|
||||
kContractID, PR_TRUE, PR_TRUE, nsnull);
|
||||
return rv;
|
||||
}
|
||||
|
||||
NS_METHOD
|
||||
mozHunspellDirProvider::Unregister(nsIComponentManager* aCompMgr,
|
||||
nsIFile* aPath,
|
||||
const char *aLoaderStr,
|
||||
const nsModuleComponentInfo *aInfo)
|
||||
{
|
||||
nsresult rv;
|
||||
|
||||
nsCOMPtr<nsICategoryManager> catMan =
|
||||
do_GetService(NS_CATEGORYMANAGER_CONTRACTID);
|
||||
if (!catMan)
|
||||
return NS_ERROR_FAILURE;
|
||||
|
||||
rv = catMan->DeleteCategoryEntry(XPCOM_DIRECTORY_PROVIDER_CATEGORY,
|
||||
"spellcheck-directory-provider",
|
||||
PR_TRUE);
|
||||
return rv;
|
||||
}
|
||||
|
||||
char const *const
|
||||
mozHunspellDirProvider::kContractID = "@mozilla.org/spellcheck/dir-provider;1";
|
|
@ -0,0 +1,81 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Benjamin Smedberg (benjamin@smedbergs.us) (Original Code)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* Ryan VanderMeulen (ryanvm@gmail.com)
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef mozHunspellDirProvider_h__
|
||||
#define mozHunspellDirProvider_h__
|
||||
|
||||
#include "nsIDirectoryService.h"
|
||||
#include "nsIGenericFactory.h"
|
||||
#include "nsISimpleEnumerator.h"
|
||||
|
||||
class mozHunspellDirProvider :
|
||||
public nsIDirectoryServiceProvider2
|
||||
{
|
||||
public:
|
||||
NS_DECL_ISUPPORTS
|
||||
NS_DECL_NSIDIRECTORYSERVICEPROVIDER
|
||||
NS_DECL_NSIDIRECTORYSERVICEPROVIDER2
|
||||
|
||||
static NS_METHOD Register(nsIComponentManager* aCompMgr,
|
||||
nsIFile* aPath, const char *aLoaderStr,
|
||||
const char *aType,
|
||||
const nsModuleComponentInfo *aInfo);
|
||||
|
||||
static NS_METHOD Unregister(nsIComponentManager* aCompMgr,
|
||||
nsIFile* aPath, const char *aLoaderStr,
|
||||
const nsModuleComponentInfo *aInfo);
|
||||
|
||||
static char const *const kContractID;
|
||||
|
||||
private:
|
||||
class AppendingEnumerator : public nsISimpleEnumerator
|
||||
{
|
||||
public:
|
||||
NS_DECL_ISUPPORTS
|
||||
NS_DECL_NSISIMPLEENUMERATOR
|
||||
|
||||
AppendingEnumerator(nsISimpleEnumerator* aBase);
|
||||
|
||||
private:
|
||||
nsCOMPtr<nsISimpleEnumerator> mBase;
|
||||
nsCOMPtr<nsIFile> mNext;
|
||||
};
|
||||
};
|
||||
|
||||
#define HUNSPELLDIRPROVIDER_CID \
|
||||
{ 0x64d6174c, 0x1496, 0x4ffd, \
|
||||
{ 0x87, 0xf2, 0xda, 0x26, 0x70, 0xf8, 0x89, 0x34 } }
|
||||
|
||||
#endif // mozHunspellDirProvider
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,151 @@
|
|||
/******* BEGIN LICENSE BLOCK *******
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Initial Developers of the Original Code are Kevin Hendricks (MySpell)
|
||||
* and Laszlo Nemeth (Hunspell). Portions created by the Initial Developers
|
||||
* are Copyright (C) 2002-2005 the Initial Developers. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s): Kevin Hendricks (kevin.hendricks@sympatico.ca)
|
||||
* László Németh (nemethl@gyorsposta.hu)
|
||||
* David Einstein (deinst@world.std.com)
|
||||
* Davide Prina
|
||||
* Giuseppe Modugno
|
||||
* Gianluca Turconi
|
||||
* Simon Brouwer
|
||||
* Noll Janos
|
||||
* Biro Arpad
|
||||
* Goldman Eleonora
|
||||
* Sarlos Tamas
|
||||
* Bencsath Boldizsar
|
||||
* Halacsy Peter
|
||||
* Dvornik Laszlo
|
||||
* Gefferth Andras
|
||||
* Nagy Viktor
|
||||
* Varga Daniel
|
||||
* Chris Halls
|
||||
* Rene Engelhard
|
||||
* Bram Moolenaar
|
||||
* Dafydd Jones
|
||||
* Harri Pitkanen
|
||||
* Andras Timar
|
||||
* Tor Lillqvist
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
******* END LICENSE BLOCK *******/
|
||||
|
||||
#ifndef _SUGGESTMGR_HXX_
|
||||
#define _SUGGESTMGR_HXX_
|
||||
|
||||
#define MAXSWL 100
|
||||
#define MAXSWUTF8L (MAXSWL * 4)
|
||||
#define MAX_ROOTS 100
|
||||
#define MAX_WORDS 100
|
||||
#define MAX_GUESS 100
|
||||
#define MAXNGRAMSUGS 5
|
||||
|
||||
#define MINTIMER 500
|
||||
#define MAXPLUSTIMER 500
|
||||
|
||||
#define NGRAM_IGNORE_LENGTH 0
|
||||
#define NGRAM_LONGER_WORSE 1
|
||||
#define NGRAM_ANY_MISMATCH 2
|
||||
|
||||
#include "atypes.hxx"
|
||||
#include "affixmgr.hxx"
|
||||
#include "hashmgr.hxx"
|
||||
#include "langnum.hxx"
|
||||
#include <time.h>
|
||||
|
||||
enum { LCS_UP, LCS_LEFT, LCS_UPLEFT };
|
||||
|
||||
class SuggestMgr
|
||||
{
|
||||
char * ctry;
|
||||
int ctryl;
|
||||
w_char * ctry_utf;
|
||||
|
||||
AffixMgr* pAMgr;
|
||||
int maxSug;
|
||||
struct cs_info * csconv;
|
||||
int utf8;
|
||||
int nosplitsugs;
|
||||
int maxngramsugs;
|
||||
int complexprefixes;
|
||||
|
||||
|
||||
public:
|
||||
SuggestMgr(const char * tryme, int maxn, AffixMgr *aptr);
|
||||
~SuggestMgr();
|
||||
|
||||
int suggest(char*** slst, const char * word, int nsug);
|
||||
int ngsuggest(char ** wlst, char * word, HashMgr* pHMgr);
|
||||
int suggest_auto(char*** slst, const char * word, int nsug);
|
||||
int suggest_stems(char*** slst, const char * word, int nsug);
|
||||
int suggest_pos_stems(char*** slst, const char * word, int nsug);
|
||||
|
||||
char * suggest_morph(const char * word);
|
||||
char * suggest_morph_for_spelling_error(const char * word);
|
||||
|
||||
private:
|
||||
int testsug(char** wlst, const char * candidate, int wl, int ns, int cpdsuggest,
|
||||
int * timer, time_t * timelimit);
|
||||
int checkword(const char *, int, int, int *, time_t *);
|
||||
int check_forbidden(const char *, int);
|
||||
|
||||
int capchars(char **, const char *, int, int);
|
||||
int replchars(char**, const char *, int, int);
|
||||
int doubletwochars(char**, const char *, int, int);
|
||||
int forgotchar(char **, const char *, int, int);
|
||||
int swapchar(char **, const char *, int, int);
|
||||
int longswapchar(char **, const char *, int, int);
|
||||
int movechar(char **, const char *, int, int);
|
||||
int extrachar(char **, const char *, int, int);
|
||||
int badchar(char **, const char *, int, int);
|
||||
int twowords(char **, const char *, int, int);
|
||||
int fixstems(char **, const char *, int);
|
||||
|
||||
int capchars_utf(char **, const w_char *, int wl, int, int);
|
||||
int doubletwochars_utf(char**, const w_char *, int wl, int, int);
|
||||
int forgotchar_utf(char**, const w_char *, int wl, int, int);
|
||||
int extrachar_utf(char**, const w_char *, int wl, int, int);
|
||||
int badchar_utf(char **, const w_char *, int wl, int, int);
|
||||
int swapchar_utf(char **, const w_char *, int wl, int, int);
|
||||
int longswapchar_utf(char **, const w_char *, int, int, int);
|
||||
int movechar_utf(char **, const w_char *, int, int, int);
|
||||
|
||||
int mapchars(char**, const char *, int);
|
||||
int map_related(const char *, int, char ** wlst, int, const mapentry*, int, int *, time_t *);
|
||||
int map_related_utf(w_char *, int, int, char ** wlst, int, const mapentry*, int, int *, time_t *);
|
||||
int ngram(int n, char * s1, const char * s2, int uselen);
|
||||
int mystrlen(const char * word);
|
||||
int equalfirstletter(char * s1, const char * s2);
|
||||
int commoncharacterpositions(char * s1, const char * s2, int * is_swap);
|
||||
void bubblesort( char ** rwd, int * rsc, int n);
|
||||
void lcs(const char * s, const char * s2, int * l1, int * l2, char ** result);
|
||||
int lcslen(const char * s, const char* s2);
|
||||
|
||||
};
|
||||
|
||||
#endif
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
OpenOffice.org Hunspell en_US dictionary
|
||||
2007-03-20 release
|
||||
--
|
||||
This dictionary is based on a subset of the original
|
||||
English wordlist created by Kevin Atkinson for Pspell
|
||||
and Aspell and thus is covered by his original
|
||||
LGPL license. The affix file is a heavily modified
|
||||
version of the original english.aff file which was
|
||||
released as part of Geoff Kuenning's Ispell and as
|
||||
such is covered by his BSD license.
|
||||
|
||||
Thanks to both authors for there wonderful work.
|
||||
|
||||
ChangeLog
|
||||
|
||||
2007-03-20 nemeth AT OOo
|
||||
|
||||
- alot -> a_lot REP suggestion, add "a lot"
|
||||
- add Mozilla words (blog, cafe, inline, online, eBay, PayPal, etc.)
|
||||
- add cybercafé
|
||||
- alias compression (saving 15 kB disk space + 0.8 MB memory)
|
||||
|
||||
Mozilla 355178 - add scot-free
|
||||
Mozilla 374411 - add Scotty
|
||||
Mozilla 359305 - add archaeology, archeological, archeologist
|
||||
Mozilla 358754 - add doughnut
|
||||
Mozilla 254814 - add gauging, canoeing, *canoing, proactively
|
||||
Issue 71718 - remove *opthalmic, *opthalmology; *opthalmologic -> ophthalmologic
|
||||
Issue 68550 - *estoppal -> estoppel
|
||||
Issue 69345 - add tapenade
|
||||
Issue 67975 - add assistive
|
||||
Issue 63541 - remove *dessicate
|
||||
Issue 62599 - add toolbar
|
||||
|
||||
2006-02-07 nemeth AT OOo
|
||||
|
||||
Issue 48060 - add ordinal numbers with COMPOUNDRULE (1st, 11th, 101st etc.)
|
||||
Issue 29112, 55498 - add NOSUGGEST flags to taboo words
|
||||
Issue 56755 - add sequitor (non sequitor)
|
||||
Issue 50616 - add open source words (GNOME, KDE, OOo, OpenOffice.org)
|
||||
Issue 56389 - add Mozilla words (Mozilla, Firefox, Thunderbird)
|
||||
Issue 29110 - add okay
|
||||
Issue 58468 - add advisors
|
||||
Issue 58708 - add hiragana & katakana
|
||||
Issue 60240 - add arginine, histidine, monovalent, polymorphism, pyroelectric, pyroelectricity
|
||||
|
||||
2005-11-01 dnaber AT OOo
|
||||
|
||||
Issue 25797 - add proven, advisor, etc.
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Загрузка…
Ссылка в новой задаче