public class SpellChecker
extends java.lang.Object
implements java.io.Closeable
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory); // To index a field of a user index: spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field)); // To index a file containing words: spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt"))); String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
修飾子とタイプ | フィールドと説明 |
---|---|
static float |
DEFAULT_ACCURACY
The default minimum score to use, if not specified by calling
setAccuracy(float) . |
static java.lang.String |
F_SUGGEST
Field name for each word in the ngram index.
|
static java.lang.String |
F_WORD |
コンストラクタと説明 |
---|
SpellChecker(Directory spellIndex)
Use the given directory as a spell checker index with a
LevensteinDistance as the default StringDistance . |
SpellChecker(Directory spellIndex,
org.apache.lucene.search.spell.StringDistance sd)
Use the given directory as a spell checker index.
|
SpellChecker(Directory spellIndex,
org.apache.lucene.search.spell.StringDistance sd,
java.util.Comparator<org.apache.lucene.search.spell.SuggestWord> comparator)
Use the given directory as a spell checker index with the given
StringDistance measure
and the given Comparator for sorting the results. |
修飾子とタイプ | メソッドと説明 |
---|---|
void |
clearIndex()
Removes all terms from the spell check index.
|
void |
close()
Close the IndexSearcher used by this SpellChecker
|
boolean |
exist(java.lang.String word)
Check whether the word exists in the index.
|
float |
getAccuracy()
The accuracy (minimum score) to be used, unless overridden in
suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float) , to
decide whether a suggestion is included or not. |
java.util.Comparator<org.apache.lucene.search.spell.SuggestWord> |
getComparator() |
org.apache.lucene.search.spell.StringDistance |
getStringDistance()
Returns the
StringDistance instance used by this
SpellChecker instance. |
Analyzer |
getWordAnalyzer() |
void |
indexDictionary(org.apache.lucene.search.spell.Dictionary dict,
IndexWriterConfig config,
boolean fullMerge)
Indexes the data from the given
Dictionary . |
void |
setAccuracy(float acc)
Sets the accuracy 0 < minScore < 1; default
DEFAULT_ACCURACY |
void |
setComparator(java.util.Comparator<org.apache.lucene.search.spell.SuggestWord> comparator)
Sets the
Comparator for the SuggestWordQueue . |
void |
setSpellIndex(Directory spellIndexDir)
Use a different index as the spell checker index or re-open
the existing index if
spellIndex is the same value
as given in the constructor. |
void |
setStringDistance(org.apache.lucene.search.spell.StringDistance sd)
Sets the
StringDistance implementation for this
SpellChecker instance. |
void |
setWordAnalyzer(Analyzer wordAnalyzer) |
java.lang.String[] |
suggestSimilar(java.lang.String target,
int numSug)
Suggest similar words.
|
java.lang.String[] |
suggestSimilar(java.lang.String target,
int numSug,
float accuracy)
Suggest similar words.
|
java.lang.String[] |
suggestSimilar(java.lang.String target,
int numSug,
IndexReader ir,
java.lang.String field,
boolean morePopular)
非推奨です。
use suggestSimilar(String, int, IndexReader, String, SuggestMode)
|
java.lang.String[] |
suggestSimilar(java.lang.String target,
int numSug,
IndexReader ir,
java.lang.String field,
boolean morePopular,
float accuracy)
非推奨です。
use suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
|
java.lang.String[] |
suggestSimilar(java.lang.String target,
int numSug,
IndexReader ir,
java.lang.String field,
org.apache.lucene.search.spell.SuggestMode suggestMode)
|
java.lang.String[] |
suggestSimilar(java.lang.String target,
int numSug,
IndexReader ir,
java.lang.String field,
org.apache.lucene.search.spell.SuggestMode suggestMode,
float accuracy)
Suggest similar words (optionally restricted to a field of an index).
|
public static final float DEFAULT_ACCURACY
setAccuracy(float)
.public static final java.lang.String F_SUGGEST
public static final java.lang.String F_WORD
public SpellChecker(Directory spellIndex, org.apache.lucene.search.spell.StringDistance sd) throws java.io.IOException
spellIndex
- the spell index directorysd
- the StringDistance
measurement to usejava.io.IOException
- if Spellchecker can not open the directorypublic SpellChecker(Directory spellIndex) throws java.io.IOException
LevensteinDistance
as the default StringDistance
. The
directory is created if it doesn't exist yet.spellIndex
- the spell index directoryjava.io.IOException
- if spellchecker can not open the directorypublic SpellChecker(Directory spellIndex, org.apache.lucene.search.spell.StringDistance sd, java.util.Comparator<org.apache.lucene.search.spell.SuggestWord> comparator) throws java.io.IOException
StringDistance
measure
and the given Comparator
for sorting the results.spellIndex
- The spelling indexsd
- The distancecomparator
- The comparatorjava.io.IOException
- if there is a problem opening the indexpublic void setSpellIndex(Directory spellIndexDir) throws java.io.IOException
spellIndex
is the same value
as given in the constructor.spellIndexDir
- the spell directory to useAlreadyClosedException
- if the Spellchecker is already closedjava.io.IOException
- if spellchecker can not open the directorypublic void setComparator(java.util.Comparator<org.apache.lucene.search.spell.SuggestWord> comparator)
Comparator
for the SuggestWordQueue
.comparator
- the comparatorpublic java.util.Comparator<org.apache.lucene.search.spell.SuggestWord> getComparator()
public void setStringDistance(org.apache.lucene.search.spell.StringDistance sd)
StringDistance
implementation for this
SpellChecker
instance.sd
- the StringDistance
implementation for this
SpellChecker
instancepublic org.apache.lucene.search.spell.StringDistance getStringDistance()
StringDistance
instance used by this
SpellChecker
instance.StringDistance
instance used by this
SpellChecker
instance.public void setAccuracy(float acc)
DEFAULT_ACCURACY
acc
- The new accuracypublic float getAccuracy()
suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)
, to
decide whether a suggestion is included or not.public void setWordAnalyzer(Analyzer wordAnalyzer)
public Analyzer getWordAnalyzer()
public java.lang.String[] suggestSimilar(java.lang.String target, int numSug) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
target
- the word you want a spell check done onnumSug
- the number of suggested wordsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedsuggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)
public java.lang.String[] suggestSimilar(java.lang.String target, int numSug, float accuracy) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
target
- the word you want a spell check done onnumSug
- the number of suggested wordsaccuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the resultsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedsuggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)
@Deprecated public java.lang.String[] suggestSimilar(java.lang.String target, int numSug, IndexReader ir, java.lang.String field, boolean morePopular) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Uses the getAccuracy()
value passed into the constructor as the accuracy.
target
- the word you want a spell check done onnumSug
- the number of suggested wordsir
- the indexReader of the user index (can be null see field param)field
- the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.morePopular
- return only the suggest words that are as frequent or more frequent than the searched word
(only if restricted mode = (indexReader!=null and field!=null)java.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedsuggestSimilar(String, int, IndexReader, String, SuggestMode, float)
@Deprecated public java.lang.String[] suggestSimilar(java.lang.String target, int numSug, IndexReader ir, java.lang.String field, boolean morePopular, float accuracy) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
target
- the word you want a spell check done onnumSug
- the number of suggested wordsir
- the indexReader of the user index (can be null see field param)field
- the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.morePopular
- return only the suggest words that are as frequent or more frequent than the searched word
(only if restricted mode = (indexReader!=null and field!=null)accuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the resultsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedsuggestSimilar(String, int, IndexReader, String, SuggestMode, float)
public java.lang.String[] suggestSimilar(java.lang.String target, int numSug, IndexReader ir, java.lang.String field, org.apache.lucene.search.spell.SuggestMode suggestMode) throws java.io.IOException
java.io.IOException
public java.lang.String[] suggestSimilar(java.lang.String target, int numSug, IndexReader ir, java.lang.String field, org.apache.lucene.search.spell.SuggestMode suggestMode, float accuracy) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
target
- the word you want a spell check done onnumSug
- the number of suggested wordsir
- the indexReader of the user index (can be null see field param)field
- the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.suggestMode
- (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)accuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the resultsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedpublic void clearIndex() throws java.io.IOException
java.io.IOException
AlreadyClosedException
- if the Spellchecker is already closedpublic boolean exist(java.lang.String word) throws java.io.IOException
word
- java.io.IOException
AlreadyClosedException
- if the Spellchecker is already closedpublic final void indexDictionary(org.apache.lucene.search.spell.Dictionary dict, IndexWriterConfig config, boolean fullMerge) throws java.io.IOException
Dictionary
.dict
- Dictionary to indexconfig
- IndexWriterConfig
to usefullMerge
- whether or not the spellcheck index should be fully mergedAlreadyClosedException
- if the Spellchecker is already closedjava.io.IOException
public void close() throws java.io.IOException
close
インタフェース内 java.io.Closeable
close
インタフェース内 java.lang.AutoCloseable
java.io.IOException
- if the close operation causes an IOException
AlreadyClosedException
- if the SpellChecker
is already closedCopyright © 2009-2018 RONDHUIT Co.,Ltd. All Rights Reserved.