edu.udo.cs.wvtool.generic.stemmer
Class DictionaryStemmer
java.lang.Object
edu.udo.cs.wvtool.generic.stemmer.AbstractStemmer
edu.udo.cs.wvtool.generic.stemmer.DictionaryStemmer
- All Implemented Interfaces:
- SimpleStemmer, WVTStemmer, TokenEnumeration
public class DictionaryStemmer
- extends AbstractStemmer
A stemmer that is based on an explicit dictionary containing pairs of terms
and base forms. Terms can be described by regular expressions as well. All
terms in a text that fullfill a given regular expression are then assigned
the user given base form. It is also possible to provide a fallback stemmer
that is called if a term is not found in the dictionary.
- Version:
- $Id$
- Author:
- Michael Wurst
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
termMap
private final java.util.Map termMap
regExprList
private final java.util.Map regExprList
additionalMap
private final java.util.Map additionalMap
addMappings
private final boolean addMappings
fallBackStemmer
private final SimpleStemmer fallBackStemmer
DictionaryStemmer
public DictionaryStemmer()
DictionaryStemmer
public DictionaryStemmer(java.io.Reader in_)
throws java.io.IOException
- Throws:
java.io.IOException
DictionaryStemmer
public DictionaryStemmer(java.io.Reader in_,
SimpleStemmer stemmer,
boolean addMappings)
throws java.io.IOException
- Parameters:
in_
- a stream to the dictionary filestemmer
- a fallback stemmer that is applied, if a word is not found in the dictionaryaddMappings
- if true, mappings creates implicitely by the fallback stemmer are added and can later be stored.
- Throws:
java.io.IOException
containsLettersOnly
private boolean containsLettersOnly(java.lang.String s)
getBase
public java.lang.String getBase(java.lang.String s)
- Description copied from interface:
SimpleStemmer
- Produce the base form of a given term.
- Specified by:
getBase
in interface SimpleStemmer
- Specified by:
getBase
in class AbstractStemmer
- Parameters:
s
- a term
- Returns:
- the base form of the term
addTermMapping
public void addTermMapping(java.lang.String term,
java.lang.String base)
addRegularExpression
public void addRegularExpression(java.lang.String regExprStr,
java.lang.String base)
writeAddedMappings
public void writeAddedMappings(java.io.Writer out_)