edu.udo.cs.wvtool.generic.stemmer
Class DictionaryStemmer

java.lang.Object
  extended by edu.udo.cs.wvtool.generic.stemmer.AbstractStemmer
      extended by edu.udo.cs.wvtool.generic.stemmer.DictionaryStemmer
All Implemented Interfaces:
SimpleStemmer, WVTStemmer, TokenEnumeration

public class DictionaryStemmer
extends AbstractStemmer

A stemmer that is based on an explicit dictionary containing pairs of terms and base forms. Terms can be described by regular expressions as well. All terms in a text that fullfill a given regular expression are then assigned the user given base form. It is also possible to provide a fallback stemmer that is called if a term is not found in the dictionary.

Version:
$Id$
Author:
Michael Wurst

Field Summary
private  java.util.Map additionalMap
           
private  boolean addMappings
           
private  SimpleStemmer fallBackStemmer
           
private  java.util.Map regExprList
           
private  java.util.Map termMap
           
 
Constructor Summary
DictionaryStemmer()
           
DictionaryStemmer(java.io.Reader in_)
           
DictionaryStemmer(java.io.Reader in_, SimpleStemmer stemmer, boolean addMappings)
           
 
Method Summary
 void addRegularExpression(java.lang.String regExprStr, java.lang.String base)
           
 void addTermMapping(java.lang.String term, java.lang.String base)
           
private  boolean containsLettersOnly(java.lang.String s)
           
 java.lang.String getBase(java.lang.String s)
          Produce the base form of a given term.
 void writeAddedMappings(java.io.Writer out_)
           
 
Methods inherited from class edu.udo.cs.wvtool.generic.stemmer.AbstractStemmer
hasMoreTokens, nextToken, stem
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

termMap

private final java.util.Map termMap

regExprList

private final java.util.Map regExprList

additionalMap

private final java.util.Map additionalMap

addMappings

private final boolean addMappings

fallBackStemmer

private final SimpleStemmer fallBackStemmer
Constructor Detail

DictionaryStemmer

public DictionaryStemmer()

DictionaryStemmer

public DictionaryStemmer(java.io.Reader in_)
                  throws java.io.IOException
Throws:
java.io.IOException

DictionaryStemmer

public DictionaryStemmer(java.io.Reader in_,
                         SimpleStemmer stemmer,
                         boolean addMappings)
                  throws java.io.IOException
Parameters:
in_ - a stream to the dictionary file
stemmer - a fallback stemmer that is applied, if a word is not found in the dictionary
addMappings - if true, mappings creates implicitely by the fallback stemmer are added and can later be stored.
Throws:
java.io.IOException
Method Detail

containsLettersOnly

private boolean containsLettersOnly(java.lang.String s)

getBase

public java.lang.String getBase(java.lang.String s)
Description copied from interface: SimpleStemmer
Produce the base form of a given term.

Specified by:
getBase in interface SimpleStemmer
Specified by:
getBase in class AbstractStemmer
Parameters:
s - a term
Returns:
the base form of the term


addTermMapping

public void addTermMapping(java.lang.String term,
                           java.lang.String base)

addRegularExpression

public void addRegularExpression(java.lang.String regExprStr,
                                 java.lang.String base)

writeAddedMappings

public void writeAddedMappings(java.io.Writer out_)