edu.udo.cs.wvtool.main
Class WVTool

java.lang.Object
  extended by edu.udo.cs.wvtool.main.WVTool

public class WVTool
extends java.lang.Object

Main class of the word vector tool. It provides all the functionality and can be called directly via java.

Version:
$Id: WVTool.java,v 1.4 2006/06/06 11:45:23 mjwurst Exp $
Author:
Michael Wurst

Field Summary
private static int DEFAULT_PRUNE_MAX
          upper boundary for automatic pruning
private static int DEFAULT_PRUNE_MIN
          lower boundary for automatic pruning
private  boolean skipErrors
          should errors be skiped
 
Constructor Summary
WVTool(boolean skipErrors)
          Create a new WVTool instance.
 
Method Summary
 WVTWordVector createVector(java.lang.String text, WVTDocumentInfo d, WVTConfiguration config, WVTWordList wordList)
          Create a single word vector.
 WVTWordVector createVector(java.lang.String text, WVTWordList wordList)
          Create an individual word vector from a String using TF/IDF weights and stadard configuration.
 void createVectors(WVTInputList input, WVTConfiguration config)
          Deprecated. Please use the method createVectors(WVTInputList input, WVTConfiguration config, int pruneMin, int pruneMax)
 void createVectors(WVTInputList input, WVTConfiguration config, int pruneMin, int pruneMax)
          Create a word list and after this word vectors, both from the same input list.
 void createVectors(WVTInputList input, WVTConfiguration config, WVTWordList wordList)
          Create word vectors from an input list.
 WVTWordList createWordList(WVTInputList input, WVTConfiguration config)
          Create a word list from scrat based on the given texts.
 WVTWordList createWordList(WVTInputList input, WVTConfiguration config, java.util.List initialWords, boolean addWords)
          Create a word list based on an existing word list.
 void iterateWords(WVTInputList input, WVTConfiguration config, WVToolWordListener listener)
          Process the specified documents using the configured steps and send all encountered words to a listener class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_PRUNE_MIN

private static final int DEFAULT_PRUNE_MIN
lower boundary for automatic pruning

See Also:
Constant Field Values


DEFAULT_PRUNE_MAX

private static final int DEFAULT_PRUNE_MAX
upper boundary for automatic pruning

See Also:
Constant Field Values


skipErrors

private boolean skipErrors
should errors be skiped

Constructor Detail

WVTool

public WVTool(boolean skipErrors)
Create a new WVTool instance.

Parameters:
skipErrors - should errors be skip (and only be written to the error log) or should an Exception be thrown

Method Detail

createWordList

public WVTWordList createWordList(WVTInputList input,
                                  WVTConfiguration config)
                           throws WVToolException
Create a word list from scrat based on the given texts.

Parameters:
input - the input list from which word list is created
config - the underlying configuration
Returns:
a WVTWordList object
Throws:
java.lang.Exception
WVToolException


createWordList

public WVTWordList createWordList(WVTInputList input,
                                  WVTConfiguration config,
                                  java.util.List initialWords,
                                  boolean addWords)
                           throws WVToolException
Create a word list based on an existing word list.

Parameters:
input - the input list from which word list is created
config - the underlying configuration
initialWords - initial list of words to use
addWords - should words, appearing in texts but not in the initial list be added to the list
Returns:
a WVTWordList object
Throws:
java.lang.Exception
WVToolException


createVectors

public void createVectors(WVTInputList input,
                          WVTConfiguration config,
                          int pruneMin,
                          int pruneMax)
                   throws WVToolException
Create a word list and after this word vectors, both from the same input list.

Parameters:
input - the input list
config - the configuration
pruneMin - the minimal number of occurences of a word to be considered
pruneMax - the maximum number of occurences of a word to be considered
Throws:
WVToolException


createVectors

public void createVectors(WVTInputList input,
                          WVTConfiguration config)
                   throws WVToolException
Deprecated. Please use the method createVectors(WVTInputList input, WVTConfiguration config, int pruneMin, int pruneMax)

Create a word list and after this word vectors, both from the same input list.

Parameters:
input - the input list
config - the configuration
Throws:
WVToolException


createVectors

public void createVectors(WVTInputList input,
                          WVTConfiguration config,
                          WVTWordList wordList)
                   throws WVToolException
Create word vectors from an input list.

Parameters:
input - the input list
config - the configuration
wordList - a word list (possibly containing document and class frequencies).
Throws:
java.lang.Exception
WVToolException


createVector

public WVTWordVector createVector(java.lang.String text,
                                  WVTDocumentInfo d,
                                  WVTConfiguration config,
                                  WVTWordList wordList)
                           throws WVToolException
Create a single word vector.

Parameters:
text - the underlying text
d - information about the text
config - the configuration to use (though it will be only partly used)
wordList - the word list to use
Returns:
WVTWordVector
Throws:
WVToolException


createVector

public WVTWordVector createVector(java.lang.String text,
                                  WVTWordList wordList)
                           throws WVToolException
Create an individual word vector from a String using TF/IDF weights and stadard configuration.

Parameters:
text - the underlying text
wordList - a wordlist (for IDF)
Returns:
a WVTWordVector
Throws:
java.lang.Exception
WVToolException


iterateWords

public void iterateWords(WVTInputList input,
                         WVTConfiguration config,
                         WVToolWordListener listener)
                  throws WVToolException
Process the specified documents using the configured steps and send all encountered words to a listener class. This method can be used to implement specialized applications that merely use the preprocessing steps of the tool instead of using the vectorization functions.

Parameters:
input - the input list
config - the configuration
listener - a call back class that is invoked on every processed document and word
Throws:
WVToolException