edu.udo.cs.wvtool.generic.tokenizer
Class NGramTokenizer

java.lang.Object
  extended by edu.udo.cs.wvtool.generic.tokenizer.NGramTokenizer
All Implemented Interfaces:
WVTTokenizer, TokenEnumeration

public class NGramTokenizer
extends java.lang.Object
implements WVTTokenizer, TokenEnumeration

Creates tokens by creating ngrams of the tokens received from an inner tokenizer.

Version:
$Id$
Author:
Michael Wurst

Field Summary
private  java.util.List currentTokens
          The token, which is currently provided.
private  TokenEnumeration input
           
private  int n
           
private  WVTTokenizer tokenizer
           
 
Constructor Summary
NGramTokenizer(int n, WVTTokenizer tokenizer)
           
 
Method Summary
 boolean hasMoreTokens()
          Determine whether there are tokens left in the Enumeration.
 java.lang.String nextToken()
          Return the next token from the stream.
private  void readNextToken()
          Read a token from the character stream and store it into currentToken.
 TokenEnumeration tokenize(java.io.Reader source, WVTDocumentInfo d)
          Tokenize a character stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

currentTokens

private java.util.List currentTokens
The token, which is currently provided. This buffer is neccessary, to implement the semantic of TokenEnumeration


n

private int n

input

private TokenEnumeration input

tokenizer

private WVTTokenizer tokenizer
Constructor Detail

NGramTokenizer

public NGramTokenizer(int n,
                      WVTTokenizer tokenizer)
Method Detail

tokenize

public TokenEnumeration tokenize(java.io.Reader source,
                                 WVTDocumentInfo d)
                          throws WVToolException
Description copied from interface: WVTTokenizer
Tokenize a character stream.

Specified by:
tokenize in interface WVTTokenizer
Parameters:
source - the Reader from which to get the character stream
d - the WVTDocumentInfo value, describing the document being processed
Returns:
a TokenEnumeration
Throws:
WVToolException
See Also:
WVTTokenizer.tokenize(Reader, WVTDocumentInfo)


readNextToken

private void readNextToken()
                    throws WVToolException
Read a token from the character stream and store it into currentToken. If there are no more tokens left store a null value.

Throws:
WVToolException


hasMoreTokens

public boolean hasMoreTokens()
Description copied from interface: TokenEnumeration
Determine whether there are tokens left in the Enumeration. If an error occurs, false is returned.

Specified by:
hasMoreTokens in interface TokenEnumeration
Returns:
a boolean value
See Also:
TokenEnumeration.hasMoreTokens()


nextToken

public java.lang.String nextToken()
                           throws WVToolException
Description copied from interface: TokenEnumeration
Return the next token from the stream.

Specified by:
nextToken in interface TokenEnumeration
Returns:
a String value, or null if there are no more tokens
Throws:
WVToolException
See Also:
TokenEnumeration.nextToken()