edu.udo.cs.wvtool.generic.tokenizer
Class SimpleTokenizer

java.lang.Object
  extended by edu.udo.cs.wvtool.generic.tokenizer.SimpleTokenizer
All Implemented Interfaces:
WVTTokenizer, TokenEnumeration

public class SimpleTokenizer
extends java.lang.Object
implements WVTTokenizer, TokenEnumeration

This class implements a simple tokenizer. All characters for which Character.isLetter() returns false, are considered to be seperators and are removed.

Version:
$Id: SimpleTokenizer.java,v 1.2 2006/06/06 11:45:24 mjwurst Exp $
Author:
Michael Wurst

Field Summary
private  java.lang.String currentToken
          The token, which is currently provided.
private  java.io.Reader input
          The underlying character stream of the currently tokenized document
 
Constructor Summary
SimpleTokenizer()
           
 
Method Summary
 boolean hasMoreTokens()
          Determine whether there are tokens left in the Enumeration.
 java.lang.String nextToken()
          Return the next token from the stream.
private  void readNextToken()
          Read a token from the character stream and store it into currentToken.
 TokenEnumeration tokenize(java.io.Reader source, WVTDocumentInfo d)
          Tokenize a character stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

input

private java.io.Reader input
The underlying character stream of the currently tokenized document


currentToken

private java.lang.String currentToken
The token, which is currently provided. This buffer is neccessary, to implement the semantic of TokenEnumeration

Constructor Detail

SimpleTokenizer

public SimpleTokenizer()
Method Detail

tokenize

public TokenEnumeration tokenize(java.io.Reader source,
                                 WVTDocumentInfo d)
Description copied from interface: WVTTokenizer
Tokenize a character stream.

Specified by:
tokenize in interface WVTTokenizer
Parameters:
source - the Reader from which to get the character stream
d - the WVTDocumentInfo value, describing the document being processed
Returns:
a TokenEnumeration
See Also:
WVTTokenizer.tokenize(Reader, WVTDocumentInfo)


readNextToken

private void readNextToken()
Read a token from the character stream and store it into currentToken. If there are no more tokens left store a null value.


hasMoreTokens

public boolean hasMoreTokens()
Description copied from interface: TokenEnumeration
Determine whether there are tokens left in the Enumeration. If an error occurs, false is returned.

Specified by:
hasMoreTokens in interface TokenEnumeration
Returns:
a boolean value
See Also:
TokenEnumeration.hasMoreTokens()


nextToken

public java.lang.String nextToken()
Description copied from interface: TokenEnumeration
Return the next token from the stream.

Specified by:
nextToken in interface TokenEnumeration
Returns:
a String value, or null if there are no more tokens
See Also:
TokenEnumeration.nextToken()