|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.udo.cs.wvtool.wordlist.WVTWordList
public class WVTWordList
This class represents a word list. It is used to store information about individual words, to count words and to calculate the vectors.
Field Summary | |
---|---|
private boolean |
appendWords
indicates, whether missing words should be added to the list |
private int |
numClasses
the number of possible class values |
private int |
numDocuments
the number of documents processed so far |
private int |
numLocalTerms
the number of terms processed in the current document so far |
private boolean |
updateOnlyCurrent
indicates, whether the document and class frequencies should be updated as well, or only the frequencies for the current document |
private java.util.List |
wordList
A sequential indexing structure, to ensure a fixed order of all words in the list |
private java.util.Map |
wordMap
A Hash used to find words efficiently |
Constructor Summary | |
---|---|
WVTWordList(int numClasses)
Create a new instance of WVTWordList. |
|
WVTWordList(java.util.List words,
int numClasses)
|
|
WVTWordList(java.io.Reader in)
Create a new instance of WVTWordList by reading it from a stream. |
Method Summary | |
---|---|
void |
addWordOccurance(java.lang.String word)
Count the occurance of the given word. |
void |
closeDocument(WVTDocumentInfo d)
Used to reset the calculation for individual documents after the given document has been processed. |
int[] |
getClassFrequencies(int classValue)
Get the document frequencies of documents having a given class value. |
int[] |
getDocumentFrequencies()
Get the document frequencies. |
int[] |
getFrequenciesForCurrentDocument()
Get the word frequencies for the document that is currently processed. |
int |
getFrequencyByRank(int p)
Returns the document frequency of the word that is on the p-th rank, assuming that each word occupies exactly one rank. |
int |
getNumDocuments()
Returns the numDocuments. |
int |
getNumWords()
Return the number of words in the list. |
int |
getTermCountForCurrentDocument()
|
java.lang.String |
getWord(int index)
Returns the WVTWord with the given index. |
boolean |
isAppendWords()
Returns the appendWords. |
boolean |
isUpdateOnlyCurrent()
Returns the updateOnlyCurrent. |
void |
pruneByFrequency(int min,
int max)
Prune the word list by document frequencies. |
void |
setAppendWords(boolean appendWords)
Sets the appendWords. |
void |
setUpdateOnlyCurrent(boolean updateOnlyCurrent)
Sets the updateOnlyCurrent. |
void |
store(java.io.Writer out)
Write the wordlist to a stream. |
void |
storePlain(java.io.Writer out)
Write the wordlist to a stream without any additional info. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private java.util.Map wordMap
private java.util.List wordList
private int numClasses
private boolean appendWords
private boolean updateOnlyCurrent
private int numDocuments
private int numLocalTerms
Constructor Detail |
---|
public WVTWordList(int numClasses)
numClasses
- the number of possible class valuespublic WVTWordList(java.util.List words, int numClasses)
public WVTWordList(java.io.Reader in)
in
- the stream from which to read the informationMethod Detail |
---|
public void addWordOccurance(java.lang.String word)
word
- the wordpublic void closeDocument(WVTDocumentInfo d)
d
- information about the documentpublic int[] getFrequenciesForCurrentDocument()
public int getTermCountForCurrentDocument()
public int[] getDocumentFrequencies()
public int[] getClassFrequencies(int classValue)
classValue
- the class value
public void store(java.io.Writer out)
out
- the stream to which to write the word listpublic void storePlain(java.io.Writer out)
out
- the stream to which to write the word listpublic boolean isAppendWords()
public boolean isUpdateOnlyCurrent()
public void setAppendWords(boolean appendWords)
appendWords
- The appendWords to setpublic void setUpdateOnlyCurrent(boolean updateOnlyCurrent)
updateOnlyCurrent
- The updateOnlyCurrent to setpublic int getNumDocuments()
public int getNumWords()
public void pruneByFrequency(int min, int max)
min
- minimal frequency, all words with less frequency will be
deletedmax
- maximal frequency, all words with more frequency will be
deletedpublic int getFrequencyByRank(int p)
p
- the rank of the word starting with 1 for the first rank
public java.lang.String getWord(int index)
index
- the index of the word
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |