|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.udo.cs.yale.example.AbstractDataRowReader
edu.udo.cs.yale.example.FileDataRowReader
public class FileDataRowReader
FileDataRowReader implements a DataRowReader that reads DataRows from a file. This is the main data reader for many file formats (including csv) and is used by the ExampleSource operator and the attribute editor.
This class supports the reading of data from multiple source files. Each attribute (including special attributes like labels, weights, ...) might be read from another file. Please note that only the minimum number of lines of all files will be read, i.e. if one of the data source files has less lines than the others, only this number of data rows will be read.
The split points can be defined with regular expressions (please refer to the
Java API). Quoting is possible but not suggested since the runtime is higher.
The user should ensure that the split characters are not included in the data
columns. Please refer to YaleLineReader
for further information.
Unknown attribute values can be marked with empty strings or "?".
Field Summary | |
---|---|
private Attribute[] |
attributes
The attribute descriptions. |
private static int |
COLUMN_NR
|
private java.lang.String[][] |
currentData
This array hold the current data. |
private int[][] |
dataSourceIndex
Array of size [number of attributes][2]. |
private boolean |
eof
Remember if an end of file has occured. |
private int[] |
expectedNumberOfColumns
This array holds the information how many columns each data source should provide. |
private static int |
FILE_NR
|
private java.io.BufferedReader[] |
fileReader
The file readers. |
private boolean |
lineRead
Remember if a line has already been read. |
private int |
linesRead
The number of lines read so far (i.e. the number of examples). |
private int |
maxNumber
The maximum number of examples to read (sampling). |
private RandomGenerator |
random
The random generator used for sampling. |
private double |
sampleRatio
The sample ratio. |
private YaleLineReader |
yaleLineReader
This reader maps lines read from a file to Yale columns. |
Constructor Summary | |
---|---|
FileDataRowReader(DataRowFactory factory,
java.util.List<AttributeDataSource> attributeDataSources,
double sampleRatio,
int sampleSize,
java.lang.String separatorsRegExpr,
char[] commentChars,
boolean useQuotes,
char decimalPointCharacter,
RandomGenerator random)
Constructs a new FileDataRowReader. |
|
FileDataRowReader(DataRowFactory factory,
java.util.List<AttributeDataSource> attributeDataSources,
double sampleRatio,
int sampleSize,
java.lang.String separatorsRegExpr,
char[] commentChars,
boolean useQuotes,
RandomGenerator random)
Constructs a new FileDataRowReader. |
Method Summary | |
---|---|
boolean |
hasNext()
Checks if another line exists and reads. |
private void |
initReader(DataRowFactory factory,
java.util.List<AttributeDataSource> attributeDataSources,
int sampleSize,
java.lang.String separatorsRegExpr,
char[] commentChars,
boolean useQuotes,
char decimalPointCharacter)
Read the complete data. |
DataRow |
next()
Returns the next Example. |
private boolean |
readLine()
Reads a line of data from all file readers. |
void |
skipLine()
Skips the next line, if present. |
Methods inherited from class edu.udo.cs.yale.example.AbstractDataRowReader |
---|
getFactory, remove |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final int FILE_NR
private static final int COLUMN_NR
private java.io.BufferedReader[] fileReader
private Attribute[] attributes
private boolean eof
private boolean lineRead
private double sampleRatio
private int maxNumber
private int linesRead
private java.lang.String[][] currentData
private int[] expectedNumberOfColumns
private YaleLineReader yaleLineReader
private RandomGenerator random
private int[][] dataSourceIndex
fileReader
and the value of dataSourceIndex[i][TOKEN_NR] specifies the index of the
column to use for attribute i.
Constructor Detail |
---|
public FileDataRowReader(DataRowFactory factory, java.util.List<AttributeDataSource> attributeDataSources, double sampleRatio, int sampleSize, java.lang.String separatorsRegExpr, char[] commentChars, boolean useQuotes, RandomGenerator random) throws java.io.IOException
factory
- Factory used to create data rows.attributeDataSources
- List of AttributeDataSource
s.sampleRatio
- the ratio of examples which will be read. Only used if
sampleSize is -1.sampleSize
- Limit sample to the first sampleSize lines read from files. -1
for no limit, then the sampleRatio will be used.separatorsRegExpr
- a regular expression describing the separator characters for
the columns of each linecommentChars
- defines which characters are used to comment the rest of a
lineuseQuotes
- indicates if quotes should be used and parsed. Slows down
reading and should be avoided if possiblerandom
- the random generator used for sampling
java.io.IOException
public FileDataRowReader(DataRowFactory factory, java.util.List<AttributeDataSource> attributeDataSources, double sampleRatio, int sampleSize, java.lang.String separatorsRegExpr, char[] commentChars, boolean useQuotes, char decimalPointCharacter, RandomGenerator random) throws java.io.IOException
factory
- Factory used to create data rows.attributeDataSources
- List of AttributeDataSource
s.sampleRatio
- the ratio of examples which will be read. Only used if
sampleSize is -1.sampleSize
- Limit sample to the first sampleSize lines read from files. -1
for no limit, then the sampleRatio will be used.separatorsRegExpr
- a regular expression describing the separator characters for
the columns of each linecommentChars
- defines which characters are used to comment the rest of a
lineuseQuotes
- indicates if quotes should be used and parsed. Slows down
reading and should be avoided if possibledecimalPointCharacter
- indicates the character used to define a decimal point
separatorrandom
- the random generator used for sampling
java.io.IOException
Method Detail |
---|
public void skipLine()
private void initReader(DataRowFactory factory, java.util.List<AttributeDataSource> attributeDataSources, int sampleSize, java.lang.String separatorsRegExpr, char[] commentChars, boolean useQuotes, char decimalPointCharacter) throws java.io.IOException
java.io.IOException
private boolean readLine() throws java.io.IOException
java.io.IOException
public boolean hasNext()
public DataRow next()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |