edu.udo.cs.miningmart.compiler.utils
Class DrawSample
java.lang.Object
edu.udo.cs.miningmart.compiler.utils.Sampling
edu.udo.cs.miningmart.compiler.utils.DrawSample
- public class DrawSample
- extends edu.udo.cs.miningmart.compiler.utils.Sampling
This class encapsulates the frequently occoring sampling of data
from a database table. It can for instance be instantiated from
operators working on samples. Different constructors allow to
leave the calculation of unknown values to this class, so it may
for instance be invoked either with a sample size or with a sample
ratio.
- Version:
- $Id: DrawSample.java,v 1.3 2006/04/11 14:10:18 euler Exp $
- Author:
- Martin Scholz
Constructor Summary |
DrawSample(Columnset sourceCs,
java.util.Collection selectedColumns,
java.lang.String destTable,
java.lang.String tempTable,
java.lang.Long rowcount,
long sampleSize,
java.lang.Long seed,
CompilerDatabaseService db)
|
DrawSample(Columnset sourceCs,
java.lang.String destTable,
java.lang.String tempTable,
double ratio,
CompilerDatabaseService db)
Default version of the constructor:
Random numbers are not fixed by specifying the random seed. |
DrawSample(Columnset sourceCs,
java.lang.String destTable,
java.lang.String tempTable,
java.lang.Long rowcount,
double ratio,
java.lang.Long seed,
CompilerDatabaseService db)
|
DrawSample(Columnset sourceCs,
java.lang.String destTable,
java.lang.String tempTable,
java.lang.Long rowcount,
long sampleSize,
java.lang.Long seed,
CompilerDatabaseService db)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
COMMIT_LIMIT
public static final long COMMIT_LIMIT
- See Also:
- Constant Field Values
materializedInput
protected boolean materializedInput
usingPostgres
protected boolean usingPostgres
usingOracle
protected boolean usingOracle
allRowIds
protected java.util.Vector allRowIds
thePrimaryKey
protected java.lang.String[] thePrimaryKey
numericDatatypeName
protected java.lang.String numericDatatypeName
rowIdentifierName
protected java.lang.String rowIdentifierName
DrawSample
public DrawSample(Columnset sourceCs,
java.lang.String destTable,
java.lang.String tempTable,
double ratio,
CompilerDatabaseService db)
throws M4CompilerError
- Default version of the constructor:
- Random numbers are not fixed by specifying the random seed.
- The number of rows is not known in advance.
- A ratio is given rather than a sample size.
- Parameters:
sourceCs
- the source Columnset
to draw a sample fromdestTable
- the name of the output tabletempTable
- the name of the temporary table used by this classratio
- the sample ratio, a value in [0, 1]
.db
- a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB
object.
- Throws:
M4CompilerError
- if the sampling fails.
DrawSample
public DrawSample(Columnset sourceCs,
java.lang.String destTable,
java.lang.String tempTable,
java.lang.Long rowcount,
double ratio,
java.lang.Long seed,
CompilerDatabaseService db)
throws M4CompilerError
- Parameters:
sourceCs
- the source Columnset
to draw a sample fromdestTable
- the name of the output tabletempTable
- the name of the temporary table used by this classrowcount
- the number of rows in the source Columnset
,
or null
, if this value is not known in advance.
The number of rows is calculated by the class in the latter case.ratio
- the sample ratio, a value in [0, 1]
.seed
- the random seed to be used or null
to use a
"random" random seed.db
- a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB
object.
- Throws:
M4CompilerError
- if the sampling fails.
DrawSample
public DrawSample(Columnset sourceCs,
java.lang.String destTable,
java.lang.String tempTable,
java.lang.Long rowcount,
long sampleSize,
java.lang.Long seed,
CompilerDatabaseService db)
throws M4CompilerError
- Parameters:
sourceCs
- the source Columnset
to draw a sample fromdestTable
- the name of the output tabletempTable
- the name of the temporary table used by this classsampleSize
- the number of tuples the sample is approximately going to haverowcount
- the number of rows in the source Columnset
,
or null
, if this value is not known in advance.
The number of rows is calculated by the class in the latter case.seed
- the random seed to be used or null
to use a
"random" random seed.db
- a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB
object.
- Throws:
M4CompilerError
- if the sampling fails.
DrawSample
public DrawSample(Columnset sourceCs,
java.util.Collection selectedColumns,
java.lang.String destTable,
java.lang.String tempTable,
java.lang.Long rowcount,
long sampleSize,
java.lang.Long seed,
CompilerDatabaseService db)
throws M4CompilerError
- Parameters:
sourceCs
- the source Columnset
to draw a sample fromselectedColumns
- a Collection
with column names in upper
case letters. Specifies the subset of columns of the source columnset
to be contained in the sample table.
null
indicates to select all columns.destTable
- the name of the output tabletempTable
- the name of the temporary table used by this classsampleSize
- the number of tuples the sample is approximately going to haverowcount
- the number of rows in the source Columnset
,
or null
, if this value is not known in advance.
The number of rows is calculated by the class in the latter case.seed
- the random seed to be used or null
to use a
"random" random seed.db
- a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB
object.
- Throws:
M4CompilerError
- if the sampling fails.
getDestTableName
public java.lang.String getDestTableName()
- Returns:
- name of the destination table
getNextBoolean
public boolean getNextBoolean()
- Returns:
- a random boolean value. The probability of receiving
true
is equal to the variable ratio
specified in the constructor.
getSourceAttributes
public java.lang.String getSourceAttributes()
throws M4CompilerError
- Returns:
- a comma separated list of the source columnset's attribute
names "registered" as M4 columns.
- Throws:
M4CompilerError
getSourceAttributeDefinitions
public java.lang.String getSourceAttributeDefinitions()
throws M4CompilerError
- Returns:
- the list of attributes "registered" as columns,
in the format necessary for an SQL
SELECT
statement.
If columns are "virtual", then the SQL definition followed
by the name is returned.
- Throws:
M4CompilerError
getSourceTableName
public java.lang.String getSourceTableName()
- Returns:
- name of the source columnset
getSourceTableColumns
public java.util.Collection getSourceTableColumns()
throws M4CompilerError
- Returns:
- collection of columns of the source columnset
- Throws:
M4CompilerError
getTempTableName
public java.lang.String getTempTableName()
- Returns:
- name of the temporary table to use
getRowCount
public long getRowCount()
- Returns:
- number of rows of the source columnset
getNextRandomDouble
public double getNextRandomDouble()
- Returns:
- a new uniformly distributed random double
dbWrite
protected void dbWrite(java.lang.String sql)
throws java.sql.SQLException,
DbConnectionClosed
- Parameters:
sql
- an sql string to be executed in the business database.
- Throws:
an
- SQLException
, if the database operations fail.
an
- DbConnectionClosed
, if the database
connection has been closed after a request to stop the thread.
java.sql.SQLException
DbConnectionClosed
deleteTable
protected void deleteTable(java.lang.String tableName)
throws M4CompilerError
- Helper method to delete a table and to ignore a possible
"table does not exist" exception.
- Throws:
M4CompilerError
- See Also:
edu.udo.cs.miningmart.m4.core.utils.DB#dropBusinessTable(String)
commit
protected void commit()
throws DbConnectionClosed,
java.sql.SQLException
- Throws:
DbConnectionClosed
java.sql.SQLException
Copyright © 2001-2005