edu.udo.cs.miningmart.compiler.utils
Class DrawSample

java.lang.Object
  extended byedu.udo.cs.miningmart.compiler.utils.Sampling
      extended byedu.udo.cs.miningmart.compiler.utils.DrawSample

public class DrawSample
extends edu.udo.cs.miningmart.compiler.utils.Sampling

This class encapsulates the frequently occoring sampling of data from a database table. It can for instance be instantiated from operators working on samples. Different constructors allow to leave the calculation of unknown values to this class, so it may for instance be invoked either with a sample size or with a sample ratio.

Version:
$Id: DrawSample.java,v 1.3 2006/04/11 14:10:18 euler Exp $
Author:
Martin Scholz

Field Summary
protected  java.util.Vector allRowIds
           
static long COMMIT_LIMIT
           
protected  boolean materializedInput
           
protected  java.lang.String numericDatatypeName
           
protected  java.lang.String rowIdentifierName
           
protected  java.lang.String[] thePrimaryKey
           
protected  boolean usingOracle
           
protected  boolean usingPostgres
           
 
Constructor Summary
DrawSample(Columnset sourceCs, java.util.Collection selectedColumns, java.lang.String destTable, java.lang.String tempTable, java.lang.Long rowcount, long sampleSize, java.lang.Long seed, CompilerDatabaseService db)
           
DrawSample(Columnset sourceCs, java.lang.String destTable, java.lang.String tempTable, double ratio, CompilerDatabaseService db)
          Default version of the constructor: Random numbers are not fixed by specifying the random seed.
DrawSample(Columnset sourceCs, java.lang.String destTable, java.lang.String tempTable, java.lang.Long rowcount, double ratio, java.lang.Long seed, CompilerDatabaseService db)
           
DrawSample(Columnset sourceCs, java.lang.String destTable, java.lang.String tempTable, java.lang.Long rowcount, long sampleSize, java.lang.Long seed, CompilerDatabaseService db)
           
 
Method Summary
protected  void commit()
           
protected  void dbWrite(java.lang.String sql)
           
protected  void deleteTable(java.lang.String tableName)
          Helper method to delete a table and to ignore a possible "table does not exist" exception.
 java.lang.String getDestTableName()
           
 boolean getNextBoolean()
           
 double getNextRandomDouble()
           
 long getRowCount()
           
 java.lang.String getSourceAttributeDefinitions()
           
 java.lang.String getSourceAttributes()
           
 java.util.Collection getSourceTableColumns()
           
 java.lang.String getSourceTableName()
           
 java.lang.String getTempTableName()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMMIT_LIMIT

public static final long COMMIT_LIMIT
See Also:
Constant Field Values

materializedInput

protected boolean materializedInput

usingPostgres

protected boolean usingPostgres

usingOracle

protected boolean usingOracle

allRowIds

protected java.util.Vector allRowIds

thePrimaryKey

protected java.lang.String[] thePrimaryKey

numericDatatypeName

protected java.lang.String numericDatatypeName

rowIdentifierName

protected java.lang.String rowIdentifierName
Constructor Detail

DrawSample

public DrawSample(Columnset sourceCs,
                  java.lang.String destTable,
                  java.lang.String tempTable,
                  double ratio,
                  CompilerDatabaseService db)
           throws M4CompilerError
Default version of the constructor:

Parameters:
sourceCs - the source Columnset to draw a sample from
destTable - the name of the output table
tempTable - the name of the temporary table used by this class
ratio - the sample ratio, a value in [0, 1].
db - a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB object.
Throws:
M4CompilerError - if the sampling fails.

DrawSample

public DrawSample(Columnset sourceCs,
                  java.lang.String destTable,
                  java.lang.String tempTable,
                  java.lang.Long rowcount,
                  double ratio,
                  java.lang.Long seed,
                  CompilerDatabaseService db)
           throws M4CompilerError
Parameters:
sourceCs - the source Columnset to draw a sample from
destTable - the name of the output table
tempTable - the name of the temporary table used by this class
rowcount - the number of rows in the source Columnset, or null, if this value is not known in advance. The number of rows is calculated by the class in the latter case.
ratio - the sample ratio, a value in [0, 1].
seed - the random seed to be used or null to use a "random" random seed.
db - a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB object.
Throws:
M4CompilerError - if the sampling fails.

DrawSample

public DrawSample(Columnset sourceCs,
                  java.lang.String destTable,
                  java.lang.String tempTable,
                  java.lang.Long rowcount,
                  long sampleSize,
                  java.lang.Long seed,
                  CompilerDatabaseService db)
           throws M4CompilerError
Parameters:
sourceCs - the source Columnset to draw a sample from
destTable - the name of the output table
tempTable - the name of the temporary table used by this class
sampleSize - the number of tuples the sample is approximately going to have
rowcount - the number of rows in the source Columnset, or null, if this value is not known in advance. The number of rows is calculated by the class in the latter case.
seed - the random seed to be used or null to use a "random" random seed.
db - a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB object.
Throws:
M4CompilerError - if the sampling fails.

DrawSample

public DrawSample(Columnset sourceCs,
                  java.util.Collection selectedColumns,
                  java.lang.String destTable,
                  java.lang.String tempTable,
                  java.lang.Long rowcount,
                  long sampleSize,
                  java.lang.Long seed,
                  CompilerDatabaseService db)
           throws M4CompilerError
Parameters:
sourceCs - the source Columnset to draw a sample from
selectedColumns - a Collection with column names in upper case letters. Specifies the subset of columns of the source columnset to be contained in the sample table. null indicates to select all columns.
destTable - the name of the output table
tempTable - the name of the temporary table used by this class
sampleSize - the number of tuples the sample is approximately going to have
rowcount - the number of rows in the source Columnset, or null, if this value is not known in advance. The number of rows is calculated by the class in the latter case.
seed - the random seed to be used or null to use a "random" random seed.
db - a reference to the thread's edu.udo.cs.miningmart.m4.core.utils.DB object.
Throws:
M4CompilerError - if the sampling fails.
Method Detail

getDestTableName

public java.lang.String getDestTableName()
Returns:
name of the destination table

getNextBoolean

public boolean getNextBoolean()
Returns:
a random boolean value. The probability of receiving true is equal to the variable ratio specified in the constructor.

getSourceAttributes

public java.lang.String getSourceAttributes()
                                     throws M4CompilerError
Returns:
a comma separated list of the source columnset's attribute names "registered" as M4 columns.
Throws:
M4CompilerError

getSourceAttributeDefinitions

public java.lang.String getSourceAttributeDefinitions()
                                               throws M4CompilerError
Returns:
the list of attributes "registered" as columns, in the format necessary for an SQL SELECT statement. If columns are "virtual", then the SQL definition followed by the name is returned.
Throws:
M4CompilerError

getSourceTableName

public java.lang.String getSourceTableName()
Returns:
name of the source columnset

getSourceTableColumns

public java.util.Collection getSourceTableColumns()
                                           throws M4CompilerError
Returns:
collection of columns of the source columnset
Throws:
M4CompilerError

getTempTableName

public java.lang.String getTempTableName()
Returns:
name of the temporary table to use

getRowCount

public long getRowCount()
Returns:
number of rows of the source columnset

getNextRandomDouble

public double getNextRandomDouble()
Returns:
a new uniformly distributed random double

dbWrite

protected void dbWrite(java.lang.String sql)
                throws java.sql.SQLException,
                       DbConnectionClosed
Parameters:
sql - an sql string to be executed in the business database.
Throws:
an - SQLException, if the database operations fail.
an - DbConnectionClosed, if the database connection has been closed after a request to stop the thread.
java.sql.SQLException
DbConnectionClosed

deleteTable

protected void deleteTable(java.lang.String tableName)
                    throws M4CompilerError
Helper method to delete a table and to ignore a possible "table does not exist" exception.

Throws:
M4CompilerError
See Also:
edu.udo.cs.miningmart.m4.core.utils.DB#dropBusinessTable(String)

commit

protected void commit()
               throws DbConnectionClosed,
                      java.sql.SQLException
Throws:
DbConnectionClosed
java.sql.SQLException


Copyright © 2001-2005