edu.udo.cs.yale.operator.io
Class DatabaseExampleSource

java.lang.Object
  extended by edu.udo.cs.yale.operator.Operator
      extended by edu.udo.cs.yale.operator.io.ResultSetExampleSource
          extended by edu.udo.cs.yale.operator.io.DatabaseExampleSource
All Implemented Interfaces:
ConfigurationListener

public class DatabaseExampleSource
extends ResultSetExampleSource

This operator reads an ExampleSet from an SQL database. The SQL query can be passed to Yale via a parameter or, in case of long SQL statements, in a separate file. Please note that column names are often case sensitive. Databases may behave differently here.

The most convenient way of defining the necessary parameters is the configuration wizard. The most important parameters (database URL and user name) will be automatically determined by this wizard and it is also possible to define the special attributes like labels or ids.

Please note that this operator supports two basic working modes:

  1. reading the data from the database and creating an example table in main memory
  2. keeping the data in the database and directly working on the database table

The latter possibility will be turned on by the parameter "work_on_database". Please note that this working mode is still regarded as experimental and errors might occur. In order to ensure proper data changes the database working mode is only allowed on a single table which must be defined with the parameter "table_name". If you encounter during data updates (e.g. messages that the result set is not updatable) you have to define a primary key for your table.

If you are not directly working on the database the data will be read with an arbitrary SQL query statement (SELECT ... FROM ... WHERE ...) defined by "query" or "query_file". The memory mode is the recommended way of using this operator. This is especially important for following operators like learning schemes which would often load (most of) the data into main memory during the learning process.

Warning
As the java ResultSetMetaData interface does not provide information about the possible values of nominal attributes, the internal indices the nominal values are mapped to, will be dependent on the ordering they appear in the table. This may cause problems only when experiments are split up into training a experiment and an application or testing experiment. For learning schemes which are capable of handling nominal attributes, this is not a problem. If a learning scheme like a SVM is used with nominal data, Yale pretends that nominal attributes are numerical and uses indices for the nominal values as their numerical value. A SVM may perform well if there are only two possible values. If a test set is read in another experiment, the nominal values may be assigned different indices, and hence the SVM trained is useless. This is not a problem for label attributes, since the classes can be specified using the classes parameter and hence, all learning schemes intended to use with nominal data are safe to use.

Version:
$Id: DatabaseExampleSource.java,v 1.11 2006/03/27 13:22:00 ingomierswa Exp $
Author:
Simon Fischer, Ingo Mierswa, Timm Euler
To do:
Fix the above problem. This may not be possible effeciently since it is not supported by the Java ResultSet interface.

Field Summary
private  DatabaseHandler dbAccess
           
 
Constructor Summary
DatabaseExampleSource(OperatorDescription description)
           
 
Method Summary
 IOObject[] apply()
          Implement this method in subclasses.
private  void disconnect()
           
 void experimentFinished()
          Called at the end of the experiment.
private  DatabaseHandler getConnectedDatabaseHandler()
           
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
private  java.lang.String getQuery()
           
 java.sql.ResultSet getResultSet()
          This method reads the file whose name is given, extracts the database access information and the query from it and executes the query.
 void setNominalValues(java.util.List attributeList, java.sql.ResultSet resultSet, Attribute label)
          Since the ResultSet does not provide information about possible values of nominal attributes, subclasses must set these by implementing this method.
private  void setNominalValuesForLabel(Attribute label)
           
 
Methods inherited from class edu.udo.cs.yale.operator.io.ResultSetExampleSource
createExampleSet, getInputClasses, getOutputClasses
 
Methods inherited from class edu.udo.cs.yale.operator.Operator
addError, addValue, addWarning, apply, checkDeprecations, checkIO, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createExperimentTree, createFromXML, createMarkedExperimentTree, delete, experimentStarts, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getErrorList, getExperiment, getInnerOperatorsXML, getInput, getInput, getInput, getInputDescription, getIOContainerForInApplyLoopBreakpoint, getName, getNumberOfSteps, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsInt, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isEnabled, isParameterSet, logMessage, performAdditionalChecks, register, remove, rename, resume, setBreakpoint, setEnabled, setExperiment, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

dbAccess

private DatabaseHandler dbAccess
Constructor Detail

DatabaseExampleSource

public DatabaseExampleSource(OperatorDescription description)
Method Detail

setNominalValues

public void setNominalValues(java.util.List attributeList,
                             java.sql.ResultSet resultSet,
                             Attribute label)
                      throws UndefinedParameterError
Description copied from class: ResultSetExampleSource
Since the ResultSet does not provide information about possible values of nominal attributes, subclasses must set these by implementing this method.

Specified by:
setNominalValues in class ResultSetExampleSource
Parameters:
attributeList - List of Attribute
Throws:
UndefinedParameterError


setNominalValuesForLabel

private void setNominalValuesForLabel(Attribute label)
                               throws UndefinedParameterError
Throws:
UndefinedParameterError

apply

public IOObject[] apply()
                 throws OperatorException
Description copied from class: Operator
Implement this method in subclasses.

Overrides:
apply in class ResultSetExampleSource
Throws:
OperatorException


getQuery

private java.lang.String getQuery()
                           throws OperatorException
Throws:
OperatorException

getConnectedDatabaseHandler

private DatabaseHandler getConnectedDatabaseHandler()
                                             throws OperatorException,
                                                    java.sql.SQLException
Throws:
OperatorException
java.sql.SQLException

getResultSet

public java.sql.ResultSet getResultSet()
                                throws OperatorException
This method reads the file whose name is given, extracts the database access information and the query from it and executes the query. The query result is returned as a ResultSet.

Specified by:
getResultSet in class ResultSetExampleSource
Throws:
OperatorException


experimentFinished

public void experimentFinished()
Description copied from class: Operator
Called at the end of the experiment. The default implementation does nothing.

Overrides:
experimentFinished in class Operator


disconnect

private void disconnect()

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Overrides:
getParameterTypes in class ResultSetExampleSource



Copyright © 2001-2006