edu.udo.cs.yale.operator.learner.igss
Class IteratingGSS

java.lang.Object
  extended by edu.udo.cs.yale.operator.Operator
      extended by edu.udo.cs.yale.operator.learner.AbstractLearner
          extended by edu.udo.cs.yale.operator.learner.igss.IteratingGSS
All Implemented Interfaces:
ConfigurationListener, Learner

public class IteratingGSS
extends AbstractLearner

This operator implements the IteratingGSS algorithmus presented in the diploma thesis 'Effiziente Entdeckung unabhaengiger Subgruppen in grossen Datenbanken' at the Department of Computer Science, University of Dortmund.

Version:
$Id: IteratingGSS.java,v 1.2 2006/10/02 21:45:26 ingomierswa Exp $
Author:
Dirk Dach

Field Summary
private  java.util.LinkedList<Hypothesis> bestList
          Stores the k-best hypothesis.
static java.lang.String[] CRITERION_TYPES
           
private  double currentDelta
          Remaining delta
private  double epsilon
          Parameter epsilon of the GSS algorithm
private  double exampleFactor
          Factor needed by example_criterion.
static int FIRST_TYPE_INDEX
           
private  boolean forceIterations
          Always make all iterations?
private  IGSSResult gssResult
          stores all results
private  int iterations
          The number of iterations for the IGSS algorithm.
private  Attribute label
          The label attribute
private  int large
          Number of random experiments before a normal approximation is used.
static int LAST_TYPE_INDEX
           
private  int maxComplexity
          Maximum hypothesis complexity
private  Result maxRest
          Best of the hypothesis not among the k best
 int MIN_MODEL_NUMBER
          minimal model number for example_criterion
private  double min_utility_pruning
          Minimum utility used for pruning
private  double min_utility_useful
          Minimum utility needed for a utility to be useful
private  Result minBest
          Worst of the k best hypothesis
private  int minComplexity
          Minimum hypothesis complexity
private  int numberOfSolutions
          Parameter k of the GSS algorithm
private  RandomGenerator random
          global random generator
private  Attribute[] regularAttributes
          The regular atributes
private  boolean rejectionSampling
          Use rejection sampling or weights directly.
private  boolean resetWeights
          Reset weights after complexity increase?
private  Hypothesis seed
          First hypothesis used to create all others.
private  int stepsize
          Parameter stepsize of the IGSS algorithm
private  Utility theUtility
          The utility function
private  double totalPositiveWeight
          Total positive weight used by GSS
private  double totalWeight
          Total weight used by GSS
static int TYPE_BEST_UTILITY
           
static int TYPE_EXAMPLE
           
static int TYPE_UTILITY
           
static int TYPE_WORST_UTILITY
           
private  boolean useBinomial
          Indicates if Binomial should be used before increasing complexity.
private  int useful_criterion
          the useful criterion for the IGSS algorithm
private  boolean useKBS
          Indicates if kbs should be used.
 
Constructor Summary
IteratingGSS(OperatorDescription description)
          Must pass the given object to the superclass.
 
Method Summary
 IOObject[] apply()
          Trains a model useing an ExampleSet from the input.
 java.util.LinkedList<Hypothesis> generate(java.util.LinkedList<Hypothesis> oldHypothesis)
          Generates all successors of the hypothesis in the given list.
 java.lang.Class[] getOutputClasses()
          Returns the classes that are guaranteed to be returned by apply() as additional output.
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 java.util.LinkedList<Result> gss(ExampleSet exampleSet, java.util.LinkedList<Hypothesis> hypothesisList, double delta, double epsilon)
          Returns the n best hypothesis with maximum error epsilon with confidence 1-delta.
 boolean isUseful(Result current, java.util.LinkedList<Result> otherResults, int criterion, ExampleSet exampleSet, int min_model_number)
          Test if the model is useful according to the given criterion.
 Model learn(ExampleSet exampleSet)
          Trains a model.
static double log2(double arg)
          Returns the logarithm to base 2
 java.util.LinkedList<Hypothesis> prune(java.util.LinkedList<Hypothesis> hypoList, double minUtility, double totalWeight, double totalPositiveWeight, double delta_p)
          Prunes the given list of hypothesis.
 ContingencyMatrix reweight(ExampleSet exampleSet, Model model, boolean normalize)
          Reweights the examples according to knowledge based sampling.
 boolean supportsCapability(LearnerCapability lc)
          Checks for Learner capabilities.
private  void updateLists(java.util.LinkedList<Hypothesis> hypothesisList, int n, double totalExampleWeight, double totalPositiveWeight, double delta_h_m)
          Updates bestList,bestRest and minBest
 
Methods inherited from class edu.udo.cs.yale.operator.learner.AbstractLearner
checkLearnerCapabilities, getEstimatedPerformance, getInputClasses, getInputDescription, getOptimizationPerformance, getWeights, shouldCalculateWeights, shouldDeliverOptimizationPerformance, shouldEstimatePerformance
 
Methods inherited from class edu.udo.cs.yale.operator.Operator
addError, addValue, addWarning, apply, checkDeprecations, checkIO, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createExperimentTree, createFromXML, createMarkedExperimentTree, delete, experimentFinished, experimentStarts, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getErrorList, getExperiment, getInnerOperatorsXML, getInput, getInput, getInput, getIOContainerForInApplyLoopBreakpoint, getName, getNumberOfSteps, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsInt, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isEnabled, isParameterSet, logMessage, performAdditionalChecks, register, remove, rename, resume, setBreakpoint, setEnabled, setExperiment, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface edu.udo.cs.yale.operator.learner.Learner
getName
 

Field Detail

CRITERION_TYPES

public static final java.lang.String[] CRITERION_TYPES

FIRST_TYPE_INDEX

public static final int FIRST_TYPE_INDEX
See Also:
Constant Field Values

TYPE_WORST_UTILITY

public static final int TYPE_WORST_UTILITY
See Also:
Constant Field Values

TYPE_UTILITY

public static final int TYPE_UTILITY
See Also:
Constant Field Values

TYPE_BEST_UTILITY

public static final int TYPE_BEST_UTILITY
See Also:
Constant Field Values

TYPE_EXAMPLE

public static final int TYPE_EXAMPLE
See Also:
Constant Field Values

LAST_TYPE_INDEX

public static final int LAST_TYPE_INDEX
See Also:
Constant Field Values

gssResult

private IGSSResult gssResult
stores all results


regularAttributes

private Attribute[] regularAttributes
The regular atributes


label

private Attribute label
The label attribute


theUtility

private Utility theUtility
The utility function


random

private RandomGenerator random
global random generator


seed

private Hypothesis seed
First hypothesis used to create all others.


totalWeight

private double totalWeight
Total weight used by GSS


totalPositiveWeight

private double totalPositiveWeight
Total positive weight used by GSS


bestList

private java.util.LinkedList<Hypothesis> bestList
Stores the k-best hypothesis.


minBest

private Result minBest
Worst of the k best hypothesis


maxRest

private Result maxRest
Best of the hypothesis not among the k best


numberOfSolutions

private int numberOfSolutions
Parameter k of the GSS algorithm


currentDelta

private double currentDelta
Remaining delta


epsilon

private double epsilon
Parameter epsilon of the GSS algorithm


stepsize

private int stepsize
Parameter stepsize of the IGSS algorithm


maxComplexity

private int maxComplexity
Maximum hypothesis complexity


minComplexity

private int minComplexity
Minimum hypothesis complexity


min_utility_pruning

private double min_utility_pruning
Minimum utility used for pruning


min_utility_useful

private double min_utility_useful
Minimum utility needed for a utility to be useful


useKBS

private boolean useKBS
Indicates if kbs should be used.


useBinomial

private boolean useBinomial
Indicates if Binomial should be used before increasing complexity.


useful_criterion

private int useful_criterion
the useful criterion for the IGSS algorithm


forceIterations

private boolean forceIterations
Always make all iterations?


resetWeights

private boolean resetWeights
Reset weights after complexity increase?


exampleFactor

private double exampleFactor
Factor needed by example_criterion.


MIN_MODEL_NUMBER

public int MIN_MODEL_NUMBER
minimal model number for example_criterion


rejectionSampling

private boolean rejectionSampling
Use rejection sampling or weights directly.


large

private int large
Number of random experiments before a normal approximation is used.


iterations

private int iterations
The number of iterations for the IGSS algorithm.

Constructor Detail

IteratingGSS

public IteratingGSS(OperatorDescription description)
Must pass the given object to the superclass.

Method Detail

updateLists

private void updateLists(java.util.LinkedList<Hypothesis> hypothesisList,
                         int n,
                         double totalExampleWeight,
                         double totalPositiveWeight,
                         double delta_h_m)
Updates bestList,bestRest and minBest


gss

public java.util.LinkedList<Result> gss(ExampleSet exampleSet,
                                        java.util.LinkedList<Hypothesis> hypothesisList,
                                        double delta,
                                        double epsilon)
                                 throws OperatorException
Returns the n best hypothesis with maximum error epsilon with confidence 1-delta.

Throws:
OperatorException


reweight

public ContingencyMatrix reweight(ExampleSet exampleSet,
                                  Model model,
                                  boolean normalize)
                           throws OperatorException
Reweights the examples according to knowledge based sampling. Normalizes weights to [0,1] if the parameter normalize is set to true.

Throws:
OperatorException


apply

public IOObject[] apply()
                 throws OperatorException
Description copied from class: AbstractLearner
Trains a model useing an ExampleSet from the input. Uses the method learn(ExampleSet).

Overrides:
apply in class AbstractLearner
Throws:
OperatorException


learn

public Model learn(ExampleSet exampleSet)
            throws OperatorException
Description copied from interface: Learner
Trains a model. This method should be called by apply() and is implemented by subclasses.

Throws:
OperatorException


isUseful

public boolean isUseful(Result current,
                        java.util.LinkedList<Result> otherResults,
                        int criterion,
                        ExampleSet exampleSet,
                        int min_model_number)
Test if the model is useful according to the given criterion.


prune

public java.util.LinkedList<Hypothesis> prune(java.util.LinkedList<Hypothesis> hypoList,
                                              double minUtility,
                                              double totalWeight,
                                              double totalPositiveWeight,
                                              double delta_p)
Prunes the given list of hypothesis. All hypothesis with an upper utility bound less than the parameter minUtility is pruned.


generate

public java.util.LinkedList<Hypothesis> generate(java.util.LinkedList<Hypothesis> oldHypothesis)
Generates all successors of the hypothesis in the given list.


log2

public static double log2(double arg)
Returns the logarithm to base 2


supportsCapability

public boolean supportsCapability(LearnerCapability lc)
Description copied from interface: Learner
Checks for Learner capabilities. Should return true if the given capability is supported.


getOutputClasses

public java.lang.Class[] getOutputClasses()
Description copied from class: Operator
Returns the classes that are guaranteed to be returned by apply() as additional output. Please note that input object which should not be consumed must also be defined by this method (e.g. for preprocessing operators). The default behavior for input consumation is defined by Operator.getInputDescription(Class) and can be changed by overwriting this method. Objects which are not consumed must not be defined as additional output in this method. May be null or an empy array (no additional output is produced).

Overrides:
getOutputClasses in class AbstractLearner


getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Overrides:
getParameterTypes in class Operator



Copyright © 2001-2006