edu.udo.cs.yale.operator.learner.meta
Class BayesianBoosting

java.lang.Object
  extended by edu.udo.cs.yale.operator.Operator
      extended by edu.udo.cs.yale.operator.OperatorChain
          extended by edu.udo.cs.yale.operator.learner.meta.AbstractMetaLearner
              extended by edu.udo.cs.yale.operator.learner.meta.BayesianBoosting
All Implemented Interfaces:
ConfigurationListener, Learner

public class BayesianBoosting
extends AbstractMetaLearner

This operator trains an ensemble of classifiers for boolean target attributes. In each iteration the training set is reweighted, so that previously discovered patterns and other kinds of prior knowledge are "sampled out" [Scholz/2005b]. An inner classifier, typically a rule or decision tree induction algorithm, is sequentially applied several times, and the models are combined to a single global model. The number of models to be trained maximally are specified by the parameter iterations. If the parameter rescale_label_priors is set, then the example set is reweighted, so that all classes are equally probable (or frequent). For two-class problems this turns the problem of fitting models to maximize weighted relative accuracy into the more common task of classifier induction [Scholz/2005a]. Applying a rule induction algorithm as an inner learner allows to do subgroup discovery. This option is also recommended for data sets with class skew, if a "very weak learner" like a decision stump is used. If rescale_label_priors is not set, then the operator performs boosting based on probability estimates. The estimates used by this operator may either be computed using the same set as for training, or in each iteration the training set may be split randomly, so that a model is fitted based on the first subset, and the probabilities are estimated based on the second. The first solution may be advantageous in situations where data is rare. Set the parameter ratio_internal_bootstrap to 1 to use the same set for training as for estimation. Set this parameter to a value of lower than 1 to use the specified subset of data for training, and the remaining examples for probability estimation. If the parameter allow_marginal_skews is not set, then the support of each subset defined in terms of common base model predictions does not change from one iteration to the next. Analogously the class priors do not change. This is the procedure originally described in [Scholz/2005b] in the context of subgroup discovery. Setting the allow_marginal_skews option to true leads to a procedure that changes the marginal weights/probabilities of subsets, if this is beneficial in a boosting context, and stratifies the two classes to be equally likely. As for AdaBoost, the total weight upper-bounds the training error in this case. This bound is reduced more quickly by the BayesianBoosting operator, however. The operator requires an example set as its input. To sample out prior knowledge of a different form it is possible to provide another model as an optional additional input. The predictions of this model are used to weight produce an initial weighting of the training set. The ouput of the operator is a classification model applicable for estimating conditional class probabilities or for plain crisp classification. It contains up to the specified number of inner base models. In the case of an optional initial model, this model will also be stored in the output model, in order to produce the same initial weighting during model application.

Version:
$Id: BayesianBoosting.java,v 1.56 2006/04/14 15:14:32 ingomierswa Exp $
Author:
Martin Scholz

Field Summary
static java.lang.String ALLOW_MARGINAL_SKEWS
          Boolean parameter that switches between KBS (if set to false) and a boosting-like reweighting.
protected  int currentIteration
           
static java.lang.String EQUALLY_PROB_LABELS
          Boolean parameter to specify whether the label priors should be equally likely after first iteration.
static java.lang.String INTERNAL_BOOTSTRAP
          Name of the flag indicating internal bootstrapping.
static double MIN_ADVANTAGE
          Discard models with an advantage of less than the specified value.
static java.lang.String NUM_OF_ITERATIONS
          Name of the variable specifying the maximal number of iterations of the learner.
private  double[] oldWeights
           
private  double performance
           
private  Model startModel
           
 
Constructor Summary
BayesianBoosting(OperatorDescription description)
          Constructor.
 
Method Summary
private  void applyPriorModel(ExampleSet trainingSet, java.util.List<BayBoostBaseModelInfo> modelInfo)
          Helper method applying the start model and adding it to the modelInfo collection
private  double[] createNewWeightAttribute(ExampleSet exampleSet)
           
private  void debugMessage(WeightedPerformanceMeasures wp)
           
 int getNumberOfSteps()
          Returns the number of steps performed by this chain.
 java.util.List<ParameterType> getParameterTypes()
          Adds the parameters "number of iterations" and "model file".
private  boolean isModelUseful(ContingencyMatrix cm)
          Helper method to decide whether a model improves the training error enough to be considered.
 Model learn(ExampleSet exampleSet)
          Constructs a Model repeatedly running a weak learner, reweighting the training example set accordingly, and combining the hypothesis using the available weighted performance values.
protected  double[] prepareWeights(ExampleSet exampleSet)
          Creates a weight attribute if not yet done.
private  void readOptionalParameters()
          Helper method reading a start model from the input if present.
private  void rescaleToEqualPriors(ExampleSet exampleSet, double[] currentPriors)
           
protected  double reweightExamples(WeightedPerformanceMeasures wp, ExampleSet exampleSet)
          This method reweights the example set with respect to the WeightedPerformanceMeasures object.
 boolean supportsCapability(LearnerCapability lc)
          Overrides the method of the super class.
protected  Model trainBaseModel(ExampleSet exampleSet)
          Runs the "embedded" learner on the example set and retuns a model.
private  BayBoostModel trainBoostingModel(ExampleSet trainingSet, double[] classPriors)
          Main method for training the ensemble classifier
 
Methods inherited from class edu.udo.cs.yale.operator.learner.meta.AbstractMetaLearner
apply, applyInnerLearner, checkLearnerCapabilities, getEstimatedPerformance, getInnerOperatorCondition, getInputClasses, getInputDescription, getMaxNumberOfInnerOperators, getMinNumberOfInnerOperators, getOutputClasses, getWeights, shouldCalculateWeights, shouldEstimatePerformance, shouldReturnInnerOutput
 
Methods inherited from class edu.udo.cs.yale.operator.OperatorChain
addAddListener, addOperator, addOperator, checkDeprecations, checkIO, checkNumberOfInnerOperators, checkProperties, clearErrorList, clearStepCounter, cloneOperator, countStep, createExperimentTree, delete, experimentFinished, experimentStarts, getAllInnerOperators, getCurrentStep, getIndexOfOperator, getInnerOperatorForName, getInnerOperatorsXML, getNumberOfAllOperators, getNumberOfChildrensSteps, getNumberOfOperators, getOperator, getOperatorFromAll, getOperators, isEnabled, performAdditionalChecks, removeAddListener, removeOperator, setEnabled, setExperiment
 
Methods inherited from class edu.udo.cs.yale.operator.Operator
addError, addValue, addWarning, apply, createExperimentTree, createFromXML, createMarkedExperimentTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getErrorList, getExperiment, getInput, getInput, getInput, getIOContainerForInApplyLoopBreakpoint, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsInt, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isParameterSet, logMessage, register, remove, rename, resume, setBreakpoint, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface edu.udo.cs.yale.operator.learner.Learner
getName
 

Field Detail

NUM_OF_ITERATIONS

public static final java.lang.String NUM_OF_ITERATIONS
Name of the variable specifying the maximal number of iterations of the learner.

See Also:
Constant Field Values


INTERNAL_BOOTSTRAP

public static final java.lang.String INTERNAL_BOOTSTRAP
Name of the flag indicating internal bootstrapping.

See Also:
Constant Field Values


EQUALLY_PROB_LABELS

public static final java.lang.String EQUALLY_PROB_LABELS
Boolean parameter to specify whether the label priors should be equally likely after first iteration.

See Also:
Constant Field Values


ALLOW_MARGINAL_SKEWS

public static final java.lang.String ALLOW_MARGINAL_SKEWS
Boolean parameter that switches between KBS (if set to false) and a boosting-like reweighting.

See Also:
Constant Field Values


MIN_ADVANTAGE

public static final double MIN_ADVANTAGE
Discard models with an advantage of less than the specified value.

See Also:
Constant Field Values


startModel

private Model startModel

currentIteration

protected int currentIteration

performance

private double performance

oldWeights

private double[] oldWeights
Constructor Detail

BayesianBoosting

public BayesianBoosting(OperatorDescription description)
Constructor.

Method Detail

supportsCapability

public boolean supportsCapability(LearnerCapability lc)
Overrides the method of the super class. Returns true for polynominal class.

Specified by:
supportsCapability in interface Learner
Overrides:
supportsCapability in class AbstractMetaLearner


getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Adds the parameters "number of iterations" and "model file".

Overrides:
getParameterTypes in class Operator


getNumberOfSteps

public int getNumberOfSteps()
Description copied from class: OperatorChain
Returns the number of steps performed by this chain.

Overrides:
getNumberOfSteps in class AbstractMetaLearner
See Also:
OperatorChain.getNumberOfSteps()


learn

public Model learn(ExampleSet exampleSet)
            throws OperatorException
Constructs a Model repeatedly running a weak learner, reweighting the training example set accordingly, and combining the hypothesis using the available weighted performance values. If the input contains a model, then this model is used as a starting point for weighting the examples.

Throws:
OperatorException


prepareWeights

protected double[] prepareWeights(ExampleSet exampleSet)
Creates a weight attribute if not yet done. It either backs up the old weoghts for restoring them later, or it fills the newly created attribute with the initial value of 1. If rescaling to equal class priors is activated then the weights are set accordingly.

Parameters:
exampleSet - the example set to be prepared
Returns:
a double[] array containing the class priors.


createNewWeightAttribute

private double[] createNewWeightAttribute(ExampleSet exampleSet)

rescaleToEqualPriors

private void rescaleToEqualPriors(ExampleSet exampleSet,
                                  double[] currentPriors)

trainBaseModel

protected Model trainBaseModel(ExampleSet exampleSet)
                        throws OperatorException
Runs the "embedded" learner on the example set and retuns a model.

Parameters:
exampleSet - an ExampleSet to train a model for
Returns:
a Model
Throws:
OperatorException


readOptionalParameters

private void readOptionalParameters()
Helper method reading a start model from the input if present.


applyPriorModel

private void applyPriorModel(ExampleSet trainingSet,
                             java.util.List<BayBoostBaseModelInfo> modelInfo)
                      throws OperatorException
Helper method applying the start model and adding it to the modelInfo collection

Throws:
OperatorException


trainBoostingModel

private BayBoostModel trainBoostingModel(ExampleSet trainingSet,
                                         double[] classPriors)
                                  throws OperatorException
Main method for training the ensemble classifier

Throws:
OperatorException


debugMessage

private void debugMessage(WeightedPerformanceMeasures wp)

reweightExamples

protected double reweightExamples(WeightedPerformanceMeasures wp,
                                  ExampleSet exampleSet)
                           throws OperatorException
This method reweights the example set with respect to the WeightedPerformanceMeasures object. Please note that the weights will not be reset at any time, because they continuously change from one iteration to the next. This method does not change the priors of the classes.

Parameters:
wp - the WeightedPerformanceMeasures to use
exampleSet - ExampleSet to be reweighted
Returns:
the total weight of examples as an error estimate
Throws:
OperatorException


isModelUseful

private boolean isModelUseful(ContingencyMatrix cm)
Helper method to decide whether a model improves the training error enough to be considered.

Parameters:
cm - the lift ratio matrix as returned by the getter of the WeightedPerformance class
Returns:
true iff the advantage is high enough to consider the model to be useful



Copyright © 2001-2006