BayesianBoosting (Yale Class Documentation)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.udo.cs.yale.operator.learner.meta
Class BayesianBoosting

java.lang.Object
  edu.udo.cs.yale.operator.Operator
      edu.udo.cs.yale.operator.OperatorChain
          edu.udo.cs.yale.operator.learner.meta.AbstractMetaLearner
              edu.udo.cs.yale.operator.learner.meta.BayesianBoosting

All Implemented Interfaces:: ConfigurationListener, Learner

public class BayesianBoosting
extends AbstractMetaLearner
extends AbstractMetaLearner

This operator trains an ensemble of classifiers for boolean target attributes. In each iteration the training set is reweighted, so that previously discovered patterns and other kinds of prior knowledge are "sampled out" [Scholz/2005b]. An inner classifier, typically a rule or decision tree induction algorithm, is sequentially applied several times, and the models are combined to a single global model. The number of models to be trained maximally are specified by the parameter iterations. If the parameter rescale_label_priors is set, then the example set is reweighted, so that all classes are equally probable (or frequent). For two-class problems this turns the problem of fitting models to maximize weighted relative accuracy into the more common task of classifier induction [Scholz/2005a]. Applying a rule induction algorithm as an inner learner allows to do subgroup discovery. This option is also recommended for data sets with class skew, if a "very weak learner" like a decision stump is used. If rescale_label_priors is not set, then the operator performs boosting based on probability estimates. The estimates used by this operator may either be computed using the same set as for training, or in each iteration the training set may be split randomly, so that a model is fitted based on the first subset, and the probabilities are estimated based on the second. The first solution may be advantageous in situations where data is rare. Set the parameter ratio_internal_bootstrap to 1 to use the same set for training as for estimation. Set this parameter to a value of lower than 1 to use the specified subset of data for training, and the remaining examples for probability estimation. If the parameter allow_marginal_skews is not set, then the support of each subset defined in terms of common base model predictions does not change from one iteration to the next. Analogously the class priors do not change. This is the procedure originally described in [Scholz/2005b] in the context of subgroup discovery. Setting the allow_marginal_skews option to true leads to a procedure that changes the marginal weights/probabilities of subsets, if this is beneficial in a boosting context, and stratifies the two classes to be equally likely. As for AdaBoost, the total weight upper-bounds the training error in this case. This bound is reduced more quickly by the BayesianBoosting operator, however. The operator requires an example set as its input. To sample out prior knowledge of a different form it is possible to provide another model as an optional additional input. The predictions of this model are used to weight produce an initial weighting of the training set. The ouput of the operator is a classification model applicable for estimating conditional class probabilities or for plain crisp classification. It contains up to the specified number of inner base models. In the case of an optional initial model, this model will also be stored in the output model, in order to produce the same initial weighting during model application.

Version:: $Id: BayesianBoosting.java,v 1.56 2006/04/14 15:14:32 ingomierswa Exp $
Author:: Martin Scholz

Field Summary
`static java.lang.String`	`ALLOW_MARGINAL_SKEWS` Boolean parameter that switches between KBS (if set to false) and a boosting-like reweighting.
`protected int`	`currentIteration`
`static java.lang.String`	`EQUALLY_PROB_LABELS` Boolean parameter to specify whether the label priors should be equally likely after first iteration.
`static java.lang.String`	`INTERNAL_BOOTSTRAP` Name of the flag indicating internal bootstrapping.
`static double`	`MIN_ADVANTAGE` Discard models with an advantage of less than the specified value.
`static java.lang.String`	`NUM_OF_ITERATIONS` Name of the variable specifying the maximal number of iterations of the learner.
`private double[]`	`oldWeights`
`private double`	`performance`
`private Model`	`startModel`

Constructor Summary
`BayesianBoosting(OperatorDescription description)` Constructor.

Method Summary
`private void`	`applyPriorModel(ExampleSet trainingSet, java.util.List<BayBoostBaseModelInfo> modelInfo)` Helper method applying the start model and adding it to the modelInfo collection
`private double[]`	`createNewWeightAttribute(ExampleSet exampleSet)`
`private void`	`debugMessage(WeightedPerformanceMeasures wp)`
`int`	`getNumberOfSteps()` Returns the number of steps performed by this chain.
`java.util.List<ParameterType>`	`getParameterTypes()` Adds the parameters "number of iterations" and "model file".
`private boolean`	`isModelUseful(ContingencyMatrix cm)` Helper method to decide whether a model improves the training error enough to be considered.
`Model`	`learn(ExampleSet exampleSet)` Constructs a `Model` repeatedly running a weak learner, reweighting the training example set accordingly, and combining the hypothesis using the available weighted performance values.
`protected double[]`	`prepareWeights(ExampleSet exampleSet)` Creates a weight attribute if not yet done.
`private void`	`readOptionalParameters()` Helper method reading a start model from the input if present.
`private void`	`rescaleToEqualPriors(ExampleSet exampleSet, double[] currentPriors)`
`protected double`	`reweightExamples(WeightedPerformanceMeasures wp, ExampleSet exampleSet)` This method reweights the example set with respect to the `WeightedPerformanceMeasures` object.
`boolean`	`supportsCapability(LearnerCapability lc)` Overrides the method of the super class.
`protected Model`	`trainBaseModel(ExampleSet exampleSet)` Runs the "embedded" learner on the example set and retuns a model.
`private BayBoostModel`	`trainBoostingModel(ExampleSet trainingSet, double[] classPriors)` Main method for training the ensemble classifier

Methods inherited from class edu.udo.cs.yale.operator.learner.meta.AbstractMetaLearner
`apply, applyInnerLearner, checkLearnerCapabilities, getEstimatedPerformance, getInnerOperatorCondition, getInputClasses, getInputDescription, getMaxNumberOfInnerOperators, getMinNumberOfInnerOperators, getOutputClasses, getWeights, shouldCalculateWeights, shouldEstimatePerformance, shouldReturnInnerOutput`

Methods inherited from class edu.udo.cs.yale.operator.OperatorChain
addAddListener, addOperator, addOperator, checkDeprecations, checkIO, checkNumberOfInnerOperators, checkProperties, clearErrorList, clearStepCounter, cloneOperator, countStep, createExperimentTree, delete, experimentFinished, experimentStarts, getAllInnerOperators, getCurrentStep, getIndexOfOperator, getInnerOperatorForName, getInnerOperatorsXML, getNumberOfAllOperators, getNumberOfChildrensSteps, getNumberOfOperators, getOperator, getOperatorFromAll, getOperators, isEnabled, performAdditionalChecks, removeAddListener, removeOperator, setEnabled, setExperiment

Methods inherited from class edu.udo.cs.yale.operator.OperatorChain

addAddListener, addOperator, addOperator, checkDeprecations, checkIO, checkNumberOfInnerOperators, checkProperties, clearErrorList, clearStepCounter, cloneOperator, countStep, createExperimentTree, delete, experimentFinished, experimentStarts, getAllInnerOperators, getCurrentStep, getIndexOfOperator, getInnerOperatorForName, getInnerOperatorsXML, getNumberOfAllOperators, getNumberOfChildrensSteps, getNumberOfOperators, getOperator, getOperatorFromAll, getOperators, isEnabled, performAdditionalChecks, removeAddListener, removeOperator, setEnabled, setExperiment

Methods inherited from class edu.udo.cs.yale.operator.Operator
addError, addValue, addWarning, apply, createExperimentTree, createFromXML, createMarkedExperimentTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getErrorList, getExperiment, getInput, getInput, getInput, getIOContainerForInApplyLoopBreakpoint, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsInt, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isParameterSet, logMessage, register, remove, rename, resume, setBreakpoint, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, writeXML

Methods inherited from class edu.udo.cs.yale.operator.Operator

addError, addValue, addWarning, apply, createExperimentTree, createFromXML, createMarkedExperimentTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getErrorList, getExperiment, getInput, getInput, getInput, getIOContainerForInApplyLoopBreakpoint, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsInt, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isParameterSet, logMessage, register, remove, rename, resume, setBreakpoint, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, writeXML

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Methods inherited from interface edu.udo.cs.yale.operator.learner.Learner
`getName`

Field Detail

NUM_OF_ITERATIONS

public static final java.lang.String NUM_OF_ITERATIONS

Name of the variable specifying the maximal number of iterations of the learner.

See Also:: Constant Field Values

INTERNAL_BOOTSTRAP

public static final java.lang.String INTERNAL_BOOTSTRAP

Name of the flag indicating internal bootstrapping.

See Also:: Constant Field Values

EQUALLY_PROB_LABELS

public static final java.lang.String EQUALLY_PROB_LABELS

Boolean parameter to specify whether the label priors should be equally likely after first iteration.

See Also:: Constant Field Values

ALLOW_MARGINAL_SKEWS

public static final java.lang.String ALLOW_MARGINAL_SKEWS

Boolean parameter that switches between KBS (if set to false) and a boosting-like reweighting.

See Also:: Constant Field Values

MIN_ADVANTAGE

public static final double MIN_ADVANTAGE

Discard models with an advantage of less than the specified value.

See Also:: Constant Field Values

startModel

private Model startModel

currentIteration

protected int currentIteration

performance

private double performance

oldWeights

private double[] oldWeights

Constructor Detail

BayesianBoosting

public BayesianBoosting(OperatorDescription description)

Constructor.

Method Detail

supportsCapability

public boolean supportsCapability(LearnerCapability lc)

Overrides the method of the super class. Returns true for polynominal class.

Specified by:: supportsCapability in interface Learner
Overrides:: supportsCapability in class AbstractMetaLearner

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()

Adds the parameters "number of iterations" and "model file".

Overrides:: getParameterTypes in class Operator

getNumberOfSteps

public int getNumberOfSteps()

Description copied from class: OperatorChain

Returns the number of steps performed by this chain.

Overrides:: getNumberOfSteps in class AbstractMetaLearner

See Also:: OperatorChain.getNumberOfSteps()

learn

public Model learn(ExampleSet exampleSet)
            throws OperatorException

Constructs a Model repeatedly running a weak learner, reweighting the training example set accordingly, and combining the hypothesis using the available weighted performance values. If the input contains a model, then this model is used as a starting point for weighting the examples.

Throws:: OperatorException

prepareWeights

protected double[] prepareWeights(ExampleSet exampleSet)

Creates a weight attribute if not yet done. It either backs up the old weoghts for restoring them later, or it fills the newly created attribute with the initial value of 1. If rescaling to equal class priors is activated then the weights are set accordingly.

Parameters:: exampleSet - the example set to be prepared
Returns:: a double[] array containing the class priors.

createNewWeightAttribute

private double[] createNewWeightAttribute(ExampleSet exampleSet)

rescaleToEqualPriors

private void rescaleToEqualPriors(ExampleSet exampleSet,
                                  double[] currentPriors)

trainBaseModel

protected Model trainBaseModel(ExampleSet exampleSet)
                        throws OperatorException

Runs the "embedded" learner on the example set and retuns a model.

Parameters:: exampleSet - an ExampleSet to train a model for
Returns:: a Model
Throws:: OperatorException

readOptionalParameters

private void readOptionalParameters()

Helper method reading a start model from the input if present.

applyPriorModel

private void applyPriorModel(ExampleSet trainingSet,
                             java.util.List<BayBoostBaseModelInfo> modelInfo)
                      throws OperatorException

Helper method applying the start model and adding it to the modelInfo collection

Throws:: OperatorException

trainBoostingModel

private BayBoostModel trainBoostingModel(ExampleSet trainingSet,
                                         double[] classPriors)
                                  throws OperatorException

Main method for training the ensemble classifier

Throws:: OperatorException

debugMessage

private void debugMessage(WeightedPerformanceMeasures wp)

reweightExamples

protected double reweightExamples(WeightedPerformanceMeasures wp,
                                  ExampleSet exampleSet)
                           throws OperatorException

This method reweights the example set with respect to the WeightedPerformanceMeasures object. Please note that the weights will not be reset at any time, because they continuously change from one iteration to the next. This method does not change the priors of the classes.

Parameters:: wp - the WeightedPerformanceMeasures to use; exampleSet - ExampleSet to be reweighted
Returns:: the total weight of examples as an error estimate
Throws:: OperatorException

isModelUseful

private boolean isModelUseful(ContingencyMatrix cm)

Helper method to decide whether a model improves the training error enough to be considered.

Parameters:: cm - the lift ratio matrix as returned by the getter of the WeightedPerformance class
Returns:: true iff the advantage is high enough to consider the model to be useful