|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.udo.cs.yale.operator.Operator
edu.udo.cs.yale.operator.OperatorChain
edu.udo.cs.yale.operator.learner.meta.AbstractMetaLearner
edu.udo.cs.yale.operator.learner.meta.BayesianBoosting
public class BayesianBoosting
This operator trains an ensemble of classifiers for boolean target
attributes. In each iteration the training set is reweighted, so that
previously discovered patterns and other kinds of prior knowledge are
"sampled out" [Scholz/2005b]. An inner classifier,
typically a rule or decision tree induction algorithm, is sequentially
applied several times, and the models are combined to a single global model.
The number of models to be trained maximally are specified by the parameter
iterations
.
If the parameter rescale_label_priors
is set, then the example
set is reweighted, so that all classes are equally probable (or frequent).
For two-class problems this turns the problem of fitting models to maximize
weighted relative accuracy into the more common task of classifier induction
[Scholz/2005a]. Applying a rule induction algorithm as an inner
learner allows to do subgroup discovery. This option is also recommended for
data sets with class skew, if a "very weak learner" like a decision
stump is used. If rescale_label_priors
is not set, then the
operator performs boosting based on probability estimates.
The estimates used by this operator may either be computed using the same set
as for training, or in each iteration the training set may be split randomly,
so that a model is fitted based on the first subset, and the probabilities
are estimated based on the second. The first solution may be advantageous in
situations where data is rare. Set the parameter
ratio_internal_bootstrap
to 1 to use the same set for training
as for estimation. Set this parameter to a value of lower than 1 to use the
specified subset of data for training, and the remaining examples for
probability estimation.
If the parameter allow_marginal_skews
is not set,
then the support of each subset defined in terms of common base model
predictions does not change from one iteration to the next. Analogously the
class priors do not change. This is the procedure originally described in
[Scholz/2005b] in the context of subgroup discovery.
Setting the allow_marginal_skews
option to true
leads to a procedure that changes the marginal weights/probabilities of
subsets, if this is beneficial in a boosting context, and stratifies the two
classes to be equally likely. As for AdaBoost, the total weight upper-bounds
the training error in this case. This bound is reduced more quickly by the
BayesianBoosting operator, however.
The operator requires an example set as its input. To sample out prior
knowledge of a different form it is possible to provide another model as an
optional additional input. The predictions of this model are used to weight
produce an initial weighting of the training set. The ouput of the operator
is a classification model applicable for estimating conditional class
probabilities or for plain crisp classification. It contains up to the
specified number of inner base models. In the case of an optional initial
model, this model will also be stored in the output model, in order to
produce the same initial weighting during model application.
Field Summary | |
---|---|
static java.lang.String |
ALLOW_MARGINAL_SKEWS
Boolean parameter that switches between KBS (if set to false) and a boosting-like reweighting. |
protected int |
currentIteration
|
static java.lang.String |
EQUALLY_PROB_LABELS
Boolean parameter to specify whether the label priors should be equally likely after first iteration. |
static java.lang.String |
INTERNAL_BOOTSTRAP
Name of the flag indicating internal bootstrapping. |
static double |
MIN_ADVANTAGE
Discard models with an advantage of less than the specified value. |
static java.lang.String |
NUM_OF_ITERATIONS
Name of the variable specifying the maximal number of iterations of the learner. |
private double[] |
oldWeights
|
private double |
performance
|
private Model |
startModel
|
Constructor Summary | |
---|---|
BayesianBoosting(OperatorDescription description)
Constructor. |
Method Summary | |
---|---|
private void |
applyPriorModel(ExampleSet trainingSet,
java.util.List<BayBoostBaseModelInfo> modelInfo)
Helper method applying the start model and adding it to the modelInfo collection |
private double[] |
createNewWeightAttribute(ExampleSet exampleSet)
|
private void |
debugMessage(WeightedPerformanceMeasures wp)
|
int |
getNumberOfSteps()
Returns the number of steps performed by this chain. |
java.util.List<ParameterType> |
getParameterTypes()
Adds the parameters "number of iterations" and "model file". |
private boolean |
isModelUseful(ContingencyMatrix cm)
Helper method to decide whether a model improves the training error enough to be considered. |
Model |
learn(ExampleSet exampleSet)
Constructs a Model repeatedly running a weak learner,
reweighting the training example set accordingly, and combining the
hypothesis using the available weighted performance values. |
protected double[] |
prepareWeights(ExampleSet exampleSet)
Creates a weight attribute if not yet done. |
private void |
readOptionalParameters()
Helper method reading a start model from the input if present. |
private void |
rescaleToEqualPriors(ExampleSet exampleSet,
double[] currentPriors)
|
protected double |
reweightExamples(WeightedPerformanceMeasures wp,
ExampleSet exampleSet)
This method reweights the example set with respect to the WeightedPerformanceMeasures object. |
boolean |
supportsCapability(LearnerCapability lc)
Overrides the method of the super class. |
protected Model |
trainBaseModel(ExampleSet exampleSet)
Runs the "embedded" learner on the example set and retuns a model. |
private BayBoostModel |
trainBoostingModel(ExampleSet trainingSet,
double[] classPriors)
Main method for training the ensemble classifier |
Methods inherited from class edu.udo.cs.yale.operator.learner.meta.AbstractMetaLearner |
---|
apply, applyInnerLearner, checkLearnerCapabilities, getEstimatedPerformance, getInnerOperatorCondition, getInputClasses, getInputDescription, getMaxNumberOfInnerOperators, getMinNumberOfInnerOperators, getOutputClasses, getWeights, shouldCalculateWeights, shouldEstimatePerformance, shouldReturnInnerOutput |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface edu.udo.cs.yale.operator.learner.Learner |
---|
getName |
Field Detail |
---|
public static final java.lang.String NUM_OF_ITERATIONS
public static final java.lang.String INTERNAL_BOOTSTRAP
public static final java.lang.String EQUALLY_PROB_LABELS
public static final java.lang.String ALLOW_MARGINAL_SKEWS
public static final double MIN_ADVANTAGE
private Model startModel
protected int currentIteration
private double performance
private double[] oldWeights
Constructor Detail |
---|
public BayesianBoosting(OperatorDescription description)
Method Detail |
---|
public boolean supportsCapability(LearnerCapability lc)
supportsCapability
in interface Learner
supportsCapability
in class AbstractMetaLearner
public java.util.List<ParameterType> getParameterTypes()
getParameterTypes
in class Operator
public int getNumberOfSteps()
OperatorChain
getNumberOfSteps
in class AbstractMetaLearner
OperatorChain.getNumberOfSteps()
public Model learn(ExampleSet exampleSet) throws OperatorException
Model
repeatedly running a weak learner,
reweighting the training example set accordingly, and combining the
hypothesis using the available weighted performance values. If the input
contains a model, then this model is used as a starting point for
weighting the examples.
OperatorException
protected double[] prepareWeights(ExampleSet exampleSet)
exampleSet
- the example set to be prepared
double[]
array containing the class priors.private double[] createNewWeightAttribute(ExampleSet exampleSet)
private void rescaleToEqualPriors(ExampleSet exampleSet, double[] currentPriors)
protected Model trainBaseModel(ExampleSet exampleSet) throws OperatorException
exampleSet
- an ExampleSet
to train a model for
Model
OperatorException
private void readOptionalParameters()
private void applyPriorModel(ExampleSet trainingSet, java.util.List<BayBoostBaseModelInfo> modelInfo) throws OperatorException
OperatorException
private BayBoostModel trainBoostingModel(ExampleSet trainingSet, double[] classPriors) throws OperatorException
OperatorException
private void debugMessage(WeightedPerformanceMeasures wp)
protected double reweightExamples(WeightedPerformanceMeasures wp, ExampleSet exampleSet) throws OperatorException
WeightedPerformanceMeasures
object. Please note that the
weights will not be reset at any time, because they continuously change
from one iteration to the next. This method does not change the priors of
the classes.
wp
- the WeightedPerformanceMeasures to useexampleSet
- ExampleSet
to be reweighted
OperatorException
private boolean isModelUseful(ContingencyMatrix cm)
cm
- the lift ratio matrix as returned by the getter of the
WeightedPerformance class
true
iff the advantage is high enough to consider
the model to be useful
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |