Operator SupportVectorMachineForRegression

A data mining operator. Values in TheTargetAttribute are used as target function values to train the SVM on examples that are formed with ThePredictingAttributes. All ThePredictingAttributes must belong to TheInputConcept. TheOutputAttribute contains the predicted values.

There are some SVM-specific parameters; the table gives reasonable values to choose if nothing is known about the data or SVMs. For the KernelType, only the following values (Strings) are possible: dot, polynomial, neural, radial, anova. Dot is the linear kernel and can be taken as default.

This operator can use two different versions of the Support Vector Machine algorithm. One runs in main memory; it needs the parameter SampleSize to determine a maximum number of training examples. The other runs in the database; it is used if the optional parameter UseDB_SVM is set to the String true. When this version is used, an additional parameter TheKey is needed which gives the BaseAttribute whose column is the primary key of TheInputConcept. (TheKey can be left out only if the ColumnSet that belongs to TheInputConcept represents a table rather than a view.) The database algorithm restricts the possible kernel types to dot and radial. It can also use the parameter SampleSize.

With the parameters LossFunctionPos and LossFunctionNeg, the loss function that is used for the regression can be biased such that predicting too high is more expensive (LossFunctionPos > LossFunctionNeg) or less expensive (LossFunctionNeg > LossFunctionPos) than predicting too low. If both values are equal, no bias is used. The parameter C balances training error against generalisation quality; positive values between 0.01 and 1000 have been used successfully in the literature. Epsilon limits the allowed error an example may produce; small values under 0.5 should be used.



Parameter

Parameter Object Type optional min_arg max_arg looped Remarks
TheInputConcept Concept Input no 1 1 no inherited
TheTargetAttribute BaseAttribute Input no 1 1 yes inherited
ThePredictingAttributes BaseAttribute Input no 1 yes  
KernelType Value Input no 1 1 yes one of these values: dot, polynomial, neural, radial or anova
see explanation above
SampleSize Value Input yes 0 1 yes see explanation above
LossFunctionPos Value Input no 1 1 yes positive real; try 1.0
LossFunctionNeg Value Input no 1 1 yes positive real; try 1.0
C Value Input no 1 1 yes positive real; try 1.0
Epsilon Value Input no 1 1 yes positive real; try 0.1
TheOutputAttribute BaseAttribute Output no 1 1 yes inherited
UseDB_SVM Value Input yes 0 1 yes
TheKey BaseAttribute Input yes 0 1 yes optional

The Operator SupportVectorMachineForRegression is loopable!