A data mining operator. Values in TheTargetAttribute are used as target function values to train the SVM on examples that are formed with ThePredictingAttributes. All ThePredictingAttributes must belong to TheInputConcept. TheOutputAttribute contains the predicted values.
There are some SVM-specific parameters; the table gives reasonable values to choose if nothing is known about the data or SVMs. For the KernelType, only the following values (Strings) are possible: dot, polynomial, neural, radial, anova. Dot is the linear kernel and can be taken as default.
This operator can use two different versions of the Support Vector Machine algorithm. One runs in main memory; it needs the parameter SampleSize to determine a maximum number of training examples. The other runs in the database; it is used if the optional parameter UseDB_SVM is set to the String true. When this version is used, an additional parameter TheKey is needed which gives the BaseAttribute whose column is the primary key of TheInputConcept. (TheKey can be left out only if the ColumnSet that belongs to TheInputConcept represents a table rather than a view.) The database algorithm restricts the possible kernel types to dot and radial. It can also use the parameter SampleSize.
With the parameters LossFunctionPos and LossFunctionNeg, the
loss function that is used for the regression can be biased such that predicting
too high is more expensive (LossFunctionPos > LossFunctionNeg
)
or less expensive (LossFunctionNeg > LossFunctionPos
) than predicting
too low. If both values are equal, no bias is used. The parameter C balances
training error against generalisation quality; positive values between 0.01
and 1000 have been used successfully in the literature. Epsilon limits
the allowed error an example may produce; small values under 0.5 should be used.
ParameterName | ObjType | Type | Remarks |
TheInputConcept | CON | IN | inherited |
TheTargetAttribute | BA | IN | inherited |
ThePredictingAttributes | BA List | IN | |
KernelType | V | IN | see explanation above |
SampleSize | V | IN | see explanation above |
LossFunctionPos | V | IN | positive real; try 1.0 |
LossFunctionNeg | V | IN | positive real; try 1.0 |
C | V | IN | positive real; try 1.0 |
Epsilon | V | IN | positive real; try 0.1 |
UseDB_SVM | V | IN | optional; one of true, false |
TheKey | BA | IN | optional |
TheOutputAttribute | BA | OUT | inherited |