mySVM
mySVM - a support vector machine
by Stefan Rüping, rueping@ls8.cs.uni-dortmund.de
News
- A Java version of mySVM is part of the YaLE learning environment under the name JmySVM.
- If you are using a database to store your data, try mySVM/db, a Java implementation of mySVM designed to run inside the database
- Download the latest release of mySVM (Version 2.1.4, June 24th, 2004)
- Download the binary version for Windows
- See a list of changes
About mySVM
mySVM is an implementation of the Support Vector Machine introduced by
V. Vapnik (see
[Vapnik/98a]). It is based on
the optimization algorithm of
SVMlight
as described in
[Joachims/99a]. mySVM can
be used for pattern recognition, regression and distribution estimation.
License
This software is free only for non-commercial use. It must not be
modified and distributed without prior permission of the author. The
author is not responsible for implications from the use of this
software.
If you are using mySVM for research purposes, please cite the software manual available from this cite in your publications (Stefan Rüping (2000): mySVM-Manual, University of Dortmund, Lehrstuhl Informatik 8, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/).
Installation
Installation under Unix
- Download mySVM.
- Create a new directory, change into it and unpack the files into this directory
- On typical UN*X systems simply type make to compile mySVM. On other systems you have to call your C++ compiler manually.
If everything went right you should have a new subdirectory named
bin and to files
mysvm and
predict in a subdirectory thereof.
On some systems you might get an error message about
sys/times.h. If you do, open the file
globals.h and uncomment the line
#undef use_time.
Installation under Windows
If you get the
source code version, you have to compile mySVM youself. First edit the file
globals.h and uncomment the line
#define windows 1. Compile the file
learn.cpp to get the learning program and
predict.cpp for the model application program. mySVM was tested under Visual C++ 6.0. You can also get the
binary version.
Using mySVM
For a complete reference of mySVM have a look into the mySVM manual (
Postscript,
PDF). Here is a short users guide:
- mysvm is used for training a SVM on a given example set and testing the results
- predict is used for predicting the functional value of new examples based on an already trained SVM.
The input of mySVM consists of
Input lines starting with "#" are treated as commentary. The input can be given in one or more files. If no filenames or the filename "-" are given, the input is read from stdin.
mysvm trains a SVM on the first given example set. The following example sets are used for testing (if their classification is given) or the functional value of the examples is being computed (if no classification is given).
Parameter definition
The parameter definition lets the user choose the type of loss function, the optimizer parameters and the training algorithm to use.
The parameter definition starts with the line
@parameters.
Global parameters:
pattern | use SVM for pattern recognition, y has to be in {-1,1}. |
regression | use regression SVM (default) |
nu float | use nu-SVM with the given value of nu instead of normal SVM (see [Schoelkopf/etal/2000a] for details on nu-SVMs).
|
distribution | estimate the support of the distribution of the training examples (see [Schoelkopf/etal/99a]). Nu must be set!
|
verbosity [1..5] | ranges from 1 (no messages) over 3 (default) to 5 (flood, for debugging only) |
scale | scale the training examples to mean 0 and variance 1 (default) |
no_scale | do not scale the training examples (may be numerically less stable!) |
format | set the default example file format. See the description here. |
delimiter | set the default example file format. See the description here. |
Loss function:
C float | the SVM complexity constant. If not set, 1/avg(K(x,x)) is used. |
L+ float | penalize positive deviation (prediction too high) by this factor |
L- float | penalize negative deviation (prediction too low) by this factor |
epsilon float | insensitivity constant. No loss if prediction lies this close to true value |
epsilon+ float | epsilon for positive deviation only |
epsilon- float | epsilon for negative deviation only |
quadraticLoss+ | use quadratic loss for positive deviation |
quadraticLoss- | use quadratic loss for negative deviation |
quadraticLoss | use quadratic loss for both positive and negative deviation |
Optimizer parameters:
working_set_size int | optimize this much examples in each iteration (default: 10) |
max_iterations int | stop after this much iterations |
shrink_const int | fix a variable to the bound if it is optimal for this much iterations |
is_zero float | numerical precision (default: 1e-10) |
descend float | make this much descend on the target function in each iteration |
convergence_epsilon float | precision on the KKT conditions (default: 1e-3 for pattern recognition and 1e-4 for regression) |
kernel_cache int | size of the cache for kernel evaluations im MB (default: 40) |
Training algorithms
cross_validation int | do cross validation on the training examples with the given number of chunks |
cv_inorder | do cross validation in the order the examples are given in |
cv_window int | do cross validation by moving a window of the given number of chunks over the training data. (Implies cv_inorder) |
search_C [am] | find an optimal C in the range of cmin to cmax by Adding or Multiplying the current C by cdelta |
cmin | lower bound for search_C |
cmax | upper bound for search_C |
cdelta | step size for search_C |
Kernel definition
The kernel definition lets you choose the type of kernel function to use and its parameters. It starts with the line
@kernel
name | kernel type | parameters |
dot | inner product | none |
polynomial | polynomial (x*y+1)^d | degree int |
radial | radial basis function exp(-gamma ||x-y||^2) | gamma float |
neural | two layered neural net tanh(a x*y+b) | a float, b float |
anova | (RBF) anova kernel | gamma float>/em>, degree int |
user | user definable kernel | param_i_1 ... param_i_5 int, param_f_1 ... param_f_5 float |
user2 | user definable kernel 2 | param_i, param_f |
sum_aggregation | sum of other kernels | number_parts int, range int int, followed by number_parts kernel definitions |
prod_aggregation | product of other kernels | number_parts int, range int int, followed by number_parts kernel definitions |
Example sets
An example set consists of the learning attributes for each example, its classification (for pattern recognition, -1 or 1) or functional value (for regression) and its lagrangian multiplier (actually, you don't need to supply the lagrangian multiplier for training and you don't even have to supply the functional value for prediction. But you could). The examples can be given in two different formats: dense and sparse. Note that you can change the data format
The examples set definition starts with @examples. Note that each example has to be in an own line.
WARNING: Giving real number you can also use a colon instead of a decimal dot ("1234,56" instead of "1234.56", german style). Therefore something like "1,234.56" does not work!
common parameters:
format F | Format of examples where F is either "sparse" or a string containing "x", "y" or "a". The format strings define the position of the attributes x, the funtional value y and the lagrangian multiplier a in an example. "x" has to be set. The default format is "yx", but you can set another default in the parameters definition. |
dimension int | number of attributes. If the dimension is not given it is set from the examples (maximum dimension in sparse format, dimension from the first line in dense format). |
number int | total number of examples. A warning is issued when a wrong number of examples is given |
b float | additional constant of the hyperplane |
delimiter char | character by which the attributes of an example are separated (default: space). You can set a default in the parameters section. Be careful if you set the delimiter to "," or "."! |
sparse format:
In the sparse data format, only non-zero attributes have to be given. For each non-zero attribute you give its attribute number (starting at 1) and its value, separated by a colon. The functional value is given by y:
float (the "y:" is optional here!) and the lagrangian multiplier by a:
float.
Example: The following lines all define the same example:
- 1:-1 2:0 3:1.2 y:2 a:0
- 3:1.2 y:2 1:-1
- 3:1.2 2 1:-1
dense format
The dense format consists of all attributes and (if defined so) the functional values and the lagrangian multipliers listed in the order given by the
format parameter.
Example: The following lines all define the same example as above:
- With "format yx" (default) : "2 -1 0 1.2"
- With "format xya" it is "-1 0 1.2 2 0"
- And with "format xy" and "delimiter ','" the example reads "-1,,1.2,2"
References
Schoelkopf/etal/2000a |
Schölkopf, Bernhard and Smola, Alex J. and Williamson, Robert C. and Bartlett, Peter L. (2000). New Support Vector Algorithms. Neural Computation, 12 pages 1207--1245. |
schoelkopf/etal/99a |
Schölkopf, Bernhard and Williamson, Robert C. and Smola, Alex J. and Shawe-Taylor, John (2000). SV Estimation of a Distribution's Support. In Solla, S.A. and Leen, T.K. and Müller, K.-R., editor(s), Neural Information Processing Systems 12. MIT Press. |
Joachims/99a |
Joachims, Thorsten (1999). Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf] |
Scheffer/Joachims/99a |
Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). . |
Vapnik/98a |
V. Vapnik (1998). Statistical Learning Theory. Wiley. |