mySVM
mySVM  a support vector machine
by Stefan Rüping, rueping@ls8.cs.unidortmund.de
News
 A Java version of mySVM is part of the YaLE learning environment under the name JmySVM.
 If you are using a database to store your data, try mySVM/db, a Java implementation of mySVM designed to run inside the database
 Download the latest release of mySVM (Version 2.1.4, June 24th, 2004)
 Download the binary version for Windows
 See a list of changes
About mySVM
mySVM is an implementation of the Support Vector Machine introduced by
V. Vapnik (see
[Vapnik/98a]). It is based on
the optimization algorithm of
SVM^{light}
as described in
[Joachims/99a]. mySVM can
be used for pattern recognition, regression and distribution estimation.
License
This software is free only for noncommercial use. It must not be
modified and distributed without prior permission of the author. The
author is not responsible for implications from the use of this
software.
If you are using mySVM for research purposes, please cite the software manual available from this cite in your publications (Stefan Rüping (2000): mySVMManual, University of Dortmund, Lehrstuhl Informatik 8, http://wwwai.cs.unidortmund.de/SOFTWARE/MYSVM/).
Installation
Installation under Unix
 Download mySVM.
 Create a new directory, change into it and unpack the files into this directory
 On typical UN*X systems simply type make to compile mySVM. On other systems you have to call your C++ compiler manually.
If everything went right you should have a new subdirectory named
bin and to files
mysvm and
predict in a subdirectory thereof.
On some systems you might get an error message about
sys/times.h. If you do, open the file
globals.h and uncomment the line
#undef use_time.
Installation under Windows
If you get the
source code version, you have to compile mySVM youself. First edit the file
globals.h and uncomment the line
#define windows 1. Compile the file
learn.cpp to get the learning program and
predict.cpp for the model application program. mySVM was tested under Visual C++ 6.0. You can also get the
binary version.
Using mySVM
For a complete reference of mySVM have a look into the mySVM manual (
Postscript,
PDF). Here is a short users guide:
 mysvm is used for training a SVM on a given example set and testing the results
 predict is used for predicting the functional value of new examples based on an already trained SVM.
The input of mySVM consists of
Input lines starting with "#" are treated as commentary. The input can be given in one or more files. If no filenames or the filename "" are given, the input is read from stdin.
mysvm trains a SVM on the first given example set. The following example sets are used for testing (if their classification is given) or the functional value of the examples is being computed (if no classification is given).
Parameter definition
The parameter definition lets the user choose the type of loss function, the optimizer parameters and the training algorithm to use.
The parameter definition starts with the line
@parameters.
Global parameters:
pattern  use SVM for pattern recognition, y has to be in {1,1}. 
regression  use regression SVM (default) 
nu float  use nuSVM with the given value of nu instead of normal SVM (see [Schoelkopf/etal/2000a] for details on nuSVMs).

distribution  estimate the support of the distribution of the training examples (see [Schoelkopf/etal/99a]). Nu must be set!

verbosity [1..5]  ranges from 1 (no messages) over 3 (default) to 5 (flood, for debugging only) 
scale  scale the training examples to mean 0 and variance 1 (default) 
no_scale  do not scale the training examples (may be numerically less stable!) 
format  set the default example file format. See the description here. 
delimiter  set the default example file format. See the description here. 
Loss function:
C float  the SVM complexity constant. If not set, 1/avg(K(x,x)) is used. 
L+ float  penalize positive deviation (prediction too high) by this factor 
L float  penalize negative deviation (prediction too low) by this factor 
epsilon float  insensitivity constant. No loss if prediction lies this close to true value 
epsilon+ float  epsilon for positive deviation only 
epsilon float  epsilon for negative deviation only 
quadraticLoss+  use quadratic loss for positive deviation 
quadraticLoss  use quadratic loss for negative deviation 
quadraticLoss  use quadratic loss for both positive and negative deviation 
Optimizer parameters:
working_set_size int  optimize this much examples in each iteration (default: 10) 
max_iterations int  stop after this much iterations 
shrink_const int  fix a variable to the bound if it is optimal for this much iterations 
is_zero float  numerical precision (default: 1e10) 
descend float  make this much descend on the target function in each iteration 
convergence_epsilon float  precision on the KKT conditions (default: 1e3 for pattern recognition and 1e4 for regression) 
kernel_cache int  size of the cache for kernel evaluations im MB (default: 40) 
Training algorithms
cross_validation int  do cross validation on the training examples with the given number of chunks 
cv_inorder  do cross validation in the order the examples are given in 
cv_window int  do cross validation by moving a window of the given number of chunks over the training data. (Implies cv_inorder) 
search_C [am]  find an optimal C in the range of cmin to cmax by Adding or Multiplying the current C by cdelta 
cmin  lower bound for search_C 
cmax  upper bound for search_C 
cdelta  step size for search_C 
Kernel definition
The kernel definition lets you choose the type of kernel function to use and its parameters. It starts with the line
@kernel
name  kernel type  parameters 
dot  inner product  none 
polynomial  polynomial (x*y+1)^d  degree int 
radial  radial basis function exp(gamma xy^2)  gamma float 
neural  two layered neural net tanh(a x*y+b)  a float, b float 
anova  (RBF) anova kernel  gamma float>/em>, degree int 
user  user definable kernel  param_i_1 ... param_i_5 int, param_f_1 ... param_f_5 float 
user2  user definable kernel 2  param_i, param_f 
sum_aggregation  sum of other kernels  number_parts int, range int int, followed by number_parts kernel definitions 
prod_aggregation  product of other kernels  number_parts int, range int int, followed by number_parts kernel definitions 
Example sets
An example set consists of the learning attributes for each example, its classification (for pattern recognition, 1 or 1) or functional value (for regression) and its lagrangian multiplier (actually, you don't need to supply the lagrangian multiplier for training and you don't even have to supply the functional value for prediction. But you could). The examples can be given in two different formats: dense and sparse. Note that you can change the data format
The examples set definition starts with @examples. Note that each example has to be in an own line.
WARNING: Giving real number you can also use a colon instead of a decimal dot ("1234,56" instead of "1234.56", german style). Therefore something like "1,234.56" does not work!
common parameters:
format F  Format of examples where F is either "sparse" or a string containing "x", "y" or "a". The format strings define the position of the attributes x, the funtional value y and the lagrangian multiplier a in an example. "x" has to be set. The default format is "yx", but you can set another default in the parameters definition. 
dimension int  number of attributes. If the dimension is not given it is set from the examples (maximum dimension in sparse format, dimension from the first line in dense format). 
number int  total number of examples. A warning is issued when a wrong number of examples is given 
b float  additional constant of the hyperplane 
delimiter char  character by which the attributes of an example are separated (default: space). You can set a default in the parameters section. Be careful if you set the delimiter to "," or "."! 
sparse format:
In the sparse data format, only nonzero attributes have to be given. For each nonzero attribute you give its attribute number (starting at 1) and its value, separated by a colon. The functional value is given by y:
float (the "y:" is optional here!) and the lagrangian multiplier by a:
float.
Example: The following lines all define the same example:
 1:1 2:0 3:1.2 y:2 a:0
 3:1.2 y:2 1:1
 3:1.2 2 1:1
dense format
The dense format consists of all attributes and (if defined so) the functional values and the lagrangian multipliers listed in the order given by the
format parameter.
Example: The following lines all define the same example as above:
 With "format yx" (default) : "2 1 0 1.2"
 With "format xya" it is "1 0 1.2 2 0"
 And with "format xy" and "delimiter ','" the example reads "1,,1.2,2"
References
Schoelkopf/etal/2000a 
Schölkopf, Bernhard and Smola, Alex J. and Williamson, Robert C. and Bartlett, Peter L. (2000). New Support Vector Algorithms. Neural Computation, 12 pages 12071245. 
schoelkopf/etal/99a 
Schölkopf, Bernhard and Williamson, Robert C. and Smola, Alex J. and ShaweTaylor, John (2000). SV Estimation of a Distribution's Support. In Solla, S.A. and Leen, T.K. and Müller, K.R., editor(s), Neural Information Processing Systems 12. MIT Press. 
Joachims/99a 
Joachims, Thorsten (1999). Making largeScale SVM Learning Practical. In Advances in Kernel Methods  Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf] 
Scheffer/Joachims/99a 
Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). . 
Vapnik/98a 
V. Vapnik (1998). Statistical Learning Theory. Wiley. 