mySVM - a support vector machine

by Stefan Rüping, rueping@ls8.cs.uni-dortmund.de


  • A Java version of mySVM is part of the YaLE learning environment under the name JmySVM.
  • If you are using a database to store your data, try mySVM/db, a Java implementation of mySVM designed to run inside the database
  • Download the latest release of mySVM (Version 2.1.4, June 24th, 2004)
  • Download the binary version for Windows
  • See a list of changes

About mySVM

mySVM is an implementation of the Support Vector Machine introduced by V. Vapnik (see [Vapnik/98a]). It is based on the optimization algorithm of SVMlight as described in [Joachims/99a]. mySVM can be used for pattern recognition, regression and distribution estimation.


This software is free only for non-commercial use. It must not be modified and distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

If you are using mySVM for research purposes, please cite the software manual available from this cite in your publications (Stefan Rüping (2000): mySVM-Manual, University of Dortmund, Lehrstuhl Informatik 8, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/).


Installation under Unix

  • Download mySVM.
  • Create a new directory, change into it and unpack the files into this directory
  • On typical UN*X systems simply type make to compile mySVM. On other systems you have to call your C++ compiler manually.
If everything went right you should have a new subdirectory named bin and to files mysvm and predict in a subdirectory thereof. On some systems you might get an error message about sys/times.h. If you do, open the file globals.h and uncomment the line #undef use_time.

Installation under Windows

If you get the source code version, you have to compile mySVM youself. First edit the file globals.h and uncomment the line #define windows 1. Compile the file learn.cpp to get the learning program and predict.cpp for the model application program. mySVM was tested under Visual C++ 6.0. You can also get the binary version.

Using mySVM

For a complete reference of mySVM have a look into the mySVM manual (Postscript, PDF). Here is a short users guide:
  • mysvm is used for training a SVM on a given example set and testing the results
  • predict is used for predicting the functional value of new examples based on an already trained SVM.
The input of mySVM consists of Input lines starting with "#" are treated as commentary. The input can be given in one or more files. If no filenames or the filename "-" are given, the input is read from stdin. mysvm trains a SVM on the first given example set. The following example sets are used for testing (if their classification is given) or the functional value of the examples is being computed (if no classification is given).

Parameter definition

The parameter definition lets the user choose the type of loss function, the optimizer parameters and the training algorithm to use. The parameter definition starts with the line @parameters.

Global parameters:

patternuse SVM for pattern recognition, y has to be in {-1,1}.
regressionuse regression SVM (default)
nu floatuse nu-SVM with the given value of nu instead of normal SVM (see [Schoelkopf/etal/2000a] for details on nu-SVMs).
distributionestimate the support of the distribution of the training examples (see [Schoelkopf/etal/99a]). Nu must be set!
verbosity [1..5]ranges from 1 (no messages) over 3 (default) to 5 (flood, for debugging only)
scalescale the training examples to mean 0 and variance 1 (default)
no_scaledo not scale the training examples (may be numerically less stable!)
formatset the default example file format. See the description here.
delimiterset the default example file format. See the description here.

Loss function:

C floatthe SVM complexity constant. If not set, 1/avg(K(x,x)) is used.
L+ floatpenalize positive deviation (prediction too high) by this factor
L- floatpenalize negative deviation (prediction too low) by this factor
epsilon floatinsensitivity constant. No loss if prediction lies this close to true value
epsilon+ floatepsilon for positive deviation only
epsilon- floatepsilon for negative deviation only
quadraticLoss+use quadratic loss for positive deviation
quadraticLoss-use quadratic loss for negative deviation
quadraticLossuse quadratic loss for both positive and negative deviation

Optimizer parameters:

working_set_size intoptimize this much examples in each iteration (default: 10)
max_iterations intstop after this much iterations
shrink_const intfix a variable to the bound if it is optimal for this much iterations
is_zero floatnumerical precision (default: 1e-10)
descend floatmake this much descend on the target function in each iteration
convergence_epsilon floatprecision on the KKT conditions (default: 1e-3 for pattern recognition and 1e-4 for regression)
kernel_cache intsize of the cache for kernel evaluations im MB (default: 40)

Training algorithms

cross_validation intdo cross validation on the training examples with the given number of chunks
cv_inorderdo cross validation in the order the examples are given in
cv_window intdo cross validation by moving a window of the given number of chunks over the training data. (Implies cv_inorder)
search_C [am]find an optimal C in the range of cmin to cmax by Adding or Multiplying the current C by cdelta
cminlower bound for search_C
cmaxupper bound for search_C
cdeltastep size for search_C

Kernel definition

The kernel definition lets you choose the type of kernel function to use and its parameters. It starts with the line @kernel

namekernel typeparameters
dotinner productnone
polynomialpolynomial (x*y+1)^ddegree int
radialradial basis function exp(-gamma ||x-y||^2)gamma float
neuraltwo layered neural net tanh(a x*y+b)a float, b float
anova(RBF) anova kernelgamma float>/em>, degree int
useruser definable kernelparam_i_1 ... param_i_5 int, param_f_1 ... param_f_5 float
user2user definable kernel 2param_i, param_f
sum_aggregationsum of other kernelsnumber_parts int, range int int, followed by number_parts kernel definitions
prod_aggregationproduct of other kernelsnumber_parts int, range int int, followed by number_parts kernel definitions

Example sets

An example set consists of the learning attributes for each example, its classification (for pattern recognition, -1 or 1) or functional value (for regression) and its lagrangian multiplier (actually, you don't need to supply the lagrangian multiplier for training and you don't even have to supply the functional value for prediction. But you could). The examples can be given in two different formats: dense and sparse. Note that you can change the data format

The examples set definition starts with @examples. Note that each example has to be in an own line.

WARNING: Giving real number you can also use a colon instead of a decimal dot ("1234,56" instead of "1234.56", german style). Therefore something like "1,234.56" does not work!

common parameters:

format FFormat of examples where F is either "sparse" or a string containing "x", "y" or "a". The format strings define the position of the attributes x, the funtional value y and the lagrangian multiplier a in an example. "x" has to be set. The default format is "yx", but you can set another default in the parameters definition.
dimension intnumber of attributes. If the dimension is not given it is set from the examples (maximum dimension in sparse format, dimension from the first line in dense format).
number inttotal number of examples. A warning is issued when a wrong number of examples is given
b floatadditional constant of the hyperplane
delimiter charcharacter by which the attributes of an example are separated (default: space). You can set a default in the parameters section. Be careful if you set the delimiter to "," or "."!

sparse format:

In the sparse data format, only non-zero attributes have to be given. For each non-zero attribute you give its attribute number (starting at 1) and its value, separated by a colon. The functional value is given by y:float (the "y:" is optional here!) and the lagrangian multiplier by a:float.

Example: The following lines all define the same example:

  • 1:-1 2:0 3:1.2 y:2 a:0
  • 3:1.2 y:2 1:-1
  • 3:1.2 2 1:-1

dense format

The dense format consists of all attributes and (if defined so) the functional values and the lagrangian multipliers listed in the order given by the format parameter.

Example: The following lines all define the same example as above:

  • With "format yx" (default) : "2 -1 0 1.2"
  • With "format xya" it is "-1 0 1.2 2 0"
  • And with "format xy" and "delimiter ','" the example reads "-1,,1.2,2"


Schoelkopf/etal/2000a Schölkopf, Bernhard and Smola, Alex J. and Williamson, Robert C. and Bartlett, Peter L. (2000). New Support Vector Algorithms. Neural Computation, 12 pages 1207--1245.
schoelkopf/etal/99a Schölkopf, Bernhard and Williamson, Robert C. and Smola, Alex J. and Shawe-Taylor, John (2000). SV Estimation of a Distribution's Support. In Solla, S.A. and Leen, T.K. and Müller, K.-R., editor(s), Neural Information Processing Systems 12. MIT Press.
Joachims/99a Joachims, Thorsten (1999). Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf]
Scheffer/Joachims/99a Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). .
Vapnik/98a V. Vapnik (1998). Statistical Learning Theory. Wiley.