mySVM

mySVM - a support vector machine

by Stefan Rüping, rueping@ls8.cs.uni-dortmund.de

News

A Java version of mySVM is part of the YaLE learning environment under the name JmySVM.
If you are using a database to store your data, try mySVM/db, a Java implementation of mySVM designed to run inside the database
Download the latest release of mySVM (Version 2.1.4, June 24th, 2004)
Download the binary version for Windows
See a list of changes

About mySVM

mySVM is an implementation of the Support Vector Machine introduced by V. Vapnik (see [Vapnik/98a]). It is based on the optimization algorithm of SVM^light as described in [Joachims/99a]. mySVM can be used for pattern recognition, regression and distribution estimation.

License

This software is free only for non-commercial use. It must not be modified and distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

If you are using mySVM for research purposes, please cite the software manual available from this cite in your publications (Stefan Rüping (2000): mySVM-Manual, University of Dortmund, Lehrstuhl Informatik 8, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/).

Installation

Installation under Unix

Download mySVM.
Create a new directory, change into it and unpack the files into this directory
On typical UN*X systems simply type make to compile mySVM. On other systems you have to call your C++ compiler manually.

If everything went right you should have a new subdirectory named bin and to files mysvm and predict in a subdirectory thereof. On some systems you might get an error message about sys/times.h. If you do, open the file globals.h and uncomment the line #undef use_time.

Installation under Windows

If you get the source code version, you have to compile mySVM youself. First edit the file globals.h and uncomment the line #define windows 1. Compile the file learn.cpp to get the learning program and predict.cpp for the model application program. mySVM was tested under Visual C++ 6.0. You can also get the binary version.

Using mySVM

For a complete reference of mySVM have a look into the mySVM manual (Postscript, PDF). Here is a short users guide:

mysvm is used for training a SVM on a given example set and testing the results
predict is used for predicting the functional value of new examples based on an already trained SVM.

The input of mySVM consists of

a parameter definition
a kernel definition
one or more example sets

Input lines starting with "#" are treated as commentary. The input can be given in one or more files. If no filenames or the filename "-" are given, the input is read from stdin. mysvm trains a SVM on the first given example set. The following example sets are used for testing (if their classification is given) or the functional value of the examples is being computed (if no classification is given).

Parameter definition

The parameter definition lets the user choose the type of loss function, the optimizer parameters and the training algorithm to use. The parameter definition starts with the line @parameters.

Global parameters:

pattern	use SVM for pattern recognition, y has to be in {-1,1}.
regression	use regression SVM (default)
nu float	use nu-SVM with the given value of nu instead of normal SVM (see [Schoelkopf/etal/2000a] for details on nu-SVMs).
distribution	estimate the support of the distribution of the training examples (see [Schoelkopf/etal/99a]). Nu must be set!
verbosity [1..5]	ranges from 1 (no messages) over 3 (default) to 5 (flood, for debugging only)
scale	scale the training examples to mean 0 and variance 1 (default)
no_scale	do not scale the training examples (may be numerically less stable!)
format	set the default example file format. See the description here.
delimiter	set the default example file format. See the description here.

Loss function:

C float	the SVM complexity constant. If not set, 1/avg(K(x,x)) is used.
L+ float	penalize positive deviation (prediction too high) by this factor
L- float	penalize negative deviation (prediction too low) by this factor
epsilon float	insensitivity constant. No loss if prediction lies this close to true value
epsilon+ float	epsilon for positive deviation only
epsilon- float	epsilon for negative deviation only
quadraticLoss+	use quadratic loss for positive deviation
quadraticLoss-	use quadratic loss for negative deviation
quadraticLoss	use quadratic loss for both positive and negative deviation

Optimizer parameters:

working_set_size int	optimize this much examples in each iteration (default: 10)
max_iterations int	stop after this much iterations
shrink_const int	fix a variable to the bound if it is optimal for this much iterations
is_zero float	numerical precision (default: 1e-10)
descend float	make this much descend on the target function in each iteration
convergence_epsilon float	precision on the KKT conditions (default: 1e-3 for pattern recognition and 1e-4 for regression)
kernel_cache int	size of the cache for kernel evaluations im MB (default: 40)

Training algorithms

cross_validation int	do cross validation on the training examples with the given number of chunks
cv_inorder	do cross validation in the order the examples are given in
cv_window int	do cross validation by moving a window of the given number of chunks over the training data. (Implies cv_inorder)
search_C [am]	find an optimal C in the range of cmin to cmax by Adding or Multiplying the current C by cdelta
cmin	lower bound for search_C
cmax	upper bound for search_C
cdelta	step size for search_C

Kernel definition

The kernel definition lets you choose the type of kernel function to use and its parameters. It starts with the line @kernel

name	kernel type	parameters
dot	inner product	none
polynomial	polynomial (x*y+1)^d	degree int
radial	radial basis function exp(-gamma \|\|x-y\|\|^2)	gamma float
neural	two layered neural net tanh(a x*y+b)	a float, b float
anova	(RBF) anova kernel	gamma float>/em>, degree int
user	user definable kernel	param_i_1 ... param_i_5 int, param_f_1 ... param_f_5 float
user2	user definable kernel 2	param_i, param_f
sum_aggregation	sum of other kernels	number_parts int, range int int, followed by `number_parts` kernel definitions
prod_aggregation	product of other kernels	number_parts int, range int int, followed by `number_parts` kernel definitions

Example sets

An example set consists of the learning attributes for each example, its classification (for pattern recognition, -1 or 1) or functional value (for regression) and its lagrangian multiplier (actually, you don't need to supply the lagrangian multiplier for training and you don't even have to supply the functional value for prediction. But you could). The examples can be given in two different formats: dense and sparse. Note that you can change the data format

The examples set definition starts with @examples. Note that each example has to be in an own line.

WARNING: Giving real number you can also use a colon instead of a decimal dot ("1234,56" instead of "1234.56", german style). Therefore something like "1,234.56" does not work!

common parameters:

format F	Format of examples where F is either "sparse" or a string containing "x", "y" or "a". The format strings define the position of the attributes x, the funtional value y and the lagrangian multiplier a in an example. "x" has to be set. The default format is "yx", but you can set another default in the parameters definition.
dimension int	number of attributes. If the dimension is not given it is set from the examples (maximum dimension in sparse format, dimension from the first line in dense format).
number int	total number of examples. A warning is issued when a wrong number of examples is given
b float	additional constant of the hyperplane
delimiter char	character by which the attributes of an example are separated (default: space). You can set a default in the parameters section. Be careful if you set the delimiter to "," or "."!

sparse format:

In the sparse data format, only non-zero attributes have to be given. For each non-zero attribute you give its attribute number (starting at 1) and its value, separated by a colon. The functional value is given by y:float (the "y:" is optional here!) and the lagrangian multiplier by a:float.

Example: The following lines all define the same example:

1:-1 2:0 3:1.2 y:2 a:0
3:1.2 y:2 1:-1
3:1.2 2 1:-1

dense format

The dense format consists of all attributes and (if defined so) the functional values and the lagrangian multipliers listed in the order given by the format parameter.

Example: The following lines all define the same example as above:

With "format yx" (default) : "2 -1 0 1.2"
With "format xya" it is "-1 0 1.2 2 0"
And with "format xy" and "delimiter ','" the example reads "-1,,1.2,2"

References

Schoelkopf/etal/2000a	Schölkopf, Bernhard and Smola, Alex J. and Williamson, Robert C. and Bartlett, Peter L. (2000). New Support Vector Algorithms. Neural Computation, 12 pages 1207--1245.
schoelkopf/etal/99a	Schölkopf, Bernhard and Williamson, Robert C. and Smola, Alex J. and Shawe-Taylor, John (2000). SV Estimation of a Distribution's Support. In Solla, S.A. and Leen, T.K. and Müller, K.-R., editor(s), Neural Information Processing Systems 12. MIT Press.
Joachims/99a	Joachims, Thorsten (1999). Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf]
Scheffer/Joachims/99a	Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). .
Vapnik/98a	V. Vapnik (1998). Statistical Learning Theory. Wiley.

Hauptnavigation

General

Research

Teaching

Staff