SVM^{light}: Support Vector Machine
Author: Thorsten
Joachims <thorsten@ls8.cs.unidortmund.de>
University of Dortmund,
Informatik,
AIUnit
Collaborative
Research Center on 'Complexity Reduction in Multivariate Data' (SFB475)
Version: 3.02
Date: 16.11.99
Overview
SVM^{light} is an implementation of Support Vector Machines
(SVMs) in C. The main features of the program are the following:

fast optimization algorithm

working set selection based on steepest feasible descent

"shrinking" heuristic

caching of kernel evaluations

use of folding in the linear case

includes algorithm for approximately training large transductive SVMs (TSVMs)

can train SVMs with cost models

handles many thousands of support vectors

handles several tenthousands of training examples

supports standard kernel functions and lets you define your own

uses sparse vector representation
There is also a regression support vector machine based on
SVM^{light} available at the AIUnit: mySVM.
Description
SVM^{light} is an implementation of Vapnik's Support Vector
Machine [Vapnik, 1995] for the problem of pattern
recognition. The optimization algorithm used in SVM^{light}
is described in [Joachims, 1999a]. The algorithm
has scalable memory requirements and can handle problems with many thousands
of support vectors efficiently. This new version also includes a new algorithm
for training largescale transductive SVMs. The algorithm proceeds by solving
a sequence of optimization problems lowerbounding the solution using a
form of local search. A detailed description of the algorithm can be found
in [Joachims, 1999c]. You can now also use
SVMs with cost models (see [Morik et al., 1999]).
The code has been used on a large range of problems, including text classification
[Joachims, 1999c][Joachims,
1998a], several image recognition tasks, and medical applications.
Many tasks have the property of sparse instance vectors. This implementation
makes use of this property which leads to a very compact and efficient
representation.
Source Code The source code is free for scientific use. Please
contact me, if you are planning to use the software for commercial
purposes. The software must not be modified and distributed without
prior permission of the author. If you use SVM^{light}
in your scientific work, please cite as

T. Joachims, Making largeScale SVM Learning Practical. Advances in Kernel
Methods  Support Vector Learning, B. Schölkopf and C. Burges and
A. Smola (ed.), MITPress, 1999.
[PDF]
[Postscript
(gz)]
I would also appreciate, if you sent me (a link to) your papers so that
I can learn about your research. The implementation was developed on Solaris
2.5 with gcc, but compiles also on SunOS 3.1.4, Solaris 2.7, Linux, IRIX, Windows NT, and Powermac (after small modifications, see FAQ). The source code is available at the following location:
Please send me email
and let me know that you got svmlight. I will put you on my mailing list
to inform you about new versions and bugfixes. SVM^{light}
comes with a quadratic programming tool for solving small intermediate
quadratic programming problems. It is based on the method of Hildreth and
D'Espo and solves small quadratic programs very efficiently. Nevertheless,
if for some reason you want to use another solver, the new version still
comes with an interface to PR_LOQO. The PR_LOQO
optimizer was written by A.
Smola. It can be requested from http://www.kernelmachines.org/code/prloqo.tar.gz.
Installation
To install SVM^{light} you need to download svm_light.tar.gz.
Create a new directory:
Move svm_light.tar.gz to this directory and unpack it with
gunzip c svm_light.tar.gz  tar xvf 
Now execute
which compiles the system and creates the two executables
svm_learn (learning module)
svm_classify (classification module)
If you do not want to use the builtin optimizer but PR_LOQO instead, create
a subdirectory in the svm_light directory with
and copy the files pr_loqo.c and pr_loqo.h (which you
received by email) in there. Now execute
If the system does not compile properly, check this FAQ.
How to use
This section explains how to use the SVM^{light} software. A good introduction to the theory of SVMs is Chris Burges' tutorial.
SVM^{light} consists of a learning module (svm_learn) and a classification module (svm_classify). The classification module can be used to apply the learned model to new examples. See also the examples below for how to use svm_learn and svm_classify.
svm_learn is called with the following parameters:
svm_learn [options] example_file model_file
Available options are:
General options:
?
> this help
v [0..3]
> verbosity level (default 1)
Learning options:
c float
> C: tradeoff between training error
and margin (default 1000)
j float
> Cost: costfactor, by which training errors on
positive examples outweight errors on negative
examples (default 1)
b [0,1]
> use biased hyperplane (i.e. x*w+b>0) instead
of unbiased hyperplane (i.e. x*w>0) (default 1)
i [0,1]
> remove inconsistent training examples
and retrain (default 0)
Transduction options:
p [0..1]
> fraction of unlabeled examples to be classified
into the positive class (default is the ratio of
positive and negative examples in the training data)
Kernel options:
t int
> type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(gamma ab^2)
3: sigmoid tanh(s a*b + c)
4: user defined kernel from kernel.h
d int
> parameter d in polynomial kernel
g float
> parameter gamma in rbf kernel
s float
> parameter s in sigmoid/poly kernel
r float
> parameter c in sigmoid/poly kernel
u string
> parameter of user defined kernel
Optimization options:
q [2..400] >
maximum size of QPsubproblems (default 10)
m [5..]
> size of cache for kernel evaluations in MB (default 40)
The larger the faster...
e float
> eps: Allow that error for termination criterion
[y [w*x+b]  1] >= eps (default 0.001)
h [5..]
> number of iterations a variable needs to be
optimal before considered for shrinking (default 100)
f [0,1]
> do final optimality check for variables removed
by shrinking. Although this test is usually
positive, there is no guarantee that the optimum
was found if the test is omitted. (default 1)
Output options:
l char
> file to write predicted labels of unlabeled
examples into after transductive learning
a char
> write all alphas to this file after learning
(in the same order as in the training set)
The input file example_file contains the training examples. The
first lines may contain comments and are ignored if they start with #.
Each of the following lines represents one training example and is of the
following format:
<class> .=. +1  1  0
<feature> .=. integer
<value> .=. real
<line> .=. <class> <feature>:<value> <feature>:<value>
... <feature>:<value>
The class label and each of the feature/value pairs are separated by a
space character. Feature/value pairs MUST be ordered by increasing feature
number. Features with value zero can be skipped. The +1 as class label
marks a positive example, 1 a negative example respectively. A class label
of 0 indicates that this example should be classified using transduction.
The predictions for the examples classified by transduction are written
to the file specified through the l option. The order of the predictions
is the same as in the training data.
The result of svm_learn is the model which is learned from
the training data in example_file. The model is written to model_file.
To classify test examples, svm_classify reads this file. svm_classify
is called with the following parameters:
svm_classify [options] example_file model_file output_file
Available options are:
h > Help.
v [0..3] > Verbosity level (default 2).
f [0,1] > 0: old output format of V1.0
1: output the value of decision function (default)
All test examples in example_file are classified and the predicted
classes are written to output_file. There is one line per test
example in output_file containing the value of the decision function
on that example. The test example file has the same format as the one for
svm_learn.
Again, <class> can have the value zero indicating unknown.
If you want to find out more, try this FAQ.
Getting started: an Example Problem
Inductive SVM
You will find an example text classification problem at
Download this file into your svm_light directory and unpack it with
gunzip c example1.tar.gz  tar xvf 
This will create a subdirectory example1. Documents are represented
as feature vectors. Each feature corresponds to a word stem (9947 features).
The task is to learn which Reuters
articles are about "corporate acquisitions". There are 1000 positive
and 1000 negative examples in the file train.dat. The file
test.dat
contains 600 test examples. The feature numbers correspond to the line
numbers in the file words. To run the example, execute the commands:
svm_learn example1/train.dat example1/model
svm_classify example1/test.dat example1/model example1/predictions
The accuracy on the test set is printed to stdout.
Transductive SVM
To try out the transductive learner, you can use the following dataset.
I compiled it from the same Reuters articles as used in the example for
the inductive SVM. The dataset consists of only 10 training examples (5
positive and 5 negative) and the same 600 test examples as above. You find
it at
Download this file into your svm_light directory and unpack it with
gunzip c example2.tar.gz  tar xvf 
This will create a subdirectory example2. To run the example,
execute the commands:
svm_learn example2/train_transduction.dat example2/model
svm_classify example2/test.dat example2/model example2/predictions
The classification module is called only to get the accuracy printed. The
transductive learner is invoced automatically, since train_transduction.dat
contains
unlabeled examples (i. e. the 600 test examples). You can compare the results
to those of the inductive SVM by running:
svm_learn example2/train_induction.dat example2/model
svm_classify example2/test.dat example2/model example2/predictions
The file train_induction.dat contains the same 10 (labeled) training
examples as train_transduction.dat.
Extensions and Additions
Questions and Bug Reports
If you find bugs or you have problems with the code you cannot solve by
yourself, please contact me via email <svmlight@ls8.cs.unidortmund.de>.
Disclaimer
This software is free only for noncommercial use. It must not be modified
and distributed without prior permission of the author. The author is not
responsible for implications from the use of this software.
History
V3.01 > V3.02
 Now examples can be read in correctly on SGIs.
V3.00 > V3.01
 Fixed rare convergence bug for Hildreth and D'Espo solver.
V2.01 > V3.00

Training algorithm for transductive Support Vector Machines.

Integrated core QPsolver based on the method of Hildreth and D'Espo.

Uses folding in the linear case, which speeds up linear SVM training by
an order of magnitude.

Allows linear cost models.

Faster in general.
V2.00 > V2.01
V1.00 > V2.00

Learning is much faster especially for large training sets.

Working set selection based on steepest feasible descent.

"Shrinking" heuristic.

Improved caching.

New solver for intermediate QPs.

Lets you set the size of the cache in MB.

Simplified output format of svm_classify.

Data files may contain comments.
V0.91 > V1.00

Learning is more than 4 times faster.

Smarter caching and optimization.

You can define your own kernel function.

Lets you set the size of the cache.

VCdim is now estimated based on the radius of the support vectors.

The classification module is more memory efficient.

The f2c library is available from here.

Adaptive precision tuning makes optimization more robust.

Includes some small bug fixes and is more robust.

Source code for SVM^{light}
V1.00
V0.9 > V0.91

Fixed small bug which appears for very small C. Optimization did not converge.
References
Joachims/99a 
Joachims, Thorsten (1999). Making largeScale SVM Learning Practical. In Advances in Kernel Methods  Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf] 
Joachims/99c 
Thorsten Joachims (1999). Transductive Inference for Text Classification using Support Vector Machines. In International Conference on Machine Learning (ICML). . [.ps.gz] [.pdf] 
Scheffer/Joachims/99a 
Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). . 
Joachims/98a 
Joachims, Thorsten (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Claire N\'edellec and C\'eline Rouveirol, editor(s), Proceedings of the European Conference on Machine Learning, pages 137  142. Springer. [.ps.gz] [.pdf] 
Joachims/98c 
Thorsten Joachims (1998). Making largeScale SVM Learning Practical. Technical report, Universität Dortmund, LS VIIIReport. [.ps.gz] [.pdf] 
Scheffer/Joachims/98a 
Tobias Scheffer and Thorsten Joachims (1998). Estimating the expected error of empirical minimizers for model selection. Technical report, TUBerlin. [.ps] 
Cortes/Vapnik/95a 
Cortes, Corinna and Vapnik, Vladimir N. (1995). SupportVector Networks. Machine Learning Journal, 20 pages 273297. 
Vapnik/95a 
Vladimir N. Vapnik (1995). The Nature of Statistical Learning Theory. Springer. 
Other SVM Resources
Last modified July 7th, 2000 by Thorsten
Joachims <thorsten@ls8.cs.unidortmund.de> 