The Image Group – University of Copenhagen

The Image Group
Machine learning solutions



The Shark machine learning library is a modular C++ library for the design and optimization of adaptive systems.

[Shark] The library provides methods for regression, classification, and density estimation, including various kinds of neural networks and kernel methods, as well as general algorithms for nonlinear optimization, in particular single- and multi-objective evolutionary algorithms and gradient-based methods.

The most recent version of Shark can be downloaded from here.

Video lecture

Tobias Glasmachers' 2008 NIPS open source software workshop video lecture on Shark.

Paper describing the library

Christian Igel, Verena Heidrich-Meisner, and Tobias Glasmachers. Shark. Journal of Machine Learning Research 9, pp. 993-996, 2008

Maximum Likelihood Model Selection for SVMs

Adapting the hyperparameters of support vector machines (SVMs) is a challenging model selection problem, especially when flexible kernels are to be adapted and data are scarce. We present a coherent framework for regularized model selection of 1-norm soft margin SVMs for binary classification. We propose to use gradient-ascent on a likelihood function of the hyperparameters. The likelihood function is based on logistic regression for robustly estimating the class conditional probabilities and can be computed efficiently. Overfitting is an important issue in SVM model selection and can be addressed in our framework by incorporating suitable prior distributions over the hyperparameters.

Source code

Download example code for Shark version 2.

Paper describing the algorithm

Tobias Glasmachers and Christian Igel. Maximum Likelihood Model Selection for 1-Norm Soft Margin SVMs with Multiple Parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(8), pp. 1522-1528, 2010

Training of Large-scale SVMs

[feasible region of optimization problem] Support vector machines are trained by solving constrained quadratic optimization problems. This is usually done with an iterative decomposition algorithm operating on a small working set of variables in every iteration. The training time strongly depends on the selection of these variables. We propose the maximum-gain working set selection algorithm for large scale quadratic programming. It is based on the idea to greedily maximize the progress in each single iteration. The algorithm takes second order information from cached kernel matrix entries into account. We proved the convergence to an optimal solution of a variant termed hybrid maximum-gain working set selection. This method has been empirically compared to the prominent most violating pair selection and the latest algorithm using second order information. For large training sets our new selection scheme is significantly faster.

Source code

Hybrid Maximum Gain working set selection algorithm implemented into LIBSVM 2.71. It is also implemented in the Shark machine learning library.

Supplementary material

This program checks all 2,834,352 cases considered for the proof of Lemma 8. However, in the meantime Tobias Glasmachers found a much more elegant proof:

Tobias Glasmachers. On Related Violating Pairs for Working Set Selection in SMO Algorithms. In M. Verleysen, ed.: 16th European Symposium on Artificial Neural Networks (ESANN 2008). Evere, Belgien: d-side publications, 2008

Paper describing the algorithm

Tobias Glasmachers and Christian Igel. Maximum-Gain Working Set Selection for SVMs. Journal of Machine Learning Research 7, pp. 1437-1466, 2006

Online and Active Learning for SVMs

Iterative learning algorithms that approximate the solution of support vector machines (SVMs) have two potential advantages. First, they allow for online and active learning. Second, for large datasets computing the exact SVM solution may be too time consuming and an efficient approximation can be preferable. The powerful LASVM proposed by A. Bordes, S. Ertekin, J. Weston, and L. Bottou iteratively approaches the exact SVM solution using sequential minimal optimization (SMO). It allows for efficient online and active learning. This algorithm can be considerably improved in speed and accuracy by replacing the working set selection in the SMO steps. We incorporated a second order working set selection strategy, which greedily aims at maximizing the progress in each single step.

Supplementary material

The official LASVM website can found here. The modified LASVM source code implementing our working set selection strategy can be downloaded here.


Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou. Fast Kernel Classifiers with Online and Active Learning. Journal of Machine Learning Research 5, pp. 1579-1619, 2005
Tobias Glasmachers and Christian Igel. Second Order SMO Improves SVM Online and Active Learning. Neural Computation 20(2), pp. 374–382, 2008

Gradient-based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection: Sequence Data

Extracting protein-encoding sequences from nucleotide sequences is an important task in bioinformatics. This requires to detect locations at which coding regions start. These locations are called translation initiation sites (TIS).

The TIS2007 data is a reliable data set designed to evaluate machine learning algorithms for automatic TIS detection. It is based on E. coli genes from the EcoGene database. Only entries with biochemically verified N-terminus were considered. The neighboring nucleotides were looked up in the GenBank file U00096.gbk . From the 732 positive examples associated negative examples were created. For the negative examples, sequences centered around a potential start codon were considered and accepted them if the codon is in-frame with one of the real start sites used as a positive case, its distance from a real TIS is less than 80 nucleotides, and no in-frame stop codon occurs in between. This data selection generates a difficult benchmark because the negative TISs in the data set are both in-frame with and in the neighborhood of the real TIS. Finally a set of 1248 negative examples was obtained. The length of each sequence is 50 nucleotides, with 32 located upstream and 18 downstream including the start codon.

To minimize sampling effects, 50 different partitionings of the data into training and test sets were generated. Each training set contains 400 sequences plus the associated negatives, the corresponding test set 332 sequences plus the associated negatives. Each line in a data file starts with the label, 1 or -1 for positive and negative examples, respectively, followed by the nucleotide sequence as ASCII string.

Source code

Gradient-based optimization of the kernel-target alignment is available as a standard SVM model selection method in the Shark library. For an implementation of various oligo kernel functions for the Shark library click here.

Supplementary material

All 50 data partitionings in training and test set.


Christian Igel, Tobias Glasmachers, Britta Mersch, Nico Pfeifer, and Peter Meinicke. Gradient-based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(2), pp. 216-226, 2007
Britta Mersch, Tobias Glasmachers, Peter Meinicke, and Christian Igel. Evolutionary Optimization of Sequence Kernels for Detection of Bacterial Gene Starts. International Journal of Neural Systems 17(5), selected paper of ICANN 2006, pp. 369-381, 2007

The Elitist Covariance Matrix Adaptation Evolution Strategy


The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most powerful evolutionary algorithms for real-valued optimization. We propose the an elitist version of this algorithm. For step size adaption, the algorithms used an improved 1/5-th success rule, which replaces the cumulative path length control in the standard CMA-ES.

We developed an incremental Cholesky update for the covariance matrix replacing the computational demanding and numerically involved decomposition of the covariance matrix. This rank-one update can replace the decomposition only for the update without evolution path and reduces the computational effort by a factor of n, where n is the problem dimension. The resulting (1+1)-Cholesky-CMA-ES is an elegant algorithm and the perhaps simplest evolution strategy with covariance matrix and step size adaptation.

Source code

There are an example program for the (1+1)-CMA-ES in the Shark library.

Papers describing the algorithm

Christian Igel, Thorsten Suttorp, and Nikolaus Hansen. A Computational Efficient Covariance Matrix Update and a (1+1)-CMA for Evolution Strategies. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2006), pp. 453-460, ACM Press
Thorsten Suttorp, Nikolaus Hansen, and Christian Igel. Efficient Covariance Matrix Update for Variable Metric Evolution Strategies. Machine Learning 75, pp. 167-197, 2009

Covariance Matrix Adaptation for Multi-objective Optimization


The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most powerful evolutionary algorithms for real-valued single-objective optimization. We developed a variant of the CMA-ES for multi-objective optimization.

In the new multi-objective CMA-ES (MO-CMA-ES) a population of individuals that adapt their search strategy as in the elitist CMA-ES is maintained. These are subject to multi-objective selection. The selection is based on non-dominated sorting using either the crowding-distance or the contributing hypervolume as second sorting criterion. The MO-CMA-ES inherits important invariance properties, in particular invariance under rotation of the search space, from the original CMA-ES.

Source code

The MO-CMA-ES is part of the Shark library. There is a tutorial and an example program that can serve as starting points for using the MO-CMA-ES.

Papers describing the algorithm

Christian Igel, Nikolaus Hansen, and Stefan Roth. Covariance Matrix Adaptation for Multi-objective Optimization. Evolutionary Computation 15(1), pp. 1-28, 2007
Thorsten Suttorp, Nikolaus Hansen, and Christian Igel. Efficient Covariance Matrix Update for Variable Metric Evolution Strategies. Machine Learning, 2009

Rprop with Improved Weight-Backtracking

[Rprop Scheme] The Rprop (Resilient Backpropagation) algorithm proposed by Riedmiller and Braun is one of the best performing first-order learning methods. We introduce slight modifications of the algorithm that further improve its robustness and learning speed.

We suggest to use weight-backtracking for highly non-linear optimization problems. Weight-backtracking means partially retracting “unfavorable” previous update steps. Whether a parameter change was “unfavorable” or not is decided by a heuristic. We propose an improved weight-backtracking heuristic considering both the evolution of the partial derivatives and the value of the objective function.

Source code

The Rprop algorithms are implemented in the Shark library, which comes with several examples using the Rprop for optimization.


Christian Igel and Michael Hüsken. Empirical Evaluation of the Improved Rprop Learning Algorithm. Neurocomputing 50(C), pp. 105-123, 2003
Christian Igel and Michael Hüsken. Improving the Rprop Learning Algorithm. In H. Bothe and R. Rojas, eds.: Second International Symposium on Neural Computation (NC 2000), pp. 115-121, ICSC Academic Press, 2000)
Martin Riedmiller. Advanced supervised learning in multilayer perceptrons-from backpropagation to adaptive learning techniques. International Journal of Computer Standards and Interfaces 16(3), pp. 265-278, 1994.
Martin Riedmiller, Heinreich Braun. A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of the International Conference on Neural Networks, pp. 586-591, IEEE Press, 1993