The Versatility of Learning

The purpose of this tutorial is to demonstrate the versatility of Shark for various learning tasks. Drawing on a simple binary classification task similar to the one in the Hello World tutorial, we will cover five different learning methods in a single, consistent framework. The present tutorial assumes that the reader is already familiar with the concepts of models and trainers that have been treated in more detail, e.g., in the General Optimization Tasks tutorial.

We will start out with the (hopefully already familiar) structure of a supervised learning experiment:

#include <shark/Data/Dataset.h>
#include <shark/Data/Csv.h>
#include <shark/ObjectiveFunctions/Loss/ZeroOneLoss.h>

using namespace shark;

int main()
{
        // Load data, use 70% for training and 30% for testing.
        // The path is hard coded; make sure to invoke the executable
        // from a place where the data file can be found. It is located
        // under [shark]/examples/Supervised/data.
        ClassificationDataset traindata, testdata;
        importCSV(traindata, "data/quickstartData.csv", LAST_COLUMN, ' ');
        testdata = splitAtElement(traindata, 70 * traindata.numberOfElements() / 100);

        // TODO: define a model and a trainer

        trainer.train(model, traindata);

        Data<unsigned int> prediction = model(testdata.inputs());

        ZeroOneLoss<unsigned int> loss;
        double error_rate = loss(testdata.labels(), prediction);

        std::cout << "model: " << model.name() << std::endl
                << "trainer: " << trainer.name() << std::endl
                << "test error rate: " << error_rate << std::endl;
}

The program assumes the comma-separated-value (csv) file quickstartData.csv with training and test data located in the sub-folder /data. The file content describes a two-class (binary) problem, with labels 0 and 1. The program itself is still a stub, since the actual model and trainer declarations are missing. The ZeroOneLoss computes the classification error. It replaces the loop at the end of the Hello World tutorial.

Many Ways of Classifying Data

In the following we will demonstrate the versatility of Shark by inserting five different learning methods into the above program structure. In passing we will see how to circumvent typical pitfalls.

Linear Discriminant Analysis

Let us start with a classical linear method, namely linear discriminant analysis (LDA). We need two more includes

#include <shark/Models/LinearClassifier.h>
#include <shark/Algorithms/Trainers/LDA.h>

and in the place of the “TODO” comment we insert

LinearClassifier<> model;
LDA trainer;

That’s it! The program is ready to go. For build instructions refer to the Your Shark Programs tutorial. You can learn more on LDA in the Linear Discriminant Analysis tutorial.

Nearest Neighbor Classifier

Let’s move from the linear parametric LDA approach to a non-linear, non-parametric approach. The arguably simplest non-linear classifier is the nearest neighbor classifier. This classifier is special in that it does not require a trainer. Let’s remove the LDA code and insert the following code in the appropriate places:

#include <shark/Models/Trees/KDTree.h>
#include <shark/Models/NearestNeighborClassifier.h>
#include <shark/Algorithms/NearestNeighbors/TreeNearestNeighbors.h>


        unsigned int k = 3;   // number of neighbors
        KDTree<RealVector> tree(traindata.inputs());
        TreeNearestNeighbors<RealVector, unsigned int> algorithm(traindata, &tree);
        NearestNeighborClassifier<RealVector> model(&algorithm, k);

For the time being ignore the helper classes, unless you already know what they do. There is no explicit training step for nearest neighbor prediction and therefore no trainer object. So we remove the lines:

trainer.train(model, traindata);

        << "trainer: " << trainer.name() << std::endl

Everything should work right away. For more information on nearest neighbor classification see the Nearest Neighbor Classification tutorial.

You see, changing the learning method is really easy. So let’s try more.

Support Vector Machine

Our next candidate is a non-linear support vector machine (SVM). We will use a Gaussian radial basis function kernel:

#include <shark/Models/Kernels/GaussianRbfKernel.h>
#include <shark/Models/Kernels/KernelExpansion.h>
#include <shark/Algorithms/Trainers/CSvmTrainer.h>


        double gamma = 1.0;         // kernel bandwidth parameter
        double C = 10.0;            // regularization parameter
        GaussianRbfKernel<RealVector> kernel(gamma);
        KernelClassifier<RealVector> model(&kernel);
        CSvmTrainer<RealVector> trainer(
                        &kernel,
                        C,
                        true); /* true: train model with offset */

Quite simple, again. That’s it; you are ready to enjoy the power of non-linear SVM classification. Much more on SVMs can be found in the special SVM tutorials, starting with Support Vector Machines: First Steps.

Random Forest

There is more to explore in Shark. Let’s try a random forest instead:

#include <shark/Models/Trees/RFClassifier.h>
#include <shark/Algorithms/Trainers/RFTrainer.h>


        RFClassifier<unsigned int> model;
        RFTrainer<unsigned int> trainer;

This one is really straightforward. For an introduction to random forests see the Random Forest tutorial.

However, the attempt to compile this program results in an error message (or, depending on your compiler, a pile of hard-to-decrypt messages involving template issues). What went wrong? The problem is that in Shark there exist (for good reasons) two different conventions for representing classification labels and predictions (also refer to the Label Formats tutorial). While many output their prediction as unsigned integers, the RFClassifier outputs a RealVector holding one value per class. Here it contains two values, the higher of which indicates its prediction. Thus, we have to turn the line

Data<unsigned int> prediction = model(testdata.inputs());

into

auto prediction = model(testdata.inputs());

Now predictions are stored as RealVectors. The next thing is that these predictions are fed into the ZeroOneLoss. We change its definition into

ZeroOneLoss<unsigned int> loss;

where the first template parameter identifies the ground truth label type (the type of test.label(n)) and the second template parameter is the data type of model predictions (it can be dropped if the types coincide).

Neural Network

As a final example let’s look at a more complex case, namely that of feed forward neural network training. The most basic way of training these models is by gradient-based minimization of the training error (empirical risk), measured by some differentiable loss function such as the squared error or the cross entropy. The computation of the gradient is built into the neural network class (back-propagation algorithm), but of course there are various options for solving the underlying optimization problem. The General Optimization Tasks tutorial touches this topic. Here - for consistency with the previous examples - we will encapsulate the optimization process into the familiar model and trainer classes.

#include <shark/Models/FFNet.h>
#include <shark/ObjectiveFunctions/Loss/CrossEntropy.h>
#include <shark/ObjectiveFunctions/ErrorFunction.h>
#include <shark/Algorithms/GradientDescent/Rprop.h>
#include <shark/Algorithms/StoppingCriteria/MaxIterations.h>
#include <shark/Algorithms/Trainers/OptimizationTrainer.h>


        typedef FFNet<LogisticNeuron, LogisticNeuron> ModelType; // sigmoid transfer function for hidden and output neurons
        ModelType model;
        size_t N = inputDimension(traindata);
        size_t M = 10;
        model.setStructure(N, M, 2);         // N inputs (depends on the data),
                                             // M hidden neurons (depends on problem difficulty),
                                             // and two output neurons (two classes).
        initRandomUniform(model, -0.1, 0.1); // initialize with small random weights
        CrossEntropy trainloss;              // differentiable loss for neural network training
        IRpropPlus optimizer;                // gradient-based optimization algorithm
        MaxIterations<> stop(100);           // stop optimization after 100 Rprop steps
        OptimizationTrainer<ModelType, unsigned int> trainer(&trainloss, &optimizer, &stop);

Just like for random forests, FFNet objects output RealVectors and therefore must be used with the appropriate data containers and loss functions.

The important classes here are the loss function and the OptimizationTrainer. The loss function defines how the model to be trained is penalized when it does not give the right answer when the input is a point of the dataset to be trained on. These aspects can be incorporated in an ErrorFunction, which leads to an objective function which can be optimized with an iterative optimizer and a stopping condition. The OptimizationTrainer is a simple wrapper class that replaces the ErrorFunction and keeps references to the loss, the optimizer and the stopping condition and implements a straightforward iterative optimization loop in its train method. Feel free to use other (differentiable) loss functions for training, other (usually gradient-based) optimizers, and different stopping criteria. All this can be done without changing the program structure. In particular, after all definitions have been made there will always be a model and trainer, and that’s all we need to care for in the end.

What you learned

You should have learned the following aspects in this tutorial:

  • Shark is a versatile tool for machine learning. Changing the learning method requires only exchanging a few classes. All objects still conform to the same top level interfaces, such as AbstractModel and AbstractTrainer.
  • Nearly everything in Shark is templated. It is not always easy to get all template parameters right in the first attempt. The probably best way of dealing with errors is to check the documentation of the template classes. The meaning of all template parameters should be documented. Often it will also become clear from the template parameter’s name.

You may not have understood all details, in particular those hidden in the various helper classes. If you are particularly interested in one of the methods then please feel encouraged go ahead and explore the documentation.

In any case you should have understood how all the different learning methods are expressed by means of adaptive models and corresponding trainers. Changing the learning method may involve changing the particular sub-class, but all relevant objects will still conform to the same top-level interfaces. Thus, only minimal changes to the surrounding code will be necessary, if any at all. This design offers a lot of flexibility, since changing the learning algorithm even late in a project is usually not a big deal.