# Training Binary Restricted Boltzmann Machines¶

## Introduction¶

Shark has a module for training restricted Boltzmann machines (RBMs) [Hinton2007]
[Welling2007]. All corresponding header files are located in the subdirectory
`<SHARK_SRC_DIR>/include/shark/Unsupervised/RBM/`

. We will assume that you
already read the introduction to the RBM module The RBM Module.

In the following, we will train and evaluate a Binary RBM using Contrastive Divergence (CD-1) learning on a toy example. We choose this example as a starting point because its setup is quite common, and we provide a set of predefined types for it for convenience.

The example file for this tutorial can be found in BinaryRBM.cpp

## Contrastive Divergence Learning – Theory¶

## Contrastive Divergence Learning – Code¶

First, we need to include the following files

```
//used for training the RBM
#include <shark/Unsupervised/RBM/BinaryRBM.h>
#include <shark/Algorithms/GradientDescent/SteepestDescent.h>
//the problem
#include <shark/Unsupervised/RBM/Problems/BarsAndStripes.h>
//for evaluation
#include <shark/Unsupervised/RBM/analytics.h>
#include <iostream>
```

As an example problem, we consider one of the predefined benchmark problems in `RBM/Problems/`

,
namely, the Bars-and-Stripes data set [MacKay2002]

```
BarsAndStripes problem;
UnlabeledData<RealVector> data = problem.data();
```

Now we can create the RBM. We have to define how many input variables (visible units/observable variables) our RBM shall have. This depends on the data set from which we want to learn, since the number of visible neurons has to correspond to the dimensionality of the training data. Further, we have to choose how many hidden neurons (latent variables) we want. Also, to construct the RBM, we need to choose a random number generator. Since RBM training is time consuming, we might later want to start several trials in separate instances. In this setup, being able to choose a random number generator is crucial. But now, let’s construct the beast:

```
size_t numberOfHidden = 32;//hidden units of the rbm
size_t numberOfVisible = problem.inputDimension();//visible units of the inputs
//create rbm with simple binary units
BinaryRBM rbm(random::globalRng);
rbm.setStructure(numberOfVisible,numberOfHidden);
```

Using the RBM, we can now construct the k-step Contrastive Divergence error function. Since we want to model Hinton’s famous algorithm we will set k to 1. Throughout the library we use the convention that all kinds of initialization of the structure must be set before calling setData. This allows the gradients to adjust their internal structures. For CD-k this is not crucial, but you should get used to it before trying more elaborate gradient approximators:

```
BinaryCD cd(&rbm);
cd.setK(1);
cd.setData(data);
```

The RBM optimization problem is special in the sense that the error function can not be
evaluated exactly for more complex problems than trivial toy problems, and the gradient can
only be estimated. This is reflected by the fact that all RBM derivatives have the Flag
`HAS_VALUE`

deactivated. Thus, most optimizers will not be able to optimize it. One which
is capable of optimizing it is the `GradientDescent`

algorithm, which we will use in the
following

```
SteepestDescent optimizer;
optimizer.setMomentum(0);
optimizer.setLearningRate(0.1);
```

Since our problem is small, we can actually evaluate the negative log-likelihood. So we use it at the end to evaluate our training success after training several trials

```
unsigned int numIterations = 1000;//iterations for training
unsigned int numTrials = 10;//number of trials for training
double meanResult = 0;
for(unsigned int trial = 0; trial != numTrials; ++trial) {
initRandomUniform(rbm, -0.1,0.1);
cd.init();
optimizer.init(cd);
for(unsigned int iteration = 0; iteration != numIterations; ++iteration) {
optimizer.step(cd);
}
//evaluate exact likelihood after training. this is only possible for small problems!
double likelihood = negativeLogLikelihood(rbm,data);
std::cout<<trial<<" "<<likelihood<<std::endl;
meanResult +=likelihood;
}
meanResult /= numTrials;
```

Now we can print the results as usual with

```
cout << "RESULTS: " << std::endl;
cout << "======== " << std::endl;
cout << "mean negative log likelihood: " << meanResult << std::endl;
```

and the result will read something like

```
RESULTS:
========
mean log likelihood: 192.544
```