Sparse Autoencoder

This tutorial builds up on the previous Autoencoders tutorial. In the previous tutorials in the series on autoencoders, we have discussed to regularize autoencoders by either the number of hidden units, tying their weights, adding noise on the inputs, are dropping hidden units by setting them randomly to 0. There are more direct regularisation schemes which try to directly regularize the representation of the hidden units. This can be done by using a regularization term summing the KL-divergence

\[KL(\rho \| \rho_j) = \rho \log(\frac{\rho}{\rho_j}) + (1 - \rho) \log(\frac{1-\rho}{1-\rho_j})\]

over all hidden neurons indexed by \(j\). The regularization parameter \(\rho\) defines a target mean activation and the mean activation \(\rho_j\) of hidden neuron \(j\) over the whole training dataset is interpreted as activation probability. With this constraint, the optimisation problem can be stated as:

\[\min_{\theta} \frac 1 N \sum_i^N (\vec x_i - f_{\theta}(\vec x_i))^2 +\sum_{j=1}^m KL(\rho \| \rho_j)\]

The idea ofthe constraint is to make the hidden representation more sparse by giving the units a very small expected activation, which ideally is close to 0 most of the time. These autoencoders are therefore called sparse autoencoders.

For a detailed introduction to sparse autoencoders, we refer to the Stanford UFLDL Tutorial. In the following, we show how to solve the autoencoder exercise from this recommended tutorial using Shark.

Sparse Autoencoder in Shark

For this tutorial, the following (lengthy) list of includes is needed:

#include <shark/Data/Pgm.h> //for exporting the learned filters
#include <shark/Data/Csv.h>//for reading in the images as csv
#include <shark/Data/Statistics.h> //for normalization
#include <shark/ObjectiveFunctions/SparseAutoencoderError.h>//the error function performing the regularisation of the hidden neurons
#include <shark/Algorithms/GradientDescent/LBFGS.h>// the L-BFGS optimization algorithm
#include <shark/ObjectiveFunctions/Loss/SquaredLoss.h> // squared loss used for regression
#include <shark/ObjectiveFunctions/Regularizer.h> //L2 regulariziation

Edges in natural images

As an example of finding features with a sparse autoencoder, we will use natural images as input. We will then see how the autoencoder discovers edges as a good representation of natural images.

The images we will use are the same as in the Stanford Tutorial.

Generating a training set

In order to generate a training set, we will randomly select image patches of equal size (we use 8x8) and use each patch as a data point:

const unsigned int numsamples = 10000; //number of generated patches
const std::size_t w = 512;//width of loaded image
const std::size_t h = 512;//height of loaded image
const std::size_t psize = 8;//size of a patch

        // Read images
        UnlabeledData<RealVector> images;
        importCSV(images, "data/images.csv");
        unsigned int n = images.numberOfElements(); // number of images

        // Sample equal amount of patches per image
        size_t patchesPerImg = numsamples / n;
        typedef UnlabeledData<RealVector>::element_range::iterator ElRef;

        // Create patches
        vector<RealVector> patches;
        for (ElRef it = images.elements().begin(); it != images.elements().end(); ++it) {
                for (size_t i = 0; i < patchesPerImg; ++i) {
                        // Upper left corner of image
                        unsigned int ulx = random::discrete(random::globalRng, std::size_t(0),w-psize-1);
                        unsigned int uly = random::discrete(random::globalRng, std::size_t(0),w-psize-1);
                        // Transform 2d coordinate into 1d coordinate and get the sample
                        unsigned int ul = ulx * h + uly;
                        RealVector sample(psize * psize);
                        const RealVector& img = *it;
                        for (size_t j = 0; j < psize; ++j)
                                for (size_t k = 0; k < psize; ++k)
                                        sample(j * psize + k) = img(ul + k + j * h);
                        patches.push_back(sample);
                }
        }

        UnlabeledData<RealVector> samples = createDataFromRange(patches);

This creates a data set containing 10000 patches of size 8x8 transformed into a vector with 64 elements. Before we can use this data set we need to normalize the data points. We first truncate outliers to +/- 3 standard deviations, and then normalize to the range \([0.1, 0.9]\). We also need to make it a regression data set with same input and target values:

// zero mean
RealVector meanvec = mean(samples);
samples = transform(samples, Shift(-meanvec));

// Remove outliers outside of +/- 3 standard deviations
// and normalize to [0.1, 0.9]
RealVector pstd = 3 * sqrt(variance(samples));
samples = transform(samples, TruncateAndRescale(-pstd, pstd, 0.1, 0.9));

We encapsulate the above code in a function for brevity:

UnlabeledData<RealVector> getSamples()

Sparse autoencoder objective function

We first load our data and generate a dataset with labels equals to the points:

UnlabeledData<RealVector> samples = getSamples();
RegressionDataset data(samples, samples);

This does not create an additional copy of the samples but merely generates another reference to every single point. In order to create the objective function for the sparse autoencoder, we first need to create an instance of the Autoencoder class:

const unsigned int numhidden = 25;
const double rho = 0.01; // Sparsity parameter
const double beta = 6.0; // Regularization parameter
const double lambda = 0.0002; // Weight decay paramater

        Autoencoder<LogisticNeuron, LogisticNeuron> model;
        model.setStructure(psize * psize, numhidden);
        initializeFFNet(model);

We can also use the TiedAutoencoder here instead! We then need to add the sparsity constraint to the regression loss using the SparseAutoencoderError as well as atwo-norm-regularisation:

SquaredLoss<RealVector> loss;
SparseAutoencoderError error(data,&model, &loss, rho, beta);
// Add weight regularization
TwoNormRegularizer regularizer(error.numberOfVariables());
error.setRegularizer(lambda,&regularizer);

This creates the entire objective function for the sparse autoencoder, with sparsity constraint and weight regularization.

Training the autoencoder

In order to train the autoencoder we use the limited memory BFGS (L-BFGS) algorithm. We also need to chose a starting point for the optimization. For this we use values uniformly taken from \([-r, r]\) for the weights and \(0\) for the biases, with

\[r = \frac{\sqrt{6}}{n_{in} + n_{out} + 1}\]

where \(n_{in}\) and \(n_{out}\) is the number of input and output values per neuron.

The training is then done as follows:

const unsigned int maxIter = 400;

        LBFGS optimizer;
        error.init();
        optimizer.init(error);

        for (unsigned int i = 0; i < maxIter; ++i) {
                optimizer.step(error);
                cout << "Error: " << optimizer.solution().value << endl;
        }

In our trials we got final error values around 0.8 to 0.9.

Visualizing the autoencoder

After training, each row of the first weight matrix, \(W1\), will correspond to a feature learned by the autoencoder. To visualize these features, we export each row as an 8x8 PGM image using the PGM library of Shark:

exportFiltersToPGMGrid("features",model.encoderMatrix(),psize,psize);

We got the following result:

Plot of features learned by the autoencoder

Full example program

A complete program performing the above steps is SparseAETutorial.cpp.