Pretraining of Deep Neural Networks ============================================== .. attention:: This is an advanced topic Training deep neural networks is a challenge because normal training easily gets stuck in undesired local optima which prevent the lower layers from learning useful features. This problem can be partially circumvented by pretraining the layers in an unsupervised fashion and thus initialising them in a region of the error function which is easier to train (or fine-tune) using steepest descent techniques. In this tutorial we will implement the architecture presented in "Deep Sparse Rectifier Neural Networks" [Glorot11]_. The authors propose a multi-layered feed forward network with rectified linear hidden neurons, which is first pre-trained layerwise using denoising autoencoders [Vincent08]_. Afterwards, the full network is trained supervised with a L1-regularisation to enforce additional sparsity. Training denoising autoencoders is outlined in detail in :doc:`./denoising_autoencoders` and supervised training of a feed forward neural network is explained in :doc:`./ffnet`. This tutorial provides the glue to bring both together. Due to the complexity of the task, a number of includes are needed:: //noisy AutoencoderModel model and deep network #include // neural network for supervised training #include // the autoencoder to train unsupervised #include // to concatenate Autoencoder with noise adding model //training the model #include //the error function performing the regularisation of the hidden neurons #include // squared loss used for unsupervised pre-training #include // loss used for supervised training #include // loss used for evaluation of performance #include //L1 and L2 regularisation #include //optimizer: simple gradient descent. #include //optimizer for autoencoders Deep Network Pre-training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We will use the code of the denoising autoencoder tutorial to pre-train a deep neural network and we will create another helper function which initialises a deep neural network using the denoising autoencoder. In the next step a supervised fine-tuning step is applied which is simple gradient descent on the supervised learning goal using the pre-trained network as starting point for the optimisation. The types of networks we use are:: typedef Autoencoder AutoencoderModel;//type of autoencoder typedef FFNet Network;//final supervised trained structure First, we create a function to initialise the network. We start by training the autoencoders for the two hidden layers. We proceed by taking the original dataset and train an autoencoder using this. Next, we take the encoder layer - that is the connection of inputs to the hidden units - and compute the feature vectors for every point in the dataset using ``evalLayer``, a method specific to autoencoders and feed forward networks. Finally, we create the autoencoder for the next layer by training it on the feature dataset:: Network unsupervisedPreTraining( UnlabeledData const& data, std::size_t numHidden1,std::size_t numHidden2, std::size_t numOutputs, double regularisation, std::size_t iterations ){ //train the first hidden layer std::cout<<"training first layer"<( data,numHidden1, regularisation, iterations ); //compute the mapping onto the features of the first hidden layer UnlabeledData intermediateData = layer.evalLayer(0,data); //train the next layer std::cout<<"training second layer"<( intermediateData,numHidden2, regularisation, iterations ); We can now create the pre-trained network from the auto encoders by creating a network with two hidden layers, initialize all weights randomly, and then setting the first and hidden layers to the encoding layers of the auto encoders:: //create the final network Network network; network.setStructure(dataDimension(data),numHidden1,numHidden2, numOutputs); initRandomNormal(network,0.1); network.setLayer(0,layer.encoderMatrix(),layer.hiddenBias()); network.setLayer(1,layer2.encoderMatrix(),layer2.hiddenBias()); return network; Supervised Training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The supervised training part is overall the same as in previous tutorials and we only show the code here. We use the :doxy:`CrossEntropy` loss for classification and the :doxy:`OneNormRegularizer` for sparsity of the activation function. We again optimize using :doxy:`IRpropPlusFull`:: //model parameters std::size_t numHidden1 = 8; std::size_t numHidden2 = 8; //unsupervised hyper parameters double unsupRegularisation = 0.001; std::size_t unsupIterations = 100; //supervised hyper parameters double regularisation = 0.0001; std::size_t iterations = 200; //load data and split into training and test LabeledData data = createProblem(); data.shuffle(); LabeledData test = splitAtElement(data,static_cast(0.5*data.numberOfElements())); //unsupervised pre training Network network = unsupervisedPreTraining( data.inputs(),numHidden1, numHidden2,numberOfClasses(data), unsupRegularisation, unsupIterations ); //create the supervised problem. Cross Entropy loss with one norm regularisation CrossEntropy loss; ErrorFunction error(data, &network, &loss); OneNormRegularizer regularizer(error.numberOfVariables()); error.setRegularizer(regularisation,®ularizer); //optimize the model std::cout<<"training supervised model"<