Other Activation Functions

Activation functions are simple functions used to transform the outputs of another model. They appear often in the design of neural networks which are a cascade of linear or convolutional models, followed by a nonlinear activation function.

There are currently three models for which an activation function can be specified.

  • LinearModel creates a model of the form

    \[f(x) = g(Ax+b)\]

    where g is the nonlinear activation function.

  • Conv2DModel behaves like a linear model but uses a convolution to model the linear part

  • NeuronLayer simply computes:

    \[f(x) = g(x)\]

The activation function is chosen as a template argument, e.g. LinearModel<RealVector, LogisticNeuron> chooses g as the logistic function. Most activation functions are applied elementwise, but also interactions between several outputs can be modeled. The following neurons are implemented in shark:

  • LinearNeuron: not a good choice for hidden neurons, but for output neurons when the output is not bounded. This activation function \(f(x)=x\) is the typical choice for regression tasks and when training with cross-entropy. It is also the default activation for LinearModel and Conv2DModel.
  • LogisticNeuron: is a sigmoid (S-shaped) function with outputs in the range [0,1] and the following definition
\[f(x) = \frac 1 {1+e^{-x}}.\]
  • TanhNeuron: the hyperbolic tangens, can be viewed as a rescaled version of the Logistic function with outputs ranging from [-1,1]. It has the formula
\[f(x) = \tanh(x) = \frac 2 {1+e^{-2x}}-1.\]
  • FastSigmoidNeuron: a sigmoidal function which is faster to evaluate than the previous two activation functions. It has also “bigger tails” (i.e., the gradient does not vanish as quickly).
\[f(x) = \frac x {1+|x|}.\]
  • RectifierNeuron: Also known as ReLu-Unit. The neuron’s activation is kept at 0 for negative activation levels and is linear for all positive values. A network with these neurons is effectively a piecewise linear function.
\[ \begin{align}\begin{aligned}f(x) = \max(0,x).\\* :doxy:`NormalizerNeuron`: Normalizes each output by the sum of activations. Assumes that activations are positive and thus can be used to tansform positive numbers (e.g. counts) to probabilities\end{aligned}\end{align} \]
\[f_i(x) = \frac {x_i}{\sum_j x_j}.\]
  • SoftmaxNeuron: Computes the softmax activation of the inputs. This model is the correct function to transform the outputs of a model trained with CrossEntropy to a proper class probability.

    \[f_i(x) = \frac {\exp(x_i=}{\sum_j exp(x_j)}.\]