Other Activation Functions
--------------------------------

Activation functions are simple functions used to transform the outputs of another model. 
They appear often in the design of neural networks which are a cascade of linear or convolutional models,
followed by a nonlinear activation function.

There are currently three models for which an activation function can be specified.

* :doxy:`LinearModel` creates a model of the form

  .. math::
    f(x) = g(Ax+b)

  where g is the nonlinear activation function.

* :doxy:`Conv2DModel` behaves like a linear model but uses a convolution to model the linear part 
* :doxy:`NeuronLayer` simply computes:

  .. math::
    f(x) = g(x)

The activation function is chosen as a template argument, e.g. `LinearModel<RealVector, LogisticNeuron>`
chooses g as the logistic function. Most activation functions are applied elementwise, but also interactions
between several outputs can be modeled. The following neurons are implemented in shark:

* :doxy:`LinearNeuron`: not a good choice for hidden neurons, but
  for output neurons when the output is not bounded. This activation
  function :math:`f(x)=x` is the typical choice for regression tasks and when training with cross-entropy. 
  It is also the default activation for :doxy:`LinearModel` and :doxy:`Conv2DModel`.
  
* :doxy:`LogisticNeuron`: is a sigmoid (S-shaped)
  function with outputs in the range [0,1] and the following definition

.. math::
  f(x) = \frac 1 {1+e^{-x}}.

* :doxy:`TanhNeuron`: the hyperbolic tangens, can be viewed as a rescaled
  version of the Logistic function with outputs ranging from [-1,1]. It
  has the formula

.. math::
  f(x) = \tanh(x) = \frac 2 {1+e^{-2x}}-1.

* :doxy:`FastSigmoidNeuron`: a sigmoidal function which is faster to
  evaluate than the previous two activation functions. It has also "bigger
  tails" (i.e., the gradient does not vanish as quickly).

.. math::
  f(x) = \frac x {1+|x|}.
  
* :doxy:`RectifierNeuron`: Also known as ReLu-Unit. 
  The neuron's activation
  is kept at 0 for negative activation levels and is linear for all
  positive values.  A network with these neurons is effectively a
  piecewise linear function.

.. math::
  f(x) = \max(0,x).
  
  * :doxy:`NormalizerNeuron`: Normalizes each output by the sum of activations. Assumes
  that activations are positive and thus can be used to tansform positive numbers (e.g. counts)
  to probabilities

.. math::
  f_i(x) = \frac {x_i}{\sum_j x_j}.
  
* :doxy:`SoftmaxNeuron`: Computes the softmax activation of the inputs. This model
  is the correct function to transform the outputs of a model trained with CrossEntropy to
  a proper class probability.
  
  .. math::
    f_i(x) = \frac {\exp(x_i=}{\sum_j exp(x_j)}.