Exercise 5.1 - Solution

Overtraining and Regularization:

Open the Tensorflow Playground (https://playground.tensorflow.org) and select on the left the checkerboard pattern as the data basis (see Exercise 3.3). As input features, select the two independent variables $x_1$ and $x_2$ and set the noise to $50\%$.

Checkerboard

Tasks

  1. Choose a deep (many layers) and wide (many nodes) network and train it for more than 1000 epochs. Comment on your observations.
  2. Apply L2 regularization to reduce overfitting. Try low and high regularization rates. What do you observe?
  3. Compare the effects of L1 and L2 regularization.

Solutions

Task 1.

Choose a deep (many layers) and wide (many nodes) network and train it for more than 1000 epochs. Comment on your observations.

Checkerboard

Overtraining is observed (the network learns statistical fluctuations).

Task 2

Apply L2 regularization to reduce overfitting. Try low and high regularization rates. What do you observe?

Low regularization rates

Checkerboard

For very low regularization rates ($\lambda \rightarrow 0$), the L2 norm penalty is only hardly considered in the training. Thus, the network is still overfitting.

High regularization rates

Checkerboard

For high regularization rates ($\lambda >> 0$), almost only the L2 norm penalty is considered in training. Thus, all adaptive parameters are pushed to zero.

Moderate regularization rates

Checkerboard

For moderate regularization rates, no overtraining can be observed.

Task 3

Compare the effects of L1 and L2 regularization.

Checkerboard

As was observed in Task 1 and Task 2, L2 regularization with moderate regularization rates pushes the weights to smaller values — but not to zero.

In contrast, L1 regularization pushes certain, unimportant weights to zero. Therefore, L1 regularization can, in principle, be used as a feature-selection technique</em>.