Open the Tensorflow Playground (https://playground.tensorflow.org) and select on the left the checkerboard pattern as the data basis (see Exercise 3.3). As input features, select the two independent variables $x_1$ and $x_2$ and set the noise to $50\%$.

- Choose a deep (many layers) and wide (many nodes) network and train it for more than 1000 epochs. Comment on your observations.
- Apply L2 regularization to reduce overfitting. Try low and high regularization rates. What do you observe?
- Compare the effects of L1 and L2 regularization.

Choose a deep (many layers) and wide (many nodes) network and train it for more than 1000 epochs. Comment on your observations.

* For very low regularization rates ($\lambda \rightarrow 0$), the L2 norm penalty is only hardly considered in the training. Thus, the network is still overfitting. *

* For high regularization rates ($\lambda >> 0$), almost only the L2 norm penalty is considered in training. Thus, all adaptive parameters are pushed to zero. *

* For moderate regularization rates, no overtraining can be observed.*

Compare the effects of L1 and L2 regularization.

*As was observed in Task 1 and Task 2, L2 regularization with moderate regularization rates pushes the weights to smaller values — but not to zero.*

In contrast, L1 regularization pushes certain, unimportant weights to zero. Therefore, L1 regularization can, in principle, be used as a **feature-selection technique**</em>.