Sascha Spors, Professorship Signal Theory and Digital Signal Processing, Institute of Communications Engineering (INT), Faculty of Computer Science and Electrical Engineering (IEF), University of Rostock, Germany

Data Driven Audio Signal Processing - A Tutorial with Computational Examples¶

Winter Semester 2023/24 (Master Course #24512)

Feel free to contact lecturer frank.schultz@uni-rostock.de

XOR with Two-Layer Non-Linear Model¶

we use TensorFlow & Keras API
we follow the discussion in the highly recommended textbook of I. Goodfellow, Y. Bengio, A. Courville, "Deep Learning". MIT Press, 2016, ch 6.1

In [ ]:

import numpy as np
import tensorflow as tf
from tensorflow import keras

print(
    "TF version",
    tf.__version__,
)

tf.keras.backend.set_floatx("float64")  # we could use double precision

In [2]:

verbose = 1  # plot training status

In [ ]:

# data set consists of the 4 conditions for the XOR logical table
X = np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
print("X\n", X)
y = np.array([[0.0], [1.0], [1.0], [0.0]])
print("y\n", y)

In [ ]:

# a simple XOR non-linear model with two layers is known from the textbook
# I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. MIT Press, 2016, ch 6.1
# the model parameters are given in the book and it is stated that these
# belong to the global minimum for the mean squared error loss function

# layer 1 with relu activation and the weights/bias:
wl1 = np.array([[1, 1], [1, 1]])
bl1 = np.array([[0], [-1]])
# layer 2 with linear activation an dthe weights/bias:
wl2 = np.array([[1], [-2]])
bl2 = np.array([[0]])

# we could calc model predictions on the data in X
# layer 1 with two perceptrons: apply weights / bias
z1 = wl1.T @ X.T + bl1  # transpose input to be TF compatible
# layer 1 with two perceptrons: apply relu activation
z1[z1 < 0] = 0
# layer 2 with one perceptron: apply weights / bias
z2 = wl2.T @ z1 + bl2
# layer 2 with one perceptron: apply linear activation
y_pred = z2.T  # transpose output to be TF compatible
print(y_pred)
print(y == y_pred)  # check true and predicted data

Tensor Flow Model¶

The model is actually not easy to train to the global minimum, as it is unusual to train a binary classification problem with MSE loss and linear activation (which is rather typical for regression tasks).

So, we actually expect two numbers, 0 and 1, as model output. However, the linear activation yields real numbers as model output, which in the optimum case are 0 and 1, but for not optimum trained models they might be very close to 0 and 1 or even completely 'wrong'. So, the model needs to be trained exactly to the above given weights, to have the intended binary classification characteristics.

This is a nice toy example to see what model training can (not) do on a rather simple problem. We should spend some time to really understand, how the model output is calculated, i.e. how the model prediction works. If we got this, we are ready to work with larger models.

In [5]:

epochs = 2**8
batch_size = X.shape[0]

In [ ]:

optimizer = keras.optimizers.Adam()
loss = keras.losses.MeanSquaredError()
metrics = [keras.metrics.MeanSquaredError()]
model = keras.Sequential()
model.add(keras.Input(shape=(2,)))
model.add(keras.layers.Dense(2, activation="relu"))
model.add(keras.layers.Dense(1, activation="linear"))
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
tw = 0
for w in model.trainable_weights:
    print(w)
    tw += np.prod(w.shape)
print("\ntrainable_weights:", tw, "\n")

We could init the model parameters close (e.g. offset=1e-2) or even exact (offset=0) to the optimum parameters that are known above. To use this set if True:.

With a robust gradient descent method, such as Adam, training should get close to or remain at the optimum parameters.

In [ ]:

wl1 = np.array([[1, 1], [1, 1]])
bl1 = np.array([0, -1])
wl2 = np.array([[1], [-2]])
bl2 = np.array([0])
if True:
    offset = 1e-5
    model.set_weights([wl1 + offset, bl1 + offset, wl2 + offset, bl2])
model.get_weights()

Train / Fit the Model¶

In [ ]:

model.fit(X, y, epochs=epochs, batch_size=batch_size, verbose=verbose)

In [ ]:

print(model.summary())
print("model weights\n", model.get_weights())

In [ ]:

results = model.evaluate(X, y, batch_size=X.shape[0], verbose=False)
y_pred = model.predict(X)

In [ ]:

print(model.loss(y, y_pred))

In [12]:

def predict_class(y):
    y[y < 0.5], y[y >= 0.5] = 0, 1

In [ ]:

print("real numbered model ouput\n", y_pred)
predict_class(y_pred)  # real numbered ouput -> classification (0,1) output
print("classification ouput\n", y_pred)
print("check true vs. predicted:\n", y == y_pred)

Copyright¶

the notebooks are provided as Open Educational Resources
feel free to use the notebooks for your own purposes
the text is licensed under Creative Commons Attribution 4.0
the code of the IPython examples is licensed under the MIT license
please attribute the work as follows: Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.