Sascha Spors, Professorship Signal Theory and Digital Signal Processing, Institute of Communications Engineering (INT), Faculty of Computer Science and Electrical Engineering (IEF), University of Rostock, Germany
Winter Semester 2023/24 (Master Course #24512)
Feel free to contact lecturer frank.schultz@uni-rostock.de
import numpy as np
import tensorflow as tf
from tensorflow import keras
print(
"TF version",
tf.__version__,
)
tf.keras.backend.set_floatx("float64") # we could use double precision
verbose = 1 # plot training status
# data set consists of the 4 conditions for the XOR logical table
X = np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
print("X\n", X)
y = np.array([[0.0], [1.0], [1.0], [0.0]])
print("y\n", y)
# a simple XOR non-linear model with two layers is known from the textbook
# I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. MIT Press, 2016, ch 6.1
# the model parameters are given in the book and it is stated that these
# belong to the global minimum for the mean squared error loss function
# layer 1 with relu activation and the weights/bias:
wl1 = np.array([[1, 1], [1, 1]])
bl1 = np.array([[0], [-1]])
# layer 2 with linear activation an dthe weights/bias:
wl2 = np.array([[1], [-2]])
bl2 = np.array([[0]])
# we could calc model predictions on the data in X
# layer 1 with two perceptrons: apply weights / bias
z1 = wl1.T @ X.T + bl1 # transpose input to be TF compatible
# layer 1 with two perceptrons: apply relu activation
z1[z1 < 0] = 0
# layer 2 with one perceptron: apply weights / bias
z2 = wl2.T @ z1 + bl2
# layer 2 with one perceptron: apply linear activation
y_pred = z2.T # transpose output to be TF compatible
print(y_pred)
print(y == y_pred) # check true and predicted data
The model is actually not easy to train to the global minimum, as it is unusual to train a binary classification problem with MSE loss and linear activation (which is rather typical for regression tasks).
So, we actually expect two numbers, 0 and 1, as model output. However, the linear activation yields real numbers as model output, which in the optimum case are 0 and 1, but for not optimum trained models they might be very close to 0 and 1 or even completely 'wrong'. So, the model needs to be trained exactly to the above given weights, to have the intended binary classification characteristics.
This is a nice toy example to see what model training can (not) do on a rather simple problem. We should spend some time to really understand, how the model output is calculated, i.e. how the model prediction works. If we got this, we are ready to work with larger models.
epochs = 2**8
batch_size = X.shape[0]
optimizer = keras.optimizers.Adam()
loss = keras.losses.MeanSquaredError()
metrics = [keras.metrics.MeanSquaredError()]
model = keras.Sequential()
model.add(keras.Input(shape=(2,)))
model.add(keras.layers.Dense(2, activation="relu"))
model.add(keras.layers.Dense(1, activation="linear"))
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
tw = 0
for w in model.trainable_weights:
print(w)
tw += np.prod(w.shape)
print("\ntrainable_weights:", tw, "\n")
We could init the model parameters close (e.g. offset=1e-2
) or even exact (offset=0
) to the optimum parameters that are known above. To use this set if True:
.
With a robust gradient descent method, such as Adam, training should get close to or remain at the optimum parameters.
wl1 = np.array([[1, 1], [1, 1]])
bl1 = np.array([0, -1])
wl2 = np.array([[1], [-2]])
bl2 = np.array([0])
if True:
offset = 1e-5
model.set_weights([wl1 + offset, bl1 + offset, wl2 + offset, bl2])
model.get_weights()
model.fit(X, y, epochs=epochs, batch_size=batch_size, verbose=verbose)
print(model.summary())
print("model weights\n", model.get_weights())
results = model.evaluate(X, y, batch_size=X.shape[0], verbose=False)
y_pred = model.predict(X)
print(model.loss(y, y_pred))
def predict_class(y):
y[y < 0.5], y[y >= 0.5] = 0, 1
print("real numbered model ouput\n", y_pred)
predict_class(y_pred) # real numbered ouput -> classification (0,1) output
print("classification ouput\n", y_pred)
print("check true vs. predicted:\n", y == y_pred)