import warnings
warnings.filterwarnings('ignore')
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import watermark
%load_ext watermark
%matplotlib inline
%watermark -i -n -v -m -g -iv
Python implementation: CPython Python version : 3.10.9 IPython version : 8.10.0 Compiler : Clang 14.0.6 OS : Darwin Release : 22.5.0 Machine : x86_64 Processor : i386 CPU cores : 16 Architecture: 64bit Git hash: 03a6cf9c5faed1d6e551b54357ff497cfa569fb9 watermark : 2.4.2 matplotlib: 3.7.0 numpy : 1.23.5
plt.style.use('d4sci.mplstyle')
Let's start by setting up our training examples. We'll consider the simple NOT operator
X_NOT = np.ones((2, 2), dtype='float')
X_NOT[0, 1] = 0
Here the inputs are just 0 and 1, along with a bias column
X_NOT
array([[1., 0.], [1., 1.]])
And the outputs are just 1 and 0, respectively.
y_NOT = [1, 0]
For the 2 point binary example for binary logic, we have one extra column:
X = np.ones((4, 3), dtype='float')
X[1, 2] = 0
X[2, 1] = 0
X[3, 1] = 0
X[3, 2] = 0
The first column is just the bias, always set to 1.
X
array([[1., 1., 1.], [1., 1., 0.], [1., 0., 1.], [1., 0., 0.]])
We'll take a look at two examples, AND and OR
y_AND = [1, 0, 0, 0]
y_OR = [1, 1, 1, 0]
The prediction function is simple. Just predict 1 if the activation value is positive and 0 otherwise
def predict(weights, inputs):
return (np.dot(inputs, weights) > 0).astype('int').flatten()
The training algorithm is also simple:
def train(weights, X, y, epochs = 100):
for _ in range(epochs):
for i in range(len(y)):
inputs = X[i, :]
label = y[i]
prediction = predict(weights, inputs)
weights += (label - prediction) * inputs
In the NOT case, our perceptron is just a vector of 2 weights that we initialize to zero
weights_NOT = np.zeros(2)
Which can easily be trained
train(weights_NOT, X_NOT, y_NOT)
to find the weights
weights_NOT
array([ 1., -1.])
And we verify that it indeed does return the opposite value, as expected
np.dot(X_NOT, weights_NOT)==y_NOT
array([ True, True])
For AND and OR operators with two inputs, we must consider a third weight:
weights_AND = np.zeros(3)
weights_OR = np.zeros(3)
We can train them both quickly, just as we did before
train(weights_AND, X, y_AND)
train(weights_OR, X, y_OR)
And take a look at the resulting weights
weights_AND
array([-2., 1., 2.])
weights_OR
array([0., 1., 1.])
Let's define some helper functions. First, one to draw the decision surface
def surface(weights, n=20):
points = np.linspace(0, 1, n)
xs = []
ys = []
zs = []
for i in range(n):
x = points[i]
for j in range(n):
y = points[j]
point = [1, x, y]
xs.append(x)
ys.append(y)
zs.append(np.dot(weights, point))
return np.array(xs), np.array(ys), np.array(zs)
And a function to plot the perceptron output
def plot_output(weights, X, y, level=0, label='AND function'):
font_size = plt.rcParams['font.size']
plt.rcParams['font.size'] = 14
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
xs, ys, zs = surface(weights)
colors = np.array(['blue']*xs.shape[0])
selector = (zs>=-1) & (zs<=1)
colors[zs>0] = 'red'
ax.scatter(X[:, 1], X[:, 2], y, c='gold', marker='*', s=1000, depthshade=False)
ax.scatter(xs[selector], ys[selector], zs[selector], s=75, c=colors[selector], marker='.')
grids = np.linspace(0, 1, 6)
for i in range(6):
ax.plot([0, 1], [grids[i], grids[i]], [level, level], 'darkgray')
ax.plot([grids[i], grids[i]], [0, 1], [level, level], 'darkgray')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('output')
ax.set_title(label)
plt.rcParams['font.size'] = font_size
plot_output(weights_AND, X, y_AND, 0, 'AND function')
plot_output(weights_OR, X, y_OR, 0, 'OR function')
What if we want an XOR operator instead? We already saw that a single perceptron isn't able to learn it, but perhaps we can combine multiple operators.
From boolean logic, we know that:
y_XOR = [1, 0, 0, 1]
And that we can write: $XOR(x, y)=(x~AND~y)~OR~(NOT~x~AND~NOT~y)$
So we split out calculations into multiple parts. The original input is:
X
array([[1., 1., 1.], [1., 1., 0.], [1., 0., 1.], [1., 0., 0.]])
The first parenthesis is just:
X1 = predict(weights_AND, X)
The input for the second parenthesis is
X_NOT = X.copy()
X_NOT[:, 1] = predict(weights_NOT, X_NOT[:, [0, 1]])
X_NOT[:, 2] = predict(weights_NOT, X_NOT[:, [0, 2]])
X_NOT
array([[1., 0., 0.], [1., 0., 1.], [1., 1., 0.], [1., 1., 1.]])
And the second parenthesis is then
X2 = predict(weights_AND, X_NOT)
X2
array([0, 0, 0, 1])
Combining these two outputs into an input matrix:
X3 = X.copy()
X3[:, 1] = X1
X3[:, 2] = X2
And finally
XOR = predict(weights_OR, X3)
XOR
array([1, 0, 0, 1])
As expected!