This project attempts to tackle Hamiltonian functions using machine learning. In particular, we implement a machine learning algorithm based on the Adam algorithm and with a Stochastic Gradient Descent (SGD). The algorithm is trained and tested on a set of given example functions, and thereafter trained with trajectories for Hamiltonian systems. Using the trained weights, the model makes an attempt solving unknown trajectories.
Section 1 will cover the declarations and implementations of necessary functions for generating synthetic data.
Section 2 will implement the neutral network and training algorithm with the Adam method and SGD. The section also covers testing of the trained network of the synthetic data and a comparison to the true values, as well as training and testing the trajectory data.
Section 3 will first discuss how to calculate the gradient of a trained function, which is necessary for the implementation of numerical integrators. Thereafter, the two numerical integrators known as the symplectic Euler method and Størmer-Verlet method, are implemented for both known and unknown hamiltonians.
We start off by importing necessary libraries and data. Note that the trajectories used through the entire project, is the new trajectory data.
# Imports
import numpy as np #NumPy library for numerical computation in Python
import matplotlib.pyplot as plt # Main plotting package
# Necessary for surface plotting ______
from matplotlib import cm # Color maps
from matplotlib import colors
from mpl_toolkits.mplot3d import Axes3D
# _____________________________________
import intervals as interval # Used for defining domains of closed intervals
import sys # For printing and flushing
import time # Receive current time to calculate expected remaining time
%matplotlib notebook
import project_2_data_acquisition as dataFile # The given .ipynb for importing trajectory data, converted to .py
The following code defines the class infrastructure used in order to generate synthetic data from the given example functions. We also define a scaler class used to scale data, a helpful tool when implementing machine learning.
Furthermore, we define f1
, f2
, f3
and f4
as function objects based on $F(y)$
from the table at page 6 of the project description.
class domain:
'''
This class allows us to define multi-dimensional domains, and is used
for the example data generating functions.
The domain class is also used for the known hamiltonians in section 4.
dim : dimention, d in the table
intervals : intervals as described in table
excluded : potential points excluded from domain, e.g. (0,0) for last function from table
'''
def __init__(self, intrvals):
self.dim = len(intrvals)
self.intervals = intrvals
self.excluded = [interval.empty() for i in range(self.dim)]
def __contains__(self, item):
'''
Overloads the *in* operator, i.e. checks if item is in self
'''
if np.isscalar(item):
if self.dim != 1: return False
return item in self.intervals[0]
if len(item) != self.dim: return False
contains = True
notInExcluded = False
for i in range(self.dim):
contains *= (item[i] in self.intervals[i])
notInExcluded += (item[i] not in self.excluded[i])
contains *= (notInExcluded > 0)
return contains
def __sub__(self, dom_to_remove):
'''
Remove subdomains from a domain.
This opens for a much simpler notation when setting domains, e.g.
domain(interval.closed(-1,1) - interval.closed(-0.1, 0.1)) for [-1,1]\[-0.1,0.1]
'''
if dom_to_remove.dim != self.dim:
raise Exception("Error: Domain dimension mismatch in subtract")
new_dom = domain(self.intervals)
for i in range(self.dim):
new_dom.excluded[i] = dom_to_remove.intervals[i]
if new_dom.intervals == new_dom.excluded:
return domain([intervals.empty() for i in range(new_dom.dim)])
return new_dom
def drawRandom(self, n):
'''
Draw n random samples from within the domain, with a uniform probability distribution
'''
samples = []
while len(samples) < n:
sample = [np.random.uniform(self.intervals[k].lower, self.intervals[k].upper) for k in range(self.dim)]
if sample in self:
samples.append(sample)
return np.array(samples).T
def drawGrid(self, shape):
'''
Draw a grid of samples from the domain
shape : shape of the grid (tuple(d0))
-Note- these points might be in the excluded domain
'''
linspaces = tuple([np.linspace(self.intervals[k].lower, self.intervals[k].upper, shape[k]) for k in range(self.dim)])
return linspaces
class datGenFunctions:
'''
Class of example functions for generating data, as well as for known hamiltonians
d0 : input dimension
d : hidden layer dimension, usually 2*d0
F : function F(y), implemented inline with anonymous lambda functions
dom : domain (domain object)
'''
def __init__(self, d0, d, F, dom):
self.d0 = d0
self.d = d
self.F = F
self.dom = dom
def __call__(self, *y):
'''
Calls the function, and throws exception if y is outside of domain
returns : F(y)
'''
if np.isscalar(y) or np.isscalar(y[0]):
if not y in self.dom:
raise Exception("Error: y = "+str(y)+" outside of function domain")
else:
yy = np.array(y)
if not np.isscalar(y[0][0]):
shape = list(yy.shape)
yy = np.reshape(yy, (shape[0], np.product(shape[1:])))
for row in yy.T:
if not row in self.dom:
raise Exception("Error: y = "+str(row)+" outside of function domain")
return self.F(*y)
class scaler:
'''
This class stores the scale parameters, so that a rescaling can be performed easily
ci : data to scale
[alpha, beta] : range to scale the data into
'''
def __init__(self, ci, alpha=0.2, beta=0.8):
# Store the scale parameters
self.a = np.amin(ci)
self.b = np.amax(ci)
self.alpha = alpha
self.beta = beta
self.scalefunc = (lambda c : 1/(self.b-self.a)*((self.b-c)*self.alpha+(c-self.a)*self.beta))
'''
scalefunc scales data using min-max method as described on page 6 of the Project description.
'''
def scale(self, ci):
'''Scale data'''
return self.scalefunc(ci)
def rescale(self, ci):
'''
Rescale data, the inverse function of scale
'''
return (self.a - self.b)/(self.alpha - self.beta) * ci + (self.alpha*self.b - self.a*self.beta)/(self.alpha - self.beta)
def gradientRescale(self, ci):
'''
Rescaler needed for rescaling the gradient,
this is the same as inverse of scalefunc but without shifting of c
'''
return (self.a - self.b)/(self.alpha - self.beta) * ci
def genData(funcObj, numInput=250):
'''
Generates random data within domain from given function
'''
y = funcObj.dom.drawRandom(numInput)
c = funcObj(*tuple(y))
sc_c = scaler(c)
sc_y = scaler(y)
return sc_c, y, sc_c.scale(c)
We start by defining necessary functions as given in the project description - that is
$$
\eta(x),\quad \eta'(x),\quad \sigma(x), \quad \sigma'(x), \quad \tilde{F}(Z_k, \omega, \mu, n),\quad \frac{\partial J}{\partial \mu},\quad \frac{\partial J}{\partial \omega},
$$
An additional embed
-function is defined, in order to fit the data points from a function of dimension $d_0$ to have shape $d$.
We chose $\eta(x) = \frac{1}{2} (1 + tanh(\frac{x}{2}))$ as the hypothesis function, simply beacuse it gave us the most consistent results. Especially the unknown hamiltonian proved to be more prone to divergence when using the identity function as hypothesis function.
def eta(x):
'''
Hypothesis function
'''
return 0.5*(1+np.tanh(x/2)) # Alternative definition of eta(x)
#return x
def eta_derivative(x):
'''
Derivative of hypothesis function
'''
return (1/4)*(1-np.square(np.tanh(x/2))) #Follows from alternative definition of eta(x)
#return np.ones(x.shape)
def F_tilde(Z_k, omega, mu, n_samples):
return eta(np.transpose(Z_k)@omega+mu*np.ones((n_samples, 1)))
def P_k(omega, Z_k, mu, c, n_samples):
ypsilon = eta(Z_k.T@omega + mu*np.ones((Z_k.T@omega).shape))
return np.outer(omega, ((ypsilon-c)*eta_derivative(Z_k.T@omega + mu*np.ones((Z_k.T@omega).shape))).T)
def dJdMu(Z_k, omega, mu, c, n_samples):
z_minus_c = F_tilde(Z_k, omega, mu, n_samples)-c
eta_arg = (np.transpose(Z_k)@omega + mu*np.ones((n_samples, 1))).T
return eta_derivative(eta_arg)@z_minus_c
def dJdOmega(Z_k, omega, mu, c, n_samples):
return Z_k@((F_tilde(Z_k, omega, mu, n_samples)-c)*eta_derivative(np.transpose(Z_k)@omega + mu))
def sigma(x):
'''Sigmoid activation function'''
return np.tanh(x)
def sigmaDerivative(x):
'''Derivative of activation function'''
return 1-np.square(np.tanh(x))
def Z_k(Z_k_minus_1, h, W_k_minus_1, b_k_minus_1):
return Z_k_minus_1 + h*np.tanh(W_k_minus_1@Z_k_minus_1 + b_k_minus_1)
def P_k_minus_1(P_k, h, W_k, Z_k_minus_1, b_k):
return P_k + h*np.transpose(W_k)@(sigmaDerivative(W_k@Z_k_minus_1 + b_k)*P_k)
def embed(y, d):
'''This functions embeds the input data in R^d, by setting all extra coordinates to 0'''
y = np.array(y)
return np.concatenate((y, np.zeros((d-y.shape[0],y.shape[1]))))
The network
class is the main class for storing information about a neural network.
The class stores all learning parameters, parameters for Adam descent and defines the member functions
train
, showConvergence
, getParams
, checkTraining
, grad
, along with some helper functions
(see docstring in the code). The addition operator is also overloaded so that one can sum two networks, e.g. $T$ and $V$.
The train
-function contains an implementation of the Adam-algorithm and an option for Stochastic Gradient Descent (SGD),
i.e, an optional input parameter for some $\bar{I} \ll I$ that chooses a different subset of $(y_i)_{i=1}^{\bar{I}}$
points for each iteration, instead of all $(y_i)$.
showConvergence
draws the residue plots, i.e. $J/I$ over each time step.
getParams
returns the trained parameters and is only used for initializing the tester class,
which is explained further on.
checkTraining
simply confirms that a network has trained, and runs before testing a netowrk.
grad
is the implementation of the gradient of trained function, that is $\nabla_y F(y))$.
See Section 3 for a detailed discussion of reaching this result.
class network:
'''
Network class - main class for storing values and weights.
Init Parameters:
K : Number of layers (int)
h : Step length (float)
tau: Learning parameter (float)
d : Number of dimensions for each layer (int)
d0 : Number of dimensions in input layer (int)
iterations : number of iterations (int)
hasTrained : toggles when training is done (bool)
We also define necessary parameters for Adam descent method;
these values are universal for all network objects.
'''
def __init__(self, K, h, tau, d, d0, iterations):
self.K = K
self.h = h
self.tau = tau
self.d = d
self.d0 = d0
self.iterations = iterations
self.hasTrained = False
# Params for Adam descent
self.m = [0, 0, 0, 0]
self.v = [0, 0, 0, 0]
self.beta_1 = 0.9
self.beta_2 = 0.999
self.alpha = 0.01
self.epsilon = 1e-8
def train(self, y, c, I_bar=False):
'''
Input: y-values, c-values I_bar (optional) - last one for stochastic gradient descent
This function trains the network like described in the psudeocode of the Adam-method,
and in addition has an optional implementation of stochastic gradient descent.
Toggles the hasTrained bool/flag to True and saves trained weights when completed.
'''
if not I_bar:
I_bar = len(c)
n_samples = len(c)
b = np.random.randn(self.K, self.d, 1)
W = np.random.randn(self.K, self.d, self.d)
mu = np.random.rand()
omega = np.random.randn(self.d,1)
Z = np.zeros((self.K+1, self.d, n_samples))
Z[0] = np.array(embed(y, self.d)[:, :n_samples])
c = np.expand_dims(c[:n_samples], axis=1)
J = []
dJdW = np.zeros((self.K, self.d, self.d))
dJdb = np.zeros((self.K, self.d, 1))
J_grad = [dJdW, dJdb, 0, 0]
U = [W, b, mu, omega]
startTime = time.time() # Starting a timer for keeping track of training progress
sys.stdout.write("Training")
sys.stdout.flush()
for n in range(1, self.iterations+1):
sys.stdout.write("\rTraining: " + str(round(n/self.iterations*100, 1)) + "% Estimated time remaining: " + str(round((time.time()-startTime)*self.iterations/n - (time.time()-startTime))) + " seconds ")
sys.stdout.flush()
#Clearing the list of P's from the previous iteration
P = np.zeros((self.K, self.d, n_samples))
#Running the images through all layers
for k in range(1, self.K+1):
Z[k] = Z_k(Z[k-1], self.h, U[0][k-1], U[1][k-1])
#Finding P for the last layer
P[-1] = P_k(U[3], Z[-1], U[2], c, n_samples)
#Stochastic gradient decent - choose random indices
if I_bar:
selected = np.random.default_rng().choice(n_samples, I_bar, replace=False)
else:
selected = np.arange(n_samples)
#Computing the contributions to grad(U) by mu and w
subZet = np.array([Z[self.K][:,i] for i in selected]).T #Pot. subset Z with indices according to SGD
subCet = np.array([c[i] for i in selected]) #like subZet with c
J_grad[2] = dJdMu(subZet, U[3], U[2], subCet, I_bar)
J_grad[3] = dJdOmega(subZet, U[3], U[2], subCet, I_bar)
#Backpropagating; finding P for all layers
for k in range(self.K-1, 0, -1):
P[k-1] = P_k_minus_1(P[k], self.h, U[0][k], Z[k-1], U[1][k])
#Calculating the contributions to grad(U) by W and b
for k in range(self.K):
sigmaArg = U[0][k]@Z[k] + U[1][k]
brackets = P[k]*sigmaDerivative(sigmaArg)
J_grad[0][k] = self.h*(np.array([brackets[:, i] for i in selected]).T@np.transpose(subZet))
J_grad[1][k] = self.h*np.array([brackets[:, i] for i in selected]).T@np.ones((I_bar, 1))
#Adam descent:
for i in range(len(J_grad)):
self.m[i] = self.beta_1 * self.m[i] + (1-self.beta_1) * J_grad[i]
self.v[i] = self.beta_2 * self.v[i] + (1-self.beta_2) * np.square(J_grad[i])
m_hat = self.m[i] / (1-self.beta_1**(n))
v_hat = self.v[i] / (1-self.beta_2**(n))
U[i] = U[i] - self.alpha*(m_hat/(np.sqrt(v_hat)+self.epsilon))
#Finding approximation
Ypsilon = F_tilde(Z[self.K], U[3], U[2], n_samples)
#Computing the average error
J.append(0.5*np.linalg.norm(Ypsilon - c)**2*(1/n_samples))
#---------------------------------------------------------------------------
sys.stdout.write("\r\rTraining: Done Runtime: " + str(round((time.time() - startTime), 2)) + " seconds \n")
self.W = U[0]
self.b = U[1]
self.mu = U[2]
self.omega = U[3]
self.J = J
self.hasTrained = True
def showConvergence(self, axs, plotType):
'''
Draws the residue plot, i.e. J/I over each iteration.
plotType : {log, lin}, option to either log-plot or lin-plot (string)
'''
self.checkTraining()
axs.plot(self.J)
axs.yaxis.set_label_coords(-0.1,1.02)
axs.set_title("Residue plot")
axs.set_ylabel(r"$J/I$", rotation="horizontal", size="large")
axs.set_xlabel("Training iteration")
axs.set_yscale(plotType)
plt.show()
def getParams(self):
'''Returns trained parameters'''
self.checkTraining()
return self.W, self.b, self.mu, self.omega
def checkTraining(self):
'''Ensures that only trained networks can be tested.'''
if not self.hasTrained:
raise Exception("Error, neural network has not been trained")
def propagate(self, y):
'''
Propagate y through the network
returns : all Z and the output of the network
'''
self.checkTraining()
z = []
z.append(embed(np.array(y), self.d))
for k in range(self.K):
z.append(z[-1] + self.h*sigma(self.W[k]@(z[-1]) + self.b[k]))
f_tilde = eta((z[-1]).T@self.omega + self.mu)
return z, f_tilde
def grad(self, y):
'''
Compute the gradient of the neural network at y
y : point to compute the gradient
returns : first d0 elements of gradient at point y
'''
Z_list, f_tilde = self.propagate(y)
A = eta_derivative((np.dot(np.squeeze(self.omega), np.squeeze(Z_list[-1])) + self.mu))*self.omega
for k in range(self.K, 0, -1):
A = A + np.transpose(self.W[k-1, :])@(self.h*sigmaDerivative(self.W[k-1, :]@Z_list[k-1] + self.b[k-1]) * A)
return np.squeeze(A)[:self.d0]
def gradSum(self, other, p, q):
'''
Take the gradient of a sum of neural networks, analogous to grad(T+V)
'''
gradT = self.grad(p)
gradV = other.grad(q)
return gradT + gradV
def __add__(self, other):
'''
This allows us to effectively add to neural networks
by propagating through each of them and adding the result
Note: Only works with T + V, will not work with V + T
'''
def call(p, q):
_, T_tilde = self.propagate(p)
_, V_tilde = other.propagate(q)
return T_tilde + V_tilde
return call
The tester class takes a trained neutral network, as well as testing data
(input parameters y
and c
) and compares these.
The class has two member functions, plot
and trajectoryPlotter
, where plot
compares the test data and the network approximation of said data. Similarly, trajectoryPlotter
is adapted to plot the trajectories from the test data and the network approximation.
See docstrings and inline comments in the code below for plotting details.
class tester:
'''
This class can run various tests on the trained neural network.
y : Input data to test (numpy ndarray)
c : Analytic result of y; F(y) (numpy ndarray)
sc : The scaler used to scale c in the training phase (scaler object)
nn : The neural network to test (network object)
funcObj : The function object that the network is trained on (datGenFunction object)
plotType : {'log', 'lin'} The y-scale of the convergence plot. Default: 'log'
grid : The grid to plot 3D surfaces on, if applicable (tuple)
'''
def __init__(self, y, c, sc, nn, funcObj, plotType="log", grid = None):
if nn.d0 == 1:
self.fig, self.axs = plt.subplots(2, 1)
else:
self.fig, self.axs = plt.subplots(1)
self.axs = [self.axs]
# Plot the convergence of J from the training
nn.showConvergence(self.axs[0], plotType)
print("Last J from training: ", nn.J[-1])
# Get obtained parameters from training
W, b, mu, omega = nn.getParams()
self.y = np.squeeze(y)
if nn.d0 < 3:
# This line is only needed when we want to plot, hence disabled for d0 >= 3
y = np.reshape(np.array(np.meshgrid(*y)), (nn.d//2,len(y[-1])**(nn.d//2)))
# Propagate through network
z, f_tilde = nn.propagate(y)
# Rescale using the same scaler as the data was scaled with
f_tilde = sc.rescale(f_tilde)
# Write data to class attributes
self.c = c
self.f_tilde = f_tilde
self.d = nn.d
self.d0 = nn.d0
self.network = nn
self.grid = grid
self.funcObj = funcObj
self.sc = sc
def plot(self):
'''
Main function for preforming test plots.
for d0==1, simply plots c(y) for analytical and approximated values.
'''
if self.d0 == 1:
# Plot a comparison between the analytical and the neural network approximation
self.axs[1].plot(self.y, self.c, label="Analytical")
self.axs[1].plot(self.y, self.f_tilde, label="Neural network approx.")
self.fig.legend()
plt.show()
elif self.d0 == 2:
'''
In this case, the function plots three 3D surfaces, the first one being the anlytical graph,
the second being the neural network approximation, and the third being the difference between them
'''
# Set up a grid of y-values
if self.grid != None:
y1 , y2 = np.linspace(min(self.y[0]), max(self.y[0]), grid[0]), np.linspace(min(self.y[1]), max(self.y[1]), grid[1])
y1, y2 = np.meshgrid(y1, y2)
else:
y1, y2 = np.meshgrid(self.y[0], self.y[1])
self.c = self.funcObj(y1, y2)
self.f_tilde = np.reshape(self.f_tilde, self.grid)
# Make a figure
fig = plt.figure()
fig.suptitle("Test results")
# Containers for surfaces and axes
surfaces = []
axs = []
# ------- First plot ------
# Add the first subplot to the figure
axs.append(fig.add_subplot(3, 1, 1, projection="3d"))
# Plot the first surface
surfaces.append(axs[0].plot_surface(y1, y2, self.c, cmap=cm.Spectral,
linewidth=0, antialiased=False))
# Set title
axs[0].set_title("Analytic")
# Add a colorbar
#fig.colorbar(surf1, shrink=0.5, aspect=5)
# ------- Second plot ------
# Add the first subplot to the figure
axs.append(fig.add_subplot(3, 1, 2, projection="3d"))
# Plot a normal vector to the surface at a random point
randomIndex = np.random.randint(len(y1.flatten()))
x = y1.flatten()[randomIndex]
y = y2.flatten()[randomIndex]
z = self.f_tilde.flatten()[randomIndex]
axs[1].quiver(x,y,z,*tuple(self.sc.gradientRescale(self.network.grad([[x], [y]]))),-1,
label="Normal vector at\n" + str(np.around([x, y, z], 2)))
# Plot the second surface
surfaces.append(axs[1].plot_surface(y1, y2, self.f_tilde, cmap=cm.Spectral,
linewidth=0, antialiased=False))
fig.legend()
# Set title
axs[1].set_title("Neural network approx.")
# Find max and min value across the two first surfaces, for mapping colors
vmin = min(surface.get_array().min() for surface in surfaces)
vmax = max(surface.get_array().max() for surface in surfaces)
norm = colors.Normalize(vmin=vmin, vmax=vmax)
for sf in surfaces:
sf.set_norm(norm)
# Add colorbar, this will be shared for the first two surfaces
fig.colorbar(surfaces[0], ax=axs, orientation='vertical')
# ------- Third plot -------
# Add the third subplot
ax = fig.add_subplot(3, 1, 3, projection="3d")
# Plot the surface
surf3 = ax.plot_surface(y1, y2, np.absolute((self.f_tilde-self.c)/self.c.max()), cmap=cm.coolwarm,
linewidth=0, antialiased=False)
# Set title
ax.set_title("Relative difference")
# Add colorbar, the last surface gets its own colorbar
fig.colorbar(surf3, shrink=0.5, aspect=5)
plt.show()
# Print J for the test data
print("J for test data:", 0.5*np.linalg.norm(np.squeeze(self.f_tilde)-self.c)**2/np.prod(self.f_tilde.shape))
def trajectoryPlotter(self, axs, ylabel):
'''
Additional function for plotting trajectory data
'''
axs.plot(self.c, label="Trajectory data")
axs.plot(self.f_tilde, label="Neural network prediction")
axs.set_xlabel("Time step")
axs.set_ylabel(ylabel)
axs.legend()
plt.show()
# Print J for the test data
print("J for test data: ", 0.5*np.linalg.norm(np.squeeze(self.f_tilde)-self.c)**2/np.prod(self.f_tilde.shape))
Through systematic testing, we have found that the best results we can expect to get, is around $J = 10^-4$ for the error during training, and $J = 10^-3$ for new generated test data. Our optimization of the training parameters will therefore have as a goal to achieve these results relatively consistently without unnecessary high amount of computation time.
will be sufficient without too great error. Although the error decreases by choosing $K>30$, the impact of the additional layers will not justify the increase in computation time.
seems to be a good value with reasonable convergence speed, given $d_0(F(y))=1$. * However, where $d_0>1$, $\tau$ is set to $0.001$.
The final parameter to choose is the number of generated data points in testing phase, $I$. Using too many points will affect the computation time greatly. However, using too few points might result in overfitting. One thing to keep in mind when choosing this value, is that training the network is essentially analogous to solving a system with $K \cdot d^2 + K \cdot d + 2$ unknowns and $I$ equations. Hence, as a rule of thumb, we want to use at least $I = K \cdot d^2 + K \cdot d + 2$. We therefore landed on $I \approx 2 \cdot K \cdot d^2$.
We restate the given example functions for generating synthetic data: $$ \begin{array}{c|cccc} & d_0 & d & F(y) & \text{Domain}\\[6pt] F_1(y) & 1 & 2 & \frac12 y^2 & [-2,2] \\[6pt] F_2(y) & 1 & 2 & 1-\cos y & [-\frac{\pi}{3},\frac{\pi}{3}] \\[6pt] F_3(y) & 2 & 4 & \frac12 (y_1^2+y_2^2) & [-2,2]\times[-2,2] \\[6pt] F_4(y) & 2 & 4 & -\frac{1}{\sqrt{y_1^2+y_2^2}} & \mathbb{R}^2\backslash \{0,0\} \\ \end{array} $$ Note: for $F_4$ we set the domain to be $[-10, 10]\backslash [-\frac12,\frac12] $, as it's infeasible to simulate the entire real plane.
All example functions are implemented below as $F_i(y)=\tt{fi}$. The cells below output a log-plot of the residue during the training of the network, and a plot of analytical $F_i(y)$ compared with the neural network approximation $\tilde{F}(y; \theta)$ against new testpoints $y$.
dom1 = domain([interval.closed(-2, 2)])
f1 = datGenFunctions(1, 2, lambda y : 0.5*y**2, dom1)
dom2 = domain([interval.closed(-np.pi/3, np.pi/3)])
f2 = datGenFunctions(1, 2, lambda y : 1-np.cos(y), dom2)
dom3 = domain([interval.closed(-2, 2), interval.closed(-2, 2)])
f3 = datGenFunctions(2, 4, lambda y1,y2 : 0.5*(y1**2+y2**2), dom3)
xy_plane = domain([interval.closed(-10, 10), interval.closed(-10, 10)]) #[-10, 10]x[-10, 10] set as the entire xy plane
zero = domain([interval.closed(-0.5,0.5), interval.closed(-0.5,0.5)])
dom4 = xy_plane - zero
f4 = datGenFunctions(2, 4, lambda y1,y2 : -1*1/np.sqrt(y1**2+y2**2), dom4)
#__________ f1(y) __________
sc1, y1, c1 = genData(f1, 200)
nn1 = network(K = 30,
tau = 0.002,
h = 0.1,
d = 2,
d0 = 1,
iterations = 600
)
nn1.train(y1, c1, 20)
y1 = np.linspace(-2, 2, 100)
c1 = f1(y1)
test1 = tester([y1], c1, sc1, nn1, f1)
test1.plot()
Training: Done Runtime: 4.41 seconds
Last J from training: 3.952400791439993e-05 J for test data: 0.0006689622141397997
Remarks to $F_1(y)$: $J/I$ decreases most dramatically around the first $200$ iterations, and continues decreasing close to linearly for increasing iterations. Both the analytical and approximated values share the same shape and closely resemble the same values. However, for the boundry points, $y\approx \pm1.8$, the differences are more evident.
A final remark is that $J$ from the training is of order $10^{-5}$, whereas $J$ from testing is of order $10^{-4}$. Upon review, a differnce of factor $\approx 10$ as is the case here, does not necessarily consittute overfitting.
#__________ f2(y) __________
sc2, y2, c2 = genData(f2, 200)
nn2 = network(K = 30,
tau = 0.002,
h = 0.1,
d = 2,
d0 = 1,
iterations = 600
)
nn2.train(y2, c2, 20)
y2 = np.linspace(-np.pi/3, np.pi/3, 100)
c2 = f2(y2)
test2 = tester([y2], c2, sc2, nn2, f2)
test2.plot()
Training: Done Runtime: 4.54 seconds