The goal of this case study is to perform derivative pricing from a machine learning standpoint and use supervised regression-based model to learn the Black-Scholes option pricing model from simulated data.
In the supervised regression framework used for this case study, the derivative pricing problem is defined in the regression framework, where the predicted variable is the pricing of the option, and the predictor variables are the market data that are used as inputs to the Black-Scholes option pricing model
Options have been used in finance as means to hedge risk in a nonlinear manner. They are are also used by speculators in order to take leveraged bets in the financial markets. Historically, people have used the Black Scholes formula.
With
and
Where we have; Stock price $S$; Strike price $K$; Risk-free rate $r$; Annual dividend yield $q$; Time to maturity $\tau = T-t$ (represented as a unit-less fraction of one year); Volatility $\sigma$
In order to make the logic simpler, we define Moneyness as $M = K/S$ and look at the prices in terms of per unit of current stock price. We also set $q$ as $0$
This simplifes the formula down to the following
In the options market, there isn't a single value of volatility which gives us the correct price. We often find the volatility such that the output matches the price
In this exercise, we assume the the sturcture of the vol surface. In practice, we would source the data from a data vendor.
We use the following function to generate the option volatility surface
# Distribution functions
from scipy.stats import norm
# Load libraries
import numpy as np
import pandas as pd
import pandas_datareader.data as web
from matplotlib import pyplot
from pandas.plotting import scatter_matrix
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.neural_network import MLPRegressor
#Libraries for Deep Learning Models
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.layers import LSTM
from keras.wrappers.scikit_learn import KerasRegressor
#Libraries for Statistical Models
import statsmodels.api as sm
#Libraries for Saving the Model
from pickle import dump
from pickle import load
# Time series Models
from statsmodels.tsa.arima_model import ARIMA
#from statsmodels.tsa.statespace.sarimax import SARIMAX
# Error Metrics
from sklearn.metrics import mean_squared_error
# Feature Selection
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2, f_regression
#Plotting
from pandas.plotting import scatter_matrix
from statsmodels.graphics.tsaplots import plot_acf
#Diable the warnings
import warnings
warnings.filterwarnings('ignore')
true_alpha = 0.1
true_beta = 0.1
true_sigma0 = 0.2
risk_free_rate = 0.05
def option_vol_from_surface(moneyness, time_to_maturity):
return true_sigma0 + true_alpha * time_to_maturity + true_beta * np.square(moneyness - 1)
def call_option_price(moneyness, time_to_maturity, option_vol):
d1=(np.log(1/moneyness)+(risk_free_rate+np.square(option_vol))*time_to_maturity)/(option_vol*np.sqrt(time_to_maturity))
d2=(np.log(1/moneyness)+(risk_free_rate-np.square(option_vol))*time_to_maturity)/(option_vol*np.sqrt(time_to_maturity))
N_d1 = norm.cdf(d1)
N_d2 = norm.cdf(d2)
return N_d1 - moneyness * np.exp(-risk_free_rate*time_to_maturity) * N_d2
N = 10000
Ks = 1+0.25*np.random.randn(N)
Ts = np.random.random(N)
Sigmas = np.array([option_vol_from_surface(k,t) for k,t in zip(Ks,Ts)])
Ps = np.array([call_option_price(k,t,sig) for k,t,sig in zip(Ks,Ts,Sigmas)])
Y = Ps
X = np.concatenate([Ks.reshape(-1,1), Ts.reshape(-1,1), Sigmas.reshape(-1,1)], axis=1)
dataset = pd.DataFrame(np.concatenate([Y.reshape(-1,1), X], axis=1),
columns=['Price', 'Moneyness', 'Time', 'Vol'])
dataset.head()
Price | Moneyness | Time | Vol | |
---|---|---|---|---|
0 | 0.211 | 0.934 | 0.660 | 0.266 |
1 | 0.038 | 1.011 | 0.065 | 0.207 |
2 | 0.163 | 1.187 | 0.883 | 0.292 |
3 | 0.091 | 1.245 | 0.648 | 0.271 |
4 | 0.193 | 0.813 | 0.125 | 0.216 |
pd.set_option('precision', 3)
dataset.describe()
Price | Moneyness | Time | Vol | |
---|---|---|---|---|
count | 10000.000 | 10000.000 | 1.000e+04 | 10000.000 |
mean | 0.177 | 1.005 | 5.048e-01 | 0.257 |
std | 0.133 | 0.249 | 2.875e-01 | 0.030 |
min | 0.000 | 0.123 | 2.516e-04 | 0.200 |
25% | 0.073 | 0.836 | 2.581e-01 | 0.232 |
50% | 0.159 | 1.007 | 5.038e-01 | 0.257 |
75% | 0.250 | 1.177 | 7.521e-01 | 0.281 |
max | 0.882 | 1.977 | 1.000e+00 | 0.358 |
dataset.hist(bins=50, sharex=False, sharey=False, xlabelsize=1, ylabelsize=1, figsize=(12,12))
pyplot.show()
We can see that the price has an interesting distribution with a spike at $0$
dataset.plot(kind='density', subplots=True, layout=(4,4), sharex=True, legend=True, fontsize=1, figsize=(15,15))
pyplot.show()
Next we look at the interaction between different variables
correlation = dataset.corr()
pyplot.figure(figsize=(10,10))
pyplot.title('Correlation Matrix')
sns.heatmap(correlation, vmax=1, square=True,annot=True,cmap='cubehelix')
<matplotlib.axes._subplots.AxesSubplot at 0x1e535c3cc88>
pyplot.figure(figsize=(15,15))
scatter_matrix(dataset,figsize=(12,12))
pyplot.show()
<Figure size 1080x1080 with 0 Axes>
We see some very interesting non linear analysis. This means that we expect our non linear models to do a better job than our linear models.
We use SelectKBest function from sklearn
bestfeatures = SelectKBest(k='all', score_func=f_regression)
fit = bestfeatures.fit(X,Y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(['Moneyness', 'Time', 'Vol'])
#concat two dataframes for better visualization
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score'] #naming the dataframe columns
featureScores.nlargest(10,'Score').set_index('Specs') #print 10 best features
Score | |
---|---|
Specs | |
Moneyness | 30032.266 |
Vol | 2156.177 |
Time | 1412.241 |
We observe that the moneyness is the most important variable for the price.
validation_size = 0.2
train_size = int(len(X) * (1-validation_size))
X_train, X_test = X[0:train_size], X[train_size:len(X)]
Y_train, Y_test = Y[0:train_size], Y[train_size:len(X)]
We use the prebuilt scikit models to run a K fold analysis on our training data. We then train the model on the full training data and use it for prediction of the test data. The parameters for the K fold analysis are defined as -
num_folds = 10
seed = 7
# scikit is moving away from mean_squared_error.
# In order to avoid confusion, and to allow comparison with other models, we invert the final scores
scoring = 'neg_mean_squared_error'
models = []
models.append(('LR', LinearRegression()))
models.append(('LASSO', Lasso()))
models.append(('EN', ElasticNet()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('SVR', SVR()))
models.append(('MLP', MLPRegressor()))
# Boosting methods
models.append(('ABR', AdaBoostRegressor()))
models.append(('GBR', GradientBoostingRegressor()))
# Bagging methods
models.append(('RFR', RandomForestRegressor()))
models.append(('ETR', ExtraTreesRegressor()))
names = []
kfold_results = []
test_results = []
train_results = []
for name, model in models:
names.append(name)
## K Fold analysis:
kfold = KFold(n_splits=num_folds, random_state=seed)
#converted mean square error to positive. The lower the beter
cv_results = -1* cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
kfold_results.append(cv_results)
# Full Training period
res = model.fit(X_train, Y_train)
train_result = mean_squared_error(res.predict(X_train), Y_train)
train_results.append(train_result)
# Test results
test_result = mean_squared_error(res.predict(X_test), Y_test)
test_results.append(test_result)
msg = "%s: %f (%f) %f %f" % (name, cv_results.mean(), cv_results.std(), train_result, test_result)
print(msg)
LR: 0.000348 (0.000025) 0.000348 0.000397 LASSO: 0.017766 (0.000996) 0.017763 0.017695 EN: 0.017766 (0.000996) 0.017763 0.017695 KNN: 0.000015 (0.000005) 0.000007 0.000017 CART: 0.000010 (0.000002) 0.000000 0.000009 SVR: 0.004016 (0.000076) 0.004032 0.004056 MLP: 0.000069 (0.000065) 0.000018 0.000020 ABR: 0.000615 (0.000024) 0.000577 0.000582 GBR: 0.000019 (0.000001) 0.000015 0.000019 RFR: 0.000002 (0.000001) 0.000000 0.000002 ETR: 0.000000 (0.000000) 0.000000 0.000001
We being by looking at the Kfold analysis
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison: Kfold results')
ax = fig.add_subplot(111)
pyplot.boxplot(kfold_results)
ax.set_xticklabels(names)
fig.set_size_inches(15,8)
pyplot.show()
In order to get a better view, we remove the LASSO and Elastic Net
# compare algorithms
fig = pyplot.figure()
ind = np.arange(len(names)-2) # the x locations for the groups
width = 0.35 # the width of the bars
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
pyplot.bar(ind - width/2, train_results[0:1] + train_results[3:], width=width, label='Train Error')
pyplot.bar(ind + width/2, test_results[0:1] + test_results[3:], width=width, label='Test Error')
fig.set_size_inches(15,8)
pyplot.legend()
ax.set_xticks(ind)
ax.set_xticklabels(names[0:1] + names[3:])
pyplot.show()
We see that the multilayer perceptron (MLP) algorithm does a lot better that the linear algorithm. However, the CART and the Forest methods do a very good job as well. Given MLP is one of the best models we perform the grid search for MLP model in the next step.
As shown in the chart above the MLP model is one of the best model, so we perform the model tuning. We perform a grid search with different combination of hidden layers in the MLP model.
'''
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
The ith element represents the number of neurons in the ith
hidden layer.
'''
param_grid={'hidden_layer_sizes': [(20,), (50,), (20,20), (20, 30, 20)]}
model = MLPRegressor()
kfold = KFold(n_splits=num_folds, random_state=seed)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train, Y_train)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
Best: -0.000024 using {'hidden_layer_sizes': (20, 30, 20)} -0.000640 (0.000621) with: {'hidden_layer_sizes': (20,)} -0.000169 (0.000155) with: {'hidden_layer_sizes': (50,)} -0.000085 (0.000077) with: {'hidden_layer_sizes': (20, 20)} -0.000024 (0.000012) with: {'hidden_layer_sizes': (20, 30, 20)}
The best model is the model with 3 layers with 20, 30 and 20 nodes in each layer respectively.
# prepare model
model_tuned = MLPRegressor(hidden_layer_sizes=(20, 30, 20))
model_tuned.fit(X_train, Y_train)
MLPRegressor(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08, hidden_layer_sizes=(20, 30, 20), learning_rate='constant', learning_rate_init=0.001, max_fun=15000, max_iter=200, momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5, random_state=None, shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False, warm_start=False)
# estimate accuracy on validation set
# transform the validation dataset
predictions = model_tuned.predict(X_test)
print(mean_squared_error(Y_test, predictions))
3.08127276609567e-05
We see that the mean error (RMSE) is 3.08e-5 , which is less than a cent. Hence, the deep learning model does an excellent job of fitting the Black-Scholes option pricing model. The accuracy may be enhanced with more tuning.
Next, we make the process harder by trying to predict the price without the volatility data.
X = X[:, :2]
validation_size = 0.2
train_size = int(len(X) * (1-validation_size))
X_train, X_test = X[0:train_size], X[train_size:len(X)]
Y_train, Y_test = Y[0:train_size], Y[train_size:len(X)]
num_folds = 10
seed = 7
# scikit is moving away from mean_squared_error.
# In order to avoid confusion, and to allow comparison with other models, we invert the final scores
scoring = 'neg_mean_squared_error'
models = []
models.append(('LR', LinearRegression()))
#models.append(('LASSO', Lasso()))
#models.append(('EN', ElasticNet()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('SVR', SVR()))
models.append(('MLP', MLPRegressor()))
# Boosting methods
models.append(('ABR', AdaBoostRegressor()))
models.append(('GBR', GradientBoostingRegressor()))
# Bagging methods
models.append(('RFR', RandomForestRegressor()))
models.append(('ETR', ExtraTreesRegressor()))
names = []
kfold_results = []
test_results = []
train_results = []
for name, model in models:
names.append(name)
## K Fold analysis:
kfold = KFold(n_splits=num_folds, random_state=seed)
#converted mean square error to positive. The lower the beter
cv_results = -1* cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
kfold_results.append(cv_results)
# Full Training period
res = model.fit(X_train, Y_train)
train_result = mean_squared_error(res.predict(X_train), Y_train)
train_results.append(train_result)
# Test results
test_result = mean_squared_error(res.predict(X_test), Y_test)
test_results.append(test_result)
msg = "%s: %f (%f) %f %f" % (name, cv_results.mean(), cv_results.std(), train_result, test_result)
print(msg)
LR: 0.002036 (0.000153) 0.002035 0.002126 KNN: 0.000015 (0.000005) 0.000007 0.000017 CART: 0.000011 (0.000004) 0.000000 0.000009 SVR: 0.005857 (0.000104) 0.005832 0.005886 MLP: 0.000069 (0.000109) 0.000008 0.000009 ABR: 0.000625 (0.000029) 0.000596 0.000610 GBR: 0.000020 (0.000001) 0.000015 0.000020 RFR: 0.000002 (0.000000) 0.000000 0.000002 ETR: 0.000000 (0.000000) 0.000000 0.000001
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison: Kfold results')
ax = fig.add_subplot(111)
pyplot.boxplot(kfold_results)
ax.set_xticklabels(names)
fig.set_size_inches(15,8)
pyplot.show()
# compare algorithms
fig = pyplot.figure()
ind = np.arange(len(names)) # the x locations for the groups
width = 0.35 # the width of the bars
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
pyplot.bar(ind - width/2, train_results, width=width, label='Train Error')
pyplot.bar(ind + width/2, test_results, width=width, label='Test Error')
fig.set_size_inches(15,8)
pyplot.legend()
ax.set_xticks(ind)
ax.set_xticklabels(names)
pyplot.show()
We can see that the linear regression now does a worse job than before, this is expected since we have added a greater amount of non linearity.
have a very good performance overall.
option pricing formula for a call option to a high degree of accuracy which means that we can leverage the efficient numerical calculation of machine learning in the derivative pricing without relying on the impractical assumptions made in the traditional derivative pricing models.