Derivatives Pricing¶

The goal of this case study is to perform derivative pricing from a machine learning standpoint and use supervised regression-based model to learn the Black-Scholes option pricing model from simulated data.

Content¶

1. Problem Definition
2. Getting Started - Load Libraries and Dataset
3. Exploratory Data Analysis
- 3.1 Descriptive Statistics
- 3.2. Data Visualisation
4. Data Preparation and analysis
- 4.1.Feature Selection
5.Evaluate Algorithms and Models
- 5.1. Train/Test Split and evaluation metrics
- 5.2. Compare Models and Algorithms
6. Model Tuning and finalizing the model
7. Additional analysis: removing the volatilty data

1. Problem Definition¶

In the supervised regression framework used for this case study, the derivative pricing problem is defined in the regression framework, where the predicted variable is the pricing of the option, and the predictor variables are the market data that are used as inputs to the Black-Scholes option pricing model

Options have been used in finance as means to hedge risk in a nonlinear manner. They are are also used by speculators in order to take leveraged bets in the financial markets. Historically, people have used the Black Scholes formula.

$$ Se^{-q \tau}\Phi(d_1) - e^{-r \tau} K\Phi(d_2) \, $$

With

$$ d_1 = \frac{\ln(S/K) + (r - q + \sigma^2/2)\tau}{\sigma\sqrt{\tau}} $$

and

$$ d_2 = \frac{\ln(S/K) + (r - q - \sigma^2/2)\tau}{\sigma\sqrt{\tau}} = d_1 - \sigma\sqrt{\tau} $$

Where we have; Stock price $S$; Strike price $K$; Risk-free rate $r$; Annual dividend yield $q$; Time to maturity $\tau = T-t$ (represented as a unit-less fraction of one year); Volatility $\sigma$

In order to make the logic simpler, we define Moneyness as $M = K/S$ and look at the prices in terms of per unit of current stock price. We also set $q$ as $0$

This simplifes the formula down to the following

$$ e^{-q \tau}\Phi\left( \frac{- \ln(M) + (r+ \sigma^2/2 )\tau}{\sigma\sqrt{\tau}}\right) - e^{-r \tau} M\Phi\left( \frac{- \ln(M) + (r - \sigma^2/2)\tau}{\sigma\sqrt{\tau}} \right) \, $$

Vol Suface¶

In the options market, there isn't a single value of volatility which gives us the correct price. We often find the volatility such that the output matches the price

Simulation¶

In this exercise, we assume the the sturcture of the vol surface. In practice, we would source the data from a data vendor.

We use the following function to generate the option volatility surface

$$ \sigma(M, \tau) = \sigma_0 + \alpha\tau + \beta (M - 1)^2$$

2. Getting Started- Loading the data and python packages¶

2.1. Loading the python packages¶

Python Imports¶

In [21]:

# Distribution functions
from scipy.stats import norm


# Load libraries
import numpy as np
import pandas as pd
import pandas_datareader.data as web
from matplotlib import pyplot
from pandas.plotting import scatter_matrix
import seaborn as sns
from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.neural_network import MLPRegressor

#Libraries for Deep Learning Models
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.layers import LSTM
from keras.wrappers.scikit_learn import KerasRegressor

#Libraries for Statistical Models
import statsmodels.api as sm

#Libraries for Saving the Model
from pickle import dump
from pickle import load

# Time series Models
from statsmodels.tsa.arima_model import ARIMA
#from statsmodels.tsa.statespace.sarimax import SARIMAX

# Error Metrics
from sklearn.metrics import mean_squared_error

# Feature Selection
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2, f_regression


#Plotting 
from pandas.plotting import scatter_matrix
from statsmodels.graphics.tsaplots import plot_acf

In [22]:

#Diable the warnings
import warnings
warnings.filterwarnings('ignore')

2.2. Defining functions and parameters¶

True Parameters¶

In [23]:

true_alpha = 0.1
true_beta = 0.1
true_sigma0 = 0.2

In [24]:

risk_free_rate = 0.05

Vol and Option Pricing Functions¶

In [25]:

def option_vol_from_surface(moneyness, time_to_maturity):
    return true_sigma0 + true_alpha * time_to_maturity + true_beta * np.square(moneyness - 1)

def call_option_price(moneyness, time_to_maturity, option_vol):
    d1=(np.log(1/moneyness)+(risk_free_rate+np.square(option_vol))*time_to_maturity)/(option_vol*np.sqrt(time_to_maturity))
    d2=(np.log(1/moneyness)+(risk_free_rate-np.square(option_vol))*time_to_maturity)/(option_vol*np.sqrt(time_to_maturity))
    N_d1 = norm.cdf(d1)
    N_d2 = norm.cdf(d2)
    
    return N_d1 - moneyness * np.exp(-risk_free_rate*time_to_maturity) * N_d2

2.3. Data Generation¶

In [26]:

N = 10000

Ks = 1+0.25*np.random.randn(N)
Ts = np.random.random(N)
Sigmas = np.array([option_vol_from_surface(k,t) for k,t in zip(Ks,Ts)])
Ps = np.array([call_option_price(k,t,sig) for k,t,sig in zip(Ks,Ts,Sigmas)])

Set the Endog and Exog Data¶

In [27]:

Y = Ps

X = np.concatenate([Ks.reshape(-1,1), Ts.reshape(-1,1), Sigmas.reshape(-1,1)], axis=1)

dataset = pd.DataFrame(np.concatenate([Y.reshape(-1,1), X], axis=1),
                       columns=['Price', 'Moneyness', 'Time', 'Vol'])

3. Exploratory Data Analysis¶

3.1. Descriptive Statistics¶

In [28]:

dataset.head()

Out[28]:

	Price	Moneyness	Time	Vol
0	0.211	0.934	0.660	0.266
1	0.038	1.011	0.065	0.207
2	0.163	1.187	0.883	0.292
3	0.091	1.245	0.648	0.271
4	0.193	0.813	0.125	0.216

In [29]:

pd.set_option('precision', 3)
dataset.describe()

Out[29]:

	Price	Moneyness	Time	Vol
count	10000.000	10000.000	1.000e+04	10000.000
mean	0.177	1.005	5.048e-01	0.257
std	0.133	0.249	2.875e-01	0.030
min	0.000	0.123	2.516e-04	0.200
25%	0.073	0.836	2.581e-01	0.232
50%	0.159	1.007	5.038e-01	0.257
75%	0.250	1.177	7.521e-01	0.281
max	0.882	1.977	1.000e+00	0.358

3.2. Data Visualization¶

In [30]:

dataset.hist(bins=50, sharex=False, sharey=False, xlabelsize=1, ylabelsize=1, figsize=(12,12))
pyplot.show()

We can see that the price has an interesting distribution with a spike at $0$

In [31]:

dataset.plot(kind='density', subplots=True, layout=(4,4), sharex=True, legend=True, fontsize=1, figsize=(15,15))
pyplot.show()

Next we look at the interaction between different variables

In [32]:

correlation = dataset.corr()
pyplot.figure(figsize=(10,10))
pyplot.title('Correlation Matrix')
sns.heatmap(correlation, vmax=1, square=True,annot=True,cmap='cubehelix')

Out[32]:

<matplotlib.axes._subplots.AxesSubplot at 0x1e535c3cc88>

In [33]:

pyplot.figure(figsize=(15,15))
scatter_matrix(dataset,figsize=(12,12))
pyplot.show()

<Figure size 1080x1080 with 0 Axes>

We see some very interesting non linear analysis. This means that we expect our non linear models to do a better job than our linear models.

4. Data Preparation and Analysis¶

4.1. Univariate Feature Selection¶

We use SelectKBest function from sklearn

In [34]:

bestfeatures = SelectKBest(k='all', score_func=f_regression)
fit = bestfeatures.fit(X,Y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(['Moneyness', 'Time', 'Vol'])
#concat two dataframes for better visualization 
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']  #naming the dataframe columns
featureScores.nlargest(10,'Score').set_index('Specs')  #print 10 best features

Out[34]:

	Score
Specs
Moneyness	30032.266
Vol	2156.177
Time	1412.241

We observe that the moneyness is the most important variable for the price.

5. Evaluate Algorithms and Models¶

5.1. Train Test Split and Evaluation Metrics¶

In [35]:

validation_size = 0.2

train_size = int(len(X) * (1-validation_size))
X_train, X_test = X[0:train_size], X[train_size:len(X)]
Y_train, Y_test = Y[0:train_size], Y[train_size:len(X)]

We use the prebuilt scikit models to run a K fold analysis on our training data. We then train the model on the full training data and use it for prediction of the test data. The parameters for the K fold analysis are defined as -

In [36]:

num_folds = 10
seed = 7
# scikit is moving away from mean_squared_error. 
# In order to avoid confusion, and to allow comparison with other models, we invert the final scores
scoring = 'neg_mean_squared_error' 

5.2. Compare Models and Algorithms¶

Linear Models and Regression Trees¶

In [37]:

models = []
models.append(('LR', LinearRegression()))
models.append(('LASSO', Lasso()))
models.append(('EN', ElasticNet()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('SVR', SVR()))

Neural Network Predictor¶

In [38]:

models.append(('MLP', MLPRegressor()))

Boosting and Bagging Methods¶

In [39]:

# Boosting methods
models.append(('ABR', AdaBoostRegressor()))
models.append(('GBR', GradientBoostingRegressor()))
# Bagging methods
models.append(('RFR', RandomForestRegressor()))
models.append(('ETR', ExtraTreesRegressor()))

In [40]:

names = []
kfold_results = []
test_results = []
train_results = []
for name, model in models:
    names.append(name)
    
    ## K Fold analysis:
    
    kfold = KFold(n_splits=num_folds, random_state=seed)
    #converted mean square error to positive. The lower the beter
    cv_results = -1* cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
    kfold_results.append(cv_results)
    

    # Full Training period
    res = model.fit(X_train, Y_train)
    train_result = mean_squared_error(res.predict(X_train), Y_train)
    train_results.append(train_result)
    
    # Test results
    test_result = mean_squared_error(res.predict(X_test), Y_test)
    test_results.append(test_result)
    
    msg = "%s: %f (%f) %f %f" % (name, cv_results.mean(), cv_results.std(), train_result, test_result)
    print(msg)

LR: 0.000348 (0.000025) 0.000348 0.000397
LASSO: 0.017766 (0.000996) 0.017763 0.017695
EN: 0.017766 (0.000996) 0.017763 0.017695
KNN: 0.000015 (0.000005) 0.000007 0.000017
CART: 0.000010 (0.000002) 0.000000 0.000009
SVR: 0.004016 (0.000076) 0.004032 0.004056
MLP: 0.000069 (0.000065) 0.000018 0.000020
ABR: 0.000615 (0.000024) 0.000577 0.000582
GBR: 0.000019 (0.000001) 0.000015 0.000019
RFR: 0.000002 (0.000001) 0.000000 0.000002
ETR: 0.000000 (0.000000) 0.000000 0.000001

We being by looking at the Kfold analysis

In [41]:

fig = pyplot.figure()
fig.suptitle('Algorithm Comparison: Kfold results')
ax = fig.add_subplot(111)
pyplot.boxplot(kfold_results)
ax.set_xticklabels(names)
fig.set_size_inches(15,8)
pyplot.show()

In order to get a better view, we remove the LASSO and Elastic Net

In [42]:

# compare algorithms
fig = pyplot.figure()

ind = np.arange(len(names)-2)  # the x locations for the groups
width = 0.35  # the width of the bars

fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
pyplot.bar(ind - width/2, train_results[0:1] + train_results[3:],  width=width, label='Train Error')
pyplot.bar(ind + width/2, test_results[0:1] + test_results[3:], width=width, label='Test Error')
fig.set_size_inches(15,8)
pyplot.legend()
ax.set_xticks(ind)
ax.set_xticklabels(names[0:1] + names[3:])
pyplot.show()

We see that the multilayer perceptron (MLP) algorithm does a lot better that the linear algorithm. However, the CART and the Forest methods do a very good job as well. Given MLP is one of the best models we perform the grid search for MLP model in the next step.

6. Model Tuning and finalising the model¶

As shown in the chart above the MLP model is one of the best model, so we perform the model tuning. We perform a grid search with different combination of hidden layers in the MLP model.

In [43]:

'''
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
    The ith element represents the number of neurons in the ith
    hidden layer.
'''
param_grid={'hidden_layer_sizes': [(20,), (50,), (20,20), (20, 30, 20)]}
model = MLPRegressor()
kfold = KFold(n_splits=num_folds, random_state=seed)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train, Y_train)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: -0.000024 using {'hidden_layer_sizes': (20, 30, 20)}
-0.000640 (0.000621) with: {'hidden_layer_sizes': (20,)}
-0.000169 (0.000155) with: {'hidden_layer_sizes': (50,)}
-0.000085 (0.000077) with: {'hidden_layer_sizes': (20, 20)}
-0.000024 (0.000012) with: {'hidden_layer_sizes': (20, 30, 20)}

The best model is the model with 3 layers with 20, 30 and 20 nodes in each layer respectively.

In [44]:

# prepare model
model_tuned = MLPRegressor(hidden_layer_sizes=(20, 30, 20))
model_tuned.fit(X_train, Y_train)

Out[44]:

MLPRegressor(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
             beta_2=0.999, early_stopping=False, epsilon=1e-08,
             hidden_layer_sizes=(20, 30, 20), learning_rate='constant',
             learning_rate_init=0.001, max_fun=15000, max_iter=200,
             momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
             power_t=0.5, random_state=None, shuffle=True, solver='adam',
             tol=0.0001, validation_fraction=0.1, verbose=False,
             warm_start=False)

In [45]:

# estimate accuracy on validation set
# transform the validation dataset
predictions = model_tuned.predict(X_test)
print(mean_squared_error(Y_test, predictions))

3.08127276609567e-05

We see that the mean error (RMSE) is 3.08e-5 , which is less than a cent. Hence, the deep learning model does an excellent job of fitting the Black-Scholes option pricing model. The accuracy may be enhanced with more tuning.

7. Additonal analysis: removing the volatilty data¶

Next, we make the process harder by trying to predict the price without the volatility data.

In [46]:

X = X[:, :2]

In [47]:

validation_size = 0.2

train_size = int(len(X) * (1-validation_size))
X_train, X_test = X[0:train_size], X[train_size:len(X)]
Y_train, Y_test = Y[0:train_size], Y[train_size:len(X)]

In [48]:

num_folds = 10
seed = 7
# scikit is moving away from mean_squared_error. 
# In order to avoid confusion, and to allow comparison with other models, we invert the final scores
scoring = 'neg_mean_squared_error' 

In [49]:

models = []
models.append(('LR', LinearRegression()))
#models.append(('LASSO', Lasso()))
#models.append(('EN', ElasticNet()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('SVR', SVR()))

In [50]:

models.append(('MLP', MLPRegressor()))

In [51]:

# Boosting methods
models.append(('ABR', AdaBoostRegressor()))
models.append(('GBR', GradientBoostingRegressor()))
# Bagging methods
models.append(('RFR', RandomForestRegressor()))
models.append(('ETR', ExtraTreesRegressor()))

In [52]:

names = []
kfold_results = []
test_results = []
train_results = []
for name, model in models:
    names.append(name)
    
    ## K Fold analysis:
    
    kfold = KFold(n_splits=num_folds, random_state=seed)
    #converted mean square error to positive. The lower the beter
    cv_results = -1* cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
    kfold_results.append(cv_results)
    

    # Full Training period
    res = model.fit(X_train, Y_train)
    train_result = mean_squared_error(res.predict(X_train), Y_train)
    train_results.append(train_result)
    
    # Test results
    test_result = mean_squared_error(res.predict(X_test), Y_test)
    test_results.append(test_result)
    
    msg = "%s: %f (%f) %f %f" % (name, cv_results.mean(), cv_results.std(), train_result, test_result)
    print(msg)

LR: 0.002036 (0.000153) 0.002035 0.002126
KNN: 0.000015 (0.000005) 0.000007 0.000017
CART: 0.000011 (0.000004) 0.000000 0.000009
SVR: 0.005857 (0.000104) 0.005832 0.005886
MLP: 0.000069 (0.000109) 0.000008 0.000009
ABR: 0.000625 (0.000029) 0.000596 0.000610
GBR: 0.000020 (0.000001) 0.000015 0.000020
RFR: 0.000002 (0.000000) 0.000000 0.000002
ETR: 0.000000 (0.000000) 0.000000 0.000001

In [53]:

fig = pyplot.figure()
fig.suptitle('Algorithm Comparison: Kfold results')
ax = fig.add_subplot(111)
pyplot.boxplot(kfold_results)
ax.set_xticklabels(names)
fig.set_size_inches(15,8)
pyplot.show()

In [54]:

# compare algorithms
fig = pyplot.figure()

ind = np.arange(len(names))  # the x locations for the groups
width = 0.35  # the width of the bars

fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
pyplot.bar(ind - width/2, train_results,  width=width, label='Train Error')
pyplot.bar(ind + width/2, test_results, width=width, label='Test Error')
fig.set_size_inches(15,8)
pyplot.legend()
ax.set_xticks(ind)
ax.set_xticklabels(names)
pyplot.show()

We can see that the linear regression now does a worse job than before, this is expected since we have added a greater amount of non linearity.

Summary¶

The linear regression model did not do as well as our non-linear models and the non-linear models

have a very good performance overall.

Artificial neural network (ANN) can reproduce the Black and Scholes

option pricing formula for a call option to a high degree of accuracy which means that we can leverage the efficient numerical calculation of machine learning in the derivative pricing without relying on the impractical assumptions made in the traditional derivative pricing models.