In this project, we look at long term exchange rates and predict them using the Purchasing Power Parity theory (PPP), the Real Exchange Rate theory (RER), a linear regression, a lasso regression and two neural networks. To compare the different models, we use their predictions to decide which currencies to buy and then compare the average rate of returns. We find that the PPP model performs the best and does give positive returns in some cases, the machine learning models do not perform well and all give negative returns.
Importing all the necessary packages.
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn import linear_model
from sklearn.model_selection import cross_val_score
from sklearn import tree
from sklearn.ensemble import RandomForestRegressor
from sklearn import neural_network
from sklearn import preprocessing, pipeline
from sklearn import metrics
import matplotlib.pyplot as plt
import altair as alt
First we load and clean the data.
def data_comp(variables):
'''
cleaning OECD data pass in the name of csv files you are using
'''
for n, var in enumerate(variables):
#read in file OECD data
df = pd.read_csv(f"{var}.csv")
#drop redundent columns and transform it into Year, Country, Variable
df.columns = df.iloc[2]
df.drop([0,1,2], inplace=True)
df.drop(df.columns[1:44], axis=1, inplace=True)
df.set_index('Country Name', inplace = True, drop = True)
df = df.T
df.index = [round(idx) for idx in df.index]
df.index = pd.to_datetime(df.index, format='%Y')
df = df.reset_index()
df = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:], var_name='Country Name', value_name='Value')
df.set_index(['index', 'Country Name'], inplace=True)
df.index.names = ["Year", "Country Name"]
df.rename(columns={'Value': var}, inplace=True)
#making the dataframe and combining over all variables
if n == 0:
full_df = df
else:
full_df = pd.concat([full_df, df], axis =1)
return(full_df)
df_ML = data_comp(['GDP', 'Govt_debt_gdp', 'GDP growth rate','GNI per capita',
'Inflation', 'Lending rate', 'Net exports', 'Total_reserve', 'Forex rate'])
#writting to a csv
#full_df.to_csv('EX_rate_table.csv', index=True)
Here we construct the PPP and RER models. Based on the idea that in the long run we will converge back to an average real exchange rate value. While it doesnt always hold mean reversion seems to be the most consistent (and easiest!) way to predict exchange rates.
df_PPP_model= data_comp(['RER', 'Forex rate', 'PPP'])
df_PPP = pd.read_csv(f"PPP_US.csv")
df_PPP_model = df_PPP_model[df_PPP_model.notna().all(axis=1)]
print(df_PPP_model)
#making the contructed RER which I am defining from 456 notes as RER = (forgien exchnage rate * P_in country)/ P_in US
#here P_in country/P in US == PPP
df_PPP_model['con_RER'] = (df_PPP_model['Forex rate'] * df_PPP_model['PPP'])
#Now we have to see how this constructed RER compares with the reported RER and NomER (Forex rate)
#theretically in the long run these would be zero since the NOM*PPP == RER_normed
#Then I divided by 100 to get the percetage if it is negative that implies currency will appriciate
#if is positive it will depreiciate
#Clearly a lot of issues and gross assumptions
# 1. this only holds for inflation bearing countries which might be why issues arise in non advanced countries
# 2. not clear if mean reverting in long long
# 3. not clear if assumptions on PPP hold in long run
# 4. clealy issues in this not taking into consideration risks i.e. a forgein exchange might appear undervalued but that is because its risky
# 5. very US centric by norming to the US each period as the US curency itself might have issues
df =df_PPP_model.reset_index()
#df_PPP_model.to_csv('testing1.csv', index=True)
RER Forex rate PPP Year Country Name 2000-01-01 Armenia 74.010935 539.525833 0.234525 2001-01-01 Armenia 73.034791 555.078258 0.231919 2002-01-01 Armenia 73.665189 573.353333 0.226287 2003-01-01 Armenia 67.937520 578.762954 0.229938 2004-01-01 Armenia 71.634581 533.450833 0.258205 ... ... ... ... 2017-01-01 Zambia 94.221086 9.517500 0.440513 2018-01-01 Zambia 88.528850 10.458333 0.420550 2019-01-01 Zambia 79.795053 12.890000 0.360806 2020-01-01 Zambia 66.438740 18.344093 0.284940 2021-01-01 Zambia 69.764269 20.018487 0.319845 [1766 rows x 3 columns]
df= pd.read_csv(f"PPP_US.csv")
country_mapping = {
'ARM': 'Armenia', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'BDI': 'Burundi', 'BGR': 'Bulgaria',
'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BLZ': 'Belize', 'BOL': 'Bolivia', 'BRA': 'Brazil',
'CAF': 'Central African Republic', 'CAN': 'Canada', 'CHE': 'Switzerland', 'CHL': 'Chile', 'CHN': 'China',
'CIV': "Cote d'Ivoire", 'CMR': 'Cameroon', 'COD': 'Congo, Dem. Rep.', 'COL': 'Colombia', 'CRI': 'Costa Rica',
'CYP': 'Cyprus', 'CZE': 'Czechia', 'DMA': 'Dominica', 'DNK': 'Denmark', 'DOM': 'Dominican Republic',
'DZA': 'Algeria', 'FJI': 'Fiji', 'GAB': 'Gabon', 'GBR': 'United Kingdom', 'GEO': 'Georgia', 'GHA': 'Ghana',
'GMB': 'Gambia, The', 'GNQ': 'Equatorial Guinea', 'GRC': 'Greece', 'GRD': 'Grenada', 'GUY': 'Guyana',
'HKG': 'Hong Kong SAR, China', 'HRV': 'Croatia', 'HUN': 'Hungary', 'IRN': 'Iran, Islamic Rep.', 'ISL': 'Iceland',
'ISR': 'Israel', 'JPN': 'Japan', 'KNA': 'St. Kitts and Nevis', 'KOR': 'Korea, Rep.', 'LCA': 'St. Lucia',
'LSO': 'Lesotho', 'LVA': 'Latvia', 'MAR': 'Morocco', 'MDA': 'Moldova', 'MEX': 'Mexico', 'MKD': 'North Macedonia',
'MLT': 'Malta', 'MWI': 'Malawi', 'MYS': 'Malaysia', 'NGA': 'Nigeria', 'NIC': 'Nicaragua', 'NOR': 'Norway',
'NZL': 'New Zealand', 'PAK': 'Pakistan', 'PHL': 'Philippines', 'PNG': 'Papua New Guinea', 'POL': 'Poland',
'PRY': 'Paraguay', 'ROU': 'Romania', 'RUS': 'Russian Federation', 'SAU': 'Saudi Arabia', 'SGP': 'Singapore',
'SLB': 'Solomon Islands', 'SLE': 'Sierra Leone', 'SVK': 'Slovak Republic', 'SWE': 'Sweden', 'TGO': 'Togo',
'TTO': 'Trinidad and Tobago', 'TUN': 'Tunisia', 'UGA': 'Uganda', 'UKR': 'Ukraine', 'URY': 'Uruguay', 'USA': 'United States',
'VCT': 'St. Vincent and the Grenadines', 'VEN': 'Venezuela, RB', 'WSM': 'Samoa', 'ZAF': 'South Africa',
'ZMB': 'Zambia'}
def apply_country_mapping(location):
if location in country_mapping:
return country_mapping[location]
else:
return None
df['LOCATION'] = df['LOCATION'].apply(apply_country_mapping)
# Extract only the necessary columns
df = df[['LOCATION', 'TIME', 'Value']]
df['TIME'] = pd.to_datetime(df['TIME'], format='%Y')
df = df.rename(columns={'Value': 'PPP_for_ER'})
df = df.dropna()
df.set_index(['TIME', 'LOCATION'], inplace=True)
df.index.set_names(['Year', 'Country Name'], inplace=True)
df
PPP_for_ER | ||
---|---|---|
Year | Country Name | |
1960-01-01 | Australia | 0.700946 |
1961-01-01 | Australia | 0.701324 |
1962-01-01 | Australia | 0.687990 |
1963-01-01 | Australia | 0.698321 |
1964-01-01 | Australia | 0.708346 |
... | ... | ... |
2017-01-01 | Cameroon | 232.801392 |
2018-01-01 | Cameroon | 231.682690 |
2019-01-01 | Cameroon | 230.273904 |
2020-01-01 | Cameroon | 229.137013 |
2021-01-01 | Cameroon | 227.378726 |
1765 rows × 1 columns
df
df_merged = df.merge(df_PPP_model, left_index=True, right_index=True, how='inner')
#theretically in the long run these would be zero since the NOM*PPP == RER_normed
#so now we will calculate the difference between the constructed Real exchange rate where RER = PPP*NER
#versus the calculated PPP
#if the PPP*nom is higher than the PPP based on the exchange rate the currency would be overvalued
#id PPP *nom is lower than PPP the currency would be undervalued
#a lot of issues because I dont know the methodology of how the real excahnge rate of PPP was calculated for the country which I grabbed
#from OCED but still shows what we want to buy or sell depending on the year
df_merged['Final_diff'] = df_merged['con_RER'] - df_merged['PPP_for_ER']
#filtering out unstable ones
df_merged = df_merged[df_merged['Final_diff']>-0.1]
df_merged = df_merged[df_merged['Final_diff']<0.1]
print(df_merged['Final_diff'].dtype)
df_merged.sort_index(inplace=True)
# Select all rows with years before 2015 and including 2015
df_year_before_2015 = df_merged.loc[(slice(None, '2015-01-01'), slice(None)), :]
df_merged['diff_in_ave_RER'] = df_merged['RER'].div(df_year_before_2015.groupby('Country Name').mean()['RER'])
df_years={}
for yr in range(2000, 2022):
name = f'df_{yr}'
dt_year= pd.to_datetime(f'{yr}-01-01')
df_year = df_merged.loc[(dt_year, slice(None))]
df_years[yr] = df_year
#ones below 1 short if the 'diff in ave RER' agrees
float64
Below we calculate the returns from the RER model.
# Define the holdings() function
def holdings(row):
if row['diff_in_ave_RER'] > 1:
loan_amount = 1000 * row['Forex rate']
row['usd_amount'] = loan_amount/row['Forex rate']
row['home_currency_amount'] = -loan_amount
else:
loan_amount = 1000
row['usd_amount'] = -loan_amount
row['home_currency_amount'] = loan_amount * row['Forex rate']
return row
yr_past = 2015
yr_future = 2019
# Apply the function to the 'Amount' column in the DataFrame
#df_mean_RER_investment = df_years[yr_past].apply(holdings, axis=1)
#df_mean_RER_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']
##df_mean_RER_investment['new_usd_amount'] = (1/df_mean_RER_investment[f'{yr_future}_ER'])*df_mean_RER_investment['home_currency_amount']
#df_mean_RER_investment['return_USD'] = df_mean_RER_investment['new_usd_amount'] + df_mean_RER_investment['usd_amount']
returns = {}
for yr_past in range(2015,2018):
list = []
for yr in range(1,5):
yr_future = yr_past+yr
df_mean_RER_investment = df_years[yr_past].apply(holdings, axis=1)
df_mean_RER_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']
df_mean_RER_investment['new_usd_amount'] = (1/df_mean_RER_investment[f'{yr_future}_ER'])*df_mean_RER_investment['home_currency_amount']
df_mean_RER_investment['return_USD'] = df_mean_RER_investment['new_usd_amount'] + df_mean_RER_investment['usd_amount']
total_investment = df_mean_RER_investment['return_USD'].count() *1000
gains = df_mean_RER_investment['return_USD'].sum()
return_to = (((gains+total_investment)/total_investment-1)*100).round(3)
list.append(return_to)
returns[yr_past] = list
for i in range(0,4):
elements = [v[i] for v in returns.values()]
average = sum(elements) / len(elements)
print(f"Average Returns for {i+1} year holding period:", average)
Average Returns for 1 year holding period: 0.01100000000000001 Average Returns for 2 year holding period: 0.17499999999999996 Average Returns for 3 year holding period: -0.7763333333333334 Average Returns for 4 year holding period: -2.1503333333333337
As we can see above the average returns for the RER model are either close to 0 or negative. This indicates this rule isnt a good investment strategy.
Now we calculate the return for the PPP model.
# Define the holdings() function
def holdings(row):
if row['Final_diff'] > 0:
loan_amount = 1000 * row['Forex rate']
row['usd_amount'] = loan_amount/row['Forex rate']
row['home_currency_amount'] = -loan_amount
else:
loan_amount = 1000
row['usd_amount'] = -loan_amount
row['home_currency_amount'] = loan_amount * row['Forex rate']
return row
returns = {}
for yr_past in range(2009,2018):
list = []
for yr in range(1,5):
yr_future = yr_past+yr
df_mean_PPP_diff_investment = df_years[yr_past].apply(holdings, axis=1)
df_mean_PPP_diff_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']
df_mean_PPP_diff_investment['new_usd_amount'] = (1/df_mean_PPP_diff_investment[f'{yr_future}_ER'])*df_mean_PPP_diff_investment['home_currency_amount']
df_mean_PPP_diff_investment['return_USD'] = df_mean_PPP_diff_investment['new_usd_amount'] + df_mean_PPP_diff_investment['usd_amount']
total_investment = df_mean_PPP_diff_investment['return_USD'].count() *1000
gains = df_mean_PPP_diff_investment['return_USD'].sum()
return_to = (((gains+total_investment)/total_investment-1)*100).round(3)
list.append(return_to)
returns[yr_past] = list
print(returns)
for i in range(0,4):
elements = [v[i] for v in returns.values()]
average = sum(elements) / len(elements)
print(f"Average Returns for {i+1} year holding period:", average)
{2009: [2.385, 5.431, 3.62, 2.171], 2010: [1.885, 1.378, 1.67, 2.385], 2011: [2.043, 2.201, 3.096, 7.39], 2012: [0.541, 1.369, 4.757, 4.861], 2013: [-0.107, -0.924, -0.556, -0.7], 2014: [-3.203, -2.152, -2.732, -3.082], 2015: [0.017, -0.53, 0.256, 1.198], 2016: [-1.394, -1.955, -0.692, 0.744], 2017: [-0.227, -0.12, 1.197, 1.476]} Average Returns for 1 year holding period: 0.21555555555555553 Average Returns for 2 year holding period: 0.5219999999999999 Average Returns for 3 year holding period: 1.1795555555555555 Average Returns for 4 year holding period: 1.8269999999999997
As we can see above the returns for the PPP model are better and are all positive. So if we systematically invested based on this PPP rule we could get a positive return (although given the risk a four year return of 1.8% is pretty terrible).
forex_data = pd.read_csv("EX_rate_table.csv")
forex_data.head()
Year | Country Name | GDP | Govt_debt_gdp | GDP growth rate | GNI per capita | Inflation | Lending rate | Net exports | Total_reserve | Forex rate | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2000-01-01 | Aruba | 1.873184e+09 | NaN | 7.616589 | 20390.0 | 4.044021 | 12.066667 | 3.097765e+08 | 235455000.0 | 1.79 |
1 | 2001-01-01 | Aruba | 1.896648e+09 | NaN | 4.195963 | 20510.0 | 2.883604 | 12.600000 | 4.304581e+08 | 321360614.5 | 1.79 |
2 | 2002-01-01 | Aruba | 1.962011e+09 | NaN | -0.957771 | 19290.0 | 3.315247 | 11.300000 | -1.377374e+08 | 374002933.0 | 1.79 |
3 | 2003-01-01 | Aruba | 2.044134e+09 | NaN | 1.120879 | 21120.0 | 3.656365 | 10.700000 | -2.437989e+07 | 336945700.0 | 1.79 |
4 | 2004-01-01 | Aruba | 2.254749e+09 | NaN | 7.281026 | 24010.0 | 2.529129 | 9.600000 | 4.535251e+08 | 339224000.0 | 1.79 |
forex_data.count()
Year 5852 Country Name 5852 GDP 5609 Govt_debt_gdp 1378 GDP growth rate 5529 GNI per capita 5285 Inflation 4992 Lending rate 2806 Net exports 3884 Total_reserve 3801 Forex rate 3954 dtype: int64
As we can see above, we lose about a two-thirds of our dataset if we include government debt to gdp ratio. This is why we'll drop this variable.
forex_data_2 = forex_data.drop(["Govt_debt_gdp"], axis = 1)
forex_data_2.dropna(inplace = True)
Since in the dataset the Foreign Exchange Rate is in Local Currency Units (LCU) per US$, we take the reciprocal to convert it to US$ per LCU.
It is natural for a country with a large GDP to have a high absolute value of net exports or total reserves, therefore we take these variables as percentages of GDP in order to control for the different sizes.
We divide GDP by a million so that scaling is less of an issue in our model.
#Converted Net exports and total reserves as percentages of gdp to normalise across countries.
forex_data_2["Forex rate"] = 1/forex_data_2["Forex rate"]
forex_data_2["Net Exports to GDP"] = forex_data_2["Net exports"]/forex_data_2["GDP"] * 100
forex_data_2["Total Reserve to GDP"] = forex_data_2["Total_reserve"]/forex_data_2["GDP"] * 100
forex_data_2["GDP per mil"] = forex_data_2["GDP"]/1_000_000
forex_data_2 = forex_data_2.drop(["Net exports", "Total_reserve", "GDP"], axis = 1)
forex_data_2.head()
Year | Country Name | GDP growth rate | GNI per capita | Inflation | Lending rate | Forex rate | Net Exports to GDP | Total Reserve to GDP | GDP per mil | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2000-01-01 | Aruba | 7.616589 | 20390.0 | 4.044021 | 12.066667 | 0.558659 | 16.537429 | 12.569772 | 1873.184358 |
1 | 2001-01-01 | Aruba | 4.195963 | 20510.0 | 2.883604 | 12.600000 | 0.558659 | 22.695729 | 16.943608 | 1896.648045 |
2 | 2002-01-01 | Aruba | -0.957771 | 19290.0 | 3.315247 | 11.300000 | 0.558659 | -7.020216 | 19.062222 | 1962.011173 |
3 | 2003-01-01 | Aruba | 1.120879 | 21120.0 | 3.656365 | 10.700000 | 0.558659 | -1.192676 | 16.483542 | 2044.134078 |
4 | 2004-01-01 | Aruba | 7.281026 | 24010.0 | 2.529129 | 9.600000 | 0.558659 | 20.114222 | 15.044870 | 2254.748603 |
forex_data_2['Year'].unique()
array(['2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01', '2004-01-01', '2005-01-01', '2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01', '2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01'], dtype=object)
forex_data_2[forex_data_2['Country Name'] == "Vietnam"].head()
Year | Country Name | GDP growth rate | GNI per capita | Inflation | Lending rate | Forex rate | Net Exports to GDP | Total Reserve to GDP | GDP per mil | |
---|---|---|---|---|---|---|---|---|---|---|
5654 | 2000-01-01 | Vietnam | 6.787316 | 380.0 | -1.710337 | 10.550000 | 0.000071 | -0.561392 | 10.960010 | 31172.518403 |
5655 | 2001-01-01 | Vietnam | 6.192893 | 400.0 | -0.431545 | 9.420000 | 0.000068 | -0.278413 | 11.242292 | 32685.198735 |
5656 | 2002-01-01 | Vietnam | 6.320821 | 420.0 | 3.830828 | 9.061667 | 0.000065 | -5.144862 | 11.752907 | 35064.105501 |
5657 | 2003-01-01 | Vietnam | 6.899063 | 470.0 | 3.234648 | 9.480000 | 0.000064 | -8.492507 | 15.736500 | 39552.513316 |
5658 | 2004-01-01 | Vietnam | 7.536411 | 540.0 | 7.754947 | 9.722500 | 0.000064 | -6.953883 | 15.500315 | 45427.854693 |
Below we can the plots for our explanatory variables.
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='Total Reserve to GDP:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Total Reserve to GDP:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Total Reserve to GDP:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="Total Reserve to GDP for All Countries Over Time"
)
layers
/Users/uddhav/opt/anaconda3/envs/Py_2/lib/python3.8/site-packages/altair/utils/core.py:283: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for col_name, dtype in df.dtypes.iteritems():
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='Net Exports to GDP:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Net Exports to GDP:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Net Exports to GDP:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="Net Exports to GDP for All Countries Over Time"
)
layers
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='Lending rate:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Lending rate:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Lending rate:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="Lending rate for All Countries Over Time"
)
layers
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='Inflation:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Inflation:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Inflation:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="Inflation for All Countries Over Time"
)
layers
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='GNI per capita:Q',
color='Country Name:N',
# tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP growth rate:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'GNI per capita:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="GNI per capita for All Countries Over Time"
)
layers
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='GDP growth rate:Q',
color='Country Name:N',
# tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP growth rate:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'GDP growth rate:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="GDP growth rate for All Countries Over Time"
)
layers
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='GDP per mil:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP per mil:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'GDP per mil:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="GDP per mil for All Countries Over Time"
)
layers
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
alt.X('Year:T', title='Year'),
y='Forex rate:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="Forex Rates for All Countries Over Time"
)
layers
# select_countries = ['Canada', 'Switzerland', 'China', 'United Kingdom', 'India', 'Japan', 'China', 'Mexico', 'Singapore', 'United States']
select_countries = ['Canada', 'Switzerland', 'United Kingdom', 'China', 'Singapore', 'United States']
select_forex_data_2 = forex_data_2[forex_data_2['Country Name'].isin(select_countries)]
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover',
fields=['Year'])
# # The basic line
line = alt.Chart(select_forex_data_2).mark_line().encode(
alt.X('Year:T', title = 'year'),
y='Forex rate:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor
selectors = alt.Chart(select_forex_data_2).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))
)
# Draw a rule at the location of the selection
rules = alt.Chart(select_forex_data_2).mark_rule(color='gray').encode(
x='Year:T',
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the five layers into a chart and bind the data
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title = "Forex Rates for Select Countries Over Time"
)
layers
We use 2015 as the cut-off for training and testing dataset.
#2015 as the cut-off year for testing.
forex_data_2_test = forex_data_2[forex_data_2['Year'].isin(['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01'])]
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])
# The basic line
line = alt.Chart(forex_data_2_test).mark_line().encode(
alt.X('Year:T', title='Year'),
y='Forex rate:Q',
color='Country Name:N',
tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]
)
# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2_test).mark_point().encode(
x='Year:T',
opacity=alt.value(0)
)
selectors_near = selectors.encode(
opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)
# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))
)
# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2_test).mark_rule(color='gray').encode(
x='Year:T'
).transform_filter(
nearest
)
# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
x='Year:T'
)
# Put the six layers into a chart and bind the data.
layers = alt.layer(
line, selectors, points, text, rules, vline
).properties(
width=1000, height=800,
title="Forex Rates for All Countries TESTING DATA"
)
layers
forex_data_3 = forex_data_2.drop(forex_data_2_test.index)
forex_data_3.head()
Year | Country Name | GDP growth rate | GNI per capita | Inflation | Lending rate | Forex rate | Net Exports to GDP | Total Reserve to GDP | GDP per mil | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2000-01-01 | Aruba | 7.616589 | 20390.0 | 4.044021 | 12.066667 | 0.558659 | 16.537429 | 12.569772 | 1873.184358 |
1 | 2001-01-01 | Aruba | 4.195963 | 20510.0 | 2.883604 | 12.600000 | 0.558659 | 22.695729 | 16.943608 | 1896.648045 |
2 | 2002-01-01 | Aruba | -0.957771 | 19290.0 | 3.315247 | 11.300000 | 0.558659 | -7.020216 | 19.062222 | 1962.011173 |
3 | 2003-01-01 | Aruba | 1.120879 | 21120.0 | 3.656365 | 10.700000 | 0.558659 | -1.192676 | 16.483542 | 2044.134078 |
4 | 2004-01-01 | Aruba | 7.281026 | 24010.0 | 2.529129 | 9.600000 | 0.558659 | 20.114222 | 15.044870 | 2254.748603 |
forex_data_3_train_y = forex_data_3[["Forex rate"]]
forex_data_3_train_X = forex_data_3.drop(["Year", "Country Name","Forex rate", "GDP per mil"], axis = 1)
forex_data_2_test.head()
Year | Country Name | GDP growth rate | GNI per capita | Inflation | Lending rate | Forex rate | Net Exports to GDP | Total Reserve to GDP | GDP per mil | |
---|---|---|---|---|---|---|---|---|---|---|
15 | 2015-01-01 | Aruba | 3.543630 | 26180.0 | 0.474764 | 7.0 | 0.558659 | 10.277624 | 27.531731 | 2963.128492 |
16 | 2016-01-01 | Aruba | 2.116716 | 26650.0 | -0.931196 | 6.9 | 0.558659 | 11.178386 | 30.918669 | 2983.798883 |
17 | 2017-01-01 | Aruba | 5.482371 | 27720.0 | -1.028282 | 6.3 | 0.558659 | 8.646561 | 29.335673 | 3092.178771 |
18 | 2018-01-01 | Aruba | 5.257856 | 29030.0 | 3.626041 | 6.0 | 0.558659 | 8.975137 | 30.611363 | 3202.234637 |
19 | 2019-01-01 | Aruba | 0.635029 | 30000.0 | 4.257462 | 5.7 | 0.558659 | 9.547850 | 29.157277 | 3368.970253 |
forex_data_3_test_y = forex_data_2_test[["Forex rate"]]
forex_data_3_test_X = forex_data_2_test.drop(["Year","Country Name","Forex rate", "GDP per mil"], axis = 1)
Now we fit a linear regression.
#Linear regression
lr_mod = linear_model.LinearRegression()
lr_mod.fit(forex_data_3_train_X, forex_data_3_train_y)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
mse_lr_train = metrics.mean_squared_error(forex_data_3_train_y, lr_mod.predict(forex_data_3_train_X))
mse_lr_test = metrics.mean_squared_error(forex_data_3_test_y, lr_mod.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_lr_train}")
print(f"The mean squred error in the testing dataset is {mse_lr_test}")
The mean squred error in the training dataset is 0.47435291908321897 The mean squred error in the testing dataset is 0.3015103066706014
The mean squred errors for our linear regression are given above. As we can see the mean squared error is higher for our training dataset than our testing dataset.
Now we fit our lasso regression, we use a 5 fold cross validation in order to choose the optimal $\alpha$ for our lasso regression.
#k=5 cv lasso
lasso_mod = linear_model.LassoCV(cv=5)
lasso_mod.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())
LassoCV(cv=5)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LassoCV(cv=5)
mse_lasso_train = metrics.mean_squared_error(forex_data_3_train_y, lasso_mod.predict(forex_data_3_train_X))
mse_lasso_test = metrics.mean_squared_error(forex_data_3_test_y, lasso_mod.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_lasso_train}")
print(f"The mean squred error in the testing dataset is {mse_lasso_test}")
The mean squred error in the training dataset is 0.49614895650008806 The mean squred error in the testing dataset is 0.23705686900933912
As we can see above the mean squared error for our training dataset is higher than that of our testing dataset.
Now we fit our first neural network with 3 hidden layers.
#The first neural network, 3 hidden layers.
nn_mod_1 = neural_network.MLPRegressor((50, 35, 20), max_iter = 10000)
nn_mod_1.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())
MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000)
mse_nn_1_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_1.predict(forex_data_3_train_X))
mse_nn_1_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_1.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_nn_1_train}")
print(f"The mean squred error in the testing dataset is {mse_nn_1_test}")
The mean squred error in the training dataset is 0.6253439030142882 The mean squred error in the testing dataset is 0.5258991107828873
As we can see above the mean squared error is roughly the same for both our training and testing datasets, however this model does suffer from input scaling. This is why below we run a scaled version of our neural network.
#First neural netwrok but scaled. Sacled is better.
nn_mod_1_scaled = pipeline.make_pipeline(
preprocessing.StandardScaler(),
neural_network.MLPRegressor((50, 35, 20), max_iter = 10000))
nn_mod_1_scaled.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())
Pipeline(steps=[('standardscaler', StandardScaler()), ('mlpregressor', MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('standardscaler', StandardScaler()), ('mlpregressor', MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000))])
StandardScaler()
MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000)
mse_nn_1_scaled_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_1_scaled.predict(forex_data_3_train_X))
mse_nn_1_scaled_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_1_scaled.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_nn_1_scaled_train}")
print(f"The mean squred error in the testing dataset is {mse_nn_1_scaled_test}")
The mean squred error in the training dataset is 0.054302709057648856 The mean squred error in the testing dataset is 0.40455307013410235
As we can see above the mean squared error for our training data is much lower than that of our testing data. This could mean that our model is overfitting.
Now we run our second neural network with 4 hidden layers.
#Second neural network but scaled.
nn_mod_2_scaled = pipeline.make_pipeline(
preprocessing.StandardScaler(),
neural_network.MLPRegressor((50, 35, 20, 10), max_iter = 10000))
nn_mod_2_scaled.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())
Pipeline(steps=[('standardscaler', StandardScaler()), ('mlpregressor', MLPRegressor(hidden_layer_sizes=(50, 35, 20, 10), max_iter=10000))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('standardscaler', StandardScaler()), ('mlpregressor', MLPRegressor(hidden_layer_sizes=(50, 35, 20, 10), max_iter=10000))])
StandardScaler()
MLPRegressor(hidden_layer_sizes=(50, 35, 20, 10), max_iter=10000)
mse_nn_2_scaled_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_2_scaled.predict(forex_data_3_train_X))
mse_nn_2_scaled_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_2_scaled.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_nn_2_scaled_train}")
print(f"The mean squred error in the testing dataset is {mse_nn_2_scaled_test}")
The mean squred error in the training dataset is 0.0456499285270549 The mean squred error in the testing dataset is 0.3405035580847174
As we can see above the mean squared error in our training data is lower than that of our testing data.
Now we can extract the predicted values for all our models and add it to our testing dataset.
#Extracting predicted values
lr_predict = lr_mod.predict(forex_data_3_test_X)
lasso_predict = lasso_mod.predict(forex_data_3_test_X)
nn_1_scaled_predict = nn_mod_1_scaled.predict(forex_data_3_test_X)
nn_2_scaled_predict = nn_mod_2_scaled.predict(forex_data_3_test_X)
forex_data_2_test.head()
Year | Country Name | GDP growth rate | GNI per capita | Inflation | Lending rate | Forex rate | Net Exports to GDP | Total Reserve to GDP | GDP per mil | |
---|---|---|---|---|---|---|---|---|---|---|
15 | 2015-01-01 | Aruba | 3.543630 | 26180.0 | 0.474764 | 7.0 | 0.558659 | 10.277624 | 27.531731 | 2963.128492 |
16 | 2016-01-01 | Aruba | 2.116716 | 26650.0 | -0.931196 | 6.9 | 0.558659 | 11.178386 | 30.918669 | 2983.798883 |
17 | 2017-01-01 | Aruba | 5.482371 | 27720.0 | -1.028282 | 6.3 | 0.558659 | 8.646561 | 29.335673 | 3092.178771 |
18 | 2018-01-01 | Aruba | 5.257856 | 29030.0 | 3.626041 | 6.0 | 0.558659 | 8.975137 | 30.611363 | 3202.234637 |
19 | 2019-01-01 | Aruba | 0.635029 | 30000.0 | 4.257462 | 5.7 | 0.558659 | 9.547850 | 29.157277 | 3368.970253 |
#Putting the predicted vals in dataset.
df = forex_data_2_test.assign(Linear_Predict = lr_predict)
df = df.assign(Lasso_Predict = lasso_predict)
df = df.assign(NN_1_Predict = nn_1_scaled_predict)
df = df.assign(NN_2_Predict = nn_2_scaled_predict)
df.head()
Year | Country Name | GDP growth rate | GNI per capita | Inflation | Lending rate | Forex rate | Net Exports to GDP | Total Reserve to GDP | GDP per mil | Linear_Predict | Lasso_Predict | NN_1_Predict | NN_2_Predict | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 2015-01-01 | Aruba | 3.543630 | 26180.0 | 0.474764 | 7.0 | 0.558659 | 10.277624 | 27.531731 | 2963.128492 | 0.613864 | 0.618369 | 1.298603 | 0.235513 |
16 | 2016-01-01 | Aruba | 2.116716 | 26650.0 | -0.931196 | 6.9 | 0.558659 | 11.178386 | 30.918669 | 2983.798883 | 0.595682 | 0.625260 | 0.885949 | 0.245483 |
17 | 2017-01-01 | Aruba | 5.482371 | 27720.0 | -1.028282 | 6.3 | 0.558659 | 8.646561 | 29.335673 | 3092.178771 | 0.614444 | 0.640948 | 0.617596 | 0.065980 |
18 | 2018-01-01 | Aruba | 5.257856 | 29030.0 | 3.626041 | 6.0 | 0.558659 | 8.975137 | 30.611363 | 3202.234637 | 0.686940 | 0.660154 | 1.190500 | 0.478023 |
19 | 2019-01-01 | Aruba | 0.635029 | 30000.0 | 4.257462 | 5.7 | 0.558659 | 9.547850 | 29.157277 | 3368.970253 | 0.708518 | 0.674375 | 0.500876 | 0.415586 |
Now we want to check whether each model predicts the exchange rate is going to go up or down. We do this by taking the difference between the future prdeicted exchange rate (2016, 2017, 2018, 2019 and 2020) and the predicted 2015 exchange rate. We do this for different holding years, 1 year, 2 years and so on.
df_2 = df.copy()
for i in range(1,6):
grouped = df_2.groupby('Country Name')
df_2[f"lag_linear_{i}"] = grouped['Linear_Predict'].shift(i)
df_2[f"lag_lasso_{i}"] = grouped['Lasso_Predict'].shift(i)
df_2[f"lag_nn_1_{i}"] = grouped['NN_1_Predict'].shift(i)
df_2[f"lag_nn_2_{i}"] = grouped['NN_2_Predict'].shift(i)
df_4 = df_2.drop(["GDP per mil", "GDP growth rate", "GNI per capita", "Inflation", "Lending rate", "Net Exports to GDP", "Total Reserve to GDP"], axis = 1)
a = ["Linear_Predict", "Lasso_Predict", "NN_1_Predict", "NN_2_Predict"]
b = ["lag_linear", "lag_lasso", "lag_nn_1", "lag_nn_2"]
for (mod, lag) in zip(a,b):
for i in range(1,6):
df_4[f"diff_{mod}_{i}"] = df_4[f"{mod}"] - df_4[f"{lag}_{i}"]
df_4.head()
Year | Country Name | Forex rate | Linear_Predict | Lasso_Predict | NN_1_Predict | NN_2_Predict | lag_linear_1 | lag_lasso_1 | lag_nn_1_1 | ... | diff_NN_1_Predict_1 | diff_NN_1_Predict_2 | diff_NN_1_Predict_3 | diff_NN_1_Predict_4 | diff_NN_1_Predict_5 | diff_NN_2_Predict_1 | diff_NN_2_Predict_2 | diff_NN_2_Predict_3 | diff_NN_2_Predict_4 | diff_NN_2_Predict_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 2015-01-01 | Aruba | 0.558659 | 0.613864 | 0.618369 | 1.298603 | 0.235513 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
16 | 2016-01-01 | Aruba | 0.558659 | 0.595682 | 0.625260 | 0.885949 | 0.245483 | 0.613864 | 0.618369 | 1.298603 | ... | -0.412653 | NaN | NaN | NaN | NaN | 0.009970 | NaN | NaN | NaN | NaN |
17 | 2017-01-01 | Aruba | 0.558659 | 0.614444 | 0.640948 | 0.617596 | 0.065980 | 0.595682 | 0.625260 | 0.885949 | ... | -0.268353 | -0.681006 | NaN | NaN | NaN | -0.179503 | -0.169533 | NaN | NaN | NaN |
18 | 2018-01-01 | Aruba | 0.558659 | 0.686940 | 0.660154 | 1.190500 | 0.478023 | 0.614444 | 0.640948 | 0.617596 | ... | 0.572904 | 0.304551 | -0.108102 | NaN | NaN | 0.412043 | 0.232540 | 0.242510 | NaN | NaN |
19 | 2019-01-01 | Aruba | 0.558659 | 0.708518 | 0.674375 | 0.500876 | 0.415586 | 0.686940 | 0.660154 | 1.190500 | ... | -0.689624 | -0.116720 | -0.385073 | -0.797726 | NaN | -0.062437 | 0.349606 | 0.170102 | 0.180073 | NaN |
5 rows × 47 columns
Now we make a different dataframe for each holding period.
c=['Year', 'Country Name', 'Forex rate']
for mod in a:
d = f"diff_{mod}_1"
c.append(d)
print(c)
['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']
e=['Year', 'Country Name', 'Forex rate']
for mod in a:
f = f"diff_{mod}_2"
e.append(f)
print(e)
['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']
g=['Year', 'Country Name', 'Forex rate']
for mod in a:
h = f"diff_{mod}_3"
g.append(h)
print(g)
['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']
i=['Year', 'Country Name', 'Forex rate']
for mod in a:
j = f"diff_{mod}_4"
i.append(j)
print(i)
['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']
k=['Year', 'Country Name', 'Forex rate']
for mod in a:
l = f"diff_{mod}_5"
k.append(l)
print(k)
['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']
df_5 = df_4[c] #1 yr
df_6 = df_4[e] #2 yrs
df_7 = df_4[g] #3 yrs
df_8 = df_4[i] #4 yrs
df_9 = df_4[k] #5 yrs
df_5 = df_5[df_5['Year'].isin(['2015-01-01', '2016-01-01'])]
df_6 = df_6[df_6['Year'].isin(['2015-01-01', '2017-01-01'])]
df_7 = df_7[df_7['Year'].isin(['2015-01-01', '2018-01-01'])]
df_8 = df_8[df_8['Year'].isin(['2015-01-01', '2019-01-01'])]
df_9 = df_9[df_9['Year'].isin(['2015-01-01', '2020-01-01'])]
df_5.head()
Year | Country Name | Forex rate | diff_Linear_Predict_1 | diff_Lasso_Predict_1 | diff_NN_1_Predict_1 | diff_NN_2_Predict_1 | |
---|---|---|---|---|---|---|---|
15 | 2015-01-01 | Aruba | 0.558659 | NaN | NaN | NaN | NaN |
16 | 2016-01-01 | Aruba | 0.558659 | -0.018182 | 0.006891 | -0.412653 | 0.009970 |
59 | 2015-01-01 | Afghanistan | 0.016355 | NaN | NaN | NaN | NaN |
60 | 2016-01-01 | Afghanistan | 0.014735 | 0.049683 | -0.000586 | 0.047096 | 0.082432 |
103 | 2015-01-01 | Angola | 0.008329 | NaN | NaN | NaN | NaN |
Now we define a function which tells us whether to buy or short a currency based on whether the model predicts that the exchange rate is going to be higher or lower.
def buy_or_short(x):
if x >= 0:
return "Buy"
elif x < 0:
return "Short"
else:
return "NaN"
df_10 = df_5.copy()
df_10[['diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']] = df_10[['diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']].applymap(buy_or_short)
df_11 = df_6.copy()
df_11[['diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']] = df_11[['diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']].applymap(buy_or_short)
df_12 = df_7.copy()
df_12[['diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']] = df_12[['diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']].applymap(buy_or_short)
df_13 = df_8.copy()
df_13[['diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']] = df_13[['diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']].applymap(buy_or_short)
df_14 = df_9.copy()
df_14[['diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']] = df_14[['diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']].applymap(buy_or_short)
group = df_10.groupby('Country Name')
df_10['Forex rate lag'] = group['Forex rate'].shift(1)
df_10['Forex diff'] = df_10['Forex rate'] - df_10['Forex rate lag']
group = df_12.groupby('Country Name')
df_12['Forex rate lag'] = group['Forex rate'].shift(1)
df_12['Forex diff'] = df_12['Forex rate'] - df_12['Forex rate lag']
group = df_14.groupby('Country Name')
df_14['Forex rate lag'] = group['Forex rate'].shift(1)
df_14['Forex diff'] = df_14['Forex rate'] - df_14['Forex rate lag']
df_12.head(10)
Year | Country Name | Forex rate | diff_Linear_Predict_3 | diff_Lasso_Predict_3 | diff_NN_1_Predict_3 | diff_NN_2_Predict_3 | Forex rate lag | Forex diff | |
---|---|---|---|---|---|---|---|---|---|
15 | 2015-01-01 | Aruba | 0.558659 | NaN | NaN | NaN | NaN | NaN | NaN |
18 | 2018-01-01 | Aruba | 0.558659 | Buy | Buy | Short | Buy | 0.558659 | 0.000000 |
59 | 2015-01-01 | Afghanistan | 0.016355 | NaN | NaN | NaN | NaN | NaN | NaN |
103 | 2015-01-01 | Angola | 0.008329 | NaN | NaN | NaN | NaN | NaN | NaN |
106 | 2018-01-01 | Angola | 0.003955 | Buy | Short | Buy | Short | 0.008329 | -0.004374 |
125 | 2015-01-01 | Albania | 0.007939 | NaN | NaN | NaN | NaN | NaN | NaN |
128 | 2018-01-01 | Albania | 0.009260 | Buy | Buy | Buy | Short | 0.007939 | 0.001321 |
235 | 2015-01-01 | Armenia | 0.002092 | NaN | NaN | NaN | NaN | NaN | NaN |
238 | 2018-01-01 | Armenia | 0.002070 | Buy | Buy | Buy | Buy | 0.002092 | -0.000022 |
279 | 2015-01-01 | Antigua and Barbuda | 0.370370 | NaN | NaN | NaN | NaN | NaN | NaN |
Now we can calculate the average rate of return for each model for a 1 year, 3 years and 5 years holding period.
#Average rate of return for Linear reg and holding the currency for 1 year when model says to buy
df_linear_1 = df_10[['Year', 'Country Name', 'diff_Linear_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]
df_linear_1_buy = df_linear_1[df_linear_1['diff_Linear_Predict_1'].isin(['Buy'])]
df_linear_1_buy['Return_Rate'] = df_linear_1_buy['Forex diff']/df_linear_1_buy['Forex rate lag'] * 100
linear_1_return = df_linear_1_buy['Return_Rate'].mean()
print(f"The average return for the linear regression model and 1 year holding period is {linear_1_return}")
The average return for the linear regression model and 1 year holding period is -6.682001712313225
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1716736160.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_linear_1_buy['Return_Rate'] = df_linear_1_buy['Forex diff']/df_linear_1_buy['Forex rate lag'] * 100
#Average rate of return for Lasso reg and holding the currency for 1 year when model says to buy
df_lasso_1 = df_10[['Year', 'Country Name', 'diff_Lasso_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]
df_lasso_1_buy = df_lasso_1[df_lasso_1['diff_Lasso_Predict_1'].isin(['Buy'])]
df_lasso_1_buy['Return_Rate'] = df_lasso_1_buy['Forex diff']/df_lasso_1_buy['Forex rate lag'] * 100
lasso_1_return = df_lasso_1_buy['Return_Rate'].mean()
print(f"The average return for the lasso regression model and 1 year holding period is {lasso_1_return}")
The average return for the lasso regression model and 1 year holding period is -2.2038092077826814
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/339817703.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_lasso_1_buy['Return_Rate'] = df_lasso_1_buy['Forex diff']/df_lasso_1_buy['Forex rate lag'] * 100
#Average rate of return for 1st NN and holding the currency for 1 year when model says to buy
df_nn_1_1 = df_10[['Year', 'Country Name', 'diff_NN_1_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]
df_nn_1_1_buy = df_nn_1_1[df_nn_1_1['diff_NN_1_Predict_1'].isin(['Buy'])]
df_nn_1_1_buy['Return_Rate'] = df_nn_1_1_buy['Forex diff']/df_nn_1_1_buy['Forex rate lag'] * 100
nn_1_1_return = df_nn_1_1_buy['Return_Rate'].mean()
print(f"The average return for the first neural network and 1 year holding period is {nn_1_1_return}")
The average return for the first neural network and 1 year holding period is -4.228952725618851
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2819496941.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_nn_1_1_buy['Return_Rate'] = df_nn_1_1_buy['Forex diff']/df_nn_1_1_buy['Forex rate lag'] * 100
#Average rate of return for 2nd NN and holding the currency for 1 year when model says to buy
df_nn_2_1 = df_10[['Year', 'Country Name', 'diff_NN_2_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]
df_nn_2_1_buy = df_nn_2_1[df_nn_2_1['diff_NN_2_Predict_1'].isin(['Buy'])]
df_nn_2_1_buy['Return_Rate'] = df_nn_2_1_buy['Forex diff']/df_nn_2_1_buy['Forex rate lag'] * 100
nn_2_1_return = df_nn_2_1_buy['Return_Rate'].mean()
print(f"The average return for the second neural network and 1 year holding period is {nn_2_1_return}")
The average return for the second neural network and 1 year holding period is -5.661574799802205
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2864552994.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_nn_2_1_buy['Return_Rate'] = df_nn_2_1_buy['Forex diff']/df_nn_2_1_buy['Forex rate lag'] * 100
#Average rate of return for Linear reg and holding the currency for 3 years when model says to buy
df_linear_3 = df_12[['Year', 'Country Name', 'diff_Linear_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]
df_linear_3_buy = df_linear_3[df_linear_3['diff_Linear_Predict_3'].isin(['Buy'])]
df_linear_3_buy['Return_Rate'] = df_linear_3_buy['Forex diff']/df_linear_3_buy['Forex rate lag'] * 100
linear_3_return = df_linear_3_buy['Return_Rate'].mean()
print(f"The average return for the linear regression model and 3 years holding period is {linear_3_return}")
The average return for the linear regression model and 3 years holding period is -3.979679534791286
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/3889504242.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_linear_3_buy['Return_Rate'] = df_linear_3_buy['Forex diff']/df_linear_3_buy['Forex rate lag'] * 100
#Average rate of return for Lasso reg and holding the currency for 3 years when model says to buy
df_lasso_3 = df_12[['Year', 'Country Name', 'diff_Lasso_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]
df_lasso_3_buy = df_lasso_3[df_lasso_3['diff_Lasso_Predict_3'].isin(['Buy'])]
df_lasso_3_buy['Return_Rate'] = df_lasso_3_buy['Forex diff']/df_lasso_3_buy['Forex rate lag'] * 100
lasso_3_return = df_lasso_3_buy['Return_Rate'].mean()
print(f"The average return for the lasso regression model and 3 years holding period is {lasso_3_return}")
The average return for the lasso regression model and 3 years holding period is -1.8921836297212749
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1695144407.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_lasso_3_buy['Return_Rate'] = df_lasso_3_buy['Forex diff']/df_lasso_3_buy['Forex rate lag'] * 100
#Average rate of return for 1st NN and holding the currency for 3 years when model says to buy
df_nn_1_3 = df_12[['Year', 'Country Name', 'diff_NN_1_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]
df_nn_1_3_buy = df_nn_1_3[df_nn_1_3['diff_NN_1_Predict_3'].isin(['Buy'])]
df_nn_1_3_buy['Return_Rate'] = df_nn_1_3_buy['Forex diff']/df_nn_1_3_buy['Forex rate lag'] * 100
nn_1_3_return = df_nn_1_3_buy['Return_Rate'].mean()
print(f"The average return for the first neural network and 3 years holding period is {nn_1_3_return}")
The average return for the first neural network and 3 years holding period is -5.250983001440971
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2278624619.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_nn_1_3_buy['Return_Rate'] = df_nn_1_3_buy['Forex diff']/df_nn_1_3_buy['Forex rate lag'] * 100
#Average rate of return for 2nd NN and holding the currency for 3 years when model says to buy
df_nn_2_3 = df_12[['Year', 'Country Name', 'diff_NN_2_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]
df_nn_2_3_buy = df_nn_2_3[df_nn_2_3['diff_NN_2_Predict_3'].isin(['Buy'])]
df_nn_2_3_buy['Return_Rate'] = df_nn_2_3_buy['Forex diff']/df_nn_2_3_buy['Forex rate lag'] * 100
nn_2_3_return = df_nn_2_3_buy['Return_Rate'].mean()
print(f"The average return for the second neural networl and 3 years holding period is {nn_2_3_return}")
The average return for the second neural networl and 3 years holding period is -5.548735188527842
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2694513607.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_nn_2_3_buy['Return_Rate'] = df_nn_2_3_buy['Forex diff']/df_nn_2_3_buy['Forex rate lag'] * 100
#Average rate of return for Linear reg and holding the currency for 5 years when model says to buy
df_linear_5 = df_14[['Year', 'Country Name', 'diff_Linear_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]
df_linear_5_buy = df_linear_5[df_linear_5['diff_Linear_Predict_5'].isin(['Buy'])]
df_linear_5_buy['Return_Rate'] = df_linear_5_buy['Forex diff']/df_linear_5_buy['Forex rate lag'] * 100
linear_5_return = df_linear_5_buy['Return_Rate'].mean()
print(f"The average return for the linear regression model and 5 years holding period is {linear_5_return}")
The average return for the linear regression model and 5 years holding period is -17.218054184651734
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1940798184.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_linear_5_buy['Return_Rate'] = df_linear_5_buy['Forex diff']/df_linear_5_buy['Forex rate lag'] * 100
#Average rate of return for Lasso reg and holding the currency for 5 years when model says to buy
df_lasso_5 = df_14[['Year', 'Country Name', 'diff_Lasso_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]
df_lasso_5_buy = df_lasso_5[df_lasso_5['diff_Lasso_Predict_5'].isin(['Buy'])]
df_lasso_5_buy['Return_Rate'] = df_lasso_5_buy['Forex diff']/df_lasso_5_buy['Forex rate lag'] * 100
lasso_5_return = df_lasso_5_buy['Return_Rate'].mean()
print(f"The average return for the lasso regression model and 5 years holding period is {lasso_5_return}")
The average return for the lasso regression model and 5 years holding period is -5.925652587401351
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/885340572.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_lasso_5_buy['Return_Rate'] = df_lasso_5_buy['Forex diff']/df_lasso_5_buy['Forex rate lag'] * 100
#Average rate of return for 1st NN and holding the currency for 5 years when model says to buy
df_nn_1_5 = df_14[['Year', 'Country Name', 'diff_NN_1_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]
df_nn_1_5_buy = df_nn_1_5[df_nn_1_5['diff_NN_1_Predict_5'].isin(['Buy'])]
df_nn_1_5_buy['Return_Rate'] = df_nn_1_5_buy['Forex diff']/df_nn_1_5_buy['Forex rate lag'] * 100
nn_1_5_return = df_nn_1_5_buy['Return_Rate'].mean()
print(f"The average return for the first neural network and 5 years holding period is {nn_1_5_return}")
The average return for the first neural network and 5 years holding period is -9.570827108863499
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/555516999.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_nn_1_5_buy['Return_Rate'] = df_nn_1_5_buy['Forex diff']/df_nn_1_5_buy['Forex rate lag'] * 100
#Average rate of return for 2nd NN and holding the currency for 5 years when model says to buy
df_nn_2_5 = df_14[['Year', 'Country Name', 'diff_NN_2_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]
df_nn_2_5_buy = df_nn_2_5[df_nn_2_5['diff_NN_2_Predict_5'].isin(['Buy'])]
df_nn_2_5_buy['Return_Rate'] = df_nn_2_5_buy['Forex diff']/df_nn_2_5_buy['Forex rate lag'] * 100
nn_2_5_return = df_nn_2_5_buy['Return_Rate'].mean()
print(f"The average return for the second neural network and 5 years holding period is {nn_2_5_return}")
The average return for the second neural network and 5 years holding period is -11.086691753890205
/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1121885410.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_nn_2_5_buy['Return_Rate'] = df_nn_2_5_buy['Forex diff']/df_nn_2_5_buy['Forex rate lag'] * 100
Below we can see the plots of these returns.
#Preparing Returns plot
df_return_rates = pd.DataFrame({
'Model':["Linear Model 1 Years Hold Return", "Lasso Model 1 Years Hold Return", "1st Neural Network 1 Years Hold Return", "2nd Neural Network 1 Years Hold Return",
"Linear Model 3 Years Hold Return", "Lasso Model 3 Years Hold Return", "1st Neural Network 3 Years Hold Return", "2nd Neural Network 3 Years Hold Return",
"Linear Model 5 Years Hold Return", "Lasso Model 5 Years Hold Return", "1st Neural Network 5 Years Hold Return", "2nd Neural Network 5 Years Hold Return"],
'Return Rate': [linear_1_return, lasso_1_return, nn_1_1_return, nn_1_3_return,
linear_3_return, lasso_3_return, nn_1_3_return, nn_2_3_return,
linear_5_return, lasso_5_return, nn_1_5_return, nn_2_5_return]
})
df_return_rates
Model | Return Rate | |
---|---|---|
0 | Linear Model 1 Years Hold Return | -6.682002 |
1 | Lasso Model 1 Years Hold Return | -2.203809 |
2 | 1st Neural Network 1 Years Hold Return | -4.228953 |
3 | 2nd Neural Network 1 Years Hold Return | -5.250983 |
4 | Linear Model 3 Years Hold Return | -3.979680 |
5 | Lasso Model 3 Years Hold Return | -1.892184 |
6 | 1st Neural Network 3 Years Hold Return | -5.250983 |
7 | 2nd Neural Network 3 Years Hold Return | -5.548735 |
8 | Linear Model 5 Years Hold Return | -17.218054 |
9 | Lasso Model 5 Years Hold Return | -5.925653 |
10 | 1st Neural Network 5 Years Hold Return | -9.570827 |
11 | 2nd Neural Network 5 Years Hold Return | -11.086692 |
df_return_rates_1 = df_return_rates.loc[df_return_rates['Model'].isin(["Linear Model 1 Years Hold Return", "Lasso Model 1 Years Hold Return", "1st Neural Network 1 Years Hold Return", "2nd Neural Network 1 Years Hold Return"])]
df_return_rates_3 = df_return_rates.loc[df_return_rates['Model'].isin(["Linear Model 3 Years Hold Return", "Lasso Model 3 Years Hold Return", "1st Neural Network 3 Years Hold Return", "2nd Neural Network 3 Years Hold Return"])]
df_return_rates_5 = df_return_rates.loc[df_return_rates['Model'].isin([ "Linear Model 5 Years Hold Return", "Lasso Model 5 Years Hold Return", "1st Neural Network 5 Years Hold Return", "2nd Neural Network 5 Years Hold Return"])]
#A plot for the rate of returns for the different ml models
# Create a chart object and specify the data source
chart_1 = alt.Chart(df_return_rates_1)
chart_3 = alt.Chart(df_return_rates_3)
chart_5 = alt.Chart(df_return_rates_5)
# Add a bar chart layer to display the average return rates for each model
bars_1 = chart_1.mark_bar().encode(
y='Model:N',
x='Return Rate:Q',
tooltip=[alt.Tooltip('Model:N', title="Model"),
alt.Tooltip('Return Rate:Q', title="Return Rate")]
).properties(width=1000, height = 200, title = "1 Year Holding Period Models")
bars_3 = chart_3.mark_bar().encode(
y='Model:N',
x='Return Rate:Q',
tooltip=[alt.Tooltip('Model:N', title="Model"),
alt.Tooltip('Return Rate:Q', title="Return Rate")]
).properties(width=1000, height = 200, title = "3 Year Holding Period Models")
bars_5 = chart_5.mark_bar().encode(
y='Model:N',
x='Return Rate:Q',
tooltip=[alt.Tooltip('Model:N', title="Model"),
alt.Tooltip('Return Rate:Q', title="Return Rate")]
).properties(width=1000, height = 200, title = "5 Year Holding Period Models")
# # Add a text layer to display the return rates as labels on the bars
# labels = bars.mark_text(
# align='right',
# baseline='bottom',
# dy=-2
# ).encode(
# text='Return Rate'
# )
# Combine the layers into a single chart and display it
alt.vconcat(bars_1, bars_3, bars_5).properties(title="Model Return Comparison")
/Users/uddhav/opt/anaconda3/envs/Py_2/lib/python3.8/site-packages/altair/utils/core.py:283: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for col_name, dtype in df.dtypes.iteritems():
As we can see across all holding periods, the lasso regression performs the best. The mediocre performance of the neural networks could be because they are overfitting and therefore performing worse out of sample, we can see this in the mean squared error results too.
We can see from the plots above that the returns are negative for all our machine learning models, while they were positive for the PPP model.
It is hard to predict the exchange rate even in the long run. This project has shown that theory does perform better than machine learning (based on our measuring method) in predicting exchange rates.