Theory Versus Machine Learning for Exchange Rate Predictions¶

Annabella Stoll-Dansereau, Silas Kwok, [Redacted]¶

In this project, we look at long term exchange rates and predict them using the Purchasing Power Parity theory (PPP), the Real Exchange Rate theory (RER), a linear regression, a lasso regression and two neural networks. To compare the different models, we use their predictions to decide which currencies to buy and then compare the average rate of returns. We find that the PPP model performs the best and does give positive returns in some cases, the machine learning models do not perform well and all give negative returns.

Importing all the necessary packages.

In [1]:

import pandas as pd
import numpy as np 
import seaborn as sns
from sklearn import linear_model
from sklearn.model_selection import cross_val_score
from sklearn import tree
from sklearn.ensemble import RandomForestRegressor
from sklearn import neural_network
from sklearn import preprocessing, pipeline
from sklearn import metrics
import matplotlib.pyplot as plt
import altair as alt

First we load and clean the data.

In [2]:

def data_comp(variables):

    '''
    cleaning OECD data pass in the name of csv files you are using

    '''
    for n, var in enumerate(variables):

        #read in file OECD data

        df =  pd.read_csv(f"{var}.csv")

        #drop redundent columns and transform it into Year, Country, Variable

        df.columns = df.iloc[2]
        df.drop([0,1,2], inplace=True)
        df.drop(df.columns[1:44], axis=1, inplace=True)
        df.set_index('Country Name', inplace = True, drop = True)
        df = df.T
        df.index = [round(idx) for idx in df.index]
        df.index = pd.to_datetime(df.index, format='%Y')
        df = df.reset_index()
        df = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:], var_name='Country Name', value_name='Value')

        df.set_index(['index', 'Country Name'], inplace=True)
        df.index.names = ["Year", "Country Name"]
        df.rename(columns={'Value': var}, inplace=True)

        #making the dataframe and combining over all variables
        if n == 0: 
            full_df = df
        else: 
            full_df = pd.concat([full_df, df], axis =1)
    return(full_df)

df_ML = data_comp(['GDP', 'Govt_debt_gdp', 'GDP growth rate','GNI per capita',
            'Inflation', 'Lending rate', 'Net exports', 'Total_reserve', 'Forex rate'])

#writting to a csv
#full_df.to_csv('EX_rate_table.csv', index=True)

PPP and RER¶

Here we construct the PPP and RER models. Based on the idea that in the long run we will converge back to an average real exchange rate value. While it doesnt always hold mean reversion seems to be the most consistent (and easiest!) way to predict exchange rates.

In [3]:

df_PPP_model= data_comp(['RER', 'Forex rate', 'PPP'])
df_PPP = pd.read_csv(f"PPP_US.csv")

df_PPP_model = df_PPP_model[df_PPP_model.notna().all(axis=1)]
print(df_PPP_model)

#making the contructed RER which I am defining from 456 notes as RER =  (forgien exchnage rate * P_in country)/ P_in US
#here P_in country/P in US == PPP

df_PPP_model['con_RER'] = (df_PPP_model['Forex rate'] * df_PPP_model['PPP'])

#Now we have to see how this constructed RER compares with the reported RER and NomER (Forex rate)
#theretically in the long run these would be zero since the NOM*PPP == RER_normed
#Then I divided by 100 to get the percetage if it is negative that implies currency will appriciate
#if is positive it will depreiciate

#Clearly a lot of issues and gross assumptions
# 1. this only holds for inflation bearing countries  which might be why issues arise in non advanced countries
# 2. not clear if mean reverting in long long 
# 3. not clear if assumptions on PPP hold in long run
# 4. clealy issues in this not taking into consideration risks i.e. a forgein exchange might appear undervalued but that is because its risky
# 5. very US centric by norming to the US each period as the US curency itself might have issues

df =df_PPP_model.reset_index()
#df_PPP_model.to_csv('testing1.csv', index=True)

                               RER  Forex rate       PPP
Year       Country Name                                 
2000-01-01 Armenia       74.010935  539.525833  0.234525
2001-01-01 Armenia       73.034791  555.078258  0.231919
2002-01-01 Armenia       73.665189  573.353333  0.226287
2003-01-01 Armenia       67.937520  578.762954  0.229938
2004-01-01 Armenia       71.634581  533.450833  0.258205
...                            ...         ...       ...
2017-01-01 Zambia        94.221086    9.517500  0.440513
2018-01-01 Zambia        88.528850   10.458333  0.420550
2019-01-01 Zambia        79.795053   12.890000  0.360806
2020-01-01 Zambia        66.438740   18.344093  0.284940
2021-01-01 Zambia        69.764269   20.018487  0.319845

[1766 rows x 3 columns]

In [4]:

df= pd.read_csv(f"PPP_US.csv")

country_mapping = {

    'ARM': 'Armenia', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'BDI': 'Burundi', 'BGR': 'Bulgaria', 
    'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BLZ': 'Belize', 'BOL': 'Bolivia', 'BRA': 'Brazil', 
    'CAF': 'Central African Republic', 'CAN': 'Canada', 'CHE': 'Switzerland', 'CHL': 'Chile', 'CHN': 'China', 
    'CIV': "Cote d'Ivoire", 'CMR': 'Cameroon', 'COD': 'Congo, Dem. Rep.', 'COL': 'Colombia', 'CRI': 'Costa Rica', 
    'CYP': 'Cyprus', 'CZE': 'Czechia', 'DMA': 'Dominica', 'DNK': 'Denmark', 'DOM': 'Dominican Republic', 
    'DZA': 'Algeria', 'FJI': 'Fiji', 'GAB': 'Gabon', 'GBR': 'United Kingdom', 'GEO': 'Georgia', 'GHA': 'Ghana', 
    'GMB': 'Gambia, The', 'GNQ': 'Equatorial Guinea', 'GRC': 'Greece', 'GRD': 'Grenada', 'GUY': 'Guyana', 
    'HKG': 'Hong Kong SAR, China', 'HRV': 'Croatia', 'HUN': 'Hungary', 'IRN': 'Iran, Islamic Rep.', 'ISL': 'Iceland', 
    'ISR': 'Israel', 'JPN': 'Japan', 'KNA': 'St. Kitts and Nevis', 'KOR': 'Korea, Rep.', 'LCA': 'St. Lucia', 
    'LSO': 'Lesotho', 'LVA': 'Latvia', 'MAR': 'Morocco', 'MDA': 'Moldova', 'MEX': 'Mexico', 'MKD': 'North Macedonia', 
    'MLT': 'Malta', 'MWI': 'Malawi', 'MYS': 'Malaysia', 'NGA': 'Nigeria', 'NIC': 'Nicaragua', 'NOR': 'Norway', 
    'NZL': 'New Zealand', 'PAK': 'Pakistan', 'PHL': 'Philippines', 'PNG': 'Papua New Guinea', 'POL': 'Poland', 
    'PRY': 'Paraguay', 'ROU': 'Romania', 'RUS': 'Russian Federation', 'SAU': 'Saudi Arabia', 'SGP': 'Singapore', 
    'SLB': 'Solomon Islands', 'SLE': 'Sierra Leone', 'SVK': 'Slovak Republic', 'SWE': 'Sweden', 'TGO': 'Togo', 
    'TTO': 'Trinidad and Tobago', 'TUN': 'Tunisia', 'UGA': 'Uganda', 'UKR': 'Ukraine', 'URY': 'Uruguay', 'USA': 'United States',
    'VCT': 'St. Vincent and the Grenadines', 'VEN': 'Venezuela, RB', 'WSM': 'Samoa', 'ZAF': 'South Africa',
    'ZMB': 'Zambia'}

def apply_country_mapping(location):
    if location in country_mapping:
        return country_mapping[location]
    else:
        return None

df['LOCATION'] = df['LOCATION'].apply(apply_country_mapping)

# Extract only the necessary columns
df = df[['LOCATION', 'TIME', 'Value']]
df['TIME'] = pd.to_datetime(df['TIME'], format='%Y')

df = df.rename(columns={'Value': 'PPP_for_ER'})
df = df.dropna()
df.set_index(['TIME', 'LOCATION'], inplace=True)
df.index.set_names(['Year', 'Country Name'], inplace=True)
df

Out[4]:

		PPP_for_ER
Year	Country Name
1960-01-01	Australia	0.700946
1961-01-01	Australia	0.701324
1962-01-01	Australia	0.687990
1963-01-01	Australia	0.698321
1964-01-01	Australia	0.708346
...	...	...
2017-01-01	Cameroon	232.801392
2018-01-01	Cameroon	231.682690
2019-01-01	Cameroon	230.273904
2020-01-01	Cameroon	229.137013
2021-01-01	Cameroon	227.378726

1765 rows × 1 columns

In [5]:

df
df_merged = df.merge(df_PPP_model, left_index=True, right_index=True, how='inner')

#theretically in the long run these would be zero since the NOM*PPP == RER_normed
#so now we will calculate the difference between the constructed Real exchange rate where RER = PPP*NER
#versus the calculated PPP

#if the PPP*nom is higher than the PPP based on the exchange rate the currency would be overvalued
#id PPP *nom is lower than PPP the currency would be undervalued

#a lot of issues because I dont know the methodology of how the real excahnge rate of PPP was calculated for the country which I grabbed
#from OCED but still shows what we want to buy or sell depending on the year

df_merged['Final_diff'] = df_merged['con_RER'] - df_merged['PPP_for_ER']
#filtering out unstable ones
df_merged = df_merged[df_merged['Final_diff']>-0.1]
df_merged = df_merged[df_merged['Final_diff']<0.1]

print(df_merged['Final_diff'].dtype)
df_merged.sort_index(inplace=True)

# Select all rows with years before 2015 and including 2015
df_year_before_2015 = df_merged.loc[(slice(None, '2015-01-01'), slice(None)), :]




df_merged['diff_in_ave_RER'] = df_merged['RER'].div(df_year_before_2015.groupby('Country Name').mean()['RER'])

df_years={}
for yr in range(2000, 2022):
    name = f'df_{yr}'
    dt_year= pd.to_datetime(f'{yr}-01-01')
    df_year = df_merged.loc[(dt_year, slice(None))]
    df_years[yr] = df_year


#ones below 1 short if the 'diff in ave RER' agrees

float64

Below we calculate the returns from the RER model.

In [6]:

# Define the holdings() function


def holdings(row):
    if row['diff_in_ave_RER'] > 1:
        loan_amount = 1000 * row['Forex rate']
        row['usd_amount'] = loan_amount/row['Forex rate']
        row['home_currency_amount'] = -loan_amount
    else:
        loan_amount = 1000
        row['usd_amount'] = -loan_amount
        row['home_currency_amount'] = loan_amount * row['Forex rate']
    return row

yr_past = 2015
yr_future = 2019

# Apply the function to the 'Amount' column in the DataFrame
#df_mean_RER_investment = df_years[yr_past].apply(holdings, axis=1)
#df_mean_RER_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']
##df_mean_RER_investment['new_usd_amount'] = (1/df_mean_RER_investment[f'{yr_future}_ER'])*df_mean_RER_investment['home_currency_amount'] 
#df_mean_RER_investment['return_USD'] = df_mean_RER_investment['new_usd_amount'] + df_mean_RER_investment['usd_amount']
returns = {}

for yr_past in range(2015,2018):
    list = []
    for yr in range(1,5):
        yr_future = yr_past+yr
        df_mean_RER_investment = df_years[yr_past].apply(holdings, axis=1)
        df_mean_RER_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']
        df_mean_RER_investment['new_usd_amount'] = (1/df_mean_RER_investment[f'{yr_future}_ER'])*df_mean_RER_investment['home_currency_amount'] 
        df_mean_RER_investment['return_USD'] = df_mean_RER_investment['new_usd_amount'] + df_mean_RER_investment['usd_amount']
        total_investment = df_mean_RER_investment['return_USD'].count() *1000
        gains = df_mean_RER_investment['return_USD'].sum()
        return_to = (((gains+total_investment)/total_investment-1)*100).round(3)
        list.append(return_to)
    returns[yr_past] = list 

for i in range(0,4):

    elements = [v[i] for v in returns.values()]
    average = sum(elements) / len(elements)

    print(f"Average Returns for {i+1} year holding period:", average)

Average Returns for 1 year holding period: 0.01100000000000001
Average Returns for 2 year holding period: 0.17499999999999996
Average Returns for 3 year holding period: -0.7763333333333334
Average Returns for 4 year holding period: -2.1503333333333337

As we can see above the average returns for the RER model are either close to 0 or negative. This indicates this rule isnt a good investment strategy.

Now we calculate the return for the PPP model.

In [7]:

# Define the holdings() function


def holdings(row):
    if row['Final_diff'] > 0:
        loan_amount = 1000 * row['Forex rate']
        row['usd_amount'] = loan_amount/row['Forex rate']
        row['home_currency_amount'] = -loan_amount
    else:
        loan_amount = 1000
        row['usd_amount'] = -loan_amount
        row['home_currency_amount'] = loan_amount * row['Forex rate']
    return row

In [8]:

returns = {}

for yr_past in range(2009,2018):
    list = []
    for yr in range(1,5):
        yr_future = yr_past+yr
        df_mean_PPP_diff_investment = df_years[yr_past].apply(holdings, axis=1)
        df_mean_PPP_diff_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']
        df_mean_PPP_diff_investment['new_usd_amount'] = (1/df_mean_PPP_diff_investment[f'{yr_future}_ER'])*df_mean_PPP_diff_investment['home_currency_amount'] 
        df_mean_PPP_diff_investment['return_USD'] = df_mean_PPP_diff_investment['new_usd_amount'] + df_mean_PPP_diff_investment['usd_amount']
        total_investment = df_mean_PPP_diff_investment['return_USD'].count() *1000
        gains = df_mean_PPP_diff_investment['return_USD'].sum()
        return_to = (((gains+total_investment)/total_investment-1)*100).round(3)
        list.append(return_to)
    returns[yr_past] = list
print(returns)  

for i in range(0,4):

    elements = [v[i] for v in returns.values()]
    average = sum(elements) / len(elements)

    print(f"Average Returns for {i+1} year holding period:", average)

{2009: [2.385, 5.431, 3.62, 2.171], 2010: [1.885, 1.378, 1.67, 2.385], 2011: [2.043, 2.201, 3.096, 7.39], 2012: [0.541, 1.369, 4.757, 4.861], 2013: [-0.107, -0.924, -0.556, -0.7], 2014: [-3.203, -2.152, -2.732, -3.082], 2015: [0.017, -0.53, 0.256, 1.198], 2016: [-1.394, -1.955, -0.692, 0.744], 2017: [-0.227, -0.12, 1.197, 1.476]}
Average Returns for 1 year holding period: 0.21555555555555553
Average Returns for 2 year holding period: 0.5219999999999999
Average Returns for 3 year holding period: 1.1795555555555555
Average Returns for 4 year holding period: 1.8269999999999997

As we can see above the returns for the PPP model are better and are all positive. So if we systematically invested based on this PPP rule we could get a positive return (although given the risk a four year return of 1.8% is pretty terrible).

Data for Machine Learning¶

In [9]:

forex_data = pd.read_csv("EX_rate_table.csv")

In [10]:

forex_data.head()

Out[10]:

	Year	Country Name	GDP	Govt_debt_gdp	GDP growth rate	GNI per capita	Inflation	Lending rate	Net exports	Total_reserve	Forex rate
0	2000-01-01	Aruba	1.873184e+09	NaN	7.616589	20390.0	4.044021	12.066667	3.097765e+08	235455000.0	1.79
1	2001-01-01	Aruba	1.896648e+09	NaN	4.195963	20510.0	2.883604	12.600000	4.304581e+08	321360614.5	1.79
2	2002-01-01	Aruba	1.962011e+09	NaN	-0.957771	19290.0	3.315247	11.300000	-1.377374e+08	374002933.0	1.79
3	2003-01-01	Aruba	2.044134e+09	NaN	1.120879	21120.0	3.656365	10.700000	-2.437989e+07	336945700.0	1.79
4	2004-01-01	Aruba	2.254749e+09	NaN	7.281026	24010.0	2.529129	9.600000	4.535251e+08	339224000.0	1.79

In [11]:

forex_data.count()

Out[11]:

Year               5852
Country Name       5852
GDP                5609
Govt_debt_gdp      1378
GDP growth rate    5529
GNI per capita     5285
Inflation          4992
Lending rate       2806
Net exports        3884
Total_reserve      3801
Forex rate         3954
dtype: int64

As we can see above, we lose about a two-thirds of our dataset if we include government debt to gdp ratio. This is why we'll drop this variable.

In [12]:

forex_data_2 = forex_data.drop(["Govt_debt_gdp"], axis = 1)

In [13]:

forex_data_2.dropna(inplace = True)

Since in the dataset the Foreign Exchange Rate is in Local Currency Units (LCU) per US$, we take the reciprocal to convert it to US$ per LCU.

It is natural for a country with a large GDP to have a high absolute value of net exports or total reserves, therefore we take these variables as percentages of GDP in order to control for the different sizes.

We divide GDP by a million so that scaling is less of an issue in our model.

In [14]:

#Converted Net exports and total reserves as percentages of gdp to normalise across countries.
forex_data_2["Forex rate"] = 1/forex_data_2["Forex rate"]
forex_data_2["Net Exports to GDP"] = forex_data_2["Net exports"]/forex_data_2["GDP"] * 100
forex_data_2["Total Reserve to GDP"] = forex_data_2["Total_reserve"]/forex_data_2["GDP"] * 100
forex_data_2["GDP per mil"] = forex_data_2["GDP"]/1_000_000
forex_data_2 = forex_data_2.drop(["Net exports", "Total_reserve", "GDP"], axis = 1)

In [15]:

forex_data_2.head()

Out[15]:

	Year	Country Name	GDP growth rate	GNI per capita	Inflation	Lending rate	Forex rate	Net Exports to GDP	Total Reserve to GDP	GDP per mil
0	2000-01-01	Aruba	7.616589	20390.0	4.044021	12.066667	0.558659	16.537429	12.569772	1873.184358
1	2001-01-01	Aruba	4.195963	20510.0	2.883604	12.600000	0.558659	22.695729	16.943608	1896.648045
2	2002-01-01	Aruba	-0.957771	19290.0	3.315247	11.300000	0.558659	-7.020216	19.062222	1962.011173
3	2003-01-01	Aruba	1.120879	21120.0	3.656365	10.700000	0.558659	-1.192676	16.483542	2044.134078
4	2004-01-01	Aruba	7.281026	24010.0	2.529129	9.600000	0.558659	20.114222	15.044870	2254.748603

In [16]:

forex_data_2['Year'].unique()

Out[16]:

array(['2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01',
       '2004-01-01', '2005-01-01', '2006-01-01', '2007-01-01',
       '2008-01-01', '2009-01-01', '2010-01-01', '2011-01-01',
       '2012-01-01', '2013-01-01', '2014-01-01', '2015-01-01',
       '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01',
       '2020-01-01', '2021-01-01'], dtype=object)

In [17]:

forex_data_2[forex_data_2['Country Name'] == "Vietnam"].head()

Out[17]:

	Year	Country Name	GDP growth rate	GNI per capita	Inflation	Lending rate	Forex rate	Net Exports to GDP	Total Reserve to GDP	GDP per mil
5654	2000-01-01	Vietnam	6.787316	380.0	-1.710337	10.550000	0.000071	-0.561392	10.960010	31172.518403
5655	2001-01-01	Vietnam	6.192893	400.0	-0.431545	9.420000	0.000068	-0.278413	11.242292	32685.198735
5656	2002-01-01	Vietnam	6.320821	420.0	3.830828	9.061667	0.000065	-5.144862	11.752907	35064.105501
5657	2003-01-01	Vietnam	6.899063	470.0	3.234648	9.480000	0.000064	-8.492507	15.736500	39552.513316
5658	2004-01-01	Vietnam	7.536411	540.0	7.754947	9.722500	0.000064	-6.953883	15.500315	45427.854693

Below we can the plots for our explanatory variables.

In [18]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='Total Reserve to GDP:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Total Reserve to GDP:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Total Reserve to GDP:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="Total Reserve to GDP for All Countries Over Time"
)

layers

/Users/uddhav/opt/anaconda3/envs/Py_2/lib/python3.8/site-packages/altair/utils/core.py:283: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  for col_name, dtype in df.dtypes.iteritems():

Out[18]:

In [19]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='Net Exports to GDP:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Net Exports to GDP:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Net Exports to GDP:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="Net Exports to GDP for All Countries Over Time"
)

layers

Out[19]:

In [20]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='Lending rate:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Lending rate:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Lending rate:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="Lending rate for All Countries Over Time"
)

layers

Out[20]:

In [21]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='Inflation:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Inflation:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Inflation:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="Inflation for All Countries Over Time"
)

layers

Out[21]:

In [22]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='GNI per capita:Q',
    color='Country Name:N',
    # tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP growth rate:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'GNI per capita:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="GNI per capita for All Countries Over Time"
)

layers

Out[22]:

In [23]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='GDP growth rate:Q',
    color='Country Name:N',
    # tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP growth rate:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'GDP growth rate:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="GDP growth rate for All Countries Over Time"
)

layers

Out[23]:

In [24]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='GDP per mil:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP per mil:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'GDP per mil:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="GDP per mil for All Countries Over Time"
)

layers

Out[24]:

In [25]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='Forex rate:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="Forex Rates for All Countries Over Time"
)

layers

Out[25]:

In [26]:

# select_countries = ['Canada', 'Switzerland', 'China', 'United Kingdom', 'India', 'Japan', 'China', 'Mexico', 'Singapore', 'United States']
select_countries = ['Canada', 'Switzerland', 'United Kingdom', 'China', 'Singapore', 'United States']

In [27]:

select_forex_data_2 = forex_data_2[forex_data_2['Country Name'].isin(select_countries)]

In [28]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover',
                        fields=['Year'])

# # The basic line
line = alt.Chart(select_forex_data_2).mark_line().encode(
    alt.X('Year:T', title = 'year'),
    y='Forex rate:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]
)


# Transparent selectors across the chart. This is what tells us the x-value of the cursor
selectors = alt.Chart(select_forex_data_2).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)

)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(select_forex_data_2).mark_rule(color='gray').encode(
    x='Year:T',
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the five layers into a chart and bind the data
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title = "Forex Rates for Select Countries Over Time"
)

layers

Out[28]:

We use 2015 as the cut-off for training and testing dataset.

In [29]:

#2015 as the cut-off year for testing.
forex_data_2_test = forex_data_2[forex_data_2['Year'].isin(['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01'])]

In [30]:

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])

# The basic line
line = alt.Chart(forex_data_2_test).mark_line().encode(
    alt.X('Year:T', title='Year'),
    y='Forex rate:Q',
    color='Country Name:N',
    tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]
)

# Transparent selectors across the chart. This is what tells us the x-value of the cursor.
selectors = alt.Chart(forex_data_2_test).mark_point().encode(
    x='Year:T',
    opacity=alt.value(0)
)

selectors_near = selectors.encode(
    opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))
).add_selection(nearest)

# Draw points on the line, and highlight based on selection.
points = line.mark_circle().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)

# Draw text labels near the points, and highlight based on selection.
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))
)

# Draw a rule at the location of the selection.
rules = alt.Chart(forex_data_2_test).mark_rule(color='gray').encode(
    x='Year:T'
).transform_filter(
    nearest
)

# Add vertical line at "Year" == 2015
vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(
    x='Year:T'
)

# Put the six layers into a chart and bind the data.
layers = alt.layer(
    line, selectors, points, text, rules, vline
).properties(
    width=1000, height=800,
    title="Forex Rates for All Countries TESTING DATA"
)

layers

Out[30]:

In [31]:

forex_data_3 = forex_data_2.drop(forex_data_2_test.index)

In [32]:

forex_data_3.head()

Out[32]:

	Year	Country Name	GDP growth rate	GNI per capita	Inflation	Lending rate	Forex rate	Net Exports to GDP	Total Reserve to GDP	GDP per mil
0	2000-01-01	Aruba	7.616589	20390.0	4.044021	12.066667	0.558659	16.537429	12.569772	1873.184358
1	2001-01-01	Aruba	4.195963	20510.0	2.883604	12.600000	0.558659	22.695729	16.943608	1896.648045
2	2002-01-01	Aruba	-0.957771	19290.0	3.315247	11.300000	0.558659	-7.020216	19.062222	1962.011173
3	2003-01-01	Aruba	1.120879	21120.0	3.656365	10.700000	0.558659	-1.192676	16.483542	2044.134078
4	2004-01-01	Aruba	7.281026	24010.0	2.529129	9.600000	0.558659	20.114222	15.044870	2254.748603

In [33]:

forex_data_3_train_y = forex_data_3[["Forex rate"]]

In [34]:

forex_data_3_train_X = forex_data_3.drop(["Year", "Country Name","Forex rate", "GDP per mil"], axis = 1)

In [35]:

forex_data_2_test.head()

Out[35]:

	Year	Country Name	GDP growth rate	GNI per capita	Inflation	Lending rate	Forex rate	Net Exports to GDP	Total Reserve to GDP	GDP per mil
15	2015-01-01	Aruba	3.543630	26180.0	0.474764	7.0	0.558659	10.277624	27.531731	2963.128492
16	2016-01-01	Aruba	2.116716	26650.0	-0.931196	6.9	0.558659	11.178386	30.918669	2983.798883
17	2017-01-01	Aruba	5.482371	27720.0	-1.028282	6.3	0.558659	8.646561	29.335673	3092.178771
18	2018-01-01	Aruba	5.257856	29030.0	3.626041	6.0	0.558659	8.975137	30.611363	3202.234637
19	2019-01-01	Aruba	0.635029	30000.0	4.257462	5.7	0.558659	9.547850	29.157277	3368.970253

In [36]:

forex_data_3_test_y = forex_data_2_test[["Forex rate"]]

In [37]:

forex_data_3_test_X = forex_data_2_test.drop(["Year","Country Name","Forex rate", "GDP per mil"], axis = 1)

Now we fit a linear regression.

In [38]:

#Linear regression
lr_mod = linear_model.LinearRegression()
lr_mod.fit(forex_data_3_train_X, forex_data_3_train_y)

Out[38]:

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [39]:

mse_lr_train = metrics.mean_squared_error(forex_data_3_train_y, lr_mod.predict(forex_data_3_train_X))
mse_lr_test = metrics.mean_squared_error(forex_data_3_test_y, lr_mod.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_lr_train}")
print(f"The mean squred error in the testing dataset is {mse_lr_test}")

The mean squred error in the training dataset is 0.47435291908321897
The mean squred error in the testing dataset is 0.3015103066706014

The mean squred errors for our linear regression are given above. As we can see the mean squared error is higher for our training dataset than our testing dataset.

Now we fit our lasso regression, we use a 5 fold cross validation in order to choose the optimal $\alpha$ for our lasso regression.

In [40]:

#k=5 cv lasso
lasso_mod = linear_model.LassoCV(cv=5)
lasso_mod.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())

Out[40]:

LassoCV(cv=5)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [41]:

mse_lasso_train = metrics.mean_squared_error(forex_data_3_train_y, lasso_mod.predict(forex_data_3_train_X))
mse_lasso_test = metrics.mean_squared_error(forex_data_3_test_y, lasso_mod.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_lasso_train}")
print(f"The mean squred error in the testing dataset is {mse_lasso_test}")

The mean squred error in the training dataset is 0.49614895650008806
The mean squred error in the testing dataset is 0.23705686900933912

As we can see above the mean squared error for our training dataset is higher than that of our testing dataset.

Now we fit our first neural network with 3 hidden layers.

In [42]:

#The first neural network, 3 hidden layers. 
nn_mod_1 = neural_network.MLPRegressor((50, 35, 20), max_iter = 10000)
nn_mod_1.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())

Out[42]:

MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [43]:

mse_nn_1_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_1.predict(forex_data_3_train_X))
mse_nn_1_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_1.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_nn_1_train}")
print(f"The mean squred error in the testing dataset is {mse_nn_1_test}")

The mean squred error in the training dataset is 0.6253439030142882
The mean squred error in the testing dataset is 0.5258991107828873

As we can see above the mean squared error is roughly the same for both our training and testing datasets, however this model does suffer from input scaling. This is why below we run a scaled version of our neural network.

In [44]:

#First neural netwrok but scaled. Sacled is better. 
nn_mod_1_scaled = pipeline.make_pipeline(
    preprocessing.StandardScaler(),
    neural_network.MLPRegressor((50, 35, 20), max_iter = 10000))
nn_mod_1_scaled.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())

Out[44]:

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpregressor',
                 MLPRegressor(hidden_layer_sizes=(50, 35, 20),
                              max_iter=10000))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [45]:

mse_nn_1_scaled_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_1_scaled.predict(forex_data_3_train_X))
mse_nn_1_scaled_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_1_scaled.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_nn_1_scaled_train}")
print(f"The mean squred error in the testing dataset is {mse_nn_1_scaled_test}")

The mean squred error in the training dataset is 0.054302709057648856
The mean squred error in the testing dataset is 0.40455307013410235

As we can see above the mean squared error for our training data is much lower than that of our testing data. This could mean that our model is overfitting.

Now we run our second neural network with 4 hidden layers.

In [46]:

#Second neural network but scaled.
nn_mod_2_scaled = pipeline.make_pipeline(
    preprocessing.StandardScaler(),
    neural_network.MLPRegressor((50, 35, 20, 10), max_iter = 10000))
nn_mod_2_scaled.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())

Out[46]:

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpregressor',
                 MLPRegressor(hidden_layer_sizes=(50, 35, 20, 10),
                              max_iter=10000))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [47]:

mse_nn_2_scaled_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_2_scaled.predict(forex_data_3_train_X))
mse_nn_2_scaled_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_2_scaled.predict(forex_data_3_test_X))
print(f"The mean squred error in the training dataset is {mse_nn_2_scaled_train}")
print(f"The mean squred error in the testing dataset is {mse_nn_2_scaled_test}")

The mean squred error in the training dataset is 0.0456499285270549
The mean squred error in the testing dataset is 0.3405035580847174

As we can see above the mean squared error in our training data is lower than that of our testing data.

Now we can extract the predicted values for all our models and add it to our testing dataset.

In [48]:

#Extracting predicted values
lr_predict = lr_mod.predict(forex_data_3_test_X)
lasso_predict = lasso_mod.predict(forex_data_3_test_X)
nn_1_scaled_predict = nn_mod_1_scaled.predict(forex_data_3_test_X)
nn_2_scaled_predict = nn_mod_2_scaled.predict(forex_data_3_test_X)

In [49]:

forex_data_2_test.head()

Out[49]:

	Year	Country Name	GDP growth rate	GNI per capita	Inflation	Lending rate	Forex rate	Net Exports to GDP	Total Reserve to GDP	GDP per mil
15	2015-01-01	Aruba	3.543630	26180.0	0.474764	7.0	0.558659	10.277624	27.531731	2963.128492
16	2016-01-01	Aruba	2.116716	26650.0	-0.931196	6.9	0.558659	11.178386	30.918669	2983.798883
17	2017-01-01	Aruba	5.482371	27720.0	-1.028282	6.3	0.558659	8.646561	29.335673	3092.178771
18	2018-01-01	Aruba	5.257856	29030.0	3.626041	6.0	0.558659	8.975137	30.611363	3202.234637
19	2019-01-01	Aruba	0.635029	30000.0	4.257462	5.7	0.558659	9.547850	29.157277	3368.970253

In [50]:

#Putting the predicted vals in dataset.
df = forex_data_2_test.assign(Linear_Predict = lr_predict)
df = df.assign(Lasso_Predict = lasso_predict)
df = df.assign(NN_1_Predict = nn_1_scaled_predict)
df = df.assign(NN_2_Predict = nn_2_scaled_predict)

In [51]:

df.head()

Out[51]:

	Year	Country Name	GDP growth rate	GNI per capita	Inflation	Lending rate	Forex rate	Net Exports to GDP	Total Reserve to GDP	GDP per mil	Linear_Predict	Lasso_Predict	NN_1_Predict	NN_2_Predict
15	2015-01-01	Aruba	3.543630	26180.0	0.474764	7.0	0.558659	10.277624	27.531731	2963.128492	0.613864	0.618369	1.298603	0.235513
16	2016-01-01	Aruba	2.116716	26650.0	-0.931196	6.9	0.558659	11.178386	30.918669	2983.798883	0.595682	0.625260	0.885949	0.245483
17	2017-01-01	Aruba	5.482371	27720.0	-1.028282	6.3	0.558659	8.646561	29.335673	3092.178771	0.614444	0.640948	0.617596	0.065980
18	2018-01-01	Aruba	5.257856	29030.0	3.626041	6.0	0.558659	8.975137	30.611363	3202.234637	0.686940	0.660154	1.190500	0.478023
19	2019-01-01	Aruba	0.635029	30000.0	4.257462	5.7	0.558659	9.547850	29.157277	3368.970253	0.708518	0.674375	0.500876	0.415586

Now we want to check whether each model predicts the exchange rate is going to go up or down. We do this by taking the difference between the future prdeicted exchange rate (2016, 2017, 2018, 2019 and 2020) and the predicted 2015 exchange rate. We do this for different holding years, 1 year, 2 years and so on.

In [52]:

df_2 = df.copy()

In [53]:

for i in range(1,6):
    grouped = df_2.groupby('Country Name')
    df_2[f"lag_linear_{i}"] = grouped['Linear_Predict'].shift(i)
    df_2[f"lag_lasso_{i}"] = grouped['Lasso_Predict'].shift(i)
    df_2[f"lag_nn_1_{i}"] = grouped['NN_1_Predict'].shift(i)
    df_2[f"lag_nn_2_{i}"] = grouped['NN_2_Predict'].shift(i)

In [54]:

df_4 = df_2.drop(["GDP per mil", "GDP growth rate", "GNI per capita", "Inflation", "Lending rate", "Net Exports to GDP", "Total Reserve to GDP"], axis = 1)

In [55]:

a = ["Linear_Predict", "Lasso_Predict", "NN_1_Predict", "NN_2_Predict"]
b = ["lag_linear", "lag_lasso", "lag_nn_1", "lag_nn_2"]
for (mod, lag) in zip(a,b):
    for i in range(1,6):
        df_4[f"diff_{mod}_{i}"] = df_4[f"{mod}"] - df_4[f"{lag}_{i}"]

In [56]:

df_4.head()

Out[56]:

	Year	Country Name	Forex rate	Linear_Predict	Lasso_Predict	NN_1_Predict	NN_2_Predict	lag_linear_1	lag_lasso_1	lag_nn_1_1	...	diff_NN_1_Predict_1	diff_NN_1_Predict_2	diff_NN_1_Predict_3	diff_NN_1_Predict_4	diff_NN_1_Predict_5	diff_NN_2_Predict_1	diff_NN_2_Predict_2	diff_NN_2_Predict_3	diff_NN_2_Predict_4	diff_NN_2_Predict_5
15	2015-01-01	Aruba	0.558659	0.613864	0.618369	1.298603	0.235513	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
16	2016-01-01	Aruba	0.558659	0.595682	0.625260	0.885949	0.245483	0.613864	0.618369	1.298603	...	-0.412653	NaN	NaN	NaN	NaN	0.009970	NaN	NaN	NaN	NaN
17	2017-01-01	Aruba	0.558659	0.614444	0.640948	0.617596	0.065980	0.595682	0.625260	0.885949	...	-0.268353	-0.681006	NaN	NaN	NaN	-0.179503	-0.169533	NaN	NaN	NaN
18	2018-01-01	Aruba	0.558659	0.686940	0.660154	1.190500	0.478023	0.614444	0.640948	0.617596	...	0.572904	0.304551	-0.108102	NaN	NaN	0.412043	0.232540	0.242510	NaN	NaN
19	2019-01-01	Aruba	0.558659	0.708518	0.674375	0.500876	0.415586	0.686940	0.660154	1.190500	...	-0.689624	-0.116720	-0.385073	-0.797726	NaN	-0.062437	0.349606	0.170102	0.180073	NaN

5 rows × 47 columns

Now we make a different dataframe for each holding period.

In [57]:

c=['Year', 'Country Name', 'Forex rate']
for mod in a:
    d = f"diff_{mod}_1"
    c.append(d)

print(c)

['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']

In [58]:

e=['Year', 'Country Name', 'Forex rate']
for mod in a:
    f = f"diff_{mod}_2"
    e.append(f)

print(e)

['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']

In [59]:

g=['Year', 'Country Name', 'Forex rate']
for mod in a:
    h = f"diff_{mod}_3"
    g.append(h)

print(g)

['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']

In [60]:

i=['Year', 'Country Name', 'Forex rate']
for mod in a:
    j = f"diff_{mod}_4"
    i.append(j)

print(i)

['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']

In [61]:

k=['Year', 'Country Name', 'Forex rate']
for mod in a:
    l = f"diff_{mod}_5"
    k.append(l)

print(k)

['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']

In [62]:

df_5 = df_4[c] #1 yr
df_6 = df_4[e] #2 yrs
df_7 = df_4[g] #3 yrs
df_8 = df_4[i] #4 yrs
df_9 = df_4[k] #5 yrs

In [63]:

df_5 = df_5[df_5['Year'].isin(['2015-01-01', '2016-01-01'])]
df_6 = df_6[df_6['Year'].isin(['2015-01-01', '2017-01-01'])]
df_7 = df_7[df_7['Year'].isin(['2015-01-01', '2018-01-01'])]
df_8 = df_8[df_8['Year'].isin(['2015-01-01', '2019-01-01'])]
df_9 = df_9[df_9['Year'].isin(['2015-01-01', '2020-01-01'])]

In [64]:

df_5.head()

Out[64]:

	Year	Country Name	Forex rate	diff_Linear_Predict_1	diff_Lasso_Predict_1	diff_NN_1_Predict_1	diff_NN_2_Predict_1
15	2015-01-01	Aruba	0.558659	NaN	NaN	NaN	NaN
16	2016-01-01	Aruba	0.558659	-0.018182	0.006891	-0.412653	0.009970
59	2015-01-01	Afghanistan	0.016355	NaN	NaN	NaN	NaN
60	2016-01-01	Afghanistan	0.014735	0.049683	-0.000586	0.047096	0.082432
103	2015-01-01	Angola	0.008329	NaN	NaN	NaN	NaN

Now we define a function which tells us whether to buy or short a currency based on whether the model predicts that the exchange rate is going to be higher or lower.

In [65]:

def buy_or_short(x):
    if x >= 0:
        return "Buy"
    elif x < 0:
        return "Short"
    else:
        return "NaN"

In [66]:

df_10 = df_5.copy()
df_10[['diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']] = df_10[['diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']].applymap(buy_or_short)
df_11 = df_6.copy()
df_11[['diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']] = df_11[['diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']].applymap(buy_or_short)
df_12 = df_7.copy()
df_12[['diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']] = df_12[['diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']].applymap(buy_or_short)
df_13 = df_8.copy()
df_13[['diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']] = df_13[['diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']].applymap(buy_or_short)
df_14 = df_9.copy()
df_14[['diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']] = df_14[['diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']].applymap(buy_or_short)

In [67]:

group = df_10.groupby('Country Name')
df_10['Forex rate lag'] = group['Forex rate'].shift(1)
df_10['Forex diff'] = df_10['Forex rate'] - df_10['Forex rate lag']

In [68]:

group = df_12.groupby('Country Name')
df_12['Forex rate lag'] = group['Forex rate'].shift(1)
df_12['Forex diff'] = df_12['Forex rate'] - df_12['Forex rate lag']

In [69]:

group = df_14.groupby('Country Name')
df_14['Forex rate lag'] = group['Forex rate'].shift(1)
df_14['Forex diff'] = df_14['Forex rate'] - df_14['Forex rate lag']

In [70]:

df_12.head(10)

Out[70]:

	Year	Country Name	Forex rate	diff_Linear_Predict_3	diff_Lasso_Predict_3	diff_NN_1_Predict_3	diff_NN_2_Predict_3	Forex rate lag	Forex diff
15	2015-01-01	Aruba	0.558659	NaN	NaN	NaN	NaN	NaN	NaN
18	2018-01-01	Aruba	0.558659	Buy	Buy	Short	Buy	0.558659	0.000000
59	2015-01-01	Afghanistan	0.016355	NaN	NaN	NaN	NaN	NaN	NaN
103	2015-01-01	Angola	0.008329	NaN	NaN	NaN	NaN	NaN	NaN
106	2018-01-01	Angola	0.003955	Buy	Short	Buy	Short	0.008329	-0.004374
125	2015-01-01	Albania	0.007939	NaN	NaN	NaN	NaN	NaN	NaN
128	2018-01-01	Albania	0.009260	Buy	Buy	Buy	Short	0.007939	0.001321
235	2015-01-01	Armenia	0.002092	NaN	NaN	NaN	NaN	NaN	NaN
238	2018-01-01	Armenia	0.002070	Buy	Buy	Buy	Buy	0.002092	-0.000022
279	2015-01-01	Antigua and Barbuda	0.370370	NaN	NaN	NaN	NaN	NaN	NaN

Now we can calculate the average rate of return for each model for a 1 year, 3 years and 5 years holding period.

In [71]:

#Average rate of return for Linear reg and holding the currency for 1 year when model says to buy

df_linear_1 = df_10[['Year', 'Country Name', 'diff_Linear_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]

df_linear_1_buy = df_linear_1[df_linear_1['diff_Linear_Predict_1'].isin(['Buy'])]

df_linear_1_buy['Return_Rate'] = df_linear_1_buy['Forex diff']/df_linear_1_buy['Forex rate lag'] * 100

linear_1_return = df_linear_1_buy['Return_Rate'].mean()

print(f"The average return for the linear regression model and 1 year holding period is {linear_1_return}")

The average return for the linear regression model and 1 year holding period is -6.682001712313225

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1716736160.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_linear_1_buy['Return_Rate'] = df_linear_1_buy['Forex diff']/df_linear_1_buy['Forex rate lag'] * 100

In [72]:

#Average rate of return for Lasso reg and holding the currency for 1 year when model says to buy

df_lasso_1 = df_10[['Year', 'Country Name', 'diff_Lasso_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]

df_lasso_1_buy = df_lasso_1[df_lasso_1['diff_Lasso_Predict_1'].isin(['Buy'])]

df_lasso_1_buy['Return_Rate'] = df_lasso_1_buy['Forex diff']/df_lasso_1_buy['Forex rate lag'] * 100

lasso_1_return = df_lasso_1_buy['Return_Rate'].mean()

print(f"The average return for the lasso regression model and 1 year holding period is {lasso_1_return}")

The average return for the lasso regression model and 1 year holding period is -2.2038092077826814

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/339817703.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_lasso_1_buy['Return_Rate'] = df_lasso_1_buy['Forex diff']/df_lasso_1_buy['Forex rate lag'] * 100

In [73]:

#Average rate of return for 1st NN and holding the currency for 1 year when model says to buy

df_nn_1_1 = df_10[['Year', 'Country Name', 'diff_NN_1_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]

df_nn_1_1_buy = df_nn_1_1[df_nn_1_1['diff_NN_1_Predict_1'].isin(['Buy'])]

df_nn_1_1_buy['Return_Rate'] = df_nn_1_1_buy['Forex diff']/df_nn_1_1_buy['Forex rate lag'] * 100

nn_1_1_return = df_nn_1_1_buy['Return_Rate'].mean()

print(f"The average return for the first neural network and 1 year holding period is {nn_1_1_return}")

The average return for the first neural network and 1 year holding period is -4.228952725618851

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2819496941.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nn_1_1_buy['Return_Rate'] = df_nn_1_1_buy['Forex diff']/df_nn_1_1_buy['Forex rate lag'] * 100

In [74]:

#Average rate of return for 2nd NN and holding the currency for 1 year when model says to buy

df_nn_2_1 = df_10[['Year', 'Country Name', 'diff_NN_2_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]

df_nn_2_1_buy = df_nn_2_1[df_nn_2_1['diff_NN_2_Predict_1'].isin(['Buy'])]

df_nn_2_1_buy['Return_Rate'] = df_nn_2_1_buy['Forex diff']/df_nn_2_1_buy['Forex rate lag'] * 100

nn_2_1_return = df_nn_2_1_buy['Return_Rate'].mean()

print(f"The average return for the second neural network and 1 year holding period is {nn_2_1_return}")

The average return for the second neural network and 1 year holding period is -5.661574799802205

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2864552994.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nn_2_1_buy['Return_Rate'] = df_nn_2_1_buy['Forex diff']/df_nn_2_1_buy['Forex rate lag'] * 100

In [75]:

#Average rate of return for Linear reg and holding the currency for 3 years when model says to buy

df_linear_3 = df_12[['Year', 'Country Name', 'diff_Linear_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]

df_linear_3_buy = df_linear_3[df_linear_3['diff_Linear_Predict_3'].isin(['Buy'])]

df_linear_3_buy['Return_Rate'] = df_linear_3_buy['Forex diff']/df_linear_3_buy['Forex rate lag'] * 100

linear_3_return = df_linear_3_buy['Return_Rate'].mean()

print(f"The average return for the linear regression model and 3 years holding period is {linear_3_return}")

The average return for the linear regression model and 3 years holding period is -3.979679534791286

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/3889504242.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_linear_3_buy['Return_Rate'] = df_linear_3_buy['Forex diff']/df_linear_3_buy['Forex rate lag'] * 100

In [76]:

#Average rate of return for Lasso reg and holding the currency for 3 years when model says to buy

df_lasso_3 = df_12[['Year', 'Country Name', 'diff_Lasso_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]

df_lasso_3_buy = df_lasso_3[df_lasso_3['diff_Lasso_Predict_3'].isin(['Buy'])]

df_lasso_3_buy['Return_Rate'] = df_lasso_3_buy['Forex diff']/df_lasso_3_buy['Forex rate lag'] * 100

lasso_3_return = df_lasso_3_buy['Return_Rate'].mean()

print(f"The average return for the lasso regression model and 3 years holding period is {lasso_3_return}")

The average return for the lasso regression model and 3 years holding period is -1.8921836297212749

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1695144407.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_lasso_3_buy['Return_Rate'] = df_lasso_3_buy['Forex diff']/df_lasso_3_buy['Forex rate lag'] * 100

In [77]:

#Average rate of return for 1st NN and holding the currency for 3 years when model says to buy

df_nn_1_3 = df_12[['Year', 'Country Name', 'diff_NN_1_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]

df_nn_1_3_buy = df_nn_1_3[df_nn_1_3['diff_NN_1_Predict_3'].isin(['Buy'])]

df_nn_1_3_buy['Return_Rate'] = df_nn_1_3_buy['Forex diff']/df_nn_1_3_buy['Forex rate lag'] * 100

nn_1_3_return = df_nn_1_3_buy['Return_Rate'].mean()

print(f"The average return for the first neural network and 3 years holding period is {nn_1_3_return}")

The average return for the first neural network and 3 years holding period is -5.250983001440971

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2278624619.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nn_1_3_buy['Return_Rate'] = df_nn_1_3_buy['Forex diff']/df_nn_1_3_buy['Forex rate lag'] * 100

In [78]:

#Average rate of return for 2nd NN and holding the currency for 3 years when model says to buy

df_nn_2_3 = df_12[['Year', 'Country Name', 'diff_NN_2_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]

df_nn_2_3_buy = df_nn_2_3[df_nn_2_3['diff_NN_2_Predict_3'].isin(['Buy'])]

df_nn_2_3_buy['Return_Rate'] = df_nn_2_3_buy['Forex diff']/df_nn_2_3_buy['Forex rate lag'] * 100

nn_2_3_return = df_nn_2_3_buy['Return_Rate'].mean()

print(f"The average return for the second neural networl and 3 years holding period is {nn_2_3_return}")

The average return for the second neural networl and 3 years holding period is -5.548735188527842

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2694513607.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nn_2_3_buy['Return_Rate'] = df_nn_2_3_buy['Forex diff']/df_nn_2_3_buy['Forex rate lag'] * 100

In [79]:

#Average rate of return for Linear reg and holding the currency for 5 years when model says to buy

df_linear_5 = df_14[['Year', 'Country Name', 'diff_Linear_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]

df_linear_5_buy = df_linear_5[df_linear_5['diff_Linear_Predict_5'].isin(['Buy'])]

df_linear_5_buy['Return_Rate'] = df_linear_5_buy['Forex diff']/df_linear_5_buy['Forex rate lag'] * 100

linear_5_return = df_linear_5_buy['Return_Rate'].mean()

print(f"The average return for the linear regression model and 5 years holding period is {linear_5_return}")

The average return for the linear regression model and 5 years holding period is -17.218054184651734

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1940798184.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_linear_5_buy['Return_Rate'] = df_linear_5_buy['Forex diff']/df_linear_5_buy['Forex rate lag'] * 100

In [80]:

#Average rate of return for Lasso reg and holding the currency for 5 years when model says to buy

df_lasso_5 = df_14[['Year', 'Country Name', 'diff_Lasso_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]

df_lasso_5_buy = df_lasso_5[df_lasso_5['diff_Lasso_Predict_5'].isin(['Buy'])]

df_lasso_5_buy['Return_Rate'] = df_lasso_5_buy['Forex diff']/df_lasso_5_buy['Forex rate lag'] * 100

lasso_5_return = df_lasso_5_buy['Return_Rate'].mean()

print(f"The average return for the lasso regression model and 5 years holding period is {lasso_5_return}")

The average return for the lasso regression model and 5 years holding period is -5.925652587401351

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/885340572.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_lasso_5_buy['Return_Rate'] = df_lasso_5_buy['Forex diff']/df_lasso_5_buy['Forex rate lag'] * 100

In [81]:

#Average rate of return for 1st NN and holding the currency for 5 years when model says to buy

df_nn_1_5 = df_14[['Year', 'Country Name', 'diff_NN_1_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]

df_nn_1_5_buy = df_nn_1_5[df_nn_1_5['diff_NN_1_Predict_5'].isin(['Buy'])]

df_nn_1_5_buy['Return_Rate'] = df_nn_1_5_buy['Forex diff']/df_nn_1_5_buy['Forex rate lag'] * 100

nn_1_5_return = df_nn_1_5_buy['Return_Rate'].mean()

print(f"The average return for the first neural network and 5 years holding period is {nn_1_5_return}")

The average return for the first neural network and 5 years holding period is -9.570827108863499

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/555516999.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nn_1_5_buy['Return_Rate'] = df_nn_1_5_buy['Forex diff']/df_nn_1_5_buy['Forex rate lag'] * 100

In [82]:

#Average rate of return for 2nd NN and holding the currency for 5 years when model says to buy

df_nn_2_5 = df_14[['Year', 'Country Name', 'diff_NN_2_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]

df_nn_2_5_buy = df_nn_2_5[df_nn_2_5['diff_NN_2_Predict_5'].isin(['Buy'])]

df_nn_2_5_buy['Return_Rate'] = df_nn_2_5_buy['Forex diff']/df_nn_2_5_buy['Forex rate lag'] * 100

nn_2_5_return = df_nn_2_5_buy['Return_Rate'].mean()

print(f"The average return for the second neural network and 5 years holding period is {nn_2_5_return}")

The average return for the second neural network and 5 years holding period is -11.086691753890205

/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1121885410.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nn_2_5_buy['Return_Rate'] = df_nn_2_5_buy['Forex diff']/df_nn_2_5_buy['Forex rate lag'] * 100

Below we can see the plots of these returns.

In [83]:

#Preparing Returns plot
df_return_rates = pd.DataFrame({
    'Model':["Linear Model 1 Years Hold Return", "Lasso Model 1 Years Hold Return", "1st Neural Network 1 Years Hold Return", "2nd Neural Network 1 Years Hold Return",
            "Linear Model 3 Years Hold Return", "Lasso Model 3 Years Hold Return", "1st Neural Network 3 Years Hold Return", "2nd Neural Network 3 Years Hold Return",
            "Linear Model 5 Years Hold Return", "Lasso Model 5 Years Hold Return", "1st Neural Network 5 Years Hold Return", "2nd Neural Network 5 Years Hold Return"],
    'Return Rate': [linear_1_return, lasso_1_return, nn_1_1_return, nn_1_3_return,
                    linear_3_return, lasso_3_return, nn_1_3_return, nn_2_3_return,
                    linear_5_return, lasso_5_return, nn_1_5_return, nn_2_5_return]
})

In [84]:

df_return_rates

Out[84]:

	Model	Return Rate
0	Linear Model 1 Years Hold Return	-6.682002
1	Lasso Model 1 Years Hold Return	-2.203809
2	1st Neural Network 1 Years Hold Return	-4.228953
3	2nd Neural Network 1 Years Hold Return	-5.250983
4	Linear Model 3 Years Hold Return	-3.979680
5	Lasso Model 3 Years Hold Return	-1.892184
6	1st Neural Network 3 Years Hold Return	-5.250983
7	2nd Neural Network 3 Years Hold Return	-5.548735
8	Linear Model 5 Years Hold Return	-17.218054
9	Lasso Model 5 Years Hold Return	-5.925653
10	1st Neural Network 5 Years Hold Return	-9.570827
11	2nd Neural Network 5 Years Hold Return	-11.086692

In [85]:

df_return_rates_1 = df_return_rates.loc[df_return_rates['Model'].isin(["Linear Model 1 Years Hold Return", "Lasso Model 1 Years Hold Return", "1st Neural Network 1 Years Hold Return", "2nd Neural Network 1 Years Hold Return"])]
df_return_rates_3 = df_return_rates.loc[df_return_rates['Model'].isin(["Linear Model 3 Years Hold Return", "Lasso Model 3 Years Hold Return", "1st Neural Network 3 Years Hold Return", "2nd Neural Network 3 Years Hold Return"])]
df_return_rates_5 = df_return_rates.loc[df_return_rates['Model'].isin([ "Linear Model 5 Years Hold Return", "Lasso Model 5 Years Hold Return", "1st Neural Network 5 Years Hold Return", "2nd Neural Network 5 Years Hold Return"])]

In [86]:

#A plot for the rate of returns for the different ml models
# Create a chart object and specify the data source
chart_1 = alt.Chart(df_return_rates_1)
chart_3 = alt.Chart(df_return_rates_3)
chart_5 = alt.Chart(df_return_rates_5)

# Add a bar chart layer to display the average return rates for each model
bars_1 = chart_1.mark_bar().encode(
    y='Model:N',
    x='Return Rate:Q',
    tooltip=[alt.Tooltip('Model:N', title="Model"),
             alt.Tooltip('Return Rate:Q', title="Return Rate")]      
).properties(width=1000, height = 200, title = "1 Year Holding Period Models")

bars_3 = chart_3.mark_bar().encode(
    y='Model:N',
    x='Return Rate:Q',
    tooltip=[alt.Tooltip('Model:N', title="Model"),
             alt.Tooltip('Return Rate:Q', title="Return Rate")]   
).properties(width=1000, height = 200, title = "3 Year Holding Period Models")

bars_5 = chart_5.mark_bar().encode(
    y='Model:N',
    x='Return Rate:Q',
    tooltip=[alt.Tooltip('Model:N', title="Model"),
             alt.Tooltip('Return Rate:Q', title="Return Rate")]
).properties(width=1000, height = 200, title = "5 Year Holding Period Models")

# # Add a text layer to display the return rates as labels on the bars
# labels = bars.mark_text(
#     align='right',
#     baseline='bottom',
#     dy=-2
# ).encode(
#     text='Return Rate'
# )

# Combine the layers into a single chart and display it
alt.vconcat(bars_1, bars_3, bars_5).properties(title="Model Return Comparison")

/Users/uddhav/opt/anaconda3/envs/Py_2/lib/python3.8/site-packages/altair/utils/core.py:283: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  for col_name, dtype in df.dtypes.iteritems():

Out[86]:

As we can see across all holding periods, the lasso regression performs the best. The mediocre performance of the neural networks could be because they are overfitting and therefore performing worse out of sample, we can see this in the mean squared error results too.

We can see from the plots above that the returns are negative for all our machine learning models, while they were positive for the PPP model.

It is hard to predict the exchange rate even in the long run. This project has shown that theory does perform better than machine learning (based on our measuring method) in predicting exchange rates.