USD - EUR: Exchange Rates 1999 - 2021

Data Set - About

The data set was downloaded from Kaggle on January 2021. It was put together by Daria Chemkaeva using data from the European Central Bank.

The data set is a collection of historic exchange rates from 1999-2021 of the EUR against other currencies, including the USD.

USD = USD exchange rate * EUR

Data Set - First Peek

Let's see what we are working with here!

\'Period\Unit:\' column:

  • could be used as the row index
  • contains a date in format YYYY-MM-DD
  • values are stored as strings

\'[US dollar ]\' column:

  • format is f.loat (ie. 1.2345)
  • has no missing values
  • values are stored as strings.
In [1]:
import pandas as pd
x_rates = pd.read_csv("euro-daily-hist_1999_2020.csv")
x_rates.head()
Out[1]:
Period\Unit: [Australian dollar ] [Bulgarian lev ] [Brazilian real ] [Canadian dollar ] [Swiss franc ] [Chinese yuan renminbi ] [Cypriot pound ] [Czech koruna ] [Danish krone ] ... [Romanian leu ] [Russian rouble ] [Swedish krona ] [Singapore dollar ] [Slovenian tolar ] [Slovak koruna ] [Thai baht ] [Turkish lira ] [US dollar ] [South African rand ]
0 2021-01-08 1.5758 1.9558 6.5748 1.5543 1.0827 7.9184 NaN 26.163 7.4369 ... 4.8708 90.8000 10.0510 1.6228 NaN NaN 36.8480 9.0146 1.2250 18.7212
1 2021-01-07 1.5836 1.9558 6.5172 1.5601 1.0833 7.9392 NaN 26.147 7.4392 ... 4.8712 91.2000 10.0575 1.6253 NaN NaN 36.8590 8.9987 1.2276 18.7919
2 2021-01-06 1.5824 1.9558 6.5119 1.5640 1.0821 7.9653 NaN 26.145 7.4393 ... 4.8720 90.8175 10.0653 1.6246 NaN NaN 36.9210 9.0554 1.2338 18.5123
3 2021-01-05 1.5927 1.9558 6.5517 1.5651 1.0803 7.9315 NaN 26.227 7.4387 ... 4.8721 91.6715 10.0570 1.6180 NaN NaN 36.7760 9.0694 1.2271 18.4194
4 2021-01-04 1.5928 1.9558 6.3241 1.5621 1.0811 7.9484 NaN 26.141 7.4379 ... 4.8713 90.3420 10.0895 1.6198 NaN NaN 36.7280 9.0579 1.2296 17.9214

5 rows × 41 columns

In [2]:
x_rates.tail()
Out[2]:
Period\Unit: [Australian dollar ] [Bulgarian lev ] [Brazilian real ] [Canadian dollar ] [Swiss franc ] [Chinese yuan renminbi ] [Cypriot pound ] [Czech koruna ] [Danish krone ] ... [Romanian leu ] [Russian rouble ] [Swedish krona ] [Singapore dollar ] [Slovenian tolar ] [Slovak koruna ] [Thai baht ] [Turkish lira ] [US dollar ] [South African rand ]
5694 1999-01-08 1.8406 NaN NaN 1.7643 1.6138 NaN 0.58187 34.938 7.4433 ... 1.3143 27.2075 9.1650 1.9537 188.8400 42.560 42.5590 0.3718 1.1659 6.7855
5695 1999-01-07 1.8474 NaN NaN 1.7602 1.6165 NaN 0.58187 34.886 7.4431 ... 1.3092 26.9876 9.1800 1.9436 188.8000 42.765 42.1678 0.3701 1.1632 6.8283
5696 1999-01-06 1.8820 NaN NaN 1.7711 1.6116 NaN 0.58200 34.850 7.4452 ... 1.3168 27.4315 9.3050 1.9699 188.7000 42.778 42.6949 0.3722 1.1743 6.7307
5697 1999-01-05 1.8944 NaN NaN 1.7965 1.6123 NaN 0.58230 34.917 7.4495 ... 1.3168 26.5876 9.4025 1.9655 188.7750 42.848 42.5048 0.3728 1.1790 6.7975
5698 1999-01-04 1.9100 NaN NaN 1.8004 1.6168 NaN 0.58231 35.107 7.4501 ... 1.3111 25.2875 9.4696 1.9554 189.0450 42.991 42.6799 0.3723 1.1789 6.9358

5 rows × 41 columns

In [3]:
x_rates.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5699 entries, 0 to 5698
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Period\Unit:              5699 non-null   object 
 1   [Australian dollar ]      5699 non-null   object 
 2   [Bulgarian lev ]          5297 non-null   object 
 3   [Brazilian real ]         5431 non-null   object 
 4   [Canadian dollar ]        5699 non-null   object 
 5   [Swiss franc ]            5699 non-null   object 
 6   [Chinese yuan renminbi ]  5431 non-null   object 
 7   [Cypriot pound ]          2346 non-null   object 
 8   [Czech koruna ]           5699 non-null   object 
 9   [Danish krone ]           5699 non-null   object 
 10  [Estonian kroon ]         3130 non-null   object 
 11  [UK pound sterling ]      5699 non-null   object 
 12  [Greek drachma ]          520 non-null    object 
 13  [Hong Kong dollar ]       5699 non-null   object 
 14  [Croatian kuna ]          5431 non-null   object 
 15  [Hungarian forint ]       5699 non-null   object 
 16  [Indonesian rupiah ]      5699 non-null   object 
 17  [Israeli shekel ]         5431 non-null   object 
 18  [Indian rupee ]           5431 non-null   object 
 19  [Iceland krona ]          3292 non-null   float64
 20  [Japanese yen ]           5699 non-null   object 
 21  [Korean won ]             5699 non-null   object 
 22  [Lithuanian litas ]       4159 non-null   object 
 23  [Latvian lats ]           3904 non-null   object 
 24  [Maltese lira ]           2346 non-null   object 
 25  [Mexican peso ]           5699 non-null   object 
 26  [Malaysian ringgit ]      5699 non-null   object 
 27  [Norwegian krone ]        5699 non-null   object 
 28  [New Zealand dollar ]     5699 non-null   object 
 29  [Philippine peso ]        5699 non-null   object 
 30  [Polish zloty ]           5699 non-null   object 
 31  [Romanian leu ]           5637 non-null   float64
 32  [Russian rouble ]         5699 non-null   object 
 33  [Swedish krona ]          5699 non-null   object 
 34  [Singapore dollar ]       5699 non-null   object 
 35  [Slovenian tolar ]        2085 non-null   object 
 36  [Slovak koruna ]          2608 non-null   object 
 37  [Thai baht ]              5699 non-null   object 
 38  [Turkish lira ]           5637 non-null   float64
 39  [US dollar ]              5699 non-null   object 
 40  [South African rand ]     5699 non-null   object 
dtypes: float64(3), object(38)
memory usage: 1.8+ MB

Data Set - Clean Up 1

To be able to work the the data more easily I'm going to change a a few column names and convert some values into different data types.

time column will contain datetime values

In [4]:
# rename the columns - I prefer all lowercase and snakecase
new_col_names={'[US dollar ]':'us_dollar',
              'Period\\Unit:':'time'}
x_rates=x_rates.rename(columns=new_col_names)

#convert time string to datetime
x_rates['time'] = pd.to_datetime(x_rates['time'])

#sort the rows based on the time and re-index from the top
x_rates.sort_values('time', inplace=True)
x_rates.reset_index(drop=True, inplace=True)

Data Set - Clean Up 2

I want to make changes so that:

us_dollar column will contain float values

Some rows have '-' as a value which need to be removed from the data set.

I noticed the rows had '-' values because of the error traceback when trying to convert the entire column to float.

I checked for placeholder values using value_count() of the us_dollar series and found 62 occurances.

Another good check, though, is the describe method. In the end I only included this check because it resulted in a more descriptive before and after column comparision.

In [5]:
# check for dummy data: '-' used for missing values
print("USD column values BEFORE")
print(x_rates['us_dollar'].describe())
print()

# remove the dummy data and create a dataframe of just time and us_dollar
x_usd = x_rates.loc[x_rates['us_dollar'] != '-', ["time", "us_dollar"]].copy()
x_usd['us_dollar'] = x_usd['us_dollar'].astype(float)

#check the result
print("USD column values AFTER")
print(x_usd['us_dollar'].describe())
USD column values BEFORE
count     5699
unique    3528
top          -
freq        62
Name: us_dollar, dtype: object

USD column values AFTER
count    5637.000000
mean        1.199704
std         0.162328
min         0.825200
25%         1.098200
50%         1.202300
75%         1.320300
max         1.599000
Name: us_dollar, dtype: float64

Disclaimer on Variable Names

Because I tried to work ahead of the guidance I ended up with different variable names.

I am using my names if they are accurately descriptive and shorter ("Thank you!" -my wrists).

In this analysis x_usd is equivalent to dataframe euro_to_dollar in the instructions

Data Visualizations

Initial Plot Line

To start I will plot the us_dollar exchange rate against each time value (date).

In [6]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(x_usd['time'], x_usd['us_dollar'])
plt.show()

Plot Line of the Rolling Mean

Plotting each of the data points results in a rugged plot line. To smooth the plot line and increase the data-ink ratio, I will apply a rolling mean of 30 days. This will facilitate longer-term trends to emerge from the daily fluctuations.

In [7]:
x_usd['rolling_mean'] = x_usd['us_dollar'].rolling(30).mean()

plt.plot(x_usd['time'], x_usd['rolling_mean'])
plt.show()

USD - EUR vs CAD - EUR Exchange Rate Comparision

Since the dataset includes multiple currencies, I would like to compare the historical USD-EUR exchange rates with CAD-EUR and visualize how closely the CAD performace follows the USD performance.

This is the focus of the data story I am interested in sharing insights about.

CAD - EUR Data Prep

This dataset will be prepared similarly to the USD data set.

In [8]:
new_col_names={'[Canadian dollar ]':'ca_dollar'}
x_rates=x_rates.rename(columns=new_col_names)

print("CAD column values BEFORE")
print(x_rates['ca_dollar'].describe())
print()

# remove the dummy data and create a dataframe of just time and us_dollar
x_cad = x_rates.loc[x_rates['ca_dollar'] != '-', ["time", "ca_dollar"]].copy()
x_cad['ca_dollar'] = x_cad['ca_dollar'].astype(float)

#heck the result
print("CAD column values AFTER")
print(x_cad['ca_dollar'].describe())
CAD column values BEFORE
count     5699
unique    2931
top          -
freq        62
Name: ca_dollar, dtype: object

CAD column values AFTER
count    5637.000000
mean        1.469395
std         0.099561
min         1.213900
25%         1.398700
50%         1.467600
75%         1.545300
max         1.812300
Name: ca_dollar, dtype: float64

CAD - EUR Rolling Mean Plot Line

The initial plot line of the CAD exchange rate rolling mean has a different shape than the USD plot. With the exception of a matching dip in 2002, no clear correlation is obvious.

In [9]:
x_cad['rolling_mean'] = x_cad['ca_dollar'].rolling(30).mean()

plt.plot(x_cad['time'], x_cad['rolling_mean'])
plt.show()

USD - and CAD - EUR Exchange Rates Plotted Together

Initial focus is to compare the shape of each plot. To visualize this I'm going to superimpose the separate plots on the same plot.

In [10]:
plt.plot(x_usd['time'],x_usd['rolling_mean'], label="USD", color='green')
plt.plot(x_cad['time'],x_cad['rolling_mean'], label='CAD', color='red')
plt.legend(title="Destination\nCurrency", loc='lower center', bbox_to_anchor=(0.515, 0.05))
plt.xlabel("Time")
plt.ylabel("Exchange Rate From EUR")
plt.title("On Par EUR to USD and CAD Exchange Rates 2008-2012")

#x coordinate needs to be a datetime object to match x-axis values
# the rest of values came by fiddling
plt.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
plt.show()

Gestalt Principles on Axes Plots

OK well that was fun but now I need to redo it all as an axes to be able to remove the frame / border

In [11]:
fig, (ax, ax2) = plt.subplots(figsize=(6,8), nrows=2, ncols=1)

ax.plot(x_usd['time'],x_usd['rolling_mean'], label="USD", color='green')
ax.plot(x_cad['time'],x_cad['rolling_mean'], label='CAD', color='red')
ax.legend(title="Destination\nCurrency", loc='lower center', bbox_to_anchor=(0.515, 0.05), frameon=False, title_fontsize="small")

remove = ["top", "right"]
for side in remove :
    ax.spines[side].set_visible(False)

# failed attempt at changing the date ticks ... 
# abandoned due to lack of need
# year_ticks = [pd.to_datetime("2000-01-01"), 
#               pd.to_datetime("2004-01-01"), 
#               pd.to_datetime("2008-01-01"), 
#               pd.to_datetime("2012-01-01"), 
#               pd.to_datetime("2016-01-01"), 
#               pd.to_datetime("2012-01-01")]
#ax.set_xticks(year_ticks)

ax.set_xlabel("Time")
ax.set_ylabel("Exchange Rate From EUR")
#x coordinate needs to be a datetime object to match x-axis values
# the rest of values came by fiddling
ax.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
ax.set_title("On Par EUR to USD and CAD Exchange Rates 2008-2013")
Out[11]:
Text(0.5, 1.0, 'On Par EUR to USD and CAD Exchange Rates 2008-2013')

Examining Plot Shapes

I was originally comparing shape when I discovered this anomaly of EUR-CAD rate being on par with the EUR-USD rate.

To look at the shape similarities outside of the abnormal 2008-2012 period, I will re-baseline the CAD values so that the lowest CAD rate matches the lowest USD rate.

I am doing this in a very simple way - I'm taking the difference between the minimum values of the USD and CAD rates and adding it to all the CAD rates. The result will be all the CAD rate values being shifted up.

The overlapping plots may reveal something interesting about the usual correlation between these EUR exchange rate fluctuations.

In [12]:
min_usd = x_usd["rolling_mean"].min()
min_cad = x_cad["rolling_mean"].min()
#normally cad return is higher than usd return when changing euros
min_diff = round(min_cad-min_usd,2)
x_cad["baselined"] = x_cad["rolling_mean"] - min_diff
x_cad["baselined"].describe()
Out[12]:
count    5608.000000
mean        1.088403
std         0.096622
min         0.851237
25%         1.018259
50%         1.086870
75%         1.166223
max         1.360690
Name: baselined, dtype: float64
In [13]:
x_usd["rolling_mean"].describe()
Out[13]:
count    5608.000000
mean        1.199748
std         0.161895
min         0.849797
25%         1.101631
50%         1.204373
75%         1.319978
max         1.574333
Name: rolling_mean, dtype: float64

Create Subsets for Plotting

In [14]:
#start1 = x_usd['time'].iloc[0]
end1 = pd.to_datetime("2007-01-01")
start2 = pd.to_datetime("2014-01-01")
#end2 = x_usd['time'].iloc[-1]

#first section
sub_usd1 = x_usd.copy()
sub_usd1 = sub_usd1[sub_usd1['time'] <= end1]
sub_cad1 = x_cad.copy()
sub_cad1 = sub_cad1[sub_cad1['time'] <= end1]

#last section
sub_usd2 = x_usd.copy()
sub_usd2 = sub_usd2[sub_usd2['time'] >= start2]
sub_cad2 = x_cad.copy()
sub_cad2 = sub_cad2[sub_cad2['time'] >= start2]

#middle section
sub_usd3 = x_usd.copy()
sub_usd3 = sub_usd3[sub_usd3['time'] >= end1]
sub_usd3 = sub_usd3[sub_usd3['time'] <= start2]
sub_cad3 = x_cad.copy()
sub_cad3 = sub_cad3[sub_cad3['time'] >= end1]
sub_cad3 = sub_cad3[sub_cad3['time'] <= start2]
In [17]:
fig, (ax, ax2) = plt.subplots(figsize=(6,8), nrows=2, ncols=1)

ax.plot(x_usd['time'],x_usd['rolling_mean'], 
#        label="USD", 
        color='green', alpha=0.2)
ax.plot(x_cad['time'],x_cad['rolling_mean'], 
#        label='CAD', 
        color='red', alpha=0.2)

ax.plot(sub_cad3['time'], sub_cad3['rolling_mean'],
         label="CAD", 
         color='red', alpha=0.8)

ax.plot(sub_usd3['time'], sub_usd3['rolling_mean'], 
         label="USD", 
         color='green', alpha=0.8)

ax.legend(title="Destination\nCurrency", 
          loc='lower center', 
          bbox_to_anchor=(0.515, 0.05), 
          #frameon=False, 
          framealpha=0.8, 
          title_fontsize="small")

remove = ["top", "right"]
for side in remove :
    ax.spines[side].set_visible(False)

#ax.set_xlabel("Time")
ax.set_ylabel("Exchange Rate From EUR")

#ax.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
ax.set_title("On Par EUR to USD and CAD Exchange Rates 2008-2013")

ax2.plot(x_usd['time'],x_usd['rolling_mean'], 
         color='green', alpha=0.2)
ax2.plot(x_cad['time'],x_cad['baselined'], 
         color='red', alpha=0.2)

ax2.plot(sub_usd1['time'], sub_usd1['rolling_mean'], 
         label="USD", 
         color='green', alpha=0.8)
ax2.plot(sub_usd2['time'], sub_usd2['rolling_mean'], 
         color='green', alpha=0.8)

ax2.plot(sub_cad1['time'], sub_cad1['baselined'],
         label="CAD Baselined", 
         color='red', alpha=0.8)
ax2.plot(sub_cad2['time'], sub_cad2['baselined'], 
         color='red', alpha=0.8)

ax2.legend(title="Destination\nCurrency", 
           bbox_to_anchor=(0.515, 0.85), 
           loc="upper center",
           title_fontsize="small", 
           facecolor="white", 
           framealpha=0.8, 
           fontsize="small")

remove = ["top", "right"]
for side in remove :
    ax2.spines[side].set_visible(False)

ax2.set_xlabel("Time")
ax2.set_ylabel("Exchange Rate From EUR")
#ax2.axvline(x=pd.to_datetime("2010-06-01"), 
#            ymin=0.05, ymax=.85, 
#            linewidth=94, 
#            color='grey', alpha=0.2)
ax2.set_title("USD and CAD Exchange Rates Otherwise Trend Together")
ax2.set_yticks([.9,1.1,1.3,1.5])
ax2.set_yticklabels([])

plt.plot()
Out[17]:
[]

Conclusions

The EUR - USD rate is typically stronger than the EUR - CAD rate. Nevertheless, the EUR - CAD rate tends to fluctuate in parallel with the EUR - USD. This was visually depicted by moving the entire EUR-CAD plot up to superimpose it over the EUR - USD exchange rates.

An exception to their fluctuations being just in parallel occurred in 2008 when the US Housing Market crashed and during the restabiliztion years after.

For this period the EUR to USD rate significantly dropped to nearly match the EUR to CAD rate with the exception of a short-lived strong rebound in the immediate wake of the crisis.

Areas for Improvement

  1. Improve application of Gestalt principles, particularly whether enclosure is needed to isolate the area of interest on the graphs
  2. Figure out why ax.text() method would prevent output and replace .title() with a title using ax.text() method
  3. Add space between the 2 plots
  4. Reconsider how to baseline the CAD data against the USD. Might involve creating USD baseline values
  5. Cut the top off the 2nd plot where only faint points reach these values. There is too much dead space there right now.