Due to the COVID-19 pandemic, the national government decreed the mandatory preventive isolation measure from March 24, 2020, initially the measure went until April 13 but was extended until May 11. As of this date, an "intelligent preventive isolation" began, in which measures were started for the economic reopening. This measure originally went until May 25 but was extended until May 31. However, on May 28, the mandatory preventive isolation measure was extended again from June 1 to August 31, 2020, but maintaining the economic reopening measures.
Throughout the application of the measure, a reduction in the levels of air pollution in Bogotá has been reported in the media and in institutional bulletins. It has even been reported that thanks to the decrease in particulate matter in the atmosphere, the peaks of the Nevado del Ruiz and the Nevado del Tolima have been seen again from Bogotá.
Nevado del Tolima: @GiovannyPulido
Nevado del Ruiz: @salamancayesid
However, has the mandatory preventive isolation measure really helped improve air quality? And if so, how much has air quality improved in Bogotá? To solve these questions, I took the data from all the air quality monitoring stations in Bogotá during the periods from January 1, 2019 to June 30, 2019 and from January 1, 2020 to June 30, 2020 in order to Make an annual comparison on daily averages.
The stations used are:
It should be clarified that not all stations measure the same parameters, so the parameters to be compared are:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.style as style
style.use('fivethirtyeight')
cs19 = pd.read_excel('2019\Carvajal-Sevillana.xlsx', header=0, na_values='----')
cs19.head()
Fecha | PM10 | OZONO | CO | SO2 | NO | NO2 | NOX | PM2.5 | |
---|---|---|---|---|---|---|---|---|---|
0 | 01-01-2019 01:00 | 24.1 | 9.7 | 0.5 | 0.3 | 10.9 | 11.8 | 22.7 | 19.7 |
1 | 01-01-2019 02:00 | 46.7 | 3.5 | 0.9 | 1.3 | 19.0 | 16.9 | 35.9 | 23.2 |
2 | 01-01-2019 03:00 | 98.3 | 2.2 | 0.8 | 0.2 | 10.1 | 18.8 | 28.9 | 37.9 |
3 | 01-01-2019 04:00 | 54.6 | 2.6 | 0.9 | 0.0 | 22.4 | 18.0 | 40.4 | 38.9 |
4 | 01-01-2019 05:00 | 46.1 | 2.0 | 0.7 | NaN | 11.9 | 18.7 | 30.6 | 35.0 |
cs19.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4320 entries, 0 to 4319 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Fecha 4320 non-null object 1 PM10 3784 non-null float64 2 OZONO 4304 non-null float64 3 CO 4015 non-null float64 4 SO2 2700 non-null float64 5 NO 4186 non-null float64 6 NO2 4185 non-null float64 7 NOX 4186 non-null float64 8 PM2.5 4097 non-null float64 dtypes: float64(8), object(1) memory usage: 303.9+ KB
cs19['Fecha'] = cs19['Fecha'].str.replace('24:00', '00:00')
cs19['Fecha'] = pd.to_datetime(cs19.Fecha, format="%d-%m-%Y %H:%M")
cs19['day'] = cs19['Fecha'].dt.day
cs19['month'] = cs19['Fecha'].dt.month
cs19['year'] = cs19['Fecha'].dt.year
cs19['Monthday'] = (cs19['month'] * 100) + cs19['day']
cs19.head()
Fecha | PM10 | OZONO | CO | SO2 | NO | NO2 | NOX | PM2.5 | day | month | year | Monthday | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-01 01:00:00 | 24.1 | 9.7 | 0.5 | 0.3 | 10.9 | 11.8 | 22.7 | 19.7 | 1 | 1 | 2019 | 101 |
1 | 2019-01-01 02:00:00 | 46.7 | 3.5 | 0.9 | 1.3 | 19.0 | 16.9 | 35.9 | 23.2 | 1 | 1 | 2019 | 101 |
2 | 2019-01-01 03:00:00 | 98.3 | 2.2 | 0.8 | 0.2 | 10.1 | 18.8 | 28.9 | 37.9 | 1 | 1 | 2019 | 101 |
3 | 2019-01-01 04:00:00 | 54.6 | 2.6 | 0.9 | 0.0 | 22.4 | 18.0 | 40.4 | 38.9 | 1 | 1 | 2019 | 101 |
4 | 2019-01-01 05:00:00 | 46.1 | 2.0 | 0.7 | NaN | 11.9 | 18.7 | 30.6 | 35.0 | 1 | 1 | 2019 | 101 |
cs19.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4320 entries, 0 to 4319 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Fecha 4320 non-null datetime64[ns] 1 PM10 3784 non-null float64 2 OZONO 4304 non-null float64 3 CO 4015 non-null float64 4 SO2 2700 non-null float64 5 NO 4186 non-null float64 6 NO2 4185 non-null float64 7 NOX 4186 non-null float64 8 PM2.5 4097 non-null float64 9 day 4320 non-null int64 10 month 4320 non-null int64 11 year 4320 non-null int64 12 Monthday 4320 non-null int64 dtypes: datetime64[ns](1), float64(8), int64(4) memory usage: 438.9 KB
Some of the pollutants have NaN values, let's find the percentage of NaN values
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
values = cs19[col].isnull().sum()
length = 4320
perc = round((values / length) * 100, 2)
print(col + ': ' + str(perc) + '% NaN values')
PM10: 12.41% NaN values PM2.5: 5.16% NaN values NO: 3.1% NaN values NOX: 3.1% NaN values NO2: 3.12% NaN values CO: 7.06% NaN values SO2: 37.5% NaN values OZONO: 0.37% NaN values
Once the excel file was transformed into a DataFrame, the empty values indicated with '----' were replaced by NaN, the time was also changed from 24:00 to 00:00 because for pandas the hours range are from 00:00 to 23:00.
A column called monthday was created in order make it easy to graph, the day was extracted from the original date column to group the data and thus obtain an average of 24 hours, since the system of the air quality network the average of 24 hours can appear as empty if there was only one empty data in the day. Similarly, this column will help us organize the graph in a better way.
A column Month was also created in which the month was transformed from number to letters. This will also help organize the charts.
It can be seen that none of the columns of interest have complete data. The most critical case is with SO2 since 37.5% of the data is empty. This is due to problems with the equipment on the monitoring network, the sensor may have been damaged, or the data may not have been sent from the station to the server. Whatever the problem may be, it is an indicator that the stations are not working properly. Also that means that for SO2 the sample isn't representative.
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
fig, ax = plt.subplots(1, 8, figsize=(25, 6))
fig.suptitle('Distribution of pollutants')
for n, value in enumerate(cols):
sns.boxplot(data=cs19[value], ax=ax[n])
ax[n].set_title(value)
I generated a box plot for each pollutant hoping to find a way to fill NaN data. I'm not going to touch the outliers because these might be concentrations that were really high thanks to different anomalies, like wildfires or fires in buildings, wind influence, traffic. And in the end that is what i'm going to compare, the influence of a huge anomaly against normal data.
I decided to use the total mean of each pollutant to fill the NaN data in each pollutant.
Also making a fast analysis about the concentration of each pollutant:
Both PM10 and PM2.5 measure particulate matter under 10 and 2.5 µm, with most of the data close and over the max allowed level it is possible to say that the air quality in Bogotá isn't good, however a better analysis is needed.
I'm not going to make this analysis for every station, just for this one. The final analysis will be like this and i'm also going to explain the impact in health and environment of each pollutant
As i said i'm going to use the mean of each pollutant to fill NaN values
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
mean = round(cs19[col].mean(), 1)
cs19[col].fillna(mean, inplace=True)
cs19.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4320 entries, 0 to 4319 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Fecha 4320 non-null datetime64[ns] 1 PM10 4320 non-null float64 2 OZONO 4320 non-null float64 3 CO 4320 non-null float64 4 SO2 4320 non-null float64 5 NO 4320 non-null float64 6 NO2 4320 non-null float64 7 NOX 4320 non-null float64 8 PM2.5 4320 non-null float64 9 day 4320 non-null int64 10 month 4320 non-null int64 11 year 4320 non-null int64 12 Monthday 4320 non-null int64 dtypes: datetime64[ns](1), float64(8), int64(4) memory usage: 438.9 KB
To make the analysis i decided to calculate the dialy mean value for each polutant in each station. I could have downloaded them from the Air Quality Network but as some hours have NaN values the system doesn't calculate the mean value for that day, so i might have had more NaN values and the sample could haven't been representative.
To calculate the mean value for each day, i'm going to group by Monthday and apply mean as an aggregate function.
After that i'm going to create a Month column and transform it into letters.
cs19 = round(cs19.groupby('Monthday').mean(), 2).reset_index()
cs19.head()
Monthday | PM10 | OZONO | CO | SO2 | NO | NO2 | NOX | PM2.5 | day | month | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | 41.36 | 11.23 | 0.67 | 2.06 | 19.47 | 12.17 | 31.65 | 22.05 | 1 | 1 | 2019 |
1 | 102 | 42.44 | 10.75 | 0.81 | 1.58 | 37.05 | 15.13 | 52.17 | 19.58 | 2 | 1 | 2019 |
2 | 103 | 42.41 | 4.96 | 1.11 | 1.40 | 58.27 | 16.69 | 74.93 | 22.82 | 3 | 1 | 2019 |
3 | 104 | 52.11 | 4.58 | 1.21 | 3.09 | 61.28 | 16.77 | 78.08 | 30.83 | 4 | 1 | 2019 |
4 | 105 | 64.16 | 7.28 | 1.29 | 3.50 | 60.05 | 19.23 | 79.31 | 29.38 | 5 | 1 | 2019 |
cs19['Month'] = (cs19['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
#Remember that i'm only analyzing for the first 6 months of each year
cs19['Month'] = cs19['Month'].map(months)
cs19.head()
Monthday | PM10 | OZONO | CO | SO2 | NO | NO2 | NOX | PM2.5 | day | month | year | Month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | 41.36 | 11.23 | 0.67 | 2.06 | 19.47 | 12.17 | 31.65 | 22.05 | 1 | 1 | 2019 | January |
1 | 102 | 42.44 | 10.75 | 0.81 | 1.58 | 37.05 | 15.13 | 52.17 | 19.58 | 2 | 1 | 2019 | January |
2 | 103 | 42.41 | 4.96 | 1.11 | 1.40 | 58.27 | 16.69 | 74.93 | 22.82 | 3 | 1 | 2019 | January |
3 | 104 | 52.11 | 4.58 | 1.21 | 3.09 | 61.28 | 16.77 | 78.08 | 30.83 | 4 | 1 | 2019 | January |
4 | 105 | 64.16 | 7.28 | 1.29 | 3.50 | 60.05 | 19.23 | 79.31 | 29.38 | 5 | 1 | 2019 | January |
cs19.set_index('day')
Monthday | PM10 | OZONO | CO | SO2 | NO | NO2 | NOX | PM2.5 | month | year | Month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
day | ||||||||||||
1 | 101 | 41.36 | 11.23 | 0.67 | 2.06 | 19.47 | 12.17 | 31.65 | 22.05 | 1 | 2019 | January |
2 | 102 | 42.44 | 10.75 | 0.81 | 1.58 | 37.05 | 15.13 | 52.17 | 19.58 | 1 | 2019 | January |
3 | 103 | 42.41 | 4.96 | 1.11 | 1.40 | 58.27 | 16.69 | 74.93 | 22.82 | 1 | 2019 | January |
4 | 104 | 52.11 | 4.58 | 1.21 | 3.09 | 61.28 | 16.77 | 78.08 | 30.83 | 1 | 2019 | January |
5 | 105 | 64.16 | 7.28 | 1.29 | 3.50 | 60.05 | 19.23 | 79.31 | 29.38 | 1 | 2019 | January |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
25 | 625 | 61.89 | 5.39 | 1.21 | 6.95 | 62.59 | 26.28 | 88.84 | 28.35 | 6 | 2019 | June |
26 | 626 | 67.61 | 6.42 | 1.38 | 3.28 | 75.26 | 23.28 | 98.55 | 32.37 | 6 | 2019 | June |
27 | 627 | 65.70 | 9.25 | 1.29 | 3.61 | 49.66 | 22.26 | 71.10 | 35.72 | 6 | 2019 | June |
28 | 628 | 51.09 | 5.82 | 1.58 | 6.37 | 48.80 | 20.90 | 69.80 | 29.08 | 6 | 2019 | June |
29 | 629 | 48.99 | 4.92 | 1.60 | 6.51 | 48.80 | 20.90 | 69.80 | 28.70 | 6 | 2019 | June |
180 rows × 12 columns
cs20 = pd.read_excel('2020\Carvajal-Sevillana.xlsx', na_values='----')
cs20['Fecha'] = cs20['Fecha'].str.replace('24:00', '00:00')
cs20['Fecha'] = pd.to_datetime(cs20.Fecha, format="%d-%m-%Y %H:%M")
cs20['day'] = cs20['Fecha'].dt.day
cs20['month'] = cs20['Fecha'].dt.month
cs20['year'] = cs20['Fecha'].dt.year
cs20['Monthday'] = (cs20['month'] * 100) + cs20['day']
cs20.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4344 entries, 0 to 4343 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Fecha 4344 non-null datetime64[ns] 1 PM10 4021 non-null float64 2 OZONO 3965 non-null float64 3 CO 3863 non-null float64 4 SO2 4295 non-null float64 5 NO 2644 non-null float64 6 NO2 2643 non-null float64 7 NOX 2644 non-null float64 8 PM2.5 4077 non-null float64 9 day 4344 non-null int64 10 month 4344 non-null int64 11 year 4344 non-null int64 12 Monthday 4344 non-null int64 dtypes: datetime64[ns](1), float64(8), int64(4) memory usage: 441.3 KB
In the dataset for the 2020 data from the Carvajal-Sevillana station we find similar problems than the 2019 data, some pollutants have several NaN values that affect the sample. Let's see the percentage of NaN data per pollutant
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
values = cs20[col].isnull().sum()
length = 4344
perc = round((values / length) * 100, 2)
print(col + ': ' + str(perc) + '% NaN values')
PM10: 7.44% NaN values PM2.5: 6.15% NaN values NO: 39.13% NaN values NOX: 39.13% NaN values NO2: 39.16% NaN values CO: 11.07% NaN values SO2: 1.13% NaN values OZONO: 8.72% NaN values
The 2020 data is worse than i thought and is worse than 2019 data. The percentage of NaN values increased, not only in SO2, wich actually decreased, but in other pollutants like NO, NOX, NO2, CO. Again i will fill the NaN data with the mean of each pollutant.
There might been different explanations for this increase besides failures and maintenance. 2020 is the beginning of the new Mayor's 4 year period, and there might have been politics involved behind this, but this is just a thinking
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
mean = round(cs19[col].mean(), 1)
cs20[col].fillna(mean, inplace=True)
cs20.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4344 entries, 0 to 4343 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Fecha 4344 non-null datetime64[ns] 1 PM10 4344 non-null float64 2 OZONO 4344 non-null float64 3 CO 4344 non-null float64 4 SO2 4344 non-null float64 5 NO 4344 non-null float64 6 NO2 4344 non-null float64 7 NOX 4344 non-null float64 8 PM2.5 4344 non-null float64 9 day 4344 non-null int64 10 month 4344 non-null int64 11 year 4344 non-null int64 12 Monthday 4344 non-null int64 dtypes: datetime64[ns](1), float64(8), int64(4) memory usage: 441.3 KB
cs20 = round(cs20.groupby('Monthday').mean(), 2).reset_index()
cs20['Month'] = (cs20['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
cs20['Month'] = cs20['Month'].map(months)
cs20.head()
Monthday | PM10 | OZONO | CO | SO2 | NO | NO2 | NOX | PM2.5 | day | month | year | Month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | 42.09 | 16.60 | 0.58 | 1.48 | 48.8 | 20.9 | 69.8 | 37.42 | 1 | 1 | 2020 | January |
1 | 102 | 34.49 | 16.87 | 0.60 | 1.50 | 48.8 | 20.9 | 69.8 | 21.43 | 2 | 1 | 2020 | January |
2 | 103 | 45.45 | 16.27 | 0.98 | 3.08 | 48.8 | 20.9 | 69.8 | 26.83 | 3 | 1 | 2020 | January |
3 | 104 | 51.84 | 21.17 | 0.90 | 4.13 | 48.8 | 20.9 | 69.8 | 32.92 | 4 | 1 | 2020 | January |
4 | 105 | 29.48 | 21.40 | 0.48 | 1.59 | 48.8 | 20.9 | 69.8 | 16.38 | 5 | 1 | 2020 | January |
After cleaning both datasets
fig = plt.figure(figsize=(15,20))
fig.suptitle('PM10 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM10'], label='2019')
ax1.plot(jan20['day'], jan20['PM10'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM10'], label='2019')
ax2.plot(feb20['day'], feb20['PM10'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM10'], label='2019')
ax3.plot(mar20['day'], mar20['PM10'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM10'], label='2019')
ax4.plot(apr20['day'], apr20['PM10'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM10'], label='2019')
ax5.plot(may20['day'], may20['PM10'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM10'], label='2019')
ax6.plot(jun20['day'], jun20['PM10'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe541edc8>
In the months before the declaration of mandatory preventive isolation PM10 levels had similar behaviour with some exceptions, at the end of January and beginning of February the 2019 levels were higher. A search about events during this period of time showed that in those days there was an alert in air quality because of wildfires.
At the beginning of the mandatory preventive isolation the levels are almost the same, but for almost a month the levels decreased, but there were few days where the levels were higher, these days coincide with the first measures of economic aperture. Since the intelligent isolation the levels have an up and down behavior but are lower than last years levels.
June have almost the same behavior with the exception of the period between June 22nd and 26th, during these days the Sahara Dust moved through the region wich caused an increase in the levels of particulate matter
fig = plt.figure(figsize=(15,20))
fig.suptitle('PM2.5 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM2.5'], label='2019')
ax1.plot(jan20['day'], jan20['PM2.5'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM2.5'], label='2019')
ax2.plot(feb20['day'], feb20['PM2.5'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM2.5'], label='2019')
ax3.plot(mar20['day'], mar20['PM2.5'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM2.5'], label='2019')
ax4.plot(apr20['day'], apr20['PM2.5'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM2.5'], label='2019')
ax5.plot(may20['day'], may20['PM2.5'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM2.5'], label='2019')
ax6.plot(jun20['day'], jun20['PM2.5'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe5020e48>
PM2.5 behaves similary than PM10, 2020 values tend to be lower with some days were the levels were higher. However since the declaration of isolation the levels decreased considerably. This happens because most of the sources of this particulates are vehicles emissions, fuel combustion, dust, and other sources. Again the levels increase considerably during the movement of the Sahara Dust.
There's a flat line for almost 10 days in June, this is problably thanks to a high concentration of NaN data that were filled with mean values
fig = plt.figure(figsize=(15,20))
fig.suptitle('Ozone 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['OZONO'], label='2019')
ax1.plot(jan20['day'], jan20['OZONO'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['OZONO'], label='2019')
ax2.plot(feb20['day'], feb20['OZONO'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['OZONO'], label='2019')
ax3.plot(mar20['day'], mar20['OZONO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['OZONO'], label='2019')
ax4.plot(apr20['day'], apr20['OZONO'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['OZONO'], label='2019')
ax5.plot(may20['day'], may20['OZONO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['OZONO'], label='2019')
ax6.plot(jun20['day'], jun20['OZONO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe7276fc8>
Ozone is a secondary pollutant, that means that this one isn't emitted by vehicles or fuel consumption, it is formed through photochemical reactions in the atmosphere between primary pollutants (NOX, SOX, VOCs) and light.
There are high levels of ozone during hot days with low humidity, when the wind is light or the air is stagnant. If we look at the levels between mandatory isolation and intelligent isolation, the levels were higher than last year, this because there wasn't enough traffic in the city and the air was stagnant, also during those days the weather was hot, after intelligent isolation, wich is actually an economic aperture, the levels are slightly higher. There's traffic but not the same traffic than a normal day without pandemic
Note: COVs are volatile organic compounds
fig = plt.figure(figsize=(15,20))
fig.suptitle('CO 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['CO'], label='2019')
ax1.plot(jan20['day'], jan20['CO'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['CO'], label='2019')
ax2.plot(feb20['day'], feb20['CO'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['CO'], label='2019')
ax3.plot(mar20['day'], mar20['CO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['CO'], label='2019')
ax4.plot(apr20['day'], apr20['CO'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['CO'], label='2019')
ax5.plot(may20['day'], may20['CO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['CO'], label='2019')
ax6.plot(jun20['day'], jun20['CO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe8034e88>
Carbon Monoxide is produced from the incomplete combustion of carbon containing compounds when there isn't enough oxigen to generate CO2, like a combustion chamber in an engine. The levels are almost the same before the isolation measure, then decreased and after the intelligent isolation are slightly lower, again there wasn't traffy during isolation and after the intelligent isolation the traffic isn't the same as a normal day
fig = plt.figure(figsize=(15,20))
fig.suptitle('SO2 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['SO2'], label='2019')
ax1.plot(jan20['day'], jan20['SO2'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['SO2'], label='2019')
ax2.plot(feb20['day'], feb20['SO2'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['SO2'], label='2019')
ax3.plot(mar20['day'], mar20['SO2'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['SO2'], label='2019')
ax4.plot(apr20['day'], apr20['SO2'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['SO2'], label='2019')
ax5.plot(may20['day'], may20['SO2'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['SO2'], label='2019')
ax6.plot(jun20['day'], jun20['SO2'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe677a788>
Sulfur Dioxide is liberated in carbon, oil, diesel or gas combustion. Most of the vehicular fleet in Bogotá use petrol as fuel, the exceptions are public transport, cargo vehicles and some cars like pickups. Bogotá, like most of the cities in the world, has regulated the transit of cargo vehicles through the city during the day, limiting the transit to certain hours to reduce the environmental and traffic impacts.
The levels reduced even before the isolation measure but i can't say more because this pollutant wasn't measure during most of the time in 2019.
fig = plt.figure(figsize=(15,20))
fig.suptitle('NO 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NO'], label='2019')
ax1.plot(jan20['day'], jan20['NO'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NO'], label='2019')
ax2.plot(feb20['day'], feb20['NO'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NO'], label='2019')
ax3.plot(mar20['day'], mar20['NO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NO'], label='2019')
ax4.plot(apr20['day'], apr20['NO'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NO'], label='2019')
ax5.plot(may20['day'], may20['NO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NO'], label='2019')
ax6.plot(jun20['day'], jun20['NO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe6a208c8>
Before the isolation measure, at least in January, the levels were similar for part of the month, then we have uncertainty in the data. Between obligatory isolation and intelligent isolation the levels were lower. Since the declaratory of Intelligent isolation the levels have had an slightly increase but the majority are lower than last year.
This compound is generated by fuel combustion in traffic and cargo vehicles.
fig = plt.figure(figsize=(15,20))
fig.suptitle('NOx 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NOX'], label='2019')
ax1.plot(jan20['day'], jan20['NOX'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NOX'], label='2019')
ax2.plot(feb20['day'], feb20['NOX'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NOX'], label='2019')
ax3.plot(mar20['day'], mar20['NOX'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NOX'], label='2019')
ax4.plot(apr20['day'], apr20['NOX'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NOX'], label='2019')
ax5.plot(may20['day'], may20['NOX'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NOX'], label='2019')
ax6.plot(jun20['day'], jun20['NOX'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fea72cc88>
NOx is a way to measure both NO and NO2, fuel cause 54% of the emissions of this pollutant are generated by transportation fuels. In January the levels are slightly high this year than last year, Between the obligatory isolation and intelligent isolation the levels were lower although there were days were the levels were similar. This trend continued after the intelligent isolation
fig = plt.figure(figsize=(15,20))
fig.suptitle('NO2 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NO2'], label='2019')
ax1.plot(jan20['day'], jan20['NO2'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NO2'], label='2019')
ax2.plot(feb20['day'], feb20['NO2'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NO2'], label='2019')
ax3.plot(mar20['day'], mar20['NO2'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NO2'], label='2019')
ax4.plot(apr20['day'], apr20['NO2'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NO2'], label='2019')
ax5.plot(may20['day'], may20['NO2'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NO2'], label='2019')
ax6.plot(jun20['day'], jun20['NO2'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
<matplotlib.legend.Legend at 0x20fe819fec8>
This pollutant is the product of the oxidation of NO, it's also an enhancer of particulate matter, especially PM2.5. When this pollutant react with UV rays it's a precursor of Ozone. In January there were high concentrations of NO2, just the same as NO, and the values were higher than 2019. Then we have a period of uncertainty in the data. Between obligatory isolation and intelligent isolation the levels were lower, and some times had the same behaviour than 2019. After the declaratory of intelligent isolation there have been an up and down effect. This might be because the reaction of NO into NO2 take time, and when the NO2 reacts with UV light then the levels of Ozone grow like we saw in the Ozone graph
Air quality is important for human life, the pandemic of COVID-19 (a respiratory disease) have forced us to take very restrictive measures like obligatory isolation for months to flatten the curve of contagion. This have impacts in different aspects like health, economy and the environment.
Here we found that most of the criteria pollutans have had lower levels than last year during the period of obligatory isolation, with the exception of Ozone wich has increased mostly because the air has been stagant because the lack or decrease of traffic in the city. This pollutant can have impacts in health like cough, asma, bronchitis. In high exposure can produce permanent lung damage.
This might be a huge problem specially in low income areas where the air pollution is high like the area in wich this station is located.
def cleaning_function(df):
#Transform dates
df['Fecha'] = df['Fecha'].str.replace('24:00', '00:00')
df['Fecha'] = pd.to_datetime(df.Fecha, format="%d-%m-%Y %H:%M")
df['day'] = df['Fecha'].dt.day
df['month'] = df['Fecha'].dt.month
df['year'] = df['Fecha'].dt.year
df['Monthday'] = (df['month'] * 100) + df['day']
#Filling NaN
cols = df.columns
for col in cols:
mean = df[col].mean()
df[col].fillna(mean, inplace=True)
#Grouping by
df = df.groupby('Monthday').mean().reset_index()
#Monthday Col and index as day
df['Month'] = (df['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
df['Month'] = df['Month'].map(months)
Another thing that i should make is find the days with levels over the maximum allowed level for both last year and this year and compare them
I think a better way to plot the concentration vs time is using the dates from the system. To do that i might need to extract and transform the day and month from the date column and put them in other column in a datetime type. But i don't know how to do it