Due to the COVID-19 pandemic, the national government decreed the mandatory preventive isolation measure from March 24, 2020, initially the measure went until April 13 but was extended until May 11. As of this date, an "intelligent preventive isolation" began, in which measures were started for the economic reopening. This measure originally went until May 25 but was extended until May 31. However, on May 28, the mandatory preventive isolation measure was extended again from June 1 to August 31, 2020, but maintaining the economic reopening measures.
Throughout the application of the measure, a reduction in the levels of air pollution in Bogotá has been reported in the media and in institutional bulletins. It has even been reported that thanks to the decrease in particulate matter in the atmosphere, the peaks of the Nevado del Ruiz and the Nevado del Tolima have been seen again from Bogotá.
Nevado del Tolima: @GiovannyPulido
Nevado del Ruiz: @salamancayesid
However, has the mandatory preventive isolation measure really helped improve air quality? And if so, how much has air quality improved in Bogotá? To solve these questions, I took the data from all the air quality monitoring stations in Bogotá during the periods from January 1, 2019 to June 30, 2019 and from January 1, 2020 to June 30, 2020 in order to Make an annual comparison on daily averages.
The stations used are:
It should be clarified that not all stations measure the same parameters, so the parameters to be compared are:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.style as style
style.use('fivethirtyeight')
cs19 = pd.read_excel('2019\Carvajal-Sevillana.xlsx', header=0, na_values='----')
cs19.head()
cs19.info()
cs19['Fecha'] = cs19['Fecha'].str.replace('24:00', '00:00')
cs19['Fecha'] = pd.to_datetime(cs19.Fecha, format="%d-%m-%Y %H:%M")
cs19['day'] = cs19['Fecha'].dt.day
cs19['month'] = cs19['Fecha'].dt.month
cs19['year'] = cs19['Fecha'].dt.year
cs19['Monthday'] = (cs19['month'] * 100) + cs19['day']
cs19.head()
cs19.info()
Some of the pollutants have NaN values, let's find the percentage of NaN values
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
values = cs19[col].isnull().sum()
length = 4320
perc = round((values / length) * 100, 2)
print(col + ': ' + str(perc) + '% NaN values')
Once the excel file was transformed into a DataFrame, the empty values indicated with '----' were replaced by NaN, the time was also changed from 24:00 to 00:00 because for pandas the hours range are from 00:00 to 23:00.
A column called monthday was created in order make it easy to graph, the day was extracted from the original date column to group the data and thus obtain an average of 24 hours, since the system of the air quality network the average of 24 hours can appear as empty if there was only one empty data in the day. Similarly, this column will help us organize the graph in a better way.
A column Month was also created in which the month was transformed from number to letters. This will also help organize the charts.
It can be seen that none of the columns of interest have complete data. The most critical case is with SO2 since 37.5% of the data is empty. This is due to problems with the equipment on the monitoring network, the sensor may have been damaged, or the data may not have been sent from the station to the server. Whatever the problem may be, it is an indicator that the stations are not working properly. Also that means that for SO2 the sample isn't representative.
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
fig, ax = plt.subplots(1, 8, figsize=(25, 6))
fig.suptitle('Distribution of pollutants')
for n, value in enumerate(cols):
sns.boxplot(data=cs19[value], ax=ax[n])
ax[n].set_title(value)
I generated a box plot for each pollutant hoping to find a way to fill NaN data. I'm not going to touch the outliers because these might be concentrations that were really high thanks to different anomalies, like wildfires or fires in buildings, wind influence, traffic. And in the end that is what i'm going to compare, the influence of a huge anomaly against normal data.
I decided to use the total mean of each pollutant to fill the NaN data in each pollutant.
Also making a fast analysis about the concentration of each pollutant:
Both PM10 and PM2.5 measure particulate matter under 10 and 2.5 µm, with most of the data close and over the max allowed level it is possible to say that the air quality in Bogotá isn't good, however a better analysis is needed.
I'm not going to make this analysis for every station, just for this one. The final analysis will be like this and i'm also going to explain the impact in health and environment of each pollutant
As i said i'm going to use the mean of each pollutant to fill NaN values
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
mean = round(cs19[col].mean(), 1)
cs19[col].fillna(mean, inplace=True)
cs19.info()
To make the analysis i decided to calculate the dialy mean value for each polutant in each station. I could have downloaded them from the Air Quality Network but as some hours have NaN values the system doesn't calculate the mean value for that day, so i might have had more NaN values and the sample could haven't been representative.
To calculate the mean value for each day, i'm going to group by Monthday and apply mean as an aggregate function.
After that i'm going to create a Month column and transform it into letters.
cs19 = round(cs19.groupby('Monthday').mean(), 2).reset_index()
cs19.head()
cs19['Month'] = (cs19['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
#Remember that i'm only analyzing for the first 6 months of each year
cs19['Month'] = cs19['Month'].map(months)
cs19.head()
cs19.set_index('day')
cs20 = pd.read_excel('2020\Carvajal-Sevillana.xlsx', na_values='----')
cs20['Fecha'] = cs20['Fecha'].str.replace('24:00', '00:00')
cs20['Fecha'] = pd.to_datetime(cs20.Fecha, format="%d-%m-%Y %H:%M")
cs20['day'] = cs20['Fecha'].dt.day
cs20['month'] = cs20['Fecha'].dt.month
cs20['year'] = cs20['Fecha'].dt.year
cs20['Monthday'] = (cs20['month'] * 100) + cs20['day']
cs20.info()
In the dataset for the 2020 data from the Carvajal-Sevillana station we find similar problems than the 2019 data, some pollutants have several NaN values that affect the sample. Let's see the percentage of NaN data per pollutant
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
values = cs20[col].isnull().sum()
length = 4344
perc = round((values / length) * 100, 2)
print(col + ': ' + str(perc) + '% NaN values')
The 2020 data is worse than i thought and is worse than 2019 data. The percentage of NaN values increased, not only in SO2, wich actually decreased, but in other pollutants like NO, NOX, NO2, CO. Again i will fill the NaN data with the mean of each pollutant.
There might been different explanations for this increase besides failures and maintenance. 2020 is the beginning of the new Mayor's 4 year period, and there might have been politics involved behind this, but this is just a thinking
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']
for col in cols:
mean = round(cs19[col].mean(), 1)
cs20[col].fillna(mean, inplace=True)
cs20.info()
cs20 = round(cs20.groupby('Monthday').mean(), 2).reset_index()
cs20['Month'] = (cs20['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
cs20['Month'] = cs20['Month'].map(months)
cs20.head()
After cleaning both datasets
fig = plt.figure(figsize=(15,20))
fig.suptitle('PM10 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM10'], label='2019')
ax1.plot(jan20['day'], jan20['PM10'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM10'], label='2019')
ax2.plot(feb20['day'], feb20['PM10'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM10'], label='2019')
ax3.plot(mar20['day'], mar20['PM10'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM10'], label='2019')
ax4.plot(apr20['day'], apr20['PM10'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM10'], label='2019')
ax5.plot(may20['day'], may20['PM10'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM10'], label='2019')
ax6.plot(jun20['day'], jun20['PM10'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
In the months before the declaration of mandatory preventive isolation PM10 levels had similar behaviour with some exceptions, at the end of January and beginning of February the 2019 levels were higher. A search about events during this period of time showed that in those days there was an alert in air quality because of wildfires.
At the beginning of the mandatory preventive isolation the levels are almost the same, but for almost a month the levels decreased, but there were few days where the levels were higher, these days coincide with the first measures of economic aperture. Since the intelligent isolation the levels have an up and down behavior but are lower than last years levels.
June have almost the same behavior with the exception of the period between June 22nd and 26th, during these days the Sahara Dust moved through the region wich caused an increase in the levels of particulate matter
fig = plt.figure(figsize=(15,20))
fig.suptitle('PM2.5 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM2.5'], label='2019')
ax1.plot(jan20['day'], jan20['PM2.5'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM2.5'], label='2019')
ax2.plot(feb20['day'], feb20['PM2.5'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM2.5'], label='2019')
ax3.plot(mar20['day'], mar20['PM2.5'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM2.5'], label='2019')
ax4.plot(apr20['day'], apr20['PM2.5'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM2.5'], label='2019')
ax5.plot(may20['day'], may20['PM2.5'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM2.5'], label='2019')
ax6.plot(jun20['day'], jun20['PM2.5'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
PM2.5 behaves similary than PM10, 2020 values tend to be lower with some days were the levels were higher. However since the declaration of isolation the levels decreased considerably. This happens because most of the sources of this particulates are vehicles emissions, fuel combustion, dust, and other sources. Again the levels increase considerably during the movement of the Sahara Dust.
There's a flat line for almost 10 days in June, this is problably thanks to a high concentration of NaN data that were filled with mean values
fig = plt.figure(figsize=(15,20))
fig.suptitle('Ozone 2019 Vs 2020')
ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['OZONO'], label='2019')
ax1.plot(jan20['day'], jan20['OZONO'], label='2020')
ax1.set_title('January')
plt.legend()
ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['OZONO'], label='2019')
ax2.plot(feb20['day'], feb20['OZONO'], label='2020')
ax2.set_title('February')
plt.legend()
ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['OZONO'], label='2019')
ax3.plot(mar20['day'], mar20['OZONO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March')
plt.legend()
ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['OZONO'], label='2019')
ax4.plot(apr20['day'], apr20['OZONO'], label='2020')
ax4.set_title('April')
plt.legend()
ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['OZONO'], label='2019')
ax5.plot(may20['day'], may20['OZONO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May')
plt.legend()
ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['OZONO'], label='2019')
ax6.plot(jun20['day'], jun20['OZONO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()