The quarantine imposed due to the COVID-19 pandemic has improved air quality in Bogotá, Colombia?¶

Due to the COVID-19 pandemic, the national government decreed the mandatory preventive isolation measure from March 24, 2020, initially the measure went until April 13 but was extended until May 11. As of this date, an "intelligent preventive isolation" began, in which measures were started for the economic reopening. This measure originally went until May 25 but was extended until May 31. However, on May 28, the mandatory preventive isolation measure was extended again from June 1 to August 31, 2020, but maintaining the economic reopening measures.

Throughout the application of the measure, a reduction in the levels of air pollution in Bogotá has been reported in the media and in institutional bulletins. It has even been reported that thanks to the decrease in particulate matter in the atmosphere, the peaks of the Nevado del Ruiz and the Nevado del Tolima have been seen again from Bogotá.

Nevado del Tolima: @GiovannyPulido

Nevado del Ruiz: @salamancayesid

However, has the mandatory preventive isolation measure really helped improve air quality? And if so, how much has air quality improved in Bogotá? To solve these questions, I took the data from all the air quality monitoring stations in Bogotá during the periods from January 1, 2019 to June 30, 2019 and from January 1, 2020 to June 30, 2020 in order to Make an annual comparison on daily averages.

The stations used are:

Castilla - Sevillana
Centro de Alto Rendimiento
Fontibon
Guaymaral
Kennedy
Las Ferias
Ministerio de Ambiente
Movil Carrera 7ma
Puente Aranda
San Cristobal
Suba
Tunal
Usaquen

It should be clarified that not all stations measure the same parameters, so the parameters to be compared are:

PM10: Particulate matter under 10 µm
PM2.5: particulate matter under 2.5 µm
NO: Nitric oxide
NOX: Generic term for the nitrogen oxides that are relevant in air pollution (NO and NO2)
NO2: Nitrogen Dioxide
CO: Carbon Monoxide
CO2: Carbon Doxide
SO2: Sulfur Dioxide
O3: Ozone

Data Cleaning for Carvajal-Sevillana station, 2019 Data¶

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import matplotlib.style as style
style.use('fivethirtyeight')

In [2]:

cs19 = pd.read_excel('2019\Carvajal-Sevillana.xlsx', header=0, na_values='----')
cs19.head()

Out[2]:

	Fecha	PM10	OZONO	CO	SO2	NO	NO2	NOX	PM2.5
0	01-01-2019 01:00	24.1	9.7	0.5	0.3	10.9	11.8	22.7	19.7
1	01-01-2019 02:00	46.7	3.5	0.9	1.3	19.0	16.9	35.9	23.2
2	01-01-2019 03:00	98.3	2.2	0.8	0.2	10.1	18.8	28.9	37.9
3	01-01-2019 04:00	54.6	2.6	0.9	0.0	22.4	18.0	40.4	38.9
4	01-01-2019 05:00	46.1	2.0	0.7	NaN	11.9	18.7	30.6	35.0

In [3]:

cs19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4320 entries, 0 to 4319
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Fecha   4320 non-null   object 
 1   PM10    3784 non-null   float64
 2   OZONO   4304 non-null   float64
 3   CO      4015 non-null   float64
 4   SO2     2700 non-null   float64
 5   NO      4186 non-null   float64
 6   NO2     4185 non-null   float64
 7   NOX     4186 non-null   float64
 8   PM2.5   4097 non-null   float64
dtypes: float64(8), object(1)
memory usage: 303.9+ KB

In [4]:

cs19['Fecha'] = cs19['Fecha'].str.replace('24:00', '00:00')
cs19['Fecha'] = pd.to_datetime(cs19.Fecha, format="%d-%m-%Y %H:%M")
cs19['day'] = cs19['Fecha'].dt.day
cs19['month'] = cs19['Fecha'].dt.month
cs19['year'] = cs19['Fecha'].dt.year
cs19['Monthday'] = (cs19['month'] * 100) + cs19['day']
cs19.head()

Out[4]:

	Fecha	PM10	OZONO	CO	SO2	NO	NO2	NOX	PM2.5	day	month	year	Monthday
0	2019-01-01 01:00:00	24.1	9.7	0.5	0.3	10.9	11.8	22.7	19.7	1	1	2019	101
1	2019-01-01 02:00:00	46.7	3.5	0.9	1.3	19.0	16.9	35.9	23.2	1	1	2019	101
2	2019-01-01 03:00:00	98.3	2.2	0.8	0.2	10.1	18.8	28.9	37.9	1	1	2019	101
3	2019-01-01 04:00:00	54.6	2.6	0.9	0.0	22.4	18.0	40.4	38.9	1	1	2019	101
4	2019-01-01 05:00:00	46.1	2.0	0.7	NaN	11.9	18.7	30.6	35.0	1	1	2019	101

In [5]:

cs19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4320 entries, 0 to 4319
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4320 non-null   datetime64[ns]
 1   PM10      3784 non-null   float64       
 2   OZONO     4304 non-null   float64       
 3   CO        4015 non-null   float64       
 4   SO2       2700 non-null   float64       
 5   NO        4186 non-null   float64       
 6   NO2       4185 non-null   float64       
 7   NOX       4186 non-null   float64       
 8   PM2.5     4097 non-null   float64       
 9   day       4320 non-null   int64         
 10  month     4320 non-null   int64         
 11  year      4320 non-null   int64         
 12  Monthday  4320 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 438.9 KB

Some of the pollutants have NaN values, let's find the percentage of NaN values

In [6]:

cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    values = cs19[col].isnull().sum()
    length = 4320
    perc = round((values / length) * 100, 2)
    print(col + ': ' + str(perc) + '% NaN values')

PM10: 12.41% NaN values
PM2.5: 5.16% NaN values
NO: 3.1% NaN values
NOX: 3.1% NaN values
NO2: 3.12% NaN values
CO: 7.06% NaN values
SO2: 37.5% NaN values
OZONO: 0.37% NaN values

Once the excel file was transformed into a DataFrame, the empty values indicated with '----' were replaced by NaN, the time was also changed from 24:00 to 00:00 because for pandas the hours range are from 00:00 to 23:00.

A column called monthday was created in order make it easy to graph, the day was extracted from the original date column to group the data and thus obtain an average of 24 hours, since the system of the air quality network the average of 24 hours can appear as empty if there was only one empty data in the day. Similarly, this column will help us organize the graph in a better way.

A column Month was also created in which the month was transformed from number to letters. This will also help organize the charts.

It can be seen that none of the columns of interest have complete data. The most critical case is with SO2 since 37.5% of the data is empty. This is due to problems with the equipment on the monitoring network, the sensor may have been damaged, or the data may not have been sent from the station to the server. Whatever the problem may be, it is an indicator that the stations are not working properly. Also that means that for SO2 the sample isn't representative.

In [7]:

cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

fig, ax = plt.subplots(1, 8, figsize=(25, 6))
fig.suptitle('Distribution of pollutants')
for n, value in enumerate(cols):
    sns.boxplot(data=cs19[value], ax=ax[n])
    ax[n].set_title(value)

I generated a box plot for each pollutant hoping to find a way to fill NaN data. I'm not going to touch the outliers because these might be concentrations that were really high thanks to different anomalies, like wildfires or fires in buildings, wind influence, traffic. And in the end that is what i'm going to compare, the influence of a huge anomaly against normal data.

I decided to use the total mean of each pollutant to fill the NaN data in each pollutant.

Also making a fast analysis about the concentration of each pollutant:

PM10: The range of the data goes from 0 to 150, with 75% of the data between 50 and 80(?) µg/m3, the maximum allowed level for this pollutant is 75 µg/m3 according to colombian law (Resolución 2254 de 2017, Ministerio del Medio Ambiente)
PM2.5: The rage of the data goes from 0 to 80, with 75% of the data between 25 and 50 µg/m3, the maximum allowed level for this pollutant is 37 µg/m3.

Both PM10 and PM2.5 measure particulate matter under 10 and 2.5 µm, with most of the data close and over the max allowed level it is possible to say that the air quality in Bogotá isn't good, however a better analysis is needed.

NO: Most of the data is between 25 and 60-75 ppb, the colombian law doesn't consider this as a criteria air pollutant but it is a pollutant that can be involved in acid rain and ozone depletion
NOX: Most of the data is between 50 and 100 ppb, this isn't a criteria air pollutant
NO2: Most of the values are between 18 and 25 ppb, the max allowed level in Colombia is 200 µg/m3 per hour. A conversion factor for this pollutant is 1 ppb = 1.88µg/m3, that means the values are between 33.8 and 47 µg/m3
CO: Most of the values are really low, between 1 and 1.5 ppb, the max allowed level is 35000 µg/m3 per hour, again a conversion factor is used (1 ppb = 1.145 µg/m3) that means the values are between 1 and 1.71 µg/m3
SO2: Most of the values are between 5 and 10 ppb, the max allowed level is 50 µg/m3. With a conversion factor of 1ppb = 2.62 µg/m3 that means that the values are between 13.1 and 26.2 µg/m3
Ozone: This air pollutant is the one with the less concentration in this station, with most of the levels between 2 and 10 ppb, the law allows a concentration of 100 µg/m3 in 8 hours (12.5 µg/m3 per hour). Using a conversion factor of 1 ppb = 2.00 µg/m3, we have values between 4 and 20 µg/m3 per hour

I'm not going to make this analysis for every station, just for this one. The final analysis will be like this and i'm also going to explain the impact in health and environment of each pollutant

Filling NaN values¶

As i said i'm going to use the mean of each pollutant to fill NaN values

In [8]:

cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    mean = round(cs19[col].mean(), 1)
    cs19[col].fillna(mean, inplace=True)
    
cs19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4320 entries, 0 to 4319
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4320 non-null   datetime64[ns]
 1   PM10      4320 non-null   float64       
 2   OZONO     4320 non-null   float64       
 3   CO        4320 non-null   float64       
 4   SO2       4320 non-null   float64       
 5   NO        4320 non-null   float64       
 6   NO2       4320 non-null   float64       
 7   NOX       4320 non-null   float64       
 8   PM2.5     4320 non-null   float64       
 9   day       4320 non-null   int64         
 10  month     4320 non-null   int64         
 11  year      4320 non-null   int64         
 12  Monthday  4320 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 438.9 KB

Organizing Data¶

To make the analysis i decided to calculate the dialy mean value for each polutant in each station. I could have downloaded them from the Air Quality Network but as some hours have NaN values the system doesn't calculate the mean value for that day, so i might have had more NaN values and the sample could haven't been representative.

To calculate the mean value for each day, i'm going to group by Monthday and apply mean as an aggregate function.

After that i'm going to create a Month column and transform it into letters.

In [9]:

cs19 = round(cs19.groupby('Monthday').mean(), 2).reset_index()
cs19.head()

Out[9]:

	Monthday	PM10	OZONO	CO	SO2	NO	NO2	NOX	PM2.5	day	month	year
0	101	41.36	11.23	0.67	2.06	19.47	12.17	31.65	22.05	1	1	2019
1	102	42.44	10.75	0.81	1.58	37.05	15.13	52.17	19.58	2	1	2019
2	103	42.41	4.96	1.11	1.40	58.27	16.69	74.93	22.82	3	1	2019
3	104	52.11	4.58	1.21	3.09	61.28	16.77	78.08	30.83	4	1	2019
4	105	64.16	7.28	1.29	3.50	60.05	19.23	79.31	29.38	5	1	2019

In [10]:

cs19['Month'] = (cs19['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
#Remember that i'm only analyzing for the first 6 months of each year
cs19['Month'] = cs19['Month'].map(months)
cs19.head()

Out[10]:

	Monthday	PM10	OZONO	CO	SO2	NO	NO2	NOX	PM2.5	day	month	year	Month
0	101	41.36	11.23	0.67	2.06	19.47	12.17	31.65	22.05	1	1	2019	January
1	102	42.44	10.75	0.81	1.58	37.05	15.13	52.17	19.58	2	1	2019	January
2	103	42.41	4.96	1.11	1.40	58.27	16.69	74.93	22.82	3	1	2019	January
3	104	52.11	4.58	1.21	3.09	61.28	16.77	78.08	30.83	4	1	2019	January
4	105	64.16	7.28	1.29	3.50	60.05	19.23	79.31	29.38	5	1	2019	January

In [11]:

cs19.set_index('day')

Out[11]:

	Monthday	PM10	OZONO	CO	SO2	NO	NO2	NOX	PM2.5	month	year	Month
day
1	101	41.36	11.23	0.67	2.06	19.47	12.17	31.65	22.05	1	2019	January
2	102	42.44	10.75	0.81	1.58	37.05	15.13	52.17	19.58	1	2019	January
3	103	42.41	4.96	1.11	1.40	58.27	16.69	74.93	22.82	1	2019	January
4	104	52.11	4.58	1.21	3.09	61.28	16.77	78.08	30.83	1	2019	January
5	105	64.16	7.28	1.29	3.50	60.05	19.23	79.31	29.38	1	2019	January
...	...	...	...	...	...	...	...	...	...	...	...	...
25	625	61.89	5.39	1.21	6.95	62.59	26.28	88.84	28.35	6	2019	June
26	626	67.61	6.42	1.38	3.28	75.26	23.28	98.55	32.37	6	2019	June
27	627	65.70	9.25	1.29	3.61	49.66	22.26	71.10	35.72	6	2019	June
28	628	51.09	5.82	1.58	6.37	48.80	20.90	69.80	29.08	6	2019	June
29	629	48.99	4.92	1.60	6.51	48.80	20.90	69.80	28.70	6	2019	June

180 rows × 12 columns

Data cleaning for Carvajal-Sevillana, 2020 Data¶

In [12]:

cs20 = pd.read_excel('2020\Carvajal-Sevillana.xlsx', na_values='----')

cs20['Fecha'] = cs20['Fecha'].str.replace('24:00', '00:00')
cs20['Fecha'] = pd.to_datetime(cs20.Fecha, format="%d-%m-%Y %H:%M")
cs20['day'] = cs20['Fecha'].dt.day
cs20['month'] = cs20['Fecha'].dt.month
cs20['year'] = cs20['Fecha'].dt.year
cs20['Monthday'] = (cs20['month'] * 100) + cs20['day']

cs20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4344 entries, 0 to 4343
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4344 non-null   datetime64[ns]
 1   PM10      4021 non-null   float64       
 2   OZONO     3965 non-null   float64       
 3   CO        3863 non-null   float64       
 4   SO2       4295 non-null   float64       
 5   NO        2644 non-null   float64       
 6   NO2       2643 non-null   float64       
 7   NOX       2644 non-null   float64       
 8   PM2.5     4077 non-null   float64       
 9   day       4344 non-null   int64         
 10  month     4344 non-null   int64         
 11  year      4344 non-null   int64         
 12  Monthday  4344 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 441.3 KB

In the dataset for the 2020 data from the Carvajal-Sevillana station we find similar problems than the 2019 data, some pollutants have several NaN values that affect the sample. Let's see the percentage of NaN data per pollutant

In [13]:

cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    values = cs20[col].isnull().sum()
    length = 4344
    perc = round((values / length) * 100, 2)
    print(col + ': ' + str(perc) + '% NaN values')

PM10: 7.44% NaN values
PM2.5: 6.15% NaN values
NO: 39.13% NaN values
NOX: 39.13% NaN values
NO2: 39.16% NaN values
CO: 11.07% NaN values
SO2: 1.13% NaN values
OZONO: 8.72% NaN values

The 2020 data is worse than i thought and is worse than 2019 data. The percentage of NaN values increased, not only in SO2, wich actually decreased, but in other pollutants like NO, NOX, NO2, CO. Again i will fill the NaN data with the mean of each pollutant.

There might been different explanations for this increase besides failures and maintenance. 2020 is the beginning of the new Mayor's 4 year period, and there might have been politics involved behind this, but this is just a thinking

In [14]:

cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    mean = round(cs19[col].mean(), 1)
    cs20[col].fillna(mean, inplace=True)
    
cs20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4344 entries, 0 to 4343
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4344 non-null   datetime64[ns]
 1   PM10      4344 non-null   float64       
 2   OZONO     4344 non-null   float64       
 3   CO        4344 non-null   float64       
 4   SO2       4344 non-null   float64       
 5   NO        4344 non-null   float64       
 6   NO2       4344 non-null   float64       
 7   NOX       4344 non-null   float64       
 8   PM2.5     4344 non-null   float64       
 9   day       4344 non-null   int64         
 10  month     4344 non-null   int64         
 11  year      4344 non-null   int64         
 12  Monthday  4344 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 441.3 KB

In [15]:

cs20 = round(cs20.groupby('Monthday').mean(), 2).reset_index()
cs20['Month'] = (cs20['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
cs20['Month'] = cs20['Month'].map(months)
cs20.head()

Out[15]:

	Monthday	PM10	OZONO	CO	SO2	NO	NO2	NOX	PM2.5	day	month	year	Month
0	101	42.09	16.60	0.58	1.48	48.8	20.9	69.8	37.42	1	1	2020	January
1	102	34.49	16.87	0.60	1.50	48.8	20.9	69.8	21.43	2	1	2020	January
2	103	45.45	16.27	0.98	3.08	48.8	20.9	69.8	26.83	3	1	2020	January
3	104	51.84	21.17	0.90	4.13	48.8	20.9	69.8	32.92	4	1	2020	January
4	105	29.48	21.40	0.48	1.59	48.8	20.9	69.8	16.38	5	1	2020	January

After cleaning both datasets

PM10¶

In [16]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('PM10 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM10'], label='2019')
ax1.plot(jan20['day'], jan20['PM10'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM10'], label='2019')
ax2.plot(feb20['day'], feb20['PM10'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM10'], label='2019')
ax3.plot(mar20['day'], mar20['PM10'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM10'], label='2019')
ax4.plot(apr20['day'], apr20['PM10'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM10'], label='2019')
ax5.plot(may20['day'], may20['PM10'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM10'], label='2019')
ax6.plot(jun20['day'], jun20['PM10'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[16]:

<matplotlib.legend.Legend at 0x20fe541edc8>

Analysis PM10¶

In the months before the declaration of mandatory preventive isolation PM10 levels had similar behaviour with some exceptions, at the end of January and beginning of February the 2019 levels were higher. A search about events during this period of time showed that in those days there was an alert in air quality because of wildfires.

At the beginning of the mandatory preventive isolation the levels are almost the same, but for almost a month the levels decreased, but there were few days where the levels were higher, these days coincide with the first measures of economic aperture. Since the intelligent isolation the levels have an up and down behavior but are lower than last years levels.

June have almost the same behavior with the exception of the period between June 22nd and 26th, during these days the Sahara Dust moved through the region wich caused an increase in the levels of particulate matter

PM2.5¶

In [17]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('PM2.5 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM2.5'], label='2019')
ax1.plot(jan20['day'], jan20['PM2.5'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM2.5'], label='2019')
ax2.plot(feb20['day'], feb20['PM2.5'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM2.5'], label='2019')
ax3.plot(mar20['day'], mar20['PM2.5'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM2.5'], label='2019')
ax4.plot(apr20['day'], apr20['PM2.5'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM2.5'], label='2019')
ax5.plot(may20['day'], may20['PM2.5'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM2.5'], label='2019')
ax6.plot(jun20['day'], jun20['PM2.5'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[17]:

<matplotlib.legend.Legend at 0x20fe5020e48>

Analysis PM2.5¶

PM2.5 behaves similary than PM10, 2020 values tend to be lower with some days were the levels were higher. However since the declaration of isolation the levels decreased considerably. This happens because most of the sources of this particulates are vehicles emissions, fuel combustion, dust, and other sources. Again the levels increase considerably during the movement of the Sahara Dust.

There's a flat line for almost 10 days in June, this is problably thanks to a high concentration of NaN data that were filled with mean values

Ozone¶

In [18]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('Ozone 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['OZONO'], label='2019')
ax1.plot(jan20['day'], jan20['OZONO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['OZONO'], label='2019')
ax2.plot(feb20['day'], feb20['OZONO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['OZONO'], label='2019')
ax3.plot(mar20['day'], mar20['OZONO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['OZONO'], label='2019')
ax4.plot(apr20['day'], apr20['OZONO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['OZONO'], label='2019')
ax5.plot(may20['day'], may20['OZONO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['OZONO'], label='2019')
ax6.plot(jun20['day'], jun20['OZONO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[18]:

<matplotlib.legend.Legend at 0x20fe7276fc8>

Analysis Ozone¶

Ozone is a secondary pollutant, that means that this one isn't emitted by vehicles or fuel consumption, it is formed through photochemical reactions in the atmosphere between primary pollutants (NOX, SOX, VOCs) and light.

There are high levels of ozone during hot days with low humidity, when the wind is light or the air is stagnant. If we look at the levels between mandatory isolation and intelligent isolation, the levels were higher than last year, this because there wasn't enough traffic in the city and the air was stagnant, also during those days the weather was hot, after intelligent isolation, wich is actually an economic aperture, the levels are slightly higher. There's traffic but not the same traffic than a normal day without pandemic

Note: COVs are volatile organic compounds

Carbon Monoxide CO¶

In [19]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('CO 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['CO'], label='2019')
ax1.plot(jan20['day'], jan20['CO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['CO'], label='2019')
ax2.plot(feb20['day'], feb20['CO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['CO'], label='2019')
ax3.plot(mar20['day'], mar20['CO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['CO'], label='2019')
ax4.plot(apr20['day'], apr20['CO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['CO'], label='2019')
ax5.plot(may20['day'], may20['CO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['CO'], label='2019')
ax6.plot(jun20['day'], jun20['CO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[19]:

<matplotlib.legend.Legend at 0x20fe8034e88>

Analysis¶

Carbon Monoxide is produced from the incomplete combustion of carbon containing compounds when there isn't enough oxigen to generate CO2, like a combustion chamber in an engine. The levels are almost the same before the isolation measure, then decreased and after the intelligent isolation are slightly lower, again there wasn't traffy during isolation and after the intelligent isolation the traffic isn't the same as a normal day

SO2¶

In [20]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('SO2 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['SO2'], label='2019')
ax1.plot(jan20['day'], jan20['SO2'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['SO2'], label='2019')
ax2.plot(feb20['day'], feb20['SO2'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['SO2'], label='2019')
ax3.plot(mar20['day'], mar20['SO2'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['SO2'], label='2019')
ax4.plot(apr20['day'], apr20['SO2'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['SO2'], label='2019')
ax5.plot(may20['day'], may20['SO2'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['SO2'], label='2019')
ax6.plot(jun20['day'], jun20['SO2'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[20]:

<matplotlib.legend.Legend at 0x20fe677a788>

Analysis¶

Sulfur Dioxide is liberated in carbon, oil, diesel or gas combustion. Most of the vehicular fleet in Bogotá use petrol as fuel, the exceptions are public transport, cargo vehicles and some cars like pickups. Bogotá, like most of the cities in the world, has regulated the transit of cargo vehicles through the city during the day, limiting the transit to certain hours to reduce the environmental and traffic impacts.

The levels reduced even before the isolation measure but i can't say more because this pollutant wasn't measure during most of the time in 2019.

Nitrogen Oxide¶

In [21]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('NO 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NO'], label='2019')
ax1.plot(jan20['day'], jan20['NO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NO'], label='2019')
ax2.plot(feb20['day'], feb20['NO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NO'], label='2019')
ax3.plot(mar20['day'], mar20['NO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NO'], label='2019')
ax4.plot(apr20['day'], apr20['NO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NO'], label='2019')
ax5.plot(may20['day'], may20['NO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NO'], label='2019')
ax6.plot(jun20['day'], jun20['NO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[21]:

<matplotlib.legend.Legend at 0x20fe6a208c8>

Analysis¶

Before the isolation measure, at least in January, the levels were similar for part of the month, then we have uncertainty in the data. Between obligatory isolation and intelligent isolation the levels were lower. Since the declaratory of Intelligent isolation the levels have had an slightly increase but the majority are lower than last year.

This compound is generated by fuel combustion in traffic and cargo vehicles.

NOX¶

In [22]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('NOx 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NOX'], label='2019')
ax1.plot(jan20['day'], jan20['NOX'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NOX'], label='2019')
ax2.plot(feb20['day'], feb20['NOX'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NOX'], label='2019')
ax3.plot(mar20['day'], mar20['NOX'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NOX'], label='2019')
ax4.plot(apr20['day'], apr20['NOX'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NOX'], label='2019')
ax5.plot(may20['day'], may20['NOX'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NOX'], label='2019')
ax6.plot(jun20['day'], jun20['NOX'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[22]:

<matplotlib.legend.Legend at 0x20fea72cc88>

Analysis¶

NOx is a way to measure both NO and NO2, fuel cause 54% of the emissions of this pollutant are generated by transportation fuels. In January the levels are slightly high this year than last year, Between the obligatory isolation and intelligent isolation the levels were lower although there were days were the levels were similar. This trend continued after the intelligent isolation

NO2¶

In [23]:

fig = plt.figure(figsize=(15,20))
fig.suptitle('NO2 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NO2'], label='2019')
ax1.plot(jan20['day'], jan20['NO2'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NO2'], label='2019')
ax2.plot(feb20['day'], feb20['NO2'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NO2'], label='2019')
ax3.plot(mar20['day'], mar20['NO2'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NO2'], label='2019')
ax4.plot(apr20['day'], apr20['NO2'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NO2'], label='2019')
ax5.plot(may20['day'], may20['NO2'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NO2'], label='2019')
ax6.plot(jun20['day'], jun20['NO2'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()

Out[23]:

<matplotlib.legend.Legend at 0x20fe819fec8>

Analysis¶

This pollutant is the product of the oxidation of NO, it's also an enhancer of particulate matter, especially PM2.5. When this pollutant react with UV rays it's a precursor of Ozone. In January there were high concentrations of NO2, just the same as NO, and the values were higher than 2019. Then we have a period of uncertainty in the data. Between obligatory isolation and intelligent isolation the levels were lower, and some times had the same behaviour than 2019. After the declaratory of intelligent isolation there have been an up and down effect. This might be because the reaction of NO into NO2 take time, and when the NO2 reacts with UV light then the levels of Ozone grow like we saw in the Ozone graph

General Analysis¶

Air quality is important for human life, the pandemic of COVID-19 (a respiratory disease) have forced us to take very restrictive measures like obligatory isolation for months to flatten the curve of contagion. This have impacts in different aspects like health, economy and the environment.

Here we found that most of the criteria pollutans have had lower levels than last year during the period of obligatory isolation, with the exception of Ozone wich has increased mostly because the air has been stagant because the lack or decrease of traffic in the city. This pollutant can have impacts in health like cough, asma, bronchitis. In high exposure can produce permanent lung damage.

This might be a huge problem specially in low income areas where the air pollution is high like the area in wich this station is located.

Things to do¶

The original idea of this project was to analyze all the air quality stations in Bogotá, in total there are 13 stations. However i faced different problems, like the need to develop a function to automize the process of data cleaning. I made a function (written below) but it did not work as i expected. The same happens with the plotting process, i know that a function or a loop can help to make everything faster and clearer, but i tried to write and couldn't do it

In [24]:

def cleaning_function(df):
    
    #Transform dates
    df['Fecha'] = df['Fecha'].str.replace('24:00', '00:00')
    df['Fecha'] = pd.to_datetime(df.Fecha, format="%d-%m-%Y %H:%M")
    df['day'] = df['Fecha'].dt.day
    df['month'] = df['Fecha'].dt.month
    df['year'] = df['Fecha'].dt.year
    df['Monthday'] = (df['month'] * 100) + df['day']
    
    #Filling NaN
    cols = df.columns
    for col in cols:
        mean = df[col].mean()
        df[col].fillna(mean, inplace=True)
        
    #Grouping by
    df = df.groupby('Monthday').mean().reset_index()
    
    #Monthday Col and index as day
    df['Month'] = (df['Monthday'] // 100)
    months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
    df['Month'] = df['Month'].map(months)

Another thing that i should make is find the days with levels over the maximum allowed level for both last year and this year and compare them
I think a better way to plot the concentration vs time is using the dates from the system. To do that i might need to extract and transform the day and month from the date column and put them in other column in a datetime type. But i don't know how to do it