The quarantine imposed due to the COVID-19 pandemic has improved air quality in Bogotá, Colombia?

Due to the COVID-19 pandemic, the national government decreed the mandatory preventive isolation measure from March 24, 2020, initially the measure went until April 13 but was extended until May 11. As of this date, an "intelligent preventive isolation" began, in which measures were started for the economic reopening. This measure originally went until May 25 but was extended until May 31. However, on May 28, the mandatory preventive isolation measure was extended again from June 1 to August 31, 2020, but maintaining the economic reopening measures.

Throughout the application of the measure, a reduction in the levels of air pollution in Bogotá has been reported in the media and in institutional bulletins. It has even been reported that thanks to the decrease in particulate matter in the atmosphere, the peaks of the Nevado del Ruiz and the Nevado del Tolima have been seen again from Bogotá.

Tolima.jpg Nevado del Tolima: @GiovannyPulido

ruiz.jpg Nevado del Ruiz: @salamancayesid

However, has the mandatory preventive isolation measure really helped improve air quality? And if so, how much has air quality improved in Bogotá? To solve these questions, I took the data from all the air quality monitoring stations in Bogotá during the periods from January 1, 2019 to June 30, 2019 and from January 1, 2020 to June 30, 2020 in order to Make an annual comparison on daily averages.

The stations used are:

  • Castilla - Sevillana
  • Centro de Alto Rendimiento
  • Fontibon
  • Guaymaral
  • Kennedy
  • Las Ferias
  • Ministerio de Ambiente
  • Movil Carrera 7ma
  • Puente Aranda
  • San Cristobal
  • Suba
  • Tunal
  • Usaquen

It should be clarified that not all stations measure the same parameters, so the parameters to be compared are:

  • PM10: Particulate matter under 10 µm
  • PM2.5: particulate matter under 2.5 µm
  • NO: Nitric oxide
  • NOX: Generic term for the nitrogen oxides that are relevant in air pollution (NO and NO2)
  • NO2: Nitrogen Dioxide
  • CO: Carbon Monoxide
  • CO2: Carbon Doxide
  • SO2: Sulfur Dioxide
  • O3: Ozone

Data Cleaning for Carvajal-Sevillana station, 2019 Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import matplotlib.style as style
style.use('fivethirtyeight')
In [2]:
cs19 = pd.read_excel('2019\Carvajal-Sevillana.xlsx', header=0, na_values='----')
cs19.head()
Out[2]:
Fecha PM10 OZONO CO SO2 NO NO2 NOX PM2.5
0 01-01-2019 01:00 24.1 9.7 0.5 0.3 10.9 11.8 22.7 19.7
1 01-01-2019 02:00 46.7 3.5 0.9 1.3 19.0 16.9 35.9 23.2
2 01-01-2019 03:00 98.3 2.2 0.8 0.2 10.1 18.8 28.9 37.9
3 01-01-2019 04:00 54.6 2.6 0.9 0.0 22.4 18.0 40.4 38.9
4 01-01-2019 05:00 46.1 2.0 0.7 NaN 11.9 18.7 30.6 35.0
In [3]:
cs19.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4320 entries, 0 to 4319
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Fecha   4320 non-null   object 
 1   PM10    3784 non-null   float64
 2   OZONO   4304 non-null   float64
 3   CO      4015 non-null   float64
 4   SO2     2700 non-null   float64
 5   NO      4186 non-null   float64
 6   NO2     4185 non-null   float64
 7   NOX     4186 non-null   float64
 8   PM2.5   4097 non-null   float64
dtypes: float64(8), object(1)
memory usage: 303.9+ KB
In [4]:
cs19['Fecha'] = cs19['Fecha'].str.replace('24:00', '00:00')
cs19['Fecha'] = pd.to_datetime(cs19.Fecha, format="%d-%m-%Y %H:%M")
cs19['day'] = cs19['Fecha'].dt.day
cs19['month'] = cs19['Fecha'].dt.month
cs19['year'] = cs19['Fecha'].dt.year
cs19['Monthday'] = (cs19['month'] * 100) + cs19['day']
cs19.head()
Out[4]:
Fecha PM10 OZONO CO SO2 NO NO2 NOX PM2.5 day month year Monthday
0 2019-01-01 01:00:00 24.1 9.7 0.5 0.3 10.9 11.8 22.7 19.7 1 1 2019 101
1 2019-01-01 02:00:00 46.7 3.5 0.9 1.3 19.0 16.9 35.9 23.2 1 1 2019 101
2 2019-01-01 03:00:00 98.3 2.2 0.8 0.2 10.1 18.8 28.9 37.9 1 1 2019 101
3 2019-01-01 04:00:00 54.6 2.6 0.9 0.0 22.4 18.0 40.4 38.9 1 1 2019 101
4 2019-01-01 05:00:00 46.1 2.0 0.7 NaN 11.9 18.7 30.6 35.0 1 1 2019 101
In [5]:
cs19.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4320 entries, 0 to 4319
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4320 non-null   datetime64[ns]
 1   PM10      3784 non-null   float64       
 2   OZONO     4304 non-null   float64       
 3   CO        4015 non-null   float64       
 4   SO2       2700 non-null   float64       
 5   NO        4186 non-null   float64       
 6   NO2       4185 non-null   float64       
 7   NOX       4186 non-null   float64       
 8   PM2.5     4097 non-null   float64       
 9   day       4320 non-null   int64         
 10  month     4320 non-null   int64         
 11  year      4320 non-null   int64         
 12  Monthday  4320 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 438.9 KB

Some of the pollutants have NaN values, let's find the percentage of NaN values

In [6]:
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    values = cs19[col].isnull().sum()
    length = 4320
    perc = round((values / length) * 100, 2)
    print(col + ': ' + str(perc) + '% NaN values')
PM10: 12.41% NaN values
PM2.5: 5.16% NaN values
NO: 3.1% NaN values
NOX: 3.1% NaN values
NO2: 3.12% NaN values
CO: 7.06% NaN values
SO2: 37.5% NaN values
OZONO: 0.37% NaN values

Once the excel file was transformed into a DataFrame, the empty values indicated with '----' were replaced by NaN, the time was also changed from 24:00 to 00:00 because for pandas the hours range are from 00:00 to 23:00.

A column called monthday was created in order make it easy to graph, the day was extracted from the original date column to group the data and thus obtain an average of 24 hours, since the system of the air quality network the average of 24 hours can appear as empty if there was only one empty data in the day. Similarly, this column will help us organize the graph in a better way.

A column Month was also created in which the month was transformed from number to letters. This will also help organize the charts.

It can be seen that none of the columns of interest have complete data. The most critical case is with SO2 since 37.5% of the data is empty. This is due to problems with the equipment on the monitoring network, the sensor may have been damaged, or the data may not have been sent from the station to the server. Whatever the problem may be, it is an indicator that the stations are not working properly. Also that means that for SO2 the sample isn't representative.

In [7]:
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

fig, ax = plt.subplots(1, 8, figsize=(25, 6))
fig.suptitle('Distribution of pollutants')
for n, value in enumerate(cols):
    sns.boxplot(data=cs19[value], ax=ax[n])
    ax[n].set_title(value)

I generated a box plot for each pollutant hoping to find a way to fill NaN data. I'm not going to touch the outliers because these might be concentrations that were really high thanks to different anomalies, like wildfires or fires in buildings, wind influence, traffic. And in the end that is what i'm going to compare, the influence of a huge anomaly against normal data.

I decided to use the total mean of each pollutant to fill the NaN data in each pollutant.

Also making a fast analysis about the concentration of each pollutant:

  • PM10: The range of the data goes from 0 to 150, with 75% of the data between 50 and 80(?) µg/m3, the maximum allowed level for this pollutant is 75 µg/m3 according to colombian law (Resolución 2254 de 2017, Ministerio del Medio Ambiente)
  • PM2.5: The rage of the data goes from 0 to 80, with 75% of the data between 25 and 50 µg/m3, the maximum allowed level for this pollutant is 37 µg/m3.

Both PM10 and PM2.5 measure particulate matter under 10 and 2.5 µm, with most of the data close and over the max allowed level it is possible to say that the air quality in Bogotá isn't good, however a better analysis is needed.

  • NO: Most of the data is between 25 and 60-75 ppb, the colombian law doesn't consider this as a criteria air pollutant but it is a pollutant that can be involved in acid rain and ozone depletion
  • NOX: Most of the data is between 50 and 100 ppb, this isn't a criteria air pollutant
  • NO2: Most of the values are between 18 and 25 ppb, the max allowed level in Colombia is 200 µg/m3 per hour. A conversion factor for this pollutant is 1 ppb = 1.88µg/m3, that means the values are between 33.8 and 47 µg/m3
  • CO: Most of the values are really low, between 1 and 1.5 ppb, the max allowed level is 35000 µg/m3 per hour, again a conversion factor is used (1 ppb = 1.145 µg/m3) that means the values are between 1 and 1.71 µg/m3
  • SO2: Most of the values are between 5 and 10 ppb, the max allowed level is 50 µg/m3. With a conversion factor of 1ppb = 2.62 µg/m3 that means that the values are between 13.1 and 26.2 µg/m3
  • Ozone: This air pollutant is the one with the less concentration in this station, with most of the levels between 2 and 10 ppb, the law allows a concentration of 100 µg/m3 in 8 hours (12.5 µg/m3 per hour). Using a conversion factor of 1 ppb = 2.00 µg/m3, we have values between 4 and 20 µg/m3 per hour

I'm not going to make this analysis for every station, just for this one. The final analysis will be like this and i'm also going to explain the impact in health and environment of each pollutant

Filling NaN values

As i said i'm going to use the mean of each pollutant to fill NaN values

In [8]:
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    mean = round(cs19[col].mean(), 1)
    cs19[col].fillna(mean, inplace=True)
    
cs19.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4320 entries, 0 to 4319
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4320 non-null   datetime64[ns]
 1   PM10      4320 non-null   float64       
 2   OZONO     4320 non-null   float64       
 3   CO        4320 non-null   float64       
 4   SO2       4320 non-null   float64       
 5   NO        4320 non-null   float64       
 6   NO2       4320 non-null   float64       
 7   NOX       4320 non-null   float64       
 8   PM2.5     4320 non-null   float64       
 9   day       4320 non-null   int64         
 10  month     4320 non-null   int64         
 11  year      4320 non-null   int64         
 12  Monthday  4320 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 438.9 KB

Organizing Data

To make the analysis i decided to calculate the dialy mean value for each polutant in each station. I could have downloaded them from the Air Quality Network but as some hours have NaN values the system doesn't calculate the mean value for that day, so i might have had more NaN values and the sample could haven't been representative.

To calculate the mean value for each day, i'm going to group by Monthday and apply mean as an aggregate function.

After that i'm going to create a Month column and transform it into letters.

In [9]:
cs19 = round(cs19.groupby('Monthday').mean(), 2).reset_index()
cs19.head()
Out[9]:
Monthday PM10 OZONO CO SO2 NO NO2 NOX PM2.5 day month year
0 101 41.36 11.23 0.67 2.06 19.47 12.17 31.65 22.05 1 1 2019
1 102 42.44 10.75 0.81 1.58 37.05 15.13 52.17 19.58 2 1 2019
2 103 42.41 4.96 1.11 1.40 58.27 16.69 74.93 22.82 3 1 2019
3 104 52.11 4.58 1.21 3.09 61.28 16.77 78.08 30.83 4 1 2019
4 105 64.16 7.28 1.29 3.50 60.05 19.23 79.31 29.38 5 1 2019
In [10]:
cs19['Month'] = (cs19['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
#Remember that i'm only analyzing for the first 6 months of each year
cs19['Month'] = cs19['Month'].map(months)
cs19.head()
Out[10]:
Monthday PM10 OZONO CO SO2 NO NO2 NOX PM2.5 day month year Month
0 101 41.36 11.23 0.67 2.06 19.47 12.17 31.65 22.05 1 1 2019 January
1 102 42.44 10.75 0.81 1.58 37.05 15.13 52.17 19.58 2 1 2019 January
2 103 42.41 4.96 1.11 1.40 58.27 16.69 74.93 22.82 3 1 2019 January
3 104 52.11 4.58 1.21 3.09 61.28 16.77 78.08 30.83 4 1 2019 January
4 105 64.16 7.28 1.29 3.50 60.05 19.23 79.31 29.38 5 1 2019 January
In [11]:
cs19.set_index('day')
Out[11]:
Monthday PM10 OZONO CO SO2 NO NO2 NOX PM2.5 month year Month
day
1 101 41.36 11.23 0.67 2.06 19.47 12.17 31.65 22.05 1 2019 January
2 102 42.44 10.75 0.81 1.58 37.05 15.13 52.17 19.58 1 2019 January
3 103 42.41 4.96 1.11 1.40 58.27 16.69 74.93 22.82 1 2019 January
4 104 52.11 4.58 1.21 3.09 61.28 16.77 78.08 30.83 1 2019 January
5 105 64.16 7.28 1.29 3.50 60.05 19.23 79.31 29.38 1 2019 January
... ... ... ... ... ... ... ... ... ... ... ... ...
25 625 61.89 5.39 1.21 6.95 62.59 26.28 88.84 28.35 6 2019 June
26 626 67.61 6.42 1.38 3.28 75.26 23.28 98.55 32.37 6 2019 June
27 627 65.70 9.25 1.29 3.61 49.66 22.26 71.10 35.72 6 2019 June
28 628 51.09 5.82 1.58 6.37 48.80 20.90 69.80 29.08 6 2019 June
29 629 48.99 4.92 1.60 6.51 48.80 20.90 69.80 28.70 6 2019 June

180 rows × 12 columns

Data cleaning for Carvajal-Sevillana, 2020 Data

In [12]:
cs20 = pd.read_excel('2020\Carvajal-Sevillana.xlsx', na_values='----')

cs20['Fecha'] = cs20['Fecha'].str.replace('24:00', '00:00')
cs20['Fecha'] = pd.to_datetime(cs20.Fecha, format="%d-%m-%Y %H:%M")
cs20['day'] = cs20['Fecha'].dt.day
cs20['month'] = cs20['Fecha'].dt.month
cs20['year'] = cs20['Fecha'].dt.year
cs20['Monthday'] = (cs20['month'] * 100) + cs20['day']

cs20.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4344 entries, 0 to 4343
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4344 non-null   datetime64[ns]
 1   PM10      4021 non-null   float64       
 2   OZONO     3965 non-null   float64       
 3   CO        3863 non-null   float64       
 4   SO2       4295 non-null   float64       
 5   NO        2644 non-null   float64       
 6   NO2       2643 non-null   float64       
 7   NOX       2644 non-null   float64       
 8   PM2.5     4077 non-null   float64       
 9   day       4344 non-null   int64         
 10  month     4344 non-null   int64         
 11  year      4344 non-null   int64         
 12  Monthday  4344 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 441.3 KB

In the dataset for the 2020 data from the Carvajal-Sevillana station we find similar problems than the 2019 data, some pollutants have several NaN values that affect the sample. Let's see the percentage of NaN data per pollutant

In [13]:
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    values = cs20[col].isnull().sum()
    length = 4344
    perc = round((values / length) * 100, 2)
    print(col + ': ' + str(perc) + '% NaN values')
PM10: 7.44% NaN values
PM2.5: 6.15% NaN values
NO: 39.13% NaN values
NOX: 39.13% NaN values
NO2: 39.16% NaN values
CO: 11.07% NaN values
SO2: 1.13% NaN values
OZONO: 8.72% NaN values

The 2020 data is worse than i thought and is worse than 2019 data. The percentage of NaN values increased, not only in SO2, wich actually decreased, but in other pollutants like NO, NOX, NO2, CO. Again i will fill the NaN data with the mean of each pollutant.

There might been different explanations for this increase besides failures and maintenance. 2020 is the beginning of the new Mayor's 4 year period, and there might have been politics involved behind this, but this is just a thinking

In [14]:
cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    mean = round(cs19[col].mean(), 1)
    cs20[col].fillna(mean, inplace=True)
    
cs20.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4344 entries, 0 to 4343
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     4344 non-null   datetime64[ns]
 1   PM10      4344 non-null   float64       
 2   OZONO     4344 non-null   float64       
 3   CO        4344 non-null   float64       
 4   SO2       4344 non-null   float64       
 5   NO        4344 non-null   float64       
 6   NO2       4344 non-null   float64       
 7   NOX       4344 non-null   float64       
 8   PM2.5     4344 non-null   float64       
 9   day       4344 non-null   int64         
 10  month     4344 non-null   int64         
 11  year      4344 non-null   int64         
 12  Monthday  4344 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 441.3 KB
In [15]:
cs20 = round(cs20.groupby('Monthday').mean(), 2).reset_index()
cs20['Month'] = (cs20['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
cs20['Month'] = cs20['Month'].map(months)
cs20.head()
Out[15]:
Monthday PM10 OZONO CO SO2 NO NO2 NOX PM2.5 day month year Month
0 101 42.09 16.60 0.58 1.48 48.8 20.9 69.8 37.42 1 1 2020 January
1 102 34.49 16.87 0.60 1.50 48.8 20.9 69.8 21.43 2 1 2020 January
2 103 45.45 16.27 0.98 3.08 48.8 20.9 69.8 26.83 3 1 2020 January
3 104 51.84 21.17 0.90 4.13 48.8 20.9 69.8 32.92 4 1 2020 January
4 105 29.48 21.40 0.48 1.59 48.8 20.9 69.8 16.38 5 1 2020 January

After cleaning both datasets

PM10

In [16]:
fig = plt.figure(figsize=(15,20))
fig.suptitle('PM10 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM10'], label='2019')
ax1.plot(jan20['day'], jan20['PM10'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM10'], label='2019')
ax2.plot(feb20['day'], feb20['PM10'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM10'], label='2019')
ax3.plot(mar20['day'], mar20['PM10'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM10'], label='2019')
ax4.plot(apr20['day'], apr20['PM10'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM10'], label='2019')
ax5.plot(may20['day'], may20['PM10'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM10'], label='2019')
ax6.plot(jun20['day'], jun20['PM10'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
Out[16]:
<matplotlib.legend.Legend at 0x20fe541edc8>

Analysis PM10

In the months before the declaration of mandatory preventive isolation PM10 levels had similar behaviour with some exceptions, at the end of January and beginning of February the 2019 levels were higher. A search about events during this period of time showed that in those days there was an alert in air quality because of wildfires.

At the beginning of the mandatory preventive isolation the levels are almost the same, but for almost a month the levels decreased, but there were few days where the levels were higher, these days coincide with the first measures of economic aperture. Since the intelligent isolation the levels have an up and down behavior but are lower than last years levels.

June have almost the same behavior with the exception of the period between June 22nd and 26th, during these days the Sahara Dust moved through the region wich caused an increase in the levels of particulate matter

PM2.5

In [17]:
fig = plt.figure(figsize=(15,20))
fig.suptitle('PM2.5 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM2.5'], label='2019')
ax1.plot(jan20['day'], jan20['PM2.5'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM2.5'], label='2019')
ax2.plot(feb20['day'], feb20['PM2.5'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM2.5'], label='2019')
ax3.plot(mar20['day'], mar20['PM2.5'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM2.5'], label='2019')
ax4.plot(apr20['day'], apr20['PM2.5'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM2.5'], label='2019')
ax5.plot(may20['day'], may20['PM2.5'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM2.5'], label='2019')
ax6.plot(jun20['day'], jun20['PM2.5'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
Out[17]:
<matplotlib.legend.Legend at 0x20fe5020e48>

Analysis PM2.5

PM2.5 behaves similary than PM10, 2020 values tend to be lower with some days were the levels were higher. However since the declaration of isolation the levels decreased considerably. This happens because most of the sources of this particulates are vehicles emissions, fuel combustion, dust, and other sources. Again the levels increase considerably during the movement of the Sahara Dust.

There's a flat line for almost 10 days in June, this is problably thanks to a high concentration of NaN data that were filled with mean values

Ozone

In [18]:
fig = plt.figure(figsize=(15,20))
fig.suptitle('Ozone 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['OZONO'], label='2019')
ax1.plot(jan20['day'], jan20['OZONO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['OZONO'], label='2019')
ax2.plot(feb20['day'], feb20['OZONO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['OZONO'], label='2019')
ax3.plot(mar20['day'], mar20['OZONO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['OZONO'], label='2019')
ax4.plot(apr20['day'], apr20['OZONO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['OZONO'], label='2019')
ax5.plot(may20['day'], may20['OZONO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['OZONO'], label='2019')
ax6.plot(jun20['day'], jun20['OZONO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()
Out[18]:
<matplotlib.legend.Legend at 0x20fe7276fc8>