#!/usr/bin/env python
# coding: utf-8

# # The quarantine imposed due to the COVID-19 pandemic has improved air quality in Bogotá, Colombia?
# 
# Due to the COVID-19 pandemic, the national government decreed the mandatory preventive isolation measure from March 24, 2020, initially the measure went until April 13 but was extended until May 11. As of this date, an "intelligent preventive isolation" began, in which measures were started for the economic reopening. This measure originally went until May 25 but was extended until May 31. However, on May 28, the mandatory preventive isolation measure was extended again from June 1 to August 31, 2020, but maintaining the economic reopening measures.
# 
# Throughout the application of the measure, a reduction in the levels of air pollution in Bogotá has been reported in the media and in institutional bulletins. It has even been reported that thanks to the decrease in particulate matter in the atmosphere, the peaks of the Nevado del Ruiz and the Nevado del Tolima have been seen again from Bogotá.
# 
# ![Tolima.jpg](attachment:Tolima.jpg)
# Nevado del Tolima: @GiovannyPulido
# 
# ![ruiz.jpg](attachment:ruiz.jpg)
# Nevado del Ruiz: @salamancayesid
# 
# However, has the mandatory preventive isolation measure really helped improve air quality? And if so, how much has air quality improved in Bogotá? To solve these questions, I took the data from all the air quality monitoring stations in Bogotá during the periods from January 1, 2019 to June 30, 2019 and from January 1, 2020 to June 30, 2020 in order to Make an annual comparison on daily averages.
# 
# The stations used are:
# * Castilla - Sevillana
# * Centro de Alto Rendimiento
# * Fontibon
# * Guaymaral
# * Kennedy
# * Las Ferias
# * Ministerio de Ambiente
# * Movil Carrera 7ma
# * Puente Aranda
# * San Cristobal
# * Suba
# * Tunal
# * Usaquen
# 
# It should be clarified that not all stations measure the same parameters, so the parameters to be compared are:
# * PM10: Particulate matter under 10 µm
# * PM2.5: particulate matter under 2.5 µm
# * NO: Nitric oxide
# * NOX: Generic term for the nitrogen oxides that are relevant in air pollution (NO and NO2)
# * NO2: Nitrogen Dioxide
# * CO: Carbon Monoxide
# * CO2: Carbon Doxide
# * SO2: Sulfur Dioxide
# * O3: Ozone
# 
# 
# ## Data Cleaning for Carvajal-Sevillana station, 2019 Data

# In[1]:


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import matplotlib.style as style
style.use('fivethirtyeight')


# In[2]:


cs19 = pd.read_excel('2019\Carvajal-Sevillana.xlsx', header=0, na_values='----')
cs19.head()


# In[3]:


cs19.info()


# In[4]:


cs19['Fecha'] = cs19['Fecha'].str.replace('24:00', '00:00')
cs19['Fecha'] = pd.to_datetime(cs19.Fecha, format="%d-%m-%Y %H:%M")
cs19['day'] = cs19['Fecha'].dt.day
cs19['month'] = cs19['Fecha'].dt.month
cs19['year'] = cs19['Fecha'].dt.year
cs19['Monthday'] = (cs19['month'] * 100) + cs19['day']
cs19.head()


# In[5]:


cs19.info()


# Some of the pollutants have NaN values, let's find the percentage of NaN values

# In[6]:


cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    values = cs19[col].isnull().sum()
    length = 4320
    perc = round((values / length) * 100, 2)
    print(col + ': ' + str(perc) + '% NaN values')


# Once the excel file was transformed into a DataFrame, the empty values indicated with '----' were replaced by NaN, the time was also changed from 24:00 to 00:00 because for pandas the hours range are from 00:00 to 23:00.
# 
# A column called monthday was created in order make it easy to graph, the day was extracted from the original date column to group the data and thus obtain an average of 24 hours, since the system of the air quality network the average of 24 hours can appear as empty if there was only one empty data in the day. Similarly, this column will help us organize the graph in a better way.
# 
# A column Month was also created in which the month was transformed from number to letters. This will also help organize the charts.
# 
# It can be seen that none of the columns of interest have complete data. The most critical case is with SO2 since 37.5% of the data is empty. This is due to problems with the equipment on the monitoring network, the sensor may have been damaged, or the data may not have been sent from the station to the server. Whatever the problem may be, it is an indicator that the stations are not working properly. Also that means that for SO2 the sample isn't representative.

# In[7]:


cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

fig, ax = plt.subplots(1, 8, figsize=(25, 6))
fig.suptitle('Distribution of pollutants')
for n, value in enumerate(cols):
    sns.boxplot(data=cs19[value], ax=ax[n])
    ax[n].set_title(value)


# I generated a box plot for each pollutant hoping to find a way to fill NaN data. I'm not going to touch the outliers because these might be concentrations that were really high thanks to different anomalies, like wildfires or fires in buildings, wind influence, traffic. And in the end that is what i'm going to compare, the influence of a huge anomaly against normal data.
# 
# I decided to use the total mean of each pollutant to fill the NaN data in each pollutant.
# 
# Also making a fast analysis about the concentration of each pollutant:
# * PM10: The range of the data goes from 0 to 150, with 75% of the data between 50 and 80(?) µg/m3, the maximum allowed level for this pollutant is 75 µg/m3 according to colombian law (Resolución 2254 de 2017, Ministerio del Medio Ambiente)
# * PM2.5: The rage of the data goes from 0 to 80, with 75% of the data between 25 and 50 µg/m3, the maximum allowed level for this pollutant is 37 µg/m3.
# 
# Both PM10 and PM2.5 measure particulate matter under 10 and 2.5 µm, with most of the data close and over the max allowed level it is possible to say that the air quality in Bogotá isn't good, however a better analysis is needed.
# 
# * NO: Most of the data is between 25 and 60-75 ppb, the colombian law doesn't consider this as a criteria air pollutant but it is a pollutant that can be involved in acid rain and ozone depletion
# * NOX: Most of the data is between 50 and 100 ppb, this isn't a criteria air pollutant
# * NO2: Most of the values are between 18 and 25 ppb, the max allowed level in Colombia is 200 µg/m3 per hour. A conversion factor for this pollutant is 1 ppb = 1.88µg/m3, that means the values are between 33.8 and 47 µg/m3
# * CO: Most of the values are really low, between 1 and 1.5 ppb, the max allowed level is 35000 µg/m3 per hour, again a conversion factor is used (1 ppb = 1.145 µg/m3) that means the values are between 1 and 1.71 µg/m3
# * SO2: Most of the values are between 5 and 10 ppb, the max allowed level is 50 µg/m3. With a conversion factor of 1ppb = 2.62 µg/m3 that means that the values are between 13.1 and 26.2 µg/m3
# * Ozone: This air pollutant is the one with the less concentration in this station, with most of the levels between 2 and 10 ppb, the law allows a concentration of 100 µg/m3 in 8 hours (12.5 µg/m3 per hour). Using a conversion factor of 1 ppb = 2.00 µg/m3, we have values between 4 and 20 µg/m3 per hour
# 
# I'm not going to make this analysis for every station, just for this one. The final analysis will be like this and i'm also going to explain the impact in health and environment of each pollutant

# ### Filling NaN values
# 
# As i said i'm going to use the mean of each pollutant to fill NaN values

# In[8]:


cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    mean = round(cs19[col].mean(), 1)
    cs19[col].fillna(mean, inplace=True)
    
cs19.info()


# ### Organizing Data
# 
# To make the analysis i decided to calculate the dialy mean value for each polutant in each station. I could have downloaded them from the Air Quality Network but as some hours have NaN values the system doesn't calculate the mean value for that day, so i might have had more NaN values and the sample could haven't been representative.
# 
# To calculate the mean value for each day, i'm going to group by Monthday and apply mean as an aggregate function.
# 
# After that i'm going to create a Month column and transform it into letters.

# In[9]:


cs19 = round(cs19.groupby('Monthday').mean(), 2).reset_index()
cs19.head()


# In[10]:


cs19['Month'] = (cs19['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
#Remember that i'm only analyzing for the first 6 months of each year
cs19['Month'] = cs19['Month'].map(months)
cs19.head()


# In[11]:


cs19.set_index('day')


# ## Data cleaning for Carvajal-Sevillana, 2020 Data

# In[12]:


cs20 = pd.read_excel('2020\Carvajal-Sevillana.xlsx', na_values='----')

cs20['Fecha'] = cs20['Fecha'].str.replace('24:00', '00:00')
cs20['Fecha'] = pd.to_datetime(cs20.Fecha, format="%d-%m-%Y %H:%M")
cs20['day'] = cs20['Fecha'].dt.day
cs20['month'] = cs20['Fecha'].dt.month
cs20['year'] = cs20['Fecha'].dt.year
cs20['Monthday'] = (cs20['month'] * 100) + cs20['day']

cs20.info()


# In the dataset for the 2020 data from the Carvajal-Sevillana station we find similar problems than the 2019 data, some pollutants have several NaN values that affect the sample. Let's see the percentage of NaN data per pollutant

# In[13]:


cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    values = cs20[col].isnull().sum()
    length = 4344
    perc = round((values / length) * 100, 2)
    print(col + ': ' + str(perc) + '% NaN values')


# The 2020 data is worse than i thought and is worse than 2019 data. The percentage of NaN values increased, not only in SO2, wich actually decreased, but in other pollutants like NO, NOX, NO2, CO. Again i will fill the NaN data with the mean of each pollutant.
# 
# There might been different explanations for this increase besides failures and maintenance. 2020 is the beginning of the new Mayor's 4 year period, and there might have been politics involved behind this, but this is just a thinking

# In[14]:


cols = ['PM10', 'PM2.5', 'NO', 'NOX', 'NO2', 'CO', 'SO2', 'OZONO']

for col in cols:
    mean = round(cs19[col].mean(), 1)
    cs20[col].fillna(mean, inplace=True)
    
cs20.info()


# In[15]:


cs20 = round(cs20.groupby('Monthday').mean(), 2).reset_index()
cs20['Month'] = (cs20['Monthday'] // 100)
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
cs20['Month'] = cs20['Month'].map(months)
cs20.head()


# After cleaning both datasets
# 
# ## PM10

# In[16]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('PM10 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM10'], label='2019')
ax1.plot(jan20['day'], jan20['PM10'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM10'], label='2019')
ax2.plot(feb20['day'], feb20['PM10'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM10'], label='2019')
ax3.plot(mar20['day'], mar20['PM10'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM10'], label='2019')
ax4.plot(apr20['day'], apr20['PM10'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM10'], label='2019')
ax5.plot(may20['day'], may20['PM10'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM10'], label='2019')
ax6.plot(jun20['day'], jun20['PM10'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis PM10
# 
# In the months before the declaration of mandatory preventive isolation PM10 levels had similar behaviour with some exceptions, at the end of January and beginning of February the 2019 levels were higher. A search about events during this period of time showed that in those days there was an alert in air quality because of wildfires.
# 
# At the beginning of the mandatory preventive isolation the levels are almost the same, but for almost a month the levels decreased, but there were few days where the levels were higher, these days coincide with the first measures of economic aperture. Since the intelligent isolation the levels have an up and down behavior but are lower than last years levels.
# 
# June have almost the same behavior with the exception of the period between June 22nd and 26th, during these days the Sahara Dust moved through the region wich caused an increase in the levels of particulate matter
# 
# # PM2.5

# In[17]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('PM2.5 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['PM2.5'], label='2019')
ax1.plot(jan20['day'], jan20['PM2.5'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['PM2.5'], label='2019')
ax2.plot(feb20['day'], feb20['PM2.5'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['PM2.5'], label='2019')
ax3.plot(mar20['day'], mar20['PM2.5'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['PM2.5'], label='2019')
ax4.plot(apr20['day'], apr20['PM2.5'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['PM2.5'], label='2019')
ax5.plot(may20['day'], may20['PM2.5'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['PM2.5'], label='2019')
ax6.plot(jun20['day'], jun20['PM2.5'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis PM2.5
# 
# PM2.5 behaves similary than PM10, 2020 values tend to be lower with some days were the levels were higher. However since the declaration of isolation the levels decreased considerably. This happens because most of the sources of this particulates are vehicles emissions, fuel combustion, dust, and other sources. Again the levels increase considerably during the movement of the Sahara Dust.
# 
# There's a flat line for almost 10 days in June, this is problably thanks to a high concentration of NaN data that were filled with mean values
# 
# # Ozone

# In[18]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('Ozone 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['OZONO'], label='2019')
ax1.plot(jan20['day'], jan20['OZONO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['OZONO'], label='2019')
ax2.plot(feb20['day'], feb20['OZONO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['OZONO'], label='2019')
ax3.plot(mar20['day'], mar20['OZONO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['OZONO'], label='2019')
ax4.plot(apr20['day'], apr20['OZONO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['OZONO'], label='2019')
ax5.plot(may20['day'], may20['OZONO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['OZONO'], label='2019')
ax6.plot(jun20['day'], jun20['OZONO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis Ozone
# 
# Ozone is a secondary pollutant, that means that this one isn't emitted by vehicles or fuel consumption, it is formed through photochemical reactions in the atmosphere between primary pollutants (NOX, SOX, VOCs) and light.
# 
# There are high levels of ozone during hot days with low humidity, when the wind is light or the air is stagnant. If we look at the levels between mandatory isolation and intelligent isolation, the levels were higher than last year, this because there wasn't enough traffic in the city and the air was stagnant, also during those days the weather was hot, after intelligent isolation, wich is actually an economic aperture, the levels are slightly higher. There's traffic but not the same traffic than a normal day without pandemic
# 
# Note: COVs are volatile organic compounds
# 
# # Carbon Monoxide CO

# In[19]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('CO 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['CO'], label='2019')
ax1.plot(jan20['day'], jan20['CO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['CO'], label='2019')
ax2.plot(feb20['day'], feb20['CO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['CO'], label='2019')
ax3.plot(mar20['day'], mar20['CO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['CO'], label='2019')
ax4.plot(apr20['day'], apr20['CO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['CO'], label='2019')
ax5.plot(may20['day'], may20['CO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['CO'], label='2019')
ax6.plot(jun20['day'], jun20['CO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis
# 
# Carbon Monoxide is produced from the incomplete combustion of carbon containing compounds when there isn't enough oxigen to generate CO2, like a combustion chamber in an engine. The levels are almost the same before the isolation measure, then decreased and after the intelligent isolation are slightly lower, again there wasn't traffy during isolation and after the intelligent isolation the traffic isn't the same as a normal day
# 
# # SO2

# In[20]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('SO2 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['SO2'], label='2019')
ax1.plot(jan20['day'], jan20['SO2'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['SO2'], label='2019')
ax2.plot(feb20['day'], feb20['SO2'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['SO2'], label='2019')
ax3.plot(mar20['day'], mar20['SO2'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['SO2'], label='2019')
ax4.plot(apr20['day'], apr20['SO2'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['SO2'], label='2019')
ax5.plot(may20['day'], may20['SO2'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['SO2'], label='2019')
ax6.plot(jun20['day'], jun20['SO2'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis
# 
# Sulfur Dioxide is liberated in carbon, oil, diesel or gas combustion. Most of the vehicular fleet in Bogotá use petrol as fuel, the exceptions are public transport, cargo vehicles and some cars like pickups. Bogotá, like most of the cities in the world, has regulated the transit of cargo vehicles through the city during the day, limiting the transit to certain hours to reduce the environmental and traffic impacts.
# 
# The levels reduced even before the isolation measure but i can't say more because this pollutant wasn't measure during most of the time in 2019.
# 
# # Nitrogen Oxide

# In[21]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('NO 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NO'], label='2019')
ax1.plot(jan20['day'], jan20['NO'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NO'], label='2019')
ax2.plot(feb20['day'], feb20['NO'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NO'], label='2019')
ax3.plot(mar20['day'], mar20['NO'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NO'], label='2019')
ax4.plot(apr20['day'], apr20['NO'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NO'], label='2019')
ax5.plot(may20['day'], may20['NO'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NO'], label='2019')
ax6.plot(jun20['day'], jun20['NO'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis
# 
# Before the isolation measure, at least in January, the levels were similar for part of the month, then we have uncertainty in the data. Between obligatory isolation and intelligent isolation the levels were lower. Since the declaratory of Intelligent isolation the levels have had an slightly increase but the majority are lower than last year.
# 
# This compound is generated by fuel combustion in traffic and cargo vehicles.
# 
# # NOX

# In[22]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('NOx 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NOX'], label='2019')
ax1.plot(jan20['day'], jan20['NOX'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NOX'], label='2019')
ax2.plot(feb20['day'], feb20['NOX'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NOX'], label='2019')
ax3.plot(mar20['day'], mar20['NOX'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NOX'], label='2019')
ax4.plot(apr20['day'], apr20['NOX'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NOX'], label='2019')
ax5.plot(may20['day'], may20['NOX'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NOX'], label='2019')
ax6.plot(jun20['day'], jun20['NOX'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis
# 
# NOx is a way to measure both NO and NO2, fuel cause 54% of the emissions of this pollutant are generated by transportation fuels. In January the levels are slightly high this year than last year, Between the obligatory isolation and intelligent isolation the levels were lower although there were days were the levels were similar. This trend continued after the intelligent isolation
# 
# # NO2

# In[23]:


fig = plt.figure(figsize=(15,20))
fig.suptitle('NO2 2019 Vs 2020')

ax1 = fig.add_subplot(611)
jan19 = cs19[cs19['Month'] == 'January']
jan20 = cs20[cs20['Month'] == 'January']
ax1.plot(jan19['day'], jan19['NO2'], label='2019')
ax1.plot(jan20['day'], jan20['NO2'], label='2020')
ax1.set_title('January')
plt.legend()

ax2 = fig.add_subplot(612)
feb19 = cs19[cs19['Month'] == 'February']
feb20 = cs20[cs20['Month'] == 'February']
ax2.plot(feb19['day'], feb19['NO2'], label='2019')
ax2.plot(feb20['day'], feb20['NO2'], label='2020')
ax2.set_title('February') 
plt.legend()

ax3 = fig.add_subplot(613)
mar19 = cs19[cs19['Month'] == 'March']
mar20 = cs20[cs20['Month'] == 'March']
ax3.plot(mar19['day'], mar19['NO2'], label='2019')
ax3.plot(mar20['day'], mar20['NO2'], label='2020')
ax3.axvline(24, label='Isolation', c='green')
ax3.set_title('March') 
plt.legend()

ax4 = fig.add_subplot(614)
apr19 = cs19[cs19['Month'] == 'April']
apr20 = cs20[cs20['Month'] == 'April']
ax4.plot(apr19['day'], apr19['NO2'], label='2019')
ax4.plot(apr20['day'], apr20['NO2'], label='2020')
ax4.set_title('April') 
plt.legend()

ax5 = fig.add_subplot(615)
may19 = cs19[cs19['Month'] == 'May']
may20 = cs20[cs20['Month'] == 'May']
ax5.plot(may19['day'], may19['NO2'], label='2019')
ax5.plot(may20['day'], may20['NO2'], label='2020')
ax5.axvline(11, label='Intelligent Isolation', c='green')
ax5.set_title('May') 
plt.legend()

ax6 = fig.add_subplot(616)
jun19 = cs19[cs19['Month'] == 'June']
jun20 = cs20[cs20['Month'] == 'June']
ax6.plot(jun19['day'], jun19['NO2'], label='2019')
ax6.plot(jun20['day'], jun20['NO2'], label='2020')
ax6.axvline(24, label='Sahara Dust', c='yellow')
ax6.set_title('June')
plt.legend()


# # Analysis
# 
# This pollutant is the product of the oxidation of NO, it's also an enhancer of particulate matter, especially PM2.5. When this pollutant react with UV rays it's a precursor of Ozone. In January there were high concentrations of NO2, just the same as NO, and the values were higher than 2019. Then we have a period of uncertainty in the data. Between obligatory isolation and intelligent isolation the levels were lower, and some times had the same behaviour than 2019. After the declaratory of intelligent isolation there have been an up and down effect. This might be because the reaction of NO into NO2 take time, and when the NO2 reacts with UV light then the levels of Ozone grow like we saw in the Ozone graph
# 
# # General Analysis
# 
# Air quality is important for human life, the pandemic of COVID-19 (a respiratory disease) have forced us to take very restrictive measures like obligatory isolation for months to flatten the curve of contagion. This have impacts in different aspects like health, economy and the environment.
# 
# Here we found that most of the criteria pollutans have had lower levels than last year during the period of obligatory isolation, with the exception of Ozone wich has increased mostly because the air has been stagant because the lack or decrease of traffic in the city. This pollutant can have impacts in health like cough, asma, bronchitis. In high exposure can produce permanent lung damage.
# 
# This might be a huge problem specially in low income areas where the air pollution is high like the area in wich this station is located.

# ## Things to do
# 
# 1. The original idea of this project was to analyze all the air quality stations in Bogotá, in total there are 13 stations. However i faced different problems, like the need to develop a function to automize the process of data cleaning. I made a function (written below) but it did not work as i expected. The same happens with the plotting process, i know that a function or a loop can help to make everything faster and clearer, but i tried to write and couldn't do it

# In[24]:


def cleaning_function(df):
    
    #Transform dates
    df['Fecha'] = df['Fecha'].str.replace('24:00', '00:00')
    df['Fecha'] = pd.to_datetime(df.Fecha, format="%d-%m-%Y %H:%M")
    df['day'] = df['Fecha'].dt.day
    df['month'] = df['Fecha'].dt.month
    df['year'] = df['Fecha'].dt.year
    df['Monthday'] = (df['month'] * 100) + df['day']
    
    #Filling NaN
    cols = df.columns
    for col in cols:
        mean = df[col].mean()
        df[col].fillna(mean, inplace=True)
        
    #Grouping by
    df = df.groupby('Monthday').mean().reset_index()
    
    #Monthday Col and index as day
    df['Month'] = (df['Monthday'] // 100)
    months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June'}
    df['Month'] = df['Month'].map(months)


# 2. Another thing that i should make is find the days with levels over the maximum allowed level for both last year and this year and compare them
# 
# 3. I think a better way to plot the concentration vs time is using the dates from the system. To do that i might need to extract and transform the day and month from the date column and put them in other column in a datetime type. But i don't know how to do it