Welcome to our notebook on Berkeley Air Quality!
In this notebook we will be looking at Air Quality Index (AQI) scores in the surrounding Berkeley, CA area. With so many pollutants in the air, especially as we head into the annual fire season, AQI becomes something we check on daily. For many of us, this AQI map is all too familiar. Throughout this module we will discuss how data can be used to visualize and uncover underlying trends in the world.
Let's get started!
Before we get started with the data, let's talk about what Jupyter Notebook is. This lab is set up in a Jupyter Notebook. Notebooks can contain anything from live code, to written text, equations or visualizations. The content of notebooks are written into rectangular sections called cells.
There are two types of cells in Jupyter, code cells and markdown cells. Code cells, as you can imagine, contain code in Python, the programming language that we will be using throughout this notebook. Markdown cells, such as this one, contain written text. You can select any cell by clicking on it one.
'Running' a cell is similar to pressing 'Enter' on a calculator once you've typed in an expression; it computes all of the expressions contained within the cell.
To run a cell, you can do one of the following:
Running a markdown cell will embed the text into the notebook and running a code cell will evaluate the code and display its output under the cell.
Let's try it! Run the code cell below.
print("Hello World!")
Hello World!
You can add a cell by clicking Insert > Insert Cell Below
and choose the cell type in the drop down menu. Try adding a cell below to type in your name!
To delete a cell, click on the scissors
at the top or Edit > Cut Cells
. Delete the cell below.
print("Delete this!")
Delete this!
Important Tip: Everytime you open a Jupyter notebook, it is extremely important to run all the cells from the beginning in order for the notebook to work.
Now that we have had a brief crash course on Jupyter Notebooks, let's dive into Berkeley AQI!
In this notebook we will look at data collected from PrupleAir, a company that manages a network of air quality sensors. The data from these sensors are then collected to create maps like the one displayed above that depicts an intuitive visualization of the air quality in a specific region. In the dataframe below, you will find several metrics that help us do this.
Before we begin:
Cell
in the top toolbarRun All
in the drop downimport matplotlib.pyplot as plt
import numpy as np
import purpleair
import folium
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from datetime import datetime
from IPython.display import clear_output
Before we begin looking at data collected from PurpleAir sensors, lets first take a look at what a sensor is, and what it measures.
Below is a picture of a real PurpleAir Air Quality Sensor. These sensor can be mounted both indoors or outdoors, and it tracks airborne particulate matter(PM) in real time using PMSX003 laser counters. Particulate matter can include things like dust, smoke, dirt and any other organic or inorganic particles in the air. With multiple sensors mounted in a region, PurpleAir can create a relatively accurate measure of AQI throughout the day as the air quality changes.
For more information on how sensors work, take a look at the official PurpleAir website here!
In order to work with the data, we need to pull it into our workspace. Fortunately, PurpleAir has created an API that allows users to pull in and work with their AQI data. In the code cell below we will import the purpleair API and use it to create a dataframe of data from all PurpleAir sensors, which is roughly ~20,000!
Run the code cell below!
from purpleair.network import SensorList
p = SensorList()
df = p.to_dataframe(sensor_filter='all',
channel='parent')
Initialized 22,479 sensors!
The dataframe below contains all the sensor data as of the latest update. It contains data on everything from the geograohical latitude and longitude of the sensor to data on the last time that sensor measured airborne PM.
# Displaying dataframe with all the PurpleAir Sensor data
df
parent | lat | lon | name | location_type | pm_2.5 | temp_f | temp_c | humidity | pressure | ... | last_update_check | created | uptime | is_owner | 10min_avg | 30min_avg | 1hour_avg | 6hour_avg | 1day_avg | 1week_avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
14633 | None | 37.275561 | -121.964134 | Hazelwood canary | outside | 0.81 | 65.0 | 18.333333 | 54.0 | 1008.57 | ... | None | None | None | False | 0.88 | 4.10 | 8.26 | 17.72 | 21.80 | 15.42 |
25999 | None | 30.053808 | -95.494643 | Villages of Bridgestone AQI | outside | 35.16 | 70.0 | 21.111111 | 70.0 | 1011.86 | ... | None | None | None | False | 34.71 | 33.59 | 32.77 | 26.03 | 16.30 | 14.88 |
14091 | None | 37.883620 | -122.070087 | WC Hillside | outside | 1.00 | 63.0 | 17.222222 | 57.0 | 1003.26 | ... | None | None | None | False | 2.28 | 2.72 | 3.16 | 16.25 | 25.80 | 22.91 |
108226 | None | 38.573703 | -121.439113 | "C" Street Air Shelter | inside | 4.76 | 78.0 | 25.555556 | 45.0 | 1015.66 | ... | None | None | None | False | 4.77 | 4.49 | 4.15 | 3.96 | 4.75 | 5.46 |
49409 | None | 18.759182 | 99.017172 | "First's Place" | outside | 40.83 | 87.0 | 30.555556 | 37.0 | 986.64 | ... | None | None | None | False | 45.70 | 49.98 | 50.08 | 50.61 | 46.32 | 32.53 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
64085 | None | 36.785883 | 127.157040 | 청룡동행정복지센터 | outside | 40.35 | 54.0 | 12.222222 | 46.0 | 1027.31 | ... | None | None | None | False | 37.24 | 38.78 | 41.66 | 57.59 | 63.40 | 45.20 |
64995 | None | 36.691324 | 126.585255 | 한서대학교 | outside | 40.64 | 63.0 | 17.222222 | 33.0 | 1017.02 | ... | None | None | None | False | 45.03 | 45.97 | 43.96 | 39.68 | 37.22 | 28.38 |
64093 | None | 36.710720 | 126.548390 | 해미읍성 | outside | 62.07 | 61.0 | 16.111111 | 44.0 | 1027.89 | ... | None | None | None | False | 57.93 | 56.06 | 52.51 | 43.19 | 40.98 | 31.83 |
29747 | None | 36.761236 | 127.395300 | 화덕보건진료소 | outside | 34.77 | 62.0 | 16.666667 | 36.0 | 1018.16 | ... | None | None | None | False | 40.36 | 49.29 | 52.26 | 49.02 | 46.31 | 38.67 |
98309 | None | 36.718003 | 126.926841 | 화천1리마을회관 | outside | 47.41 | 60.0 | 15.555556 | 41.0 | 1023.32 | ... | None | None | None | False | 46.77 | 48.10 | 47.79 | 48.12 | 47.79 | 35.08 |
22479 rows × 43 columns
Here is a breakdown of the dataframe above and what each column represents.
Column Name | Description |
---|---|
lat | The latitude coordinate of the location |
lon | The longitude coordinate of the location |
name | The name of the location |
location_type | The nature of the location (ie. inside or outside) |
pm_2.5 | The level of fine particulate matter in the air of that location |
temp_f | The temperature of the location in degrees Farenheit |
temp_c | The temperature of the location in degrees Celsius |
humidity | The humidity percentage of the location |
pressure | The pressure index of the location (in millibars) |
last_seen | The last seen date and timestamp in UTC |
model | Model of the specific sensor |
flagged | Whether or not the channel was marked as flagged (usually based on a fault) |
age | Sensor data age (when data was last received) |
10min_avg | Average PM 2.5 AQI over the last 10 minutes |
30min_avg | Average PM 2.5 AQI over the last 30 minutes |
1hour_avg | Average PM 2.5 AQI over the last hour |
6hour_avg | Average PM 2.5 AQI over the last 6 hours |
1day_avg | Average PM 2.5 AQI over the last day |
1week_avg | Average PM 2.5 AQI over the last week |
While many of the column names are relatively straightforward, such as the "name" column (which displays the set name of the particular sensor), the "location_type" column (which indicates whether it is an indoor or outdoor sensor), etc., we would like to draw your attention to the "pm_2.5" column.
The "pm_2.5" column represents the count of airborne pm that is larger than 2.5um/dl, in otherwords, airborne particles that have a diameter of 2.5 micrometers or less. In high levels, PM 2.5 particles can reduce visibility and cause the air to appear hazy. Tracking PM 2.5 is important because prolonged exposure to high levels of PM 2.5 particles can cause adverse US Environmental Protection Agency (EPA) use to calculate the local Air Quality Index (AQI).
QUESTION: Which item or object is closest to 1 micrometer?
a) The length of an ant
b) The diameter of a spider web
c) The length of a grain of rice
ANSWER
a) The length of an ant is typically 1 millimeter, which is 1,000 micrometers
b) The diameter of a spider web is typically between 8 to 10 micrometers
c) The length of a grain of rice is typically 6 millimeters which is 6,000 micrometers.
If you go to the PurpleAir website here, it should navigate you to a map of the surrounding Berkeley area. If you click on the some of the sensored located on UC Berkeley campus, you'll find that one of them is named "Le Conte Hall".
Let's take a closer look at the Le Conte Hall Sensor! In the dataframe below we filter the dataframe by the sensor name ("Le Conte Hall") to pick out the row that corresponds to the specific sensor we are looking for.
df[df['name'] == "Le Conte Hall"]
parent | lat | lon | name | location_type | pm_2.5 | temp_f | temp_c | humidity | pressure | ... | last_update_check | created | uptime | is_owner | 10min_avg | 30min_avg | 1hour_avg | 6hour_avg | 1day_avg | 1week_avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
77905 | None | 37.872589 | -122.257219 | Le Conte Hall | inside | 0.35 | 78.0 | 25.555556 | 32.0 | 1004.55 | ... | None | None | None | False | 0.28 | 0.18 | 0.3 | 4.67 | 11.98 | 9.49 |
1 rows × 43 columns
The row above gives us loads of information on the state of the AQI in Le Conte Hall at the present moment, but it would be nice to see the AQI information over time. Below is a dataframe that contains information about the Le Conte Hall sensor roughly over the last 7 days. We can do this by filtering the times each entry was created at.
## data from Le Conte Hall sensor from the past week
from purpleair.sensor import Sensor
se = Sensor(77905)
le_conte = se.parent.get_historical(weeks_to_get=1,thingspeak_field='secondary')
le_conte['Date'] = [i.date().strftime("%d-%b-%Y") for i in le_conte['created_at']]
le_conte
created_at | 0.3um/dl | 0.5um/dl | 1.0um/dl | 2.5um/dl | 5.0um/dl | 10.0um/dl | PM1.0 (CF=ATM) ug/m3 | PM10 (CF=ATM) ug/m3 | Date | |
---|---|---|---|---|---|---|---|---|---|---|
entry_id | ||||||||||
295290 | 2021-12-01 00:00:48+00:00 | 1496.78 | 369.34 | 43.39 | 3.25 | 0.94 | 0.65 | 6.37 | 9.40 | 01-Dec-2021 |
295291 | 2021-12-01 00:02:48+00:00 | 1557.96 | 405.95 | 51.84 | 4.38 | 0.72 | 0.43 | 7.20 | 10.44 | 01-Dec-2021 |
295292 | 2021-12-01 00:04:48+00:00 | 1430.91 | 364.16 | 51.11 | 3.24 | 0.48 | 0.43 | 6.71 | 9.74 | 01-Dec-2021 |
295293 | 2021-12-01 00:06:48+00:00 | 1539.02 | 386.51 | 60.63 | 3.50 | 0.82 | 0.43 | 6.96 | 10.45 | 01-Dec-2021 |
295294 | 2021-12-01 00:08:48+00:00 | 1392.07 | 347.67 | 37.49 | 1.35 | 0.46 | 0.22 | 6.32 | 8.12 | 01-Dec-2021 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
300324 | 2021-12-07 23:50:34+00:00 | 964.21 | 210.06 | 23.83 | 1.35 | 0.22 | 0.00 | 3.65 | 4.78 | 07-Dec-2021 |
300325 | 2021-12-07 23:52:34+00:00 | 966.82 | 214.64 | 25.29 | 0.37 | 0.00 | 0.00 | 3.43 | 4.50 | 07-Dec-2021 |
300326 | 2021-12-07 23:54:35+00:00 | 945.52 | 202.40 | 15.53 | 0.22 | 0.00 | 0.00 | 2.88 | 3.71 | 07-Dec-2021 |
300327 | 2021-12-07 23:56:34+00:00 | 943.25 | 209.67 | 20.72 | 0.96 | 0.21 | 0.21 | 3.44 | 4.56 | 07-Dec-2021 |
300328 | 2021-12-07 23:58:34+00:00 | 987.24 | 222.53 | 31.91 | 1.32 | 0.22 | 0.22 | 3.89 | 5.39 | 07-Dec-2021 |
5039 rows × 10 columns
As you can see from the "created_at" column, the AQI was taken every two minutes over the past 7 days. The data frame also contains information on PM paticules of different diameters such as 0.3, 0.5, 1.0, 2.5, 5.0 and 10.0.
While this dataframe is useful, there are too many rows of data (~5000) to look at! Below is a widget that plots a line graph of the PM 2.5 measure over a specific day.
The drop down bar allows you to pick which day you would like graphed, so go ahead and pick a day!
def f(date):
fig = plt.figure(figsize=(13,3))
plt.plot(le_conte['created_at'].loc[le_conte['Date'] == date], le_conte["2.5um/dl"].loc[le_conte['Date'] == date])
plt.xlabel('Time')
plt.ylabel('PM 2.5 Particle Count')
plt.title('Le Conte Hall Sensor PM 2.5')
plt.rcParams["figure.figsize"] = (20,3)
interact(f, date = list(le_conte['Date'].unique()));
interactive(children=(Dropdown(description='date', options=('01-Dec-2021', '02-Dec-2021', '03-Dec-2021', '04-D…
The line plots above displays the date and hour along the x-axis and the PM 2.5 Particle count along the y-axis.
QUESTION: What is the highest index reading on the first time series plot?
Your answer here
QUESTION: What trends do you notice about the line plot?
Your answer here
QUESTION: Why do you think the index readings fluctuate from point to point?
Your answer here
While the line plots do show us a trend in the PM2.5 count over time, we still have not clue how that translates to the API Index. The next section will discuss what AQI is and how it is calculated.
The API Index contains 6 categories that air quality can fall into. Each category contains a range of index values from 0 - 500 that is calculated from the regions PM 2.5 measure. The chart below is provided by the US Environmental Protection Agency (EPA) and shows the official AQI Index (these breakpoints were revised in 2012).
For more information on how AQI Index is calculated, take a look at the AQI Index Factsheet provided by the EPA here!
QUESTION: What is the difference between the original and revised breakpoints?
Your answer here
QUESTION: At 3:00 on November 30th, 2021 the PM 2.5 reading is 12.5. What category does it fall into?
a) Good
b) Moderate
c) Unhealthy for Sensitive Groups
ANSWER: The category is Moderate because it falls into the 12.1 - 35.4 range.
Now that we know how sesors work, what they measure and how AQI Indexes are calculated, let's see if we can create a visualization of AQI Indexes that are a little closer to home!
First, let's find a group of sensors that are near UC Berkeley. The code cell below does just that. We use a range of longitude and latitude coordinates to decide whether to include or exclude a sensor.
## UC Berkeley,CA - Lat: 37.871666 / Lon: -122.272781
berkeleyData = df.loc[(df["lat"] >= 37.8) & (df["lat"] <= 37.9) & (df["lon"] >= -122.3) & (df["lon"] <= -122.2)]
berkeleyData = berkeleyData[["lat", "lon", "name", "location_type", "pm_2.5", "temp_f", "humidity", "pressure"]]
berkeleyData
lat | lon | name | location_type | pm_2.5 | temp_f | humidity | pressure | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
20747 | 37.838977 | -122.205489 | 1000ft Montclair | outside | 1.16 | 56.0 | 78.0 | 979.92 |
81677 | 37.889085 | -122.264327 | 1044 Keith Ave, Berkeley | outside | 3.67 | 59.0 | 64.0 | 995.76 |
79125 | 37.882941 | -122.288017 | 1094 Tevlin St | inside | 0.35 | 75.0 | 40.0 | 1013.70 |
77685 | 37.801872 | -122.274582 | 10th and Washington | inside | 31.83 | 77.0 | 37.0 | 1014.57 |
37971 | 37.883729 | -122.290362 | 1128 Key Route Blvd, Albany CA | outside | 2.04 | 60.0 | 61.0 | 1014.15 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
26977 | 37.813427 | -122.282483 | Xanadu | inside | 0.00 | 84.0 | 32.0 | 1015.51 |
56281 | 37.813500 | -122.282971 | Xanadu | outside | 0.89 | 58.0 | 67.0 | 1015.93 |
62619 | 37.886177 | -122.272443 | Yolo | inside | 0.00 | 90.0 | 22.0 | 1006.36 |
75223 | 37.827560 | -122.205627 | Zinn Drive | outside | 2.02 | 56.0 | 68.0 | 986.10 |
123201 | 37.869390 | -122.245792 | zz | inside | 0.07 | 75.0 | 33.0 | 988.19 |
817 rows × 8 columns
Now that we have a smaller subset of data to work with, the next step is to use the PM 2.5 measures to assign each sensor to an AQI Index Category and corresponding color.
#creating a column that indicates the AQI code name
color_code = []
for i in berkeleyData["pm_2.5"].to_list():
if i <= 12.0:
color_code.append('green')
elif (i < 12) & (i <=35.4):
color_code.append('yellow')
elif (i < 35.5) & (i <=55.4):
color_code.append('orange')
elif (i < 55.5) & (i <=150.4):
color_code.append('red')
elif (i < 150.5) & (i <=250.4):
color_code.append('purple')
else:
color_code.append('darkpurple')
berkeleyData['code'] = color_code
Our last step is to use the longitude and latitude coordinates to map the relative location of the sensor with is corresponding AQI Index color! The widget below contains two sliders. One represents the Latitude value and the other is the Longitude value.
Slide the sliders left and right to display a mapping of the sensors in that latitude and longitude region, or use your cursor to drag the mapping area.
Hint: Berkeley, CA - Lat: 37.871666 / Lon: -122.272781
def map(Latitude ,Longitude):
m = folium.Map(width=500, height=400, location=[Latitude, Longitude])
for i in np.arange(len(berkeleyData) - 1):
folium.Marker(
location=[berkeleyData.iloc[i]['lat'], berkeleyData.iloc[i]['lon']],
popup=berkeleyData.iloc[i]['name'],
icon=folium.Icon(color=berkeleyData.iloc[i]['code']),
).add_to(m)
display(m)
interact(map, Latitude = (36, 38, 0.001) , Longitude = (-123, -121, 0.001));
## UC Berkeley,CA - Lat: 37.871666 / Lon: -122.272781
interactive(children=(FloatSlider(value=37.0, description='Latitude', max=38.0, min=36.0, step=0.001), FloatSl…
Now that we have created a map we can easily see what the AQI index is across the city!
QUESTION: What do you notice about the map?
Your answer here
Developed By: Melisa Esqueda, Maham Bawaney & Karalyn Chong