import pandas as pd
import folium
df = pd.read_csv('data/bfro_reports_geocoded.csv')
df.head()
observed | location_details | county | state | season | title | latitude | longitude | date | number | ... | precip_intensity | precip_probability | precip_type | pressure | summary | uv_index | visibility | wind_bearing | wind_speed | location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Ed L. was salmon fishing with a companion in P... | East side of Prince William Sound | Valdez-Chitina-Whittier County | Alaska | Fall | NaN | NaN | NaN | NaN | 1261.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | heh i kinda feel a little dumb that im reporti... | the road is off us rt 80, i dont know the exit... | Warren County | New Jersey | Fall | NaN | NaN | NaN | NaN | 438.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | I was on my way to Claremont from Lebanon on R... | Close to Claremont down 120 not far from Kings... | Sullivan County | New Hampshire | Summer | Report 55269: Dawn sighting at Stevens Brook o... | 43.41549 | -72.33093 | 2016-06-07 | 55269.0 | ... | 0.001 | 0.7 | rain | 998.87 | Mostly cloudy throughout the day. | 6.0 | 9.70 | 262.0 | 0.49 | POINT(-72.33093000000001 43.415490000000005) |
3 | I was northeast of Macy Nebraska along the Mis... | Latitude & Longitude : 42.158230 -96.344197 | Thurston County | Nebraska | Spring | Report 59757: Possible daylight sighting of a ... | 42.15685 | -96.34203 | 2018-05-25 | 59757.0 | ... | 0.000 | 0.0 | NaN | 1008.07 | Partly cloudy in the morning. | 10.0 | 8.25 | 193.0 | 3.33 | POINT(-96.34203000000001 42.15685) |
4 | While this incident occurred a long time ago, ... | Ward County, Just outside of a the Minuteman T... | Ward County | North Dakota | Spring | Report 751: Hunter describes described being s... | 48.25422 | -101.31660 | 2000-04-21 | 751.0 | ... | NaN | NaN | rain | 1011.47 | Partly cloudy until evening. | 6.0 | 10.00 | 237.0 | 11.14 | POINT(-101.3166 48.254220000000004) |
5 rows × 29 columns
This is great! We have the observed column which has details of the observation, location details of where the observation happened, county, state, season the observation happened in, the report title, latitude and longitude, date etc.
Let's take a look at the shape of the dataset.
df.shape
(4747, 29)
4,747 records. Not bad for a little analysis.
Next let's see what the columns in the dataset are.
df.columns
Index(['observed', 'location_details', 'county', 'state', 'season', 'title', 'latitude', 'longitude', 'date', 'number', 'classification', 'geohash', 'temperature_high', 'temperature_mid', 'temperature_low', 'dew_point', 'humidity', 'cloud_cover', 'moon_phase', 'precip_intensity', 'precip_probability', 'precip_type', 'pressure', 'summary', 'uv_index', 'visibility', 'wind_bearing', 'wind_speed', 'location'], dtype='object')
Ok, looks like we have plenty of geo data to plot maps with. Let's first start by looking at which states have the most observations.
df['state'].value_counts()
Washington 563 California 402 Florida 292 Ohio 276 Oregon 241 Illinois 232 Texas 215 Michigan 208 Missouri 141 Georgia 121 Colorado 121 Kentucky 112 Pennsylvania 111 New York 102 West Virginia 100 Arkansas 95 Alabama 91 Tennessee 91 Oklahoma 85 Arizona 84 Idaho 81 Wisconsin 80 North Carolina 79 Indiana 75 Virginia 72 Minnesota 69 New Jersey 63 Utah 57 Iowa 56 Montana 45 Kansas 42 Louisiana 40 New Mexico 40 South Carolina 38 Maryland 34 Massachusetts 27 Wyoming 27 Mississippi 21 Alaska 20 Connecticut 15 Nebraska 15 Maine 14 New Hampshire 13 South Dakota 11 Vermont 9 Nevada 7 Rhode Island 5 Delaware 5 North Dakota 4 Name: state, dtype: int64
Washington and California sure love their Bigfoot eh?
For an analysis that's a little bit more intuitive, let's plot this out really quick. I'll use the Altair plotting library but you can use whatever you want.
import altair as alt
bf_states = alt.Chart(df).mark_bar().encode(
x=alt.X('count(state):Q'),
y=alt.Y('state:N', sort='-x')
)
bf_states