Most NTSB reports include a narrative written by the investigators. This notebook searches them for references to mast bumping.
import os
import pandas as pd
/home/palewire/.local/share/virtualenvs/helicopter-accident-analysis-OQ5AjB6w/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, **kwds)
Read in the narratives.
%store -r input_dir
%store -r output_dir
read_df = lambda name: pd.read_csv(os.path.join(output_dir, name))
narratives = read_df("narratives.csv")
Join them to fatal U.S. helicopter accidents.
helicopter_by_accident = read_df("standardized-helicopters-by-accident.csv")
us_helicopter_by_accident = helicopter_by_accident[helicopter_by_accident.in_usa == True]
merged = pd.merge(
us_helicopter_by_accident,
narratives,
on=["event_id", "aircraft_id"]
)
Search them for terms related to mast bumping.
def search(df, string):
"""
Searches the provided DataFrame's columns for the provided string.
Returns the filtered result as a new DataFrame.
"""
result_rows = []
for c in df.dtypes[df.dtypes == 'object'].index:
result_rows.append(df[df[c].str.lower().str.contains(string.lower(), na=False)])
return pd.concat(result_rows).drop_duplicates()
hits = pd.concat([
search(merged, "mast bumping"),
search(merged, "rocking"),
search(merged, "vibration"),
search(merged, "mast bump "),
]).drop_duplicates()
len(hits)
88
Output the result.
hits.sort_values("event_id", ascending=True).to_csv(os.path.join(output_dir, "searched-narratives.csv"), encoding="utf-8", index=False)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-3373456afe72> in <module>() ----> 1 hits.sort_values("event_id", ascending=True).to_csv(os.path.join(output_dir, "searched-narratives.csv"), encoding="utf-8", index=False) NameError: name 'hits' is not defined