Lets say we're interested in the characteristics of various aircraft, for the purpose of analyzing noise levels associated therewith. We may have several reasons for improving our fluency regarding what aircraft fly in and out of the nearest airport.
Below is a typical pandas function call. We're using Python to read an Excel spreadsheet, but maybe we don't want all the columns or have specific conversions we would like to perform.
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)
Resources:
import pandas as pd
import numpy as np
URL = "https://www.faa.gov/airports/engineering/aircraft_char_database/media/FAA-Aircraft-Char-Database-v2-201810.xlsx"
df = pd.read_excel(URL, sheet_name="Aircraft Database",
usecols=['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'])
df.head()
Manufacturer | Model | Physical Class (Engine) | # Engines | |
---|---|---|---|---|
0 | Acro Sport | Acro Sport | Piston | 1 |
1 | Acro Sport | Acro Sport II | Piston | 1 |
2 | Acro Sport | Cougar | Piston | 1 |
3 | Acro Sport | Junior Ace | Piston | 1 |
4 | Acro Sport | Super Ace | Piston | 1 |
df.tail()
Manufacturer | Model | Physical Class (Engine) | # Engines | |
---|---|---|---|---|
2759 | Zlin | Zlin Z-50 | tbd | tbd |
2760 | Zlin | Zlin Z-526 Akrobat | tbd | tbd |
2761 | Zlin | Zlin Z-526 Trener Master | tbd | tbd |
2762 | Zlin | Zlin Z-626 | tbd | tbd |
2763 | Zlin | Zlin Z-726 Universal | tbd | tbd |
df.columns
Index(['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'], dtype='object')
len(df)
2764
df.shape
(2764, 25)
import IPython
# Grouping by one factor
df_man = df.groupby('Manufacturer')
# Getting all methods from the groupby object:
meth = [method_name for method_name in dir(df_man)
if callable(getattr(df_man, method_name)) & ~method_name.startswith('_')]
# Printing the result
print(IPython.utils.text.columnize(meth))
agg corr cumsum get_group mean pct_change sem transform aggregate corrwith describe head median pipe shift tshift all count diff hist min plot size var any cov expanding idxmax ngroup prod skew apply cumcount ffill idxmin nth quantile std backfill cummax fillna last nunique rank sum bfill cummin filter mad ohlc resample tail boxplot cumprod first max pad rolling take
from keyword import kwlist
print(IPython.utils.text.columnize(kwlist))
False as break del finally if lambda pass while None assert class elif for import nonlocal raise with True async continue else from in not return yield and await def except global is or try
list(df_man.groups.items())[:10]
[('AESL', Int64Index([106, 107], dtype='int64')), ('AIDC', Int64Index([121, 122, 123, 124], dtype='int64')), ('AMX', Int64Index([216, 217], dtype='int64')), ('AMX International', Int64Index([218], dtype='int64')), ('ATR (Aérospatiale/Alenia)', Int64Index([248, 249, 250, 251, 252, 253, 254, 255], dtype='int64')), ('AVRO', Int64Index([300, 301, 302, 303, 304, 305, 306], dtype='int64')), ('Acro Sport', Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')), ('Adam Aircraft Industries', Int64Index([6, 7], dtype='int64')), ('Aeritalia', Int64Index([8], dtype='int64')), ('Aermacchi / Macchi', Int64Index([9, 10, 11, 12, 13, 14, 15, 16], dtype='int64'))]
df_man.get_group("Acro Sport")
Model | Physical Class (Engine) | # Engines | |
---|---|---|---|
0 | Acro Sport | Piston | 1 |
1 | Acro Sport II | Piston | 1 |
2 | Cougar | Piston | 1 |
3 | Junior Ace | Piston | 1 |
4 | Super Ace | Piston | 1 |
5 | Super Acro Sport | Piston | 1 |
df_man.size()
Manufacturer AESL 2 AIDC 4 AMX 2 AMX International 1 ATR (Aérospatiale/Alenia) 8 .. Wing Aircraft 1 XTI Aircraft 1 Yakovlev / Jakovlev 24 Zivko Aeronautics Inc. 2 Zlin 42 Length: 424, dtype: int64
df_man.count() # how many not-missing values per column per group
Model | Physical Class (Engine) | # Engines | |
---|---|---|---|
Manufacturer | |||
AESL | 2 | 2 | 2 |
AIDC | 4 | 4 | 4 |
AMX | 2 | 2 | 2 |
AMX International | 1 | 1 | 1 |
ATR (Aérospatiale/Alenia) | 8 | 8 | 8 |
... | ... | ... | ... |
Wing Aircraft | 1 | 1 | 1 |
XTI Aircraft | 1 | 1 | 1 |
Yakovlev / Jakovlev | 24 | 24 | 24 |
Zivko Aeronautics Inc. | 2 | 2 | 2 |
Zlin | 42 | 42 | 42 |
424 rows × 3 columns
df_man.nunique() # how many unique values per column per group
Model | Physical Class (Engine) | # Engines | |
---|---|---|---|
Manufacturer | |||
AESL | 2 | 1 | 1 |
AIDC | 4 | 1 | 1 |
AMX | 2 | 1 | 1 |
AMX International | 1 | 1 | 1 |
ATR (Aérospatiale/Alenia) | 8 | 1 | 1 |
... | ... | ... | ... |
Wing Aircraft | 1 | 1 | 1 |
XTI Aircraft | 1 | 1 | 1 |
Yakovlev / Jakovlev | 21 | 2 | 2 |
Zivko Aeronautics Inc. | 2 | 1 | 1 |
Zlin | 42 | 1 | 1 |
424 rows × 3 columns
from IPython.display import YouTubeVideo
YouTubeVideo("MjHpMCIvwsY")