import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Let's read a CSV file from the web directly into Pandas.
data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv')
data
Info about the dataframe:
data.info()
Detailed memory usage:
You can ask for statistical information per column or often per dataframe:
Plots are easy; try .plot
, .scatter
, or .hist
:
You can select subsets, such as mpg > 42
:
Or you can use groupby('cylinders')
to work on per-cylinder groups:
If you want legends, it's no longer one line, but still simple:
for name, grp in data.groupby('cylinders').mpg:
grp.hist(label=name)
plt.legend();
The category type is better for data.origin, and saves memory too!
data.origin.astype('category')
You can select using operators or isin:
Now, let's convert the name into make and model:
cars = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv',
dtype={"origin":"category"})
...
# cars["make"], cars["model"] = makemodel[0].astype('category'), makemodel[1]
# del cars["name"]
We can put make and model together again:
Math: To convert mpg to liters per 100 kilometers:
$$ lp100km = \frac{1}{mpg} \cdot 62.1371 \frac{\mathrm{miles}}{\mathrm{100 km}} \cdot 3.78541 \frac{\mathrm{liter}}{\mathrm{gallon}} $$
Also see:
Not covered above: