To start plotting, add these to your import statements at the top of your file:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt # sometimes we want to tweak plots
# | Step | Note |
---|---|---|
0 | Ask a question about the data | Ex: What is the distribution of unemployment in each state? |
1 | Q > What the plot should look like. Draw it! | Draw it on paper! |
2 | Plot appearance > which plot function/options to use | Find a pd or sns plot example that looks like that. |
3 | The function dictates how data should be formatted before you call the plot | Key: Wide or tall? |
sns
examples) and more help on how can I make itseaborn
tutorial pageseaborn
galleryWith seaborn
, I usually use this syntax that looks something this like for graphing. (Delete the "<" and ">" and replace the inside with what you need.) Obviously, you'll see many examples in this chapter that deviate from this. Usually, this is because you don't need to explicitly declare "data", or because "x" is just assumed as all variables in the dataset.
sns.<function>(data = <dataframe> [optional data functions],
x = '<varname>', y = '<varname>',
[optional arguments for specific plots] )
{admonition}
:class: tip
1. Sometimes I add `.query()` after the dataframe name to filter outliers
2. Sometimes I add `.sample()` afterwards to plot a more manageable amount of data.
Example:
sns.boxplot(data=ccm.query('td_a < 1 & td_a > 0'),
x='td_a')
Generally, to plot in Python:
sns
functionpd
or sns
plotting functions.panda
's plotting functions are simple and good for early-stage exploration and some simple graphics (bar, "barh", scatter, and density), but seaborn
has many more built-in options, has simpler syntax, and is easier to use, IMO.matplotlib
commands after the main plot function. Matplotlib is a full-powered (but confusing as heck) graphing package. In fact, both pandas
and seaborn
are just using matplotlib
, but they hide the gory details for us. Thanks, seaborn
!{warning}
After syntax errors, **most graphing pain comes from insufficient data wrangling.** Most plotting functions have assumptions about how the data is shaped. Data might be unwieldy but we can control it:
How do we wrangle our data to make plot functions happy?
Seaborn
expects data shaped like this. Long data is generally better for data analysis and visualization (even aside from Seaborn's assumptions)pandas
plot function, you might have to reshape (temporarily) your data to the wider "output shape" that corresponds to the graph type you're generating.