有简洁高效的seaborn,声明式的altair,还有一键生成的voila,以及不用写react的dash
%load_ext autoreload
%autoreload 2
%matplotlib inline
from matplotlib.font_manager import _rebuild
_rebuild()
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid", {"font.sans-serif": ["SimHei", "Arial"]})
import pandas_alive
import pandas as pd
import numpy as np
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
iris = sns.load_dataset("iris")
tips = sns.load_dataset("tips")
df_covid = pd.read_json("3.data-viz/timeseries.json")
df_covid.index = pd.DatetimeIndex(df_covid.iloc[:, 0].apply(lambda _: _["date"]))
df_covid.index.name = "日期"
df_covid = df_covid.applymap(lambda _: int(_["confirmed"]))
df_covid.replace(0, np.nan, inplace=True)
top20 = df_covid.iloc[-1].sort_values().tail(20).index
df_covid = df_covid[top20]
data = np.random.multivariate_normal(mean=[0, 0], cov=[[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=["x", "y"])
plt.figure(figsize=(6, 6))
for col in "xy":
plt.hist(data[col], density=True, alpha=0.5)
除了频次直方图,我们还可以用KDE获取变量的平滑分布估计图。Seaborn通过sns.kdeplot
实现:
plt.figure(figsize=(6, 6))
for col in "xy":
sns.kdeplot(data[col], shade=True)
用sns.distplot
可以让频次直方图与KDE叠加:
plt.figure(figsize=(6, 6))
for col in "xy":
sns.distplot(data[col])
如果向kdeplot
输入的是二维数据集,那么就可以获得一个二维数据可视化图:
plt.figure(figsize=(6, 6))
sns.kdeplot(data.x, data.y);
用sns.jointplot
可以同时看到两个变量的联合分布与单变量分布:
with sns.axes_style("white"):
sns.jointplot("x", "y", data, kind="kde")
可以向jointplot
函数传递一些参数。例如,可以用六边形块代替频次直方图:
with sns.axes_style("white"):
sns.jointplot("x", "y", data, kind="hex")
用sns.pairplot
探索多维数据不同维度间的相关性,例如费舍尔鸢尾花数据集记录了3种鸢尾花的花瓣与花萼数据:
sns.pairplot(iris, hue="species");
sns.FacetGrid
获取数据子集的频次直方图。例如,饭店服务员收小费的数据集:
tips["tip_pct"] = 100 * tips["tip"] / tips["total_bill"]
tips.head()
total_bill | tip | sex | smoker | day | time | size | tip_pct | |
---|---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | 5.944673 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 | 16.054159 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | 16.658734 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 | 13.978041 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | 14.680765 |
grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True, height=4)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15))
<seaborn.axisgrid.FacetGrid at 0x1a1ddd0f50>
展示分类数据分布情况:
Categorical scatterplots:
stripplot
(with kind="strip"
; the default)swarmplot
(with kind="swarm"
)Categorical distribution plots:
boxplot
(with kind="box"
)violinplot
(with kind="violin"
)boxenplot
(with kind="boxen"
)Categorical estimate plots:
pointplot
(with kind="point"
)barplot
(with kind="bar"
)countplot
(with kind="count"
)def show_factor(kind="strip"):
g = sns.catplot("day", "total_bill", "sex", kind=kind, data=tips, height=7)
g.set_axis_labels("日期", "小费金额")
g._legend.set_bbox_to_anchor((1.1, 0.5))
show_factor()
show_factor(kind="swarm")