绘制线型回归模型的函数
拟合不同的模型
条件于其他变量的回归线
设置图表的大小、形状
在其他情况下绘制回归线
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
np.random.seed(sum(map(ord, "regression")))
%matplotlib inline
tips = sns.load_dataset("tips") # 所有的数据集在https://github.com/mwaskom/seaborn-data
sns.regplot()和sns.lmplot()函数
sns.regplot(x="total_bill", y="tip", data=tips) # 默认绘制散点图、线性回归线、95%置信区间
<matplotlib.axes._subplots.AxesSubplot at 0x9de94a8>
sns.lmplot(x="total_bill", y="tip", data=tips) # 默认绘制散点图、线性回归线、95%置信区间
<seaborn.axisgrid.FacetGrid at 0xa049da0>
数据集
参数
sns.regplot()的参数是sns.lmplot()的子集,因而后续会以sns.lmplot()进行演示
sns.lmplot(x="size", y="tip", data=tips) # 当某个变量是离散值时,散点图的效果比较差,难以观察数据的分布特征
<seaborn.axisgrid.FacetGrid at 0xa1aa940>
给离散变量添加噪声
sns.lmplot(x="size", y="tip", data=tips, x_jitter=.05) # 设置jitter参数,给离散变量添加噪声,噪声只会影响散点图的绘制,不影响回归线的拟合
<seaborn.axisgrid.FacetGrid at 0xb8bc358>
Apply this function to each unique value of x and plot the resulting estimate. This is useful when x is a discrete variable. If x_ci is not None, this estimate will be bootstrapped and a confidence interval will be drawn.
sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean) # x_ci参数默认为'ci',绘制在每个离散值上y的置信区间(图中原点两侧延伸出的线条)
<seaborn.axisgrid.FacetGrid at 0xba25320>
anscombe = sns.load_dataset("anscombe")
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
ci=None, scatter_kws={"s": 80}) # ci参数设置是否绘制置信区间,scatter_kws字典会将参数传给plt.scatter()方法(本例中设置散点大小)
<seaborn.axisgrid.FacetGrid at 0xbd20e48>
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
ci=None, scatter_kws={"s": 80})
<seaborn.axisgrid.FacetGrid at 0xbf3ccc0>
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
order=2, ci=None, scatter_kws={"s": 80}) # 设置order=2,绘制2次多项式回归线
<seaborn.axisgrid.FacetGrid at 0xbbfca90>
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
ci=None, scatter_kws={"s": 80}) # 数据集中存在异常值
<seaborn.axisgrid.FacetGrid at 0xc2a0240>
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
robust=True, ci=None, scatter_kws={"s": 80}) # 设置robust=True,在有异常值存在时,降低残差较大观测点的权重,以得到更稳健的回归线
<seaborn.axisgrid.FacetGrid at 0xc3831d0>
tips["big_tip"] = (tips.tip / tips.total_bill) > .15
sns.lmplot(x="total_bill", y="big_tip", data=tips,
y_jitter=.03)
<seaborn.axisgrid.FacetGrid at 0xc80ccc0>
sns.lmplot(x="total_bill", y="big_tip", data=tips,
logistic=True, y_jitter=.03) # 设置logistic=True,绘制逻辑回归线
<seaborn.axisgrid.FacetGrid at 0xcd955f8>
sns.lmplot(x="total_bill", y="tip", data=tips,
lowess=True) # 设置lowess=True,绘制局部回归线,由于计算量很大,不会绘制置信区间
<seaborn.axisgrid.FacetGrid at 0xc0d99e8>
sns.residplot()
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
scatter_kws={"s": 80}) # 理想情况下,残差应该随机分布于直线y=0的上下
<matplotlib.axes._subplots.AxesSubplot at 0xd1ba978>
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
scatter_kws={"s": 80}) # 散点图如果呈现出特定分布,说明线性回归不合适
<matplotlib.axes._subplots.AxesSubplot at 0xd4212e8>
sns.lmplot()与FacetGrid对象
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips) # 将两条回归线置于同一坐标系内,并用不同颜色加以区分
<seaborn.axisgrid.FacetGrid at 0xd733fd0>
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,
markers=["o", "x"], palette="Set1") # markers参数设置散点形状,palette参数设置散点颜色
<seaborn.axisgrid.FacetGrid at 0xd8fee80>
sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips) # 设置col参数,条件于第四个变量
<seaborn.axisgrid.FacetGrid at 0xdb90940>
sns.lmplot(x="total_bill", y="tip", hue="smoker",
col="time", row="sex", data=tips) # 设置col、row参数,条件于第五个变量
<seaborn.axisgrid.FacetGrid at 0xdff1940>
lmplot() uses regplot() internally and takes most of its parameters
regplot() is an axes-level function, so it draws directly onto an axes (either the currently active axes or the one provided by the ax parameter)
lmplot() is a figure-level function and creates its own figure, which is managed through a FacetGrid
f, ax = plt.subplots(figsize=(5, 6))
sns.regplot(x="total_bill", y="tip", data=tips, ax=ax) # regplot()函数,为了设置图表的大小,需要创建一个figure对象
<matplotlib.axes._subplots.AxesSubplot at 0xe5b1630>
sns.lmplot(x="total_bill", y="tip", col="day", data=tips,
col_wrap=2, size=3) # size、aspect参数控制每个分面的高度、宽度(而非整个图表的大小)
<seaborn.axisgrid.FacetGrid at 0xef44780>
sns.lmplot(x="total_bill", y="tip", col="day", data=tips,
aspect=.5) # size是每个分面的高度,aspec*size是每个分面的宽度
<seaborn.axisgrid.FacetGrid at 0xf61d470>
sns.jointplot()和kind="reg"
sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg") # sns.jointplot()中设置kind="reg",绘制回归线
<seaborn.axisgrid.JointGrid at 0xfc816a0>
PairGrid对象 + map方法、sns.pairplot() + kind参数,注意同条件于其他变量的回归线相区别
展示在不同的第三变量水平下,两个变量之间关系的变化情况
展示不同的变量与某个变量之间的关系
g = sns.PairGrid(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
size=5, aspect=.8) # 生成PairGrid对象
g.map(sns.regplot) # 使用PairGrid对象的map方法,效果与下面pairplot函数中设置kind参数相同
<seaborn.axisgrid.PairGrid at 0xe2150f0>
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
size=5, aspect=.8, kind="reg") # 设置x_vars,y_vars参数
<seaborn.axisgrid.PairGrid at 0x10fee3c8>
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
hue="smoker", size=5, aspect=.8, kind="reg") # pairplot也可以通过参数hue,绘制条件于其他变量的回归线
<seaborn.axisgrid.PairGrid at 0x11232eb8>