AdaBoost算法:
输入:训练数据集$T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\}$,其中$x_{i} \in \mathcal{X} \subseteq R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N$;弱学习算法
输出:分类器$G\left(x\right)$
2.1 使用具有权值分布$D_{m}$的训练数据集学习,得到基本分类器
\begin{align*} \\ & G_{m}\left(x\right): \mathcal{X} \to \left\{ -1, +1\right\} \end{align*}
2.2 计算$G_{m}\left(x\right)$在训练数据集上的分类误差率
\begin{align*} \\& e_{m} = P\left(G_{m}\left(x_{i}\right) \neq y_{i}\right)
\\ & = \sum_{i=1}^{N} w_{mi} I \left(G_{m}\left(x_{i}\right) \neq y_{i} \right) \end{align*}
2.3 计算$G_{m} \left(x\right)$的系数
\begin{align*} \\ & \alpha_{m} = \dfrac{1}{2} \log \dfrac{1-e_{m}}{e_{m}} \end{align*}
2.4 更新训练数据集的权值分布
\begin{align*} \\ & D_{m+1}=\left(w_{m+1,1},\cdots,w_{m+1,i},\cdots,w_{m+1,N}\right)
\\ & w_{m+1,i} = \dfrac{w_{mi}}{Z_{m}} \exp \left(- \alpha_{m} y_{i} G_{m}\left(x_{i}\right)\right),
\\ & \quad \quad = \left\{
\begin{aligned}
\ & \dfrac{w_{mi}}{Z_{m}} \exp \left(- \alpha_{m} \right), G_{m}\left(x_{i}\right) = y_{i}
\\ & \dfrac{w_{mi}}{Z_{m}} \exp \left( \alpha_{m} \right), G_{m}\left(x_{i}\right) \neq y_{i}
\end{aligned}
\right. \quad i=1,2,\cdots,N \end{align*}
其中,$Z_{m}$是规范化因子
\begin{align*} \\ & Z_{m}= \sum_{i=1}^{N} w_{mi} \exp \left(- \alpha_{m} y_{i}, G_{m}\left(x_{i}\right)\right)\end{align*}
3. 构建基本分类器的线性组合
\begin{align*} \\ & f \left( x \right) = \sum_{m=1}^{M} \alpha_{m} G_{m} \left( x \right) \end{align*}
得到最终分类器
\begin{align*} \\ & G\left(x\right) = sign\left(f\left(x\right)\right)=sign\left(\sum_{m=1}^{M} \alpha_{m} G_{m} \left( x \right)\right) \end{align*}
加法模型 \begin{align*} \\ & f \left( x \right) = \sum_{m=1}^{M} \beta_{m} b\left(x;\gamma_{m}\right) \end{align*} 其中,$b\left(x;\gamma_{m}\right)$为基函数,$\beta_{m}$为基函数系数,$\gamma_{m}$为基函数参数。
在给定训练数据及损失函数$L\left(y,f\left(x\right)\right)$的条件下,学习加法模型$f\left(x\right)$成为经验风险极小化问题 \begin{align*} \\ & \min_{\beta_{m},\gamma_{m}} \sum_{i=1}^{N} L \left( y_{i}, \sum_{m=1}^{M} \beta_{m} b\left(x_{i};\gamma_{m}\right) \right) \end{align*}
学习加法模型,从前向后每一步只学习一个基函数及其系数,即每步只优化 \begin{align*} \\ & \min_{\beta,\gamma} \sum_{i=1}^{N} L \left( y_{i}, \beta b\left(x_{i};\gamma\right) \right) \end{align*}
前向分布算法:
输入:训练数据集$T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\}$,损失函数$L\left(y,f\left(x\right)\right)$;基函数集$\left\{b\left(x;\gamma\right)\right\}$
输出:加法模型$f\left(x\right)$
2.1 极小化损失函数
\begin{align*} \\ & \left(\beta_{m},\gamma_{m}\right) = \arg \min_{\beta,\gamma} \sum_{i=1}^{N} L \left( y_{i},f_{m-1} \left(x_{i}\right) + \beta b\left(x_{i};\gamma \right)\right) \end{align*}
得到参数$\beta_{m},\gamma_{m}$
2.2 更新
\begin{align*} \\& f_{m} \left(x\right) = f_{m-1} \left(x\right) + \beta_{m} b\left(x;\gamma_{m}\right) \end{align*}
3. 得到加法模型
\begin{align*} \\ & f \left( x \right) = f_{M} \left( x \right) = \sum_{m=1}^{M} \beta_{m} b \left( x; \gamma_{m} \right) \end{align*}
训练数据集
\begin{align*} \\& T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{align*}
其中,$x_{i} \in \mathcal{X} \subseteq R^{n}, y_{i} \in \mathcal{Y} \subseteq R, i = 1, 2, \cdots, N$。
将输入空间$\mathcal{X}$划分为$J$个互不相交的区域$R_{1},R_{2},\cdots,R_{J}$,且在每个区域上确定输出的常量$c_{j}$,则回归树 \begin{align*} \\& T \left(x; \varTheta\right) = \sum_{j=1}^{J} c_{j} I \left(x \in R_{j}\right) \end{align*} 其中,参数$\varTheta = \left\{ \left(R_{1}, c_{1}\right),\left(R_{2}, c_{2}\right),\cdots,\left(R_{J}, c_{J}\right) \right\}$表示树的区域划分和各区域上的常数。$J$是回归树的负责度即叶结点个数。
回归提升树使用前向分布算法 \begin{align*} \\& f_{0}=0 \\ & f_{m}\left(x\right) = f_{m-1}\left(x\right) + T \left(x; \varTheta_{m}\right) \\ & f_{M} = \sum_{m=1}^{M} T \left(x; \varTheta_{m}\right) \end{align*}
在前向分布算法的第$m$步给定当前模型$f_{m-1}\left(x\right)$,模型参数 \begin{align*} \\& \hat \varTheta_{m} = \arg \min_{\varTheta_{m}} \sum_{i=1}^{N} L \left( y_{i}, f_{m-1}\left(x_{i}\right) + T \left( x_{i}; \varTheta_{m} \right) \right) \end{align*} 得到第$m$棵树的参数$\hat \varTheta_{m}$
当采用平方误差损失函数 \begin{align*} \\& L \left( y, f_{m-1}\left(x\right)+T\left(x;\varTheta_{m}\right)\right) \\ & = \left[y-f_{m-1}\left(x\right)-T\left(x;\varTheta_{m}\right)\right]^{2} \\ & = \left[r-T\left(x;\varTheta_{m}\right)\right]^{2}\end{align*} 其中,$r=y-f_{m-1}\left(x\right)$是当前模型拟合数据的残差。
回归提升树算法:
输入:训练数据集$T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},x_{i} \in \mathcal{X} \subseteq R^{n}, y_{i} \in \mathcal{Y} \subseteq R, i = 1, 2, \cdots, N$
输出:回归提升树$f_{M}\left(x\right)$
2.1 计算残差
\begin{align*} \\ & r_{mi}=y_{i}-f_{m-1}\left(x_{i}\right),\quad i=1,2,\cdots,N \end{align*}
2.2 拟合残差$r_{mi}$学习一个回归树,得到$T\left(x;\varTheta_{m}\right)$
2.3 更新$f_{m}=f_{m-1}\left(x\right)+T\left(x;\varTheta_{m}\right)$
3. 得到回归提升树
\begin{align*} \\ & f_{M} \left( x \right) = \sum_{m=1}^{M} T \left(x;\varTheta_{m}\right) \end{align*}
梯度提升算法:
输入:训练数据集$T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},x_{i} \in \mathcal{X} \subseteq R^{n}, y_{i} \in \mathcal{Y} \subseteq R, i = 1, 2, \cdots, N$,损失函数$L\left(y,f\left(x\right)\right)$
输出:回归树$\hat f\left(x\right)$
2.1 对$i=1,2,\cdots,N$计算
\begin{align*} \\ & r_{mi}=- \left[ \dfrac {\partial L \left(y_{i},f\left(x_{i}\right) \right)}{\partial f \left(x_{i} \right)}\right]_{f\left(x\right)=f_{m-1}\left(x\right)} \end{align*}
2.2 对$r_{mi}$拟合回归树,得到第$m$棵树的叶结点区域$R_{mj},j=1,2,\cdots,J$
2.3 对$j=1,2,\cdots,J$计算
\begin{align*} \\ & c_{mj}=\arg \min_{c} \sum_{x_{i} \in R_{mj}} L \left( y_{i},f_{m-1} \left(x_{i}\right)+c \right) \end{align*}
2.4 更新$f_{m}\left(x\right)= f_{m-1}\left(x\right) + \sum_{j=1}^{J} c_{mj} I \left(x \in R_{mj} \right)$
3. 得到回归树
\begin{align*} \\ & \hat f \left( x \right) = f_{M} \left( x \right) = \sum_{m=1}^{M} \sum_{j=1}^{J} c_{mj} I \left( x \in R_{mj} \right) \end{align*}