#!/usr/bin/env python
# coding: utf-8

# **수식**
# 
# *이 노트북은 책에 있는 모든 공식을 모아 놓은 것입니다.*
# 
# **주의**: 깃허브의 노트북 뷰어는 적절하게 수식을 표현하지 못합니다. 로컬에서 주피터를 실행하여 이 노트북을 보거나 [nbviewer](http://nbviewer.jupyter.org/github/rickiepark/handson-ml/blob/master/book_equations.ipynb)를 사용하세요.

# # 1장
# **식 1-1: 간단한 선형 모델**
# 
# $
# \text{삶의_만족도} = \theta_0 + \theta_1 \times \text{1인당_GDP}
# $

# # 2장
# **식 2-1: 평균 제곱근 오차 (RMSE)**
# 
# $
# \text{RMSE}(\mathbf{X}, h) = \sqrt{\frac{1}{m}\sum\limits_{i=1}^{m}\left(h(\mathbf{x}^{(i)}) - y^{(i)}\right)^2}
# $
# 
# 
# **표기법 (72 페이지):**
# 
# $
#   \mathbf{x}^{(1)} = \begin{pmatrix}
#   -118.29 \\
#   33.91 \\
#   1,416 \\
#   38,372
#   \end{pmatrix}
# $
# 
# 
# $
#   y^{(1)}=156,400
# $
# 
# 
# $
#   \mathbf{X} = \begin{pmatrix}
#   (\mathbf{x}^{(1)})^T \\
#   (\mathbf{x}^{(2)})^T\\
#   \vdots \\
#   (\mathbf{x}^{(1999)})^T \\
#   (\mathbf{x}^{(2000)})^T
#   \end{pmatrix} = \begin{pmatrix}
#   -118.29 & 33.91 & 1,416 & 38,372 \\
#   \vdots & \vdots & \vdots & \vdots \\
#   \end{pmatrix}
# $
# 
# 
# **식 2-2: 평균 절대 오차**
# 
# $
# \text{MAE}(\mathbf{X}, h) = \frac{1}{m}\sum\limits_{i=1}^{m}\left| h(\mathbf{x}^{(i)}) - y^{(i)} \right|
# $
# 
# **$\ell_k$ 노름 (74 페이지):**
# 
# $ \left\| \mathbf{v} \right\| _k = (\left| v_0 \right|^k + \left| v_1 \right|^k + \dots + \left| v_n \right|^k)^{\frac{1}{k}} $
# 

# # 3장
# **식 3-1: 정밀도**
# 
# $
# \text{정밀도} = \cfrac{TP}{TP + FP}
# $
# 
# 
# **식 3-2: 재현율**
# 
# $
# \text{재현율} = \cfrac{TP}{TP + FN}
# $
# 
# 
# **식 3-3: $F_1$ 점수**
# 
# $
# F_1 = \cfrac{2}{\cfrac{1}{\text{정밀도}} + \cfrac{1}{\text{재현율}}} = 2 \times \cfrac{\text{정밀도}\, \times \, \text{재현율}}{\text{정밀도}\, + \, \text{재현율}} = \cfrac{TP}{TP + \cfrac{FN + FP}{2}}
# $
# 
# 

# # 4장
# **식 4-1: 선형 회귀 모델의 예측**
# 
# $
# \hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n
# $
# 
# 
# **식 4-2: 선형 회귀 모델의 예측 (벡터 형태)**
# 
# $
# \hat{y} = h_{\boldsymbol{\theta}}(\mathbf{x}) = \boldsymbol{\theta}^T \cdot \mathbf{x}
# $
# 
# 
# **식 4-3: 선형 회귀 모델의 MSE 비용 함수**
# 
# $
# \text{MSE}(\mathbf{X}, h_{\boldsymbol{\theta}}) = \dfrac{1}{m} \sum\limits_{i=1}^{m}{(\boldsymbol{\theta}^T  \mathbf{x}^{(i)} - y^{(i)})^2}
# $
# 
# 
# **식 4-4: 정규 방정식**
# 
# $
# \hat{\boldsymbol{\theta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
# $
# 
# 
# ** 편도함수 기호 (165 페이지):**
# 
# $\frac{\partial}{\partial \theta_j} \text{MSE}(\boldsymbol{\theta})$
# 
# 
# **식 4-5: 비용 함수의 편도함수**
# 
# $
# \dfrac{\partial}{\partial \theta_j} \text{MSE}(\boldsymbol{\theta}) = \dfrac{2}{m}\sum\limits_{i=1}^{m}(\boldsymbol{\theta}^T \mathbf{x}^{(i)} - y^{(i)})\, x_j^{(i)}
# $
# 
# 
# **식 4-6: 비용 함수의 그래디언트 벡터**
# 
# $
# \nabla_{\boldsymbol{\theta}}\, \text{MSE}(\boldsymbol{\theta}) =
# \begin{pmatrix}
#  \frac{\partial}{\partial \theta_0} \text{MSE}(\boldsymbol{\theta}) \\
#  \frac{\partial}{\partial \theta_1} \text{MSE}(\boldsymbol{\theta}) \\
#  \vdots \\
#  \frac{\partial}{\partial \theta_n} \text{MSE}(\boldsymbol{\theta})
# \end{pmatrix}
#  = \dfrac{2}{m} \mathbf{X}^T (\mathbf{X} \boldsymbol{\theta} - \mathbf{y})
# $
# 
# 
# **식 4-7: 경사 하강법의 스텝**
# 
# $
# \boldsymbol{\theta}^{(\text{다음 스텝})}\,\,\, = \boldsymbol{\theta} - \eta \nabla_{\boldsymbol{\theta}}\, \text{MSE}(\boldsymbol{\theta})
# $
# 
# 
# $ O(\frac{1}{\epsilon}) $
# 
# 
# $ \hat{y} = 0.56 x_1^2 + 0.93 x_1 + 1.78 $
# 
# 
# $ y = 0.5 x_1^2 + 1.0 x_1 + 2.0 + \text{가우시안 잡음} $
# 
# 
# $ \dfrac{(n+d)!}{d!\,n!} $
# 
# 
# $ \alpha \sum_{i=1}^{n}{{\theta_i}^2}$
# 
# 
# **식 4-8: 릿지 회귀의 비용 함수**
# 
# $
# J(\boldsymbol{\theta}) = \text{MSE}(\boldsymbol{\theta}) + \alpha \dfrac{1}{2}\sum\limits_{i=1}^{n}\theta_i^2
# $
# 
# 
# **식 4-9: 릿지 회귀의 정규 방정식**
# 
# $
# \hat{\boldsymbol{\theta}} = (\mathbf{X}^T \mathbf{X} + \alpha \mathbf{A})^{-1} \mathbf{X}^T \mathbf{y}
# $
# 
# 
# **식 4-10: 라쏘 회귀의 비용 함수**
# 
# $
# J(\boldsymbol{\theta}) = \text{MSE}(\boldsymbol{\theta}) + \alpha \sum\limits_{i=1}^{n}\left| \theta_i \right|
# $
# 
# 
# **식 4-11: 라쏘 회귀의 서브그래디언트 벡터**
# 
# $
# g(\boldsymbol{\theta}, J) = \nabla_{\boldsymbol{\theta}}\, \text{MSE}(\boldsymbol{\theta}) + \alpha
# \begin{pmatrix}
#   \operatorname{sign}(\theta_1) \\
#   \operatorname{sign}(\theta_2) \\
#   \vdots \\
#   \operatorname{sign}(\theta_n) \\
# \end{pmatrix} \quad \text{여기서 } \operatorname{sign}(\theta_i) =
# \begin{cases}
# -1 & \theta_i < 0 \text{일 때 } \\
# 0 & \theta_i = 0 \text{일 때 } \\
# +1 & \theta_i > 0 \text{일 때 }
# \end{cases}
# $
# 
# 
# **식 4-12: 엘라스틱넷 비용 함수**
# 
# $
# J(\boldsymbol{\theta}) = \text{MSE}(\boldsymbol{\theta}) + r \alpha \sum\limits_{i=1}^{n}\left| \theta_i \right| + \dfrac{1 - r}{2} \alpha \sum\limits_{i=1}^{n}{\theta_i^2}
# $
# 
# 
# **식 4-13: 로지스틱 회귀 모델의 확률 추정(벡터 표현식)**
# 
# $
# \hat{p} = h_{\boldsymbol{\theta}}(\mathbf{x}) = \sigma(\boldsymbol{\theta}^T \mathbf{x})
# $
# 
# 
# **식 4-14: 로지스틱 함수**
# 
# $
# \sigma(t) = \dfrac{1}{1 + \exp(-t)}
# $
# 
# 
# **식 4-15: 로지스틱 회귀 모델 예측**
# 
# $
# \hat{y} =
# \begin{cases}
#   0 & \hat{p} < 0.5 \text{일 때 } \\
#   1 & \hat{p} \geq 0.5 \text{일 때 } 
# \end{cases}
# $
# 
# 
# **식 4-16: 하나의 훈련 샘플에 대한 비용 함수**
# 
# $
# c(\boldsymbol{\theta}) =
# \begin{cases}
#   -\log(\hat{p}) & y = 1 \text{일 때 } \\
#   -\log(1 - \hat{p}) & y = 0 \text{일 때 }
# \end{cases}
# $
# 
# 
# **식 4-17: 로지스틱 회귀의 비용 함수(로그 손실)**
# 
# $
# J(\boldsymbol{\theta}) = -\dfrac{1}{m} \sum\limits_{i=1}^{m}{\left[ y^{(i)} log\left(\hat{p}^{(i)}\right) + (1 - y^{(i)}) log\left(1 - \hat{p}^{(i)}\right)\right]}
# $
# 
# 
# **식 4-18: 로지스틱 비용 함수의 편도함수**
# 
# $
# \dfrac{\partial}{\partial \theta_j} \text{J}(\boldsymbol{\theta}) = \dfrac{1}{m}\sum\limits_{i=1}^{m}\left(\mathbf{\sigma(\boldsymbol{\theta}}^T \mathbf{x}^{(i)}) - y^{(i)}\right)\, x_j^{(i)}
# $
# 
# 
# **식 4-19: 클래스 k에 대한 소프트맥스 점수**
# 
# $
# s_k(\mathbf{x}) = ({\boldsymbol{\theta}^{(k)}})^T \mathbf{x}
# $
# 
# 
# **식 4-20: 소프트맥스 함수**
# 
# $
# \hat{p}_k = \sigma\left(\mathbf{s}(\mathbf{x})\right)_k = \dfrac{\exp\left(s_k(\mathbf{x})\right)}{\sum\limits_{j=1}^{K}{\exp\left(s_j(\mathbf{x})\right)}}
# $
# 
# 
# **식 4-21: 소프트맥스 회귀 분류기의 예측**
# 
# $
# \hat{y} = \underset{k}{\operatorname{argmax}} \, \sigma\left(\mathbf{s}(\mathbf{x})\right)_k = \underset{k}{\operatorname{argmax}} \, s_k(\mathbf{x}) = \underset{k}{\operatorname{argmax}} \, \left( ({\boldsymbol{\theta}^{(k)}})^T \mathbf{x} \right)
# $
# 
# 
# **식 4-22: 크로스 엔트로피 비용 함수**
# 
# $
# J(\boldsymbol{\Theta}) = - \dfrac{1}{m}\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{K}{y_k^{(i)}\log\left(\hat{p}_k^{(i)}\right)}
# $
# 
# **두 확률 분포 $p$ 와 $q$ 사이의 크로스 엔트로피 (196 페이지):**
# $ H(p, q) = -\sum\limits_{x}p(x) \log q(x) $
# 
# 
# **식 4-23: 클래스 _k_ 에 대한 크로스 엔트로피의 그래디언트 벡터**
# 
# $
# \nabla_{\boldsymbol{\theta}^{(k)}} \, J(\boldsymbol{\Theta}) = \dfrac{1}{m} \sum\limits_{i=1}^{m}{ \left ( \hat{p}^{(i)}_k - y_k^{(i)} \right ) \mathbf{x}^{(i)}}
# $
# 

# # 5장
# **식 5-1: 가우시안 RBF**
# 
# $
# {\displaystyle \phi_{\gamma}(\mathbf{x}, \boldsymbol{\ell})} = {\displaystyle \exp({\displaystyle -\gamma \left\| \mathbf{x} - \boldsymbol{\ell} \right\|^2})}
# $
# 
# 
# **식 5-2: 선형 SVM 분류기의 예측**
# 
# $
# \hat{y} = \begin{cases}
#  0 & \mathbf{w}^T \mathbf{x} + b < 0 \text{일 때 } \\
#  1 & \mathbf{w}^T \mathbf{x} + b \geq 0 \text{일 때 }
# \end{cases}
# $
# 
# 
# **식 5-3: 하드 마진 선형 SVM 분류기의 목적 함수**
# 
# $
# \begin{split}
# &\underset{\mathbf{w}, b}{\operatorname{minimize}}\,{\frac{1}{2}\mathbf{w}^T \mathbf{w}} \\
# &[\text{조건}] \, i = 1, 2, \dots, m \text{일 때} \quad t^{(i)}(\mathbf{w}^T \mathbf{x}^{(i)} + b) \ge 1
# \end{split}
# $
# 
# 
# **식 5-4: 소프트 마진 선형 SVM 분류기의 목적 함수**
# 
# $
# \begin{split}
# &\underset{\mathbf{w}, b, \mathbf{\zeta}}{\operatorname{minimize}}\,{\dfrac{1}{2}\mathbf{w}^T \mathbf{w} + C \sum\limits_{i=1}^m{\zeta^{(i)}}}\\
# &[\text{조건}] \, i = 1, 2, \dots, m \text{일 때} \quad t^{(i)}(\mathbf{w}^T \mathbf{x}^{(i)} + b) \ge 1 - \zeta^{(i)} \text{ 이고} \quad \zeta^{(i)} \ge 0
# \end{split}
# $
# 
# 
# **식 5-5: QP 문제**
# 
# $
# \begin{split}
# \underset{\mathbf{p}}{\text{minimize}} \, & \dfrac{1}{2} \mathbf{p}^T \mathbf{H} \mathbf{p} \, + \, \mathbf{f}^T \mathbf{p}  \\
# [\text{조건}] \, & \mathbf{A} \mathbf{p} \le \mathbf{b} \\
# \text{여기서 } &
# \begin{cases}
#   \mathbf{p} \, \text{는 }n_p\text{ 차원의 벡터 (} n_p = \text{모델 파라미터 수)}\\
#   \mathbf{H} \, \text{는 }n_p \times n_p \text{ 크기 행렬}\\
#   \mathbf{f} \, \text{는 }n_p\text{ 차원의 벡터}\\
#   \mathbf{A} \, \text{는 } n_c \times n_p \text{ 크기 행렬 (}n_c = \text{제약 수)}\\
#   \mathbf{b} \, \text{는 }n_c\text{ 차원의 벡터}
# \end{cases}
# \end{split}
# $
# 
# 
# **식 5-6: 선형 SVM 목적 함수의 쌍대 형식**
# 
# $
# \begin{split}
# &\underset{\mathbf{\alpha}}{\operatorname{minimize}} \,
# \dfrac{1}{2}\sum\limits_{i=1}^{m}{
#   \sum\limits_{j=1}^{m}{
#   \alpha^{(i)} \alpha^{(j)} t^{(i)} t^{(j)} {\mathbf{x}^{(i)}}^T \mathbf{x}^{(j)}
#   }
# } \, - \, \sum\limits_{i=1}^{m}{\alpha^{(i)}}\\
# &\text{[조건]}\,i = 1, 2, \dots, m \text{일 때 } \quad \alpha^{(i)} \ge 0
# \end{split}
# $
# 
# 
# **식 5-7: 쌍대 문제에서 구한 해로 원 문제의 해 계산하기**
# 
# $
# \begin{split}
# &\hat{\mathbf{w}} = \sum_{i=1}^{m}{\hat{\alpha}}^{(i)}t^{(i)}\mathbf{x}^{(i)}\\
# &\hat{b} = \dfrac{1}{n_s}\sum\limits_{\scriptstyle i=1 \atop {\scriptstyle {\hat{\alpha}}^{(i)} > 0}}^{m}{\left(t^{(i)} - ({\hat{\mathbf{w}}}^T \mathbf{x}^{(i)})\right)}
# \end{split}
# $
# 
# 
# **식 5-8: 2차 다항식 매핑**
# 
# $
# \phi\left(\mathbf{x}\right) = \phi\left( \begin{pmatrix}
#   x_1 \\
#   x_2
# \end{pmatrix} \right) = \begin{pmatrix}
#   {x_1}^2 \\
#   \sqrt{2} \, x_1 x_2 \\
#   {x_2}^2
# \end{pmatrix}
# $
# 
# 
# **식 5-9: 2차 다항식 매핑을 위한 커널 트릭**
# 
# $
# \begin{split}
# \phi(\mathbf{a})^T \phi(\mathbf{b}) & \quad = \begin{pmatrix}
#   {a_1}^2 \\
#   \sqrt{2} \, a_1 a_2 \\
#   {a_2}^2
#   \end{pmatrix}^T \begin{pmatrix}
#   {b_1}^2 \\
#   \sqrt{2} \, b_1 b_2 \\
#   {b_2}^2
# \end{pmatrix} = {a_1}^2 {b_1}^2 + 2 a_1 b_1 a_2 b_2 + {a_2}^2 {b_2}^2 \\
#  & \quad = \left( a_1 b_1 + a_2 b_2 \right)^2 = \left( \begin{pmatrix}
#   a_1 \\
#   a_2
# \end{pmatrix}^T \begin{pmatrix}
#     b_1 \\
#     b_2
#   \end{pmatrix} \right)^2 = (\mathbf{a}^T \mathbf{b})^2
# \end{split}
# $
# 
# **커널 트릭에 관한 본문 중에서 (220 페이지):**
# [...] 변환된 벡터의 점곱을 간단하게 $ ({\mathbf{x}^{(i)}}^T \mathbf{x}^{(j)})^2 $ 으로 바꿀 수 있습니다.
# 
# 
# **식 5-10: 일반적인 커널**
# 
# $
# \begin{split}
# \text{선형:} & \quad K(\mathbf{a}, \mathbf{b}) = \mathbf{a}^T \mathbf{b} \\
# \text{다항식:} & \quad K(\mathbf{a}, \mathbf{b}) = \left(\gamma \mathbf{a}^T \mathbf{b} + r \right)^d \\
# \text{가우시안 RBF:} & \quad K(\mathbf{a}, \mathbf{b}) = \exp({\displaystyle -\gamma \left\| \mathbf{a} - \mathbf{b} \right\|^2}) \\
# \text{시그모이드:} & \quad K(\mathbf{a}, \mathbf{b}) = \tanh\left(\gamma \mathbf{a}^T \mathbf{b} + r\right)
# \end{split}
# $
# 
# **식 5-11: 커널 SVM으로 예측하기**
# 
# $
# \begin{split}
# h_{\hat{\mathbf{w}}, \hat{b}}\left(\phi(\mathbf{x}^{(n)})\right) & = \,\hat{\mathbf{w}}^T \phi(\mathbf{x}^{(n)}) + \hat{b} = \left(\sum_{i=1}^{m}{\hat{\alpha}}^{(i)}t^{(i)}\phi(\mathbf{x}^{(i)})\right)^T \phi(\mathbf{x}^{(n)}) + \hat{b}\\
#  & = \, \sum_{i=1}^{m}{\hat{\alpha}}^{(i)}t^{(i)}\left(\phi(\mathbf{x}^{(i)})^T \phi(\mathbf{x}^{(n)})\right)  + \hat{b}\\
#  & = \sum\limits_{\scriptstyle i=1 \atop {\scriptstyle {\hat{\alpha}}^{(i)} > 0}}^{m}{\hat{\alpha}}^{(i)}t^{(i)} K(\mathbf{x}^{(i)}, \mathbf{x}^{(n)}) + \hat{b}
# \end{split}
# $
# 
# 
# **식 5-12: 커널 트릭을 사용한 편향 계산**
# 
# $
# \begin{split}
# \hat{b} & = \dfrac{1}{n_s}\sum\limits_{\scriptstyle i=1 \atop {\scriptstyle {\hat{\alpha}}^{(i)} > 0}}^{m}{\left(t^{(i)} - {\hat{\mathbf{w}}}^T \phi(\mathbf{x}^{(i)})\right)} = \dfrac{1}{n_s}\sum\limits_{\scriptstyle i=1 \atop {\scriptstyle {\hat{\alpha}}^{(i)} > 0}}^{m}{\left(t^{(i)} - {
#  \left(\sum_{j=1}^{m}{\hat{\alpha}}^{(j)}t^{(j)}\phi(\mathbf{x}^{(j)})\right)
#  }^T \phi(\mathbf{x}^{(i)})\right)}\\
#  & = \dfrac{1}{n_s}\sum\limits_{\scriptstyle i=1 \atop {\scriptstyle {\hat{\alpha}}^{(i)} > 0}}^{m}{\left(t^{(i)} - 
# \sum\limits_{\scriptstyle j=1 \atop {\scriptstyle {\hat{\alpha}}^{(j)} > 0}}^{m}{
#   {\hat{\alpha}}^{(j)} t^{(j)} K(\mathbf{x}^{(i)},\mathbf{x}^{(j)})
# }
# \right)}
# \end{split}
# $
# 
# 
# **식 5-13: 선형 SVM 분류기 비용 함수**
# 
# $
# J(\mathbf{w}, b) = \dfrac{1}{2} \mathbf{w}^T \mathbf{w} \, + \, C {\displaystyle \sum\limits_{i=1}^{m}max\left(0, t^{(i)} - (\mathbf{w}^T \mathbf{x}^{(i)} + b) \right)}
# $
# 
# 
# 

# # 6장
# **식 6-1: 지니 불순도**
# 
# $
# G_i = 1 - \sum\limits_{k=1}^{n}{{p_{i,k}}^2}
# $
# 
# 
# **식 6-2: 분류에 대한 CART 비용 함수**
# 
# $
# \begin{split}
# &J(k, t_k) = \dfrac{m_{\text{left}}}{m}G_\text{left} + \dfrac{m_{\text{right}}}{m}G_{\text{right}}\\
# &\text{여기서 }\begin{cases}
# G_\text{left/right} \text{ 는 왼쪽/오른쪽 서브셋의 불순도}\\
# m_\text{left/right} \text{ 는 왼쪽/오른쪽 서브셋의 불순도}
# \end{cases}
# \end{split}
# $
# 
# **엔트로피 계산 예 (232 페이지):**
# 
# $ -\frac{49}{54}\log_2(\frac{49}{54}) - \frac{5}{54}\log_2(\frac{5}{54}) $
# 
# 
# **식 6-3: 엔트로피**
# 
# $
# H_i = -\sum\limits_{k=1 \atop p_{i,k} \ne 0}^{n}{{p_{i,k}}\log_2(p_{i,k})}
# $
# 
# 
# **식 6-4: 회귀를 위한 CART 비용 함수**
# 
# $
# J(k, t_k) = \dfrac{m_{\text{left}}}{m}\text{MSE}_\text{left} + \dfrac{m_{\text{right}}}{m}\text{MSE}_{\text{right}} \quad
# \text{여기서 }
# \begin{cases}
# \text{MSE}_{\text{node}} = \sum\limits_{\scriptstyle i \in \text{node}}(\hat{y}_{\text{node}} - y^{(i)})^2\\
# \hat{y}_\text{node} = \dfrac{1}{m_{\text{node}}}\sum\limits_{\scriptstyle i \in \text{node}}y^{(i)}
# \end{cases}
# $
# 

# # 7장
# 
# **식 7-1: j번째 예측기의 가중치가 적용된 에러율**
# 
# $
# r_j = \dfrac{\displaystyle \sum\limits_{\textstyle {i=1 \atop \hat{y}_j^{(i)} \ne y^{(i)}}}^{m}{w^{(i)}}}{\displaystyle \sum\limits_{i=1}^{m}{w^{(i)}}} \quad
# \text{where }\hat{y}_j^{(i)}\text{ is the }j^{\text{th}}\text{ predictor's prediction for the }i^{\text{th}}\text{ instance.}
# $
# 
# **식 7-2: 예측기 가중치**
# 
# $
# \begin{split}
# \alpha_j = \eta \log{\dfrac{1 - r_j}{r_j}}
# \end{split}
# $
# 
# 
# **식 7-3: 가중치 업데이트 규칙**
# 
# $
# \begin{split}
# & w^{(i)} \leftarrow
# \begin{cases}
# w^{(i)} & \hat{y_j}^{(i)} = y^{(i)} \text{ 일 때}\\
# w^{(i)} \exp(\alpha_j) & \hat{y_j}^{(i)} \ne y^{(i)} \text{ 일 때}
# \end{cases} \\
# & \text{여기서 } i = 1, 2, \dots, m \\
# \end{split}
# $
# 
# **256 페이지 본문 중에서:**
# 
# 그런 다음 모든 샘플의 가중치를 정규화합니다(즉, $ \sum_{i=1}^{m}{w^{(i)}} $으로 나눕니다).
# 
# 
# **식 7-4: AdaBoost 예측**
# 
# $
# \hat{y}(\mathbf{x}) = \underset{k}{\operatorname{argmax}}{\sum\limits_{\scriptstyle j=1 \atop \scriptstyle \hat{y}_j(\mathbf{x}) = k}^{N}{\alpha_j}} \quad \text{여기서 }N\text{은 예측기 수}
# $
# 
# 
# 

# # 8장
# 
# **식 8-1: 주성분 행렬**
# 
# $
# \mathbf{V} =
# \begin{pmatrix}
#   \mid & \mid & & \mid \\
#   \mathbf{c_1} & \mathbf{c_2} & \cdots & \mathbf{c_n} \\
#   \mid & \mid & & \mid
# \end{pmatrix}
# $
# 
# 
# **식 8-2: 훈련 세트를 _d_차원으로 투영하기**
# 
# $
# \mathbf{X}_{d\text{-proj}} = \mathbf{X} \mathbf{W}_d
# $
# 
# 
# **식 8-3: 원본의 차원 수로 되돌리는 PCA 역변환**
# 
# $
# \mathbf{X}_{\text{recovered}} = \mathbf{X}_{d\text{-proj}} {\mathbf{W}_d}^T
# $
# 
# 
# **식 8-4: LLE 단계 1: 선형적인 지역 관계 모델링**
# 
# $
# \begin{split}
# & \hat{\mathbf{W}} = \underset{\mathbf{W}}{\operatorname{argmin}}{\displaystyle \sum\limits_{i=1}^{m}} \left\|\mathbf{x}^{(i)} - \sum\limits_{j=1}^{m}{w_{i,j}}\mathbf{x}^{(j)}\right\|^2\\
# & \text{[조건] }
# \begin{cases}
#   w_{i,j}=0 & \mathbf{x}^{(j)} \text{가 } \mathbf{x}^{(i)} \text{의 최근접 이웃 개 중 하나가 아닐때}\\
#   \sum\limits_{j=1}^{m}w_{i,j} = 1 & i=1, 2, \dots, m \text{ 일 때}
# \end{cases}
# \end{split}
# $
# 
# **290 페이지 본문 중에서**
# 
# [...] $\mathbf{z}^{(i)}$와 $ \sum_{j=1}^{m}{\hat{w}_{i,j}\mathbf{z}^{(j)}} $ 사이의 거리가 최소화되어야 합니다.
# 
# 
# **식 8-5: LLE 단계 2: 관계를 보존하는 차원 축소**
# 
# $
# \hat{\mathbf{Z}} = \underset{\mathbf{Z}}{\operatorname{argmin}}{\displaystyle \sum\limits_{i=1}^{m}} \left\|\mathbf{z}^{(i)} - \sum\limits_{j=1}^{m}{\hat{w}_{i,j}}\mathbf{z}^{(j)}\right\|^2
# $
# 

# # 9장
# 
# **식 9-1: ReLU 함수**
# 
# $
# h_{\mathbf{w}, b}(\mathbf{X}) = \max(\mathbf{X} \mathbf{w} + b, 0)
# $

# # 10장
# 
# **식 10-1: 퍼셉트론에서 일반적으로 사용하는 계단 함수**
# 
# $
# \begin{split}
# \operatorname{heaviside}(z) =
# \begin{cases}
# 0 & z < 0 \text{ 일 때}\\
# 1 & z \ge 0 \text{ 일 때}
# \end{cases} & \quad\quad
# \operatorname{sgn}(z) =
# \begin{cases}
# -1 & z < 0 \text{ 일 때}\\
# 0 & z = 0 \text{ 일 때}\\
# +1 & z > 0 \text{ 일 때}
# \end{cases}
# \end{split}
# $
# 
# 
# **식 10-2: 퍼셉트론 학습 규칙(가중치 업데이트)**
# 
# $
# {w_{i,j}}^{(\text{다음 스텝})}\quad = w_{i,j} + \eta (y_j - \hat{y}_j) x_i
# $
# 
# 
# **342 페이지 본문 중에서**
# 
# 이 행렬은 표준편차가 $ 2 / \sqrt{\text{n}_\text{inputs} + \text{n}_\text{n_neurons}} $인 절단 정규(가우시안) 분포를 사용해 무작위로 초기화됩니다.
# 

# # 11장
# **식 11-1: 세이비어 초기화 (로지스틱 활성화 함수를 사용했을 때)**
# 
# $
# \begin{split}
# & \text{평균이 0이고 표준 편차 }
# \sigma = \sqrt{\dfrac{2}{n_\text{inputs} + n_\text{outputs}}} \text{ 인 정규분포}\\
# & \text{또는 }
# r = \sqrt{\dfrac{6}{n_\text{inputs} + n_\text{outputs}}} \text{ 일 때 } -r \text{ 과 } +r \text{ 사이의 균등분포}
# \end{split}
# $
# 
# **356 페이지 본문 중에서**
# 
# 입력의 연결 개수가 대략 출력의 연결 개수와 비슷하면 더 간단한 공식을 사용합니다(예를 들면, $ \sigma = 1 / \sqrt{n_\text{inputs}} $ or $ r = \sqrt{3} / \sqrt{n_\text{inputs}} $).
# 
# **표 11-1: 활성화 함수 종류에 따른 초기화 매개변수**
# 
# * 로지스틱 균등분포: $ r = \sqrt{\dfrac{6}{n_\text{inputs} + n_\text{outputs}}} $
# * 로지스틱 정규분포: $ \sigma = \sqrt{\dfrac{2}{n_\text{inputs} + n_\text{outputs}}} $
# * 하이퍼볼릭 탄젠트 균등분포: $ r = 4 \sqrt{\dfrac{6}{n_\text{inputs} + n_\text{outputs}}} $
# * 하이퍼볼릭 탄젠트 정규분포: $ \sigma = 4 \sqrt{\dfrac{2}{n_\text{inputs} + n_\text{outputs}}} $
# * ReLU와 그 변종들 균등분포: $ r = \sqrt{2} \sqrt{\dfrac{6}{n_\text{inputs} + n_\text{outputs}}} $
# * ReLU와 그 변종들 정규분포: $ \sigma = \sqrt{2} \sqrt{\dfrac{2}{n_\text{inputs} + n_\text{outputs}}} $
# 
# **식 11-2: ELU 활성화 함수**
# 
# $
# \operatorname{ELU}_\alpha(z) =
# \begin{cases}
# \alpha(\exp(z) - 1) & z < 0 \text{ 일 때}\\
# z & z \ge 0 \text{ 일 때}
# \end{cases}
# $
# 
# 
# **Equation 11-3: 배치 정규화 알고리즘**
# 
# $
# \begin{split}
# 1.\quad & \mathbf{\mu}_B = \dfrac{1}{m_B}\sum\limits_{i=1}^{m_B}{\mathbf{x}^{(i)}}\\
# 2.\quad & {\mathbf{\sigma}_B}^2 = \dfrac{1}{m_B}\sum\limits_{i=1}^{m_B}{(\mathbf{x}^{(i)} - \mathbf{\mu}_B)^2}\\
# 3.\quad & \hat{\mathbf{x}}^{(i)} = \dfrac{\mathbf{x}^{(i)} - \mathbf{\mu}_B}{\sqrt{{\mathbf{\sigma}_B}^2 + \epsilon}}\\
# 4.\quad & \mathbf{z}^{(i)} = \gamma \hat{\mathbf{x}}^{(i)} + \beta
# \end{split}
# $
# 
# **364 페이지 본문 중에서**
# 
# [...] 새로운 값 $v$가 주어지면 이동 평균 $\hat{v}$ 은 다음 식을 통해 갱신됩니다:
# 
# $ \hat{v} \gets \hat{v} \times \text{momentum} + v \times (1 - \text{momentum}) $
# 
# **식 11-4: 모멘텀 알고리즘**
# 
# 1. $\mathbf{m} \gets \beta \mathbf{m} - \eta \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta})$
# 2. $\boldsymbol{\theta} \gets \boldsymbol{\theta} + \mathbf{m}$
# 
# **377 페이지에서**
# 
# 그래디언트가 일정하다면 종단속도(즉, 가중치를 갱신하는 최대 크기)는 학습률 $ \eta $ 를 곱한 그래디언트에 $ \frac{1}{1 - \beta} $을 곱한 것과 같음을 쉽게 확인할 수 있습니다.
# 
# **식 11-5: 네스테로프 가속 경사 알고리즘**
# 
# 1. $\mathbf{m} \gets \beta \mathbf{m} - \eta \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta} + \beta \mathbf{m})$
# 2. $\boldsymbol{\theta} \gets \boldsymbol{\theta} + \mathbf{m}$
# 
# **식 11-6: AdaGrad 알고리즘**
# 
# 1. $\mathbf{s} \gets \mathbf{s} + \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta}) \otimes \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta})$
# 2. $\boldsymbol{\theta} \gets \boldsymbol{\theta} - \eta \, \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta}) \oslash {\sqrt{\mathbf{s} + \epsilon}}$
# 
# **381 페이지 본문 중에서**
# 
# 이 벡터 형식의 계산은 벡터 $\mathbf{s}$의 각 원소 $s_i$마다 $s_i \gets s_i + \left( \dfrac{\partial J(\boldsymbol{\theta})}{\partial \theta_i} \right)^2$ 을 계산하는 것과 동일합니다.
# 
# **381 페이지 본문 중에서**
# 
# 이 벡터 형식의 계산은 모든 파라미터 $\theta_i$에 대해 (동시에) $ \theta_i \gets \theta_i - \eta \, \dfrac{\partial J(\boldsymbol{\theta})}{\partial \theta_i} \dfrac{1}{\sqrt{s_i + \epsilon}} $ 을 계산하는 것과 동일합니다.
# 
# **식 11-7: RMSProp 알고리즘**
# 
# 1. $\mathbf{s} \gets \beta \mathbf{s} + (1 - \beta ) \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta}) \otimes \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta})$
# 2. $\boldsymbol{\theta} \gets \boldsymbol{\theta} - \eta \, \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta}) \oslash {\sqrt{\mathbf{s} + \epsilon}}$
# 
# 
# **식 11-8: Adam 알고리즘**
# 
# 1. $\mathbf{m} \gets \beta_1 \mathbf{m} - (1 - \beta_1) \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta})$
# 2. $\mathbf{s} \gets \beta_2 \mathbf{s} + (1 - \beta_2) \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta}) \otimes \nabla_\boldsymbol{\theta}J(\boldsymbol{\theta})$
# 3. $\mathbf{m} \gets \left(\dfrac{\mathbf{m}}{1 - {\beta_1}^T}\right)$
# 4. $\mathbf{s} \gets \left(\dfrac{\mathbf{s}}{1 - {\beta_2}^T}\right)$
# 5. $\boldsymbol{\theta} \gets \boldsymbol{\theta} + \eta \, \mathbf{m} \oslash {\sqrt{\mathbf{s} + \epsilon}}$
# 
# **393 페이지 본문 중에서**
# 
# 일반적으로 매 훈련 스텝이 끝나고 $\left\| \mathbf{w} \right\|_2$ 를 계산한 다음 $\mathbf{w}$를 클리핑 $ \left( \mathbf{w} \gets \mathbf{w} \dfrac{r}{\left\| \mathbf{w} \right\|_2} \right)$ 합니다.

# # 13장
# 
# **식 13-1: 합성곱층에 있는 뉴런의 출력 계산**
# 
# $
# z_{i,j,k} = b_k + \sum\limits_{u = 0}^{f_h - 1} \, \, \sum\limits_{v = 0}^{f_w - 1} \, \, \sum\limits_{k' = 0}^{f_{n'} - 1} \, \, x_{i', j', k'} \times w_{u, v, k', k}
# \quad \text{여기서 }
# \begin{cases}
# i' = i \times s_h + u \\
# j' = j \times s_w + v
# \end{cases}
# $
# 
# **식 13-2: LRN**
# 
# $
# b_i = a_i  \left(k + \alpha \sum\limits_{j=j_\text{low}}^{j_\text{high}}{{a_j}^2} \right)^{-\beta} \quad \text{여기서 }
# \begin{cases}
#   j_\text{high} = \min\left(i + \dfrac{r}{2}, f_n-1\right) \\
#   j_\text{low} = \max\left(0, i - \dfrac{r}{2}\right)
# \end{cases}
# $

# # 14장
# 
# **식 14-1: 하나의 샘플에 대한 순환 층의 출력**
# 
# $
# \mathbf{y}_{(t)} = \phi\left({\mathbf{W}_x}^T{\mathbf{x}_{(t)}} + {{\mathbf{W}_y}^T\mathbf{y}_{(t-1)}} + \mathbf{b} \right)
# $
# 
# 
# **식 14-2: 미니배치에 있는 전체 샘플에 대한 순환 뉴런 층의 출력**
# 
# $
# \begin{split}
# \mathbf{Y}_{(t)} & = \phi\left(\mathbf{X}_{(t)}  \mathbf{W}_{x} + \mathbf{Y}_{(t-1)} \mathbf{W}_{y} + \mathbf{b} \right) \\
# & = \phi\left(
# \left[\mathbf{X}_{(t)} \quad \mathbf{Y}_{(t-1)} \right]
#  \mathbf{W} + \mathbf{b} \right) \quad \text{ 여기서 } \mathbf{W}=
# \left[ \begin{matrix}
#   \mathbf{W}_x\\
#   \mathbf{W}_y
# \end{matrix} \right]
# \end{split}
# $
# 
# **494 페이지 본문 중에서**
# 
# 그러면 비용 함수 $ C(\mathbf{Y}_{(t_\text{min})}, \mathbf{Y}_{(t_\text{min}+1)}, \dots, \mathbf{Y}_{(t_\text{max})}) $ 를 사용하여 출력 시퀀스가 평가됩니다($t_\text{min}$과 $t_\text{max}$는 첫 번째와 마지막 출력 타임 스텝이며 무시된 출력은 카운팅하지 않습니다).
# 
# **식 14-3: LSTM 계산**
# 
# $
# \begin{split}
# \mathbf{i}_{(t)}&=\sigma({\mathbf{W}_{xi}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{hi}}^T \mathbf{h}_{(t-1)} + \mathbf{b}_i)\\
# \mathbf{f}_{(t)}&=\sigma({\mathbf{W}_{xf}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{hf}}^T \mathbf{h}_{(t-1)} + \mathbf{b}_f)\\
# \mathbf{o}_{(t)}&=\sigma({\mathbf{W}_{xo}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{ho}}^T \mathbf{h}_{(t-1)} + \mathbf{b}_o)\\
# \mathbf{g}_{(t)}&=\operatorname{tanh}({\mathbf{W}_{xg}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{hg}}^T \mathbf{h}_{(t-1)} + \mathbf{b}_g)\\
# \mathbf{c}_{(t)}&=\mathbf{f}_{(t)} \otimes \mathbf{c}_{(t-1)} \, + \, \mathbf{i}_{(t)} \otimes \mathbf{g}_{(t)}\\
# \mathbf{y}_{(t)}&=\mathbf{h}_{(t)} = \mathbf{o}_{(t)} \otimes \operatorname{tanh}(\mathbf{c}_{(t)})
# \end{split}
# $
# 
# 
# **식 14-4: GRU 계산**
# 
# $
# \begin{split}
# \mathbf{z}_{(t)}&=\sigma({\mathbf{W}_{xz}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{hz}}^T \mathbf{h}_{(t-1)}) \\
# \mathbf{r}_{(t)}&=\sigma({\mathbf{W}_{xr}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{hr}}^T \mathbf{h}_{(t-1)}) \\
# \mathbf{g}_{(t)}&=\operatorname{tanh}\left({\mathbf{W}_{xg}}^T \mathbf{x}_{(t)} + {\mathbf{W}_{hg}}^T (\mathbf{r}_{(t)} \otimes \mathbf{h}_{(t-1)})\right) \\
# \mathbf{h}_{(t)}&=(1-\mathbf{z}_{(t)}) \otimes \mathbf{h}_{(t-1)} + \mathbf{z}_{(t)} \otimes \mathbf{g}_{(t)}
# \end{split}
# $

# # 15장
# 
# **식 15-1: 쿨백 라이블러 발산**
# 
# $
# D_{\mathrm{KL}}(P\|Q) = \sum\limits_{i} P(i) \log \dfrac{P(i)}{Q(i)}
# $
# 
# 
# **식 15-2: 목표 희소 정도 $p$ 와 실제 희소 정도 $q$ 사이의 KL 발산**
# 
# $
# D_{\mathrm{KL}}(p\|q) = p \, \log \dfrac{p}{q} + (1-p) \log \dfrac{1-p}{1-q}
# $
# 
# **544 페이지 본문 중에서**
# 
# 자주 사용하는 변형은 $\sigma$ 가 아니라 $\gamma = \log\left(\sigma^2\right)$ 을 출력하도록 인코더를 훈련시키는 것입니다. $\sigma$ 는 $ \sigma = \exp\left(\dfrac{\gamma}{2}\right) $ 로 쉽게 계산할 수 있습니다.

# # 16장
# 
# **식 16-1: 벨만 최적 방정식**
# 
# $
# V^*(s) = \underset{a}{\max}\sum\limits_{s'}{T(s, a, s') [R(s, a, s') + \gamma . V^*(s')]} \quad \text{for all }s
# $
# 
# **식 16-2: 가치 반복 알고리즘**
# 
# $
#   V_{k+1}(s) \gets \underset{a}{\max}\sum\limits_{s'}{T(s, a, s') [R(s, a, s') + \gamma . V_k(s')]} \quad \text{for all }s
# $
# 
# **식 16-3: Q-가치 반복 알고리즘**
# 
# $
#   Q_{k+1}(s, a) \gets \sum\limits_{s'}{T(s, a, s') [R(s, a, s') + \gamma . \underset{a'}{\max}\,{Q_k(s',a')}]} \quad \text{for all } (s,a)
# $
# 
# **574 페이지 본문 중에서**
# 
# 최적의 Q-가치를 구하면 최적의 정책인 $\pi^{*}(s)$ 를 정의하는 것은 간단합니다. 즉, 에이전트가 상태 $s$ 에 도달했을 때 가장 높은 Q-값을 가진 행동을 선택하면 됩니다.
# 
# $ \pi^{*}(s) = \underset{a}{\operatorname{argmax}} \, Q^*(s, a) $
# 
# **식 16-4: TD 학습 알고리즘**
# 
# $
# V_{k+1}(s) \gets (1-\alpha)V_k(s) + \alpha\left(r + \gamma . V_k(s')\right)
# $
# 
# **식 16-5: Q-러닝 알고리즘**
# 
# $
# Q_{k+1}(s, a) \gets (1-\alpha)Q_k(s,a) + \alpha\left(r + \gamma . \underset{a'}{\max} \, Q_k(s', a')\right)
# $
# 
# **식 16-6: 탐험 함수를 사용한 Q-러닝**
# 
# $
# Q(s, a) \gets (1-\alpha)Q(s,a) + \alpha\left(r + \gamma \, \underset{a'}{\max}f(Q(s', a'), N(s', a'))\right)
# $
# 
# **식 16-7: 타깃 Q-가치**
# 
# $
# \begin{split}
# y(s, a) = r + \gamma\,\max_{a'}\,Q_\boldsymbol\theta(s', a')
# \end{split}
# $

# # 부록 A
# 
# 본문 중에서:
# 
# $
# \mathbf{H} =
# \begin{pmatrix}
# \mathbf{H'} & 0 & \cdots\\
# 0 & 0 & \\
# \vdots & & \ddots
# \end{pmatrix}
# $
# 
# 
# $
# \mathbf{A} =
# \begin{pmatrix}
# \mathbf{A'} & \mathbf{I}_m \\
# \mathbf{0} & -\mathbf{I}_m
# \end{pmatrix}
# $
# 
# 
# $ 1 - \frac{1}{5}^2 - \frac{4}{5}^2 $
# 
# 
# $ 1 - \frac{1}{2}^2 - \frac{1}{2}^2  $
# 
# 
# $ \frac{2}{5} \times $
# 
# 
# $ \frac{3}{5} \times 0 $

# # 부록 C

# 본문 중에서:
# 
# $ (\hat{x}, \hat{y}) $
# 
# 
# $ \hat{\alpha} $
# 
# 
# $ (\hat{x}, \hat{y}, \hat{\alpha}) $
# 
# 
# $
# \begin{cases}
# \frac{\partial}{\partial x}g(x, y, \alpha) = 2x - 3\alpha\\
# \frac{\partial}{\partial y}g(x, y, \alpha) = 2 - 2\alpha\\
# \frac{\partial}{\partial \alpha}g(x, y, \alpha) = -3x - 2y - 1\\
# \end{cases}
# $
# 
# 
# $ 2\hat{x} - 3\hat{\alpha} = 2 - 2\hat{\alpha} = -3\hat{x} - 2\hat{y} - 1 = 0 $
# 
# 
# $ \hat{x} = \frac{3}{2} $
# 
# 
# $ \hat{y} = -\frac{11}{4} $
# 
# 
# $ \hat{\alpha} = 1 $
# 
# 
# **식 C-1: 하드 마진 문제를 위한 일반화된 라그랑주 함수**
# 
# $
# \begin{split}
# \mathcal{L}(\mathbf{w}, b, \mathbf{\alpha}) = \frac{1}{2}\mathbf{w}^T \mathbf{w} - \sum\limits_{i=1}^{m}{\alpha^{(i)} \left(t^{(i)}(\mathbf{w}^T \mathbf{x}^{(i)} + b) - 1\right)} \\
# \text{여기서 } \alpha^{(i)} \ge 0 \quad i = 1, 2, \dots, m \text{ 에 대해}
# \end{split}
# $
# 
# **본문 중에서:**
# 
# $ (\hat{\mathbf{w}}, \hat{b}, \hat{\mathbf{\alpha}}) $
# 
# 
# $ t^{(i)}((\hat{\mathbf{w}})^T \mathbf{x}^{(i)} + \hat{b}) \ge 1 \quad \text{for } i = 1, 2, \dots, m $
# 
# 
# $ {\hat{\alpha}}^{(i)} \ge 0 \quad \text{for } i = 1, 2, \dots, m $
# 
# 
# $ {\hat{\alpha}}^{(i)} = 0 $
# 
# 
# $ t^{(i)}((\hat{\mathbf{w}})^T \mathbf{x}^{(i)} + \hat{b}) = 1 $
# 
# 
# $ {\hat{\alpha}}^{(i)} = 0 $
# 
# 
# **식 C-2: 일반화된 라그랑주 함수의 편도함수**
# 
# $
# \begin{split}
# \nabla_{\mathbf{w}}\mathcal{L}(\mathbf{w}, b, \mathbf{\alpha}) = \mathbf{w} - \sum\limits_{i=1}^{m}\alpha^{(i)}t^{(i)}\mathbf{x}^{(i)}\\
# \dfrac{\partial}{\partial b}\mathcal{L}(\mathbf{w}, b, \mathbf{\alpha}) = -\sum\limits_{i=1}^{m}\alpha^{(i)}t^{(i)}
# \end{split}
# $
# 
# 
# **식 C-3: 정류점의 속성**
# 
# $
# \begin{split}
# \hat{\mathbf{w}} = \sum_{i=1}^{m}{\hat{\alpha}}^{(i)}t^{(i)}\mathbf{x}^{(i)}\\
# \sum_{i=1}^{m}{\hat{\alpha}}^{(i)}t^{(i)} = 0
# \end{split}
# $
# 
# 
# **식 C-4: SVM 문제의 쌍대 형식**
# 
# $
# \begin{split}
# \mathcal{L}(\hat{\mathbf{w}}, \hat{b}, \mathbf{\alpha}) = \dfrac{1}{2}\sum\limits_{i=1}^{m}{
#   \sum\limits_{j=1}^{m}{
#   \alpha^{(i)} \alpha^{(j)} t^{(i)} t^{(j)} {\mathbf{x}^{(i)}}^T \mathbf{x}^{(j)}
#   }
# } \quad - \quad \sum\limits_{i=1}^{m}{\alpha^{(i)}}\\
# \text{여기서 } \alpha^{(i)} \ge 0 \quad i = 1, 2, \dots, m \text{ 일 때}
# \end{split}
# $
# 
# **본문 중에서:**
# 
# $ \hat{\mathbf{\alpha}} $
# 
# 
# $ {\hat{\alpha}}^{(i)} \ge 0 $
# 
# 
# $ \hat{\mathbf{\alpha}} $
# 
# 
# $ \hat{\mathbf{w}} $
# 
# 
# $ \hat{b} $
# 
# 
# $ \hat{b} = t^{(k)} - ({\hat{\mathbf{w}}}^T \mathbf{x}^{(k)}) $
# 
# 
# **식 C-5: 쌍대 형식을 사용한 편향 추정**
# 
# $
# \hat{b} = \dfrac{1}{n_s}\sum\limits_{\scriptstyle i=1 \atop {\scriptstyle {\hat{\alpha}}^{(i)} > 0}}^{m}{\left[t^{(i)} - {\hat{\mathbf{w}}}^T \mathbf{x}^{(i)}\right]}
# $

# # 부록 D

# **식 D-1: $f(x,y)의 편도함수$**
# 
# $
# \begin{split}
# \dfrac{\partial f}{\partial x} & = \dfrac{\partial(x^2y)}{\partial x} + \dfrac{\partial y}{\partial x} + \dfrac{\partial 2}{\partial x} = y \dfrac{\partial(x^2)}{\partial x} + 0 + 0 = 2xy \\
# \dfrac{\partial f}{\partial y} & = \dfrac{\partial(x^2y)}{\partial y} + \dfrac{\partial y}{\partial y} + \dfrac{\partial 2}{\partial y} = x^2 + 1 + 0 = x^2 + 1 \\
# \end{split}
# $
# 
# **본문 중에서:**
# 
# $ \frac{\partial g}{\partial x} = 0 + (0 \times x + y \times 1) = y $
# 
# 
# $ \frac{\partial x}{\partial x} = 1 $
# 
# 
# $ \frac{\partial y}{\partial x} = 0 $
# 
# 
# $ \frac{\partial (u \times v)}{\partial x} = \frac{\partial v}{\partial x} \times u + \frac{\partial u}{\partial x} \times u  $
# 
# 
# $ \frac{\partial g}{\partial x} = 0 + (0 \times x + y \times 1)  $
# 
# 
# $ \frac{\partial g}{\partial x} = y $
# 
# 
# **식 D-2: 포인트 $x_0$에서 함수 $h(x)$의 도함수**
# 
# $
# \begin{split}
# h'(x) & = \underset{\textstyle x \to x_0}{\lim}\dfrac{h(x) - h(x_0)}{x - x_0}\\
#       & = \underset{\textstyle \epsilon \to 0}{\lim}\dfrac{h(x_0 + \epsilon) - h(x_0)}{\epsilon}
# \end{split}
# $
# 
# 
# **식 D-3: 이원수의 연산**
# 
# $
# \begin{split}
# &\lambda(a + b\epsilon) = \lambda a + \lambda b \epsilon\\
# &(a + b\epsilon) + (c + d\epsilon) = (a + c) + (b + d)\epsilon \\
# &(a + b\epsilon) \times (c + d\epsilon) = ac + (ad + bc)\epsilon + (bd)\epsilon^2 = ac + (ad + bc)\epsilon\\
# \end{split}
# $
# 
# **본문 중에서:**
# 
# $ \frac{\partial f}{\partial x}(3, 4) $
# 
# 
# $ \frac{\partial f}{\partial y}(3, 4) $
# 
# 
# **식 D-4: 연쇄 규칙**
# 
# $
# \dfrac{\partial f}{\partial x} = \dfrac{\partial f}{\partial n_i} \times \dfrac{\partial n_i}{\partial x}
# $
# 
# **본문 중에서:**
# 
# $ \frac{\partial f}{\partial n_7} = 1 $
# 
# 
# $ \frac{\partial f}{\partial n_5} = \frac{\partial f}{\partial n_7} \times \frac{\partial n_7}{\partial n_5} $
# 
# 
# $ \frac{\partial f}{\partial n_7} = 1 $
# 
# 
# $ \frac{\partial n_7}{\partial n_5} $
# 
# 
# $ \frac{\partial n_7}{\partial n_5} = 1 $
# 
# 
# $ \frac{\partial f}{\partial n_5} = 1 \times 1 = 1 $
# 
# 
# $ \frac{\partial f}{\partial n_4} = \frac{\partial f}{\partial n_5} \times \frac{\partial n_5}{\partial n_4} $
# 
# 
# $ \frac{\partial n_5}{\partial n_4} = n_2 $
# 
# 
# $ \frac{\partial f}{\partial n_4} = 1 \times n_2 = 4 $
# 
# 
# $ \frac{\partial f}{\partial x} = 24 $
# 
# 
# $ \frac{\partial f}{\partial y} = 10 $

# # 부록 E

# **식 E-1: $i$번째 뉴런이 1을 출력할 확률**
# 
# $
# p\left(s_i^{(\text{다음 스텝})}\quad = 1\right) \, = \, \sigma\left(\frac{\textstyle \sum\limits_{j = 1}^N{w_{i,j}s_j + b_i}}{\textstyle T}\right)
# $
# 
# **본문 중에서:**
# 
# $ \mathbf{x}' $
# 
# 
# $ \mathbf{h}' $
# 
# 
# **식 E-2: CD 가중치 업데이트**
# 
# $
# w_{i,j} \gets w_{i,j} + \eta(\mathbf{x} \cdot \mathbf{h}^T - \mathbf{x}' \cdot \mathbf{h}'^T)
# $

# # 용어
# 
# 본문에서:
# 
# $\ell _1$
# 
# 
# $\ell _2$
# 
# 
# $\ell _k$
# 
# 
# $ \chi^2 $
# 

# 이 공식들 때문에 눈이 아프다면 가장 아름다운 하나의 공식으로 마치도록 하죠. $E = mc²$가 아닙니다. 바로 오일러의 등식(Euler's identity)이죠:

# $e^{i\pi}+1=0$