Distributions¶

Discrete Distribution
Continuous Distribution

1. Discrete Distribution¶

1.1. Bernoulli Distribution¶

앞면의 눈이 나올 확률($p$)이 0.6인 동전 던지기 실험을 python 코드로 생성¶

In [1]:

from scipy import stats

p = 0.6
bernoulli_dist = stats.bernoulli(p)

In [2]:

p_tail = bernoulli_dist.pmf(0)
p_head = bernoulli_dist.pmf(1)

In [3]:

print('Prob of tail:', p_tail)
print('Prob of head:', p_head)

Prob of tail: 0.4
Prob of head: 0.6

In [4]:

trials = bernoulli_dist.rvs(10) # 베르누이 분포를 따르는 10개의 샘플
trials

Out[4]:

array([1, 1, 0, 1, 1, 1, 1, 0, 1, 1])

1.2 Binimial Distribution¶

주사위 던지기에서 숫자 3이 나올 확률($p$)을 $\frac{1}{6}$이라고 할 때, $n$번 던지는 실험을 python 코드로 생성¶

In [5]:

n = 10  # 주사위를 10번 던진다.
p = 1/6  # 3의 눈이 나올 확률 1/6

binom_dist = stats.binom(n, p)

In [6]:

import numpy as np

# 3의 눈이 1번, 2번, 3번, ..., 10번 나올 확률
trials = binom_dist.pmf(np.arange(10))
trials

Out[6]:

array([1.61505583e-01, 3.23011166e-01, 2.90710049e-01, 1.55045360e-01,
       5.42658759e-02, 1.30238102e-02, 2.17063503e-03, 2.48072575e-04,
       1.86054431e-05, 8.26908584e-07])

In [7]:

# Float formatting 변경
[round(x, 5) for x in trials]

Out[7]:

[0.16151,
 0.32301,
 0.29071,
 0.15505,
 0.05427,
 0.01302,
 0.00217,
 0.00025,
 2e-05,
 0.0]

중심극한정리에 따라서 $n$이 커지면 이항분포가 정규분포의 형태와 비슷¶

In [8]:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(font_scale=1.5)

In [9]:

binom_dist1 = stats.binom(20, 0.5)  # n = 20
binom_dist2 = stats.binom(20, 0.7)  # n = 20
binor_dist3 = stats.binom(40, 0.5)  # n = 40

k = np.arange(40)
plt.plot(k, binom_dist1.pmf(k), 'o-b')
plt.plot(k, binom_dist2.pmf(k), 'd-r')
plt.plot(k, binor_dist3.pmf(k), 's-g')
plt.title('Binomial distribition')
plt.legend(['p=0.5 and n=20', 'p=0.7 and n=20', 'p=0.5 and n=40'])
plt.xlabel('X')
plt.ylabel('P(X)')

Out[9]:

Text(0,0.5,'P(X)')

1.3 Poisson Distribution¶

In [10]:

poisson_dist1 = stats.poisson(5)
poisson_dist2 = stats.poisson(10)
poisson_dist3 = stats.poisson(25)
n = np.arange(50)

plt.plot(n, poisson_dist1.pmf(n), 'o-b')
plt.plot(n, poisson_dist2.pmf(n), 'd-r')
plt.plot(n, poisson_dist3.pmf(n), 's-g')
plt.title('Poisson distribution - PMF')
plt.legend(['$\lambda$=5', '$\lambda$=30', '$\lambda$=50'])
plt.xlabel('X')
plt.ylabel('P(X)')

Out[10]:

Text(0,0.5,'P(X)')

2. Continuous Distribution¶

2.1. Normal Distribution¶

In [11]:

x = np.arange(-10, 10, 0.1)

mu = -2
sigma = 0.7
norm_dist1 = stats.norm(mu, sigma)  # 정규 분포
norm_dist2 = stats.norm()  #표준 정규 분포
y1 = norm_dist1.pdf(x)
y2 = norm_dist2.pdf(x)

In [12]:

plt.plot(x, y1)
plt.plot(x, y2)
plt.title('Normal Distribution - PDF')
plt.legend(['$\mu$=-2, $\sigma$=0.7', '$\mu$=0, $\sigma$=1'])
plt.xlabel('X')
plt.ylabel('pdf(X)')

Out[12]:

Text(0,0.5,'pdf(X)')

2.2 Exponential Distribution¶

In [13]:

x = np.arange(0, 3, 0.1)

exp_dist1 = stats.expon(scale=0.5)
exp_dist2 = stats.expon(scale=1)
exp_dist3 = stats.expon(scale=1.5)
y1 = exp_dist1.pdf(x)
y2 = exp_dist2.pdf(x)
y3 = exp_dist3.pdf(x)

In [14]:

plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(x, y3)
plt.title('Exponential Distribution - PDF')
plt.legend(['$\lambda$=0.5', '$\lambda$=1.0', '$\lambda$=1.5'])
plt.xlabel('X')
plt.ylabel('pdf(X)')

Out[14]:

Text(0,0.5,'pdf(X)')

2.3 Lognormal Distribution¶

Probability density function for lognormal : $$f(x,s)=\frac{1}{sx\sqrt{2\pi}}\exp(-\frac{\log^2(x)}{2s^2})$$ $s$는 shape parameter이다.

In [15]:

x = np.arange(0, 3, 0.1)

lognorm_dist1 = stats.lognorm(1)
lognorm_dist2 = stats.lognorm(2)
lognorm_dist3 = stats.lognorm(3)
y1 = lognorm_dist1.pdf(x)
y2 = lognorm_dist2.pdf(x)
y3 = lognorm_dist3.pdf(x)

In [16]:

plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(x, y3)
plt.title('Lognormal Distribution - PDF')
plt.legend(['$s$=1', '$s$=2', '$s$=3'])
plt.xlabel('X')
plt.ylabel('pdf(X)')

Out[16]:

Text(0,0.5,'pdf(X)')

2.4 Beta Distribution¶

In [17]:

import matplotlib as mpl

mpl.rcParams.update({'font.size': 22})

x = np.linspace(0, 1, 1000) # Beta 분포는 0 ~ 1 사이의 값만 가진다.

plt.figure(figsize=(10, 10))
plt.subplot(221)
plt.fill(x, stats.beta(1.0001, 1.0001).pdf(x))
plt.ylim(0, 6)
plt.title('a = 1, b = 1')

plt.subplot(222)
plt.fill(x, stats.beta(4, 2).pdf(x))
plt.ylim(0, 6)
plt.title('a = 4, b = 2, mode={}'.format((4-1)/(4+2-2)))

plt.subplot(223)
plt.fill(x, stats.beta(8, 4).pdf(x))
plt.ylim(0, 6)
plt.title('a = 8, b = 4, mode={}'.format((8-1)/(8+4-2)))

plt.subplot(224)
plt.fill(x, stats.beta(30, 12).pdf(x))
plt.ylim(0, 6)
plt.title('a = 30, b = 12, mode={}'.format((30-1)/(30+12-2)))

Out[17]:

Text(0.5,1,'a = 30, b = 12, mode=0.725')

1번은 모수 $p$를 추정할 수 없다.
2번은 모수 $p$가 0.75 인 가능성이 제일 높다.
3번은 모수 $p$가 0.70 인 가능성이 제일 높다.
4번은 모수 $p$가 0.725 인 가능성이 제일 높다.

2.5 Gamma Distribution¶

In [18]:

x = np.linspace(0, 10, 100) # Gamma 분포는 0 ~ 무한대 사이의 값을 가진다.

plt.figure(figsize=(10, 10))
plt.subplot(221)
plt.plot(x, stats.gamma(9).pdf(x))
plt.ylim(0, 0.4)
plt.title('a = 9, b = 1, mode={}'.format((9-1)/(1)))

plt.subplot(222)
plt.plot(x, stats.gamma(6).pdf(x))
plt.ylim(0, 0.4)
plt.title('a = 6, b = 1, mode={}'.format((6-1)/(1)))

plt.subplot(223)
plt.plot(x, stats.gamma(3).pdf(x))
plt.ylim(0, 0.4)
plt.title('a = 3, b = 1, mode={}'.format((3-1)/(1)))

plt.subplot(224)
plt.plot(x, stats.gamma(2).pdf(x))
plt.ylim(0, 0.4)
plt.title('a = 2, b = 1, mode={}'.format((2-1)/(1)))

Out[18]:

Text(0.5,1,'a = 2, b = 1, mode=1.0')

1번은 모수가 8인 가능성이 제일 크다. (정확도 낮음)
2번은 모수가 5인 가능성이 제일 크다. (정확도 낮음)
3번은 모수가 2인 가능성이 제일 크다. (정확도 높음)
4번은 모수가 1인 가능성이 제일 크다. (정확도 높음)