Notebook

Chapter 4: Parametric Hypothesis Testing¶

Table of Content:

Import Libraries
4.1. Introduction
4.2. Test Concerning the Mean of a Normal Population
- 4.2.1. Known Standard Deviation
- 4.2.2. Unknown Standard Deviation
4.3. Test Concerning the Equality of Means of Two Normal Populations
- 4.3.1. Known Variances
- 4.3.2. Unknown but Equal Variances
4.4. Paired t-test
4.5. Test Concerning the Variance of a Normal Population
4.6. Test Concerning the Equality of Variances of Two Normal Populations
4.7. Test Concerning P in Bernoulli Populations
4.8. Test Concerning the Equality of P in Two Bernoulli Populations

Import Libraries¶

In [1]:

!pip install --upgrade scipy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (1.4.1)
Collecting scipy
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
     |████████████████████████████████| 38.1 MB 1.2 MB/s 
Requirement already satisfied: numpy<1.23.0,>=1.16.5 in /usr/local/lib/python3.7/dist-packages (from scipy) (1.21.6)
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.4.1
    Uninstalling scipy-1.4.1:
      Successfully uninstalled scipy-1.4.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.
Successfully installed scipy-1.7.3

In [2]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
import math
from scipy import stats
from scipy.stats import norm
from scipy.stats import chi2
from scipy.stats import t
from scipy.stats import f
from scipy.stats import bernoulli
from scipy.stats import binom
from scipy.stats import nbinom
from scipy.stats import geom
from scipy.stats import poisson
from scipy.stats import uniform
from scipy.stats import randint
from scipy.stats import expon
from scipy.stats import gamma
from scipy.stats import beta
from scipy.stats import weibull_min
from scipy.stats import hypergeom
from scipy.stats import shapiro
from scipy.stats import pearsonr
from scipy.stats import normaltest
from scipy.stats import anderson
from scipy.stats import spearmanr
from scipy.stats import kendalltau
from scipy.stats import chi2_contingency
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
from scipy.stats import mannwhitneyu
from scipy.stats import wilcoxon
from scipy.stats import kruskal
from scipy.stats import friedmanchisquare
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import kpss
from statsmodels.stats.weightstats import ztest
from scipy.integrate import quad
from IPython.display import display, Latex

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=FutureWarning)

/usr/local/lib/python3.7/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm

4.1. Introduction:¶

Statistical Hypothesis: A statistical hypothesis is usually a statement about a set of parameters of a population distribution.

$H_0$ (Null Hypothesis): The null hypothesis is a statistical hypothesis to be tested and accepted or rejected in favor of an alternative.

$H_1$ (Alternative Hypothesis): An alternative hypothesis is an opposing theory in relation to the null hypothesis.

Type $\mathrm{I}$ Error: The type $\mathrm{I}$ error, is said to result if the test incorrectly calls for rejecting $H_0$ when it is indeed correct.

$\alpha = P(reject\ H_0\ |\ H_0\ is\ true)$

Type $\mathrm{II}$ Error: The type $\mathrm{II}$ error, results if the test calls for accepting $H_0$ when it is false.

$\beta = P(Accept\ H_0\ |\ H_0\ is\ not\ true)$

Significance Level: Whenever $H_0$ is true, its probability of being rejected is never greater than $\alpha$. The value $\alpha$, called the level of significance of the test, is usually set in advance, with commonly chosen values being $\alpha = 0.1, 0.05, 0.005$.

P_value: The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0) of a study question is true — the definition of ‘extreme’ depends on how the hypothesis is being tested.

If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis.

.	$H_0$ is actually true	$H_0$ is not true
Accept $H_0$	$1-\alpha$	$\beta$
Reject $H_0$	$\alpha$	$1-\beta$

Steps of a hypothesis testing:

Specify the null and alternative Hypothesis.
Collect a random sample from the population.
Calculate the Test Statistic and Corresponding P-Value
Decide whether to reject or fail to reject your null hypothesis

4.2. Test Concerning the Mean of a Normal Population:¶

4.2.1. Known Standard Deviation:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0$

$H_1:\ \mu \neq \mu_0$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\frac{\alpha}{2}}\ <\ Z_0 = \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}\ <\ Z_{\frac{\alpha}{2}}$
$\mu_0\ -\ Z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\ <\ \overline{X}\ <\ \mu_0\ +\ Z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$
$|\overline{X}-\mu_0| < Z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$
P_value $ = 2 \times P(Z \geq |Z_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0\ \quad or \quad H_0:\ \mu \leq \mu_0$

$H_1:\ \mu > \mu_0$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\infty <\ Z_0 = \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}\ <\ Z_{\alpha} $
$- \infty \ <\ \overline{X}\ <\ \mu_0\ +\ Z_{\alpha} \frac{\sigma}{\sqrt{n}}$
P_value $ = P(Z \geq Z_0) > \alpha$

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0 \quad or \quad H_0:\ \mu \geq \mu_0$

$H_1:\ \mu < \mu_0$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\alpha}\ <\ Z_0 = \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}\ <\ \infty $
$\ \mu_0\ -\ Z_{\alpha} \frac{\sigma}{\sqrt{n}}\ <\ \overline{X}\ <\ \infty$
P_value $ = P(Z \leq Z_0) > \alpha$

In [3]:

class mean_with_known_variance:
  """
  Parameters
  ----------
  null_mean : u0
  population_sd : known standrad deviation of the population
  n : optional, number of sample members
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  Sample_mean : optional, mean of the sample
  data : optional, if you do not know the sample mean, just pass the data
  """
  def __init__(self, null_mean, population_sd, alpha, type_t, n = 0., Sample_mean = 0., data=None):
    self.Sample_mean = Sample_mean
    self.null_mean = null_mean
    self.population_sd = population_sd
    self.type_t = type_t
    self.n = n
    self.alpha = alpha
    self.data = data
    if data is not None:
      self.Sample_mean = np.mean(list(data))
      self.n = len(list(data))

    mean_with_known_variance.__test(self)
  
  def __test(self):
    test_statistic = (self.Sample_mean-self.null_mean) / (self.population_sd/np.sqrt(self.n))
    print('test_statistic:', test_statistic, '\n')

    if self.type_t == 'two_tailed':
      p_value = 2*(1-norm.cdf(abs(test_statistic)))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (norm.cdf(test_statistic))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-norm.cdf(test_statistic))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [4]:

mean_with_known_variance(type_t='two_tailed', Sample_mean=8.75, n=16, null_mean=10, population_sd=1, alpha=0.05);

test_statistic: -5.0 

P_value: 5.733031438470704e-07 

Since p_value < 0.05, reject null hypothesis.

In [5]:

np.random.seed(1)
data = np.random.normal(0,1,100)
mean_with_known_variance(type_t='left_tailed', data=data, null_mean=0.1, population_sd=1, alpha=0.05);

test_statistic: -0.394171479243013 

P_value: 0.34672722046703075 

Since p_value > 0.05, the null hypothesis cannot be rejected.

4.2.2. Unknown Standard Deviation:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a unknown variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0$

$H_1:\ \mu \neq \mu_0$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{X}-\mu_0}{\frac{S}{\sqrt{n}}} \sim t_{n-1}$

$S = \sqrt{\frac{\sum_{i=1}^n\ (x_i\ -\ \overline{x})^2}{n-1}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ t_{\frac{\alpha}{2},n-1}\ <\ t_0 = \frac{\overline{X}-\mu_0}{\frac{S}{\sqrt{n}}} <\ t_{\frac{\alpha}{2},n-1}$
$\mu_0\ -\ t_{\frac{\alpha}{2},n-1} \frac{S}{\sqrt{n}}\ <\ \overline{X}\ <\ \mu_0\ +\ t_{\frac{\alpha}{2},n-1} \frac{S}{\sqrt{n}}$
$|\overline{X}-\mu_0| < t_{\frac{\alpha}{2},n-1} \frac{S}{\sqrt{n}}$
P_value $ = 2 \times P(t_{n-1} \geq |t_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a unknown variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0\ \quad or \quad H_0:\ \mu \leq \mu_0$

$H_1:\ \mu > \mu_0$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{X}-\mu_0}{\frac{S}{\sqrt{n}}} \sim t_{n-1}$

$S = \sqrt{\frac{\sum_{i=1}^n\ (x_i\ -\ \overline{x})^2}{n-1}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\infty <\ t_0 = \frac{\overline{X}-\mu_0}{\frac{S}{\sqrt{n}}}\ <\ t_{\alpha, n-1} $
$- \infty \ <\ \overline{X}\ <\ \mu_0\ +\ t_{\alpha, n-1} \frac{S}{\sqrt{n}}$
P_value $ = P(t_{n-1} \geq t_0) > \alpha$

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0 \quad or \quad H_0:\ \mu \geq \mu_0$

$H_1:\ \mu < \mu_0$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{X}-\mu_0}{\frac{S}{\sqrt{n}}} \sim t_{n-1}$

$S = \sqrt{\frac{\sum_{i=1}^n\ (x_i\ -\ \overline{x})^2}{n-1}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\ t_{\alpha, n-1} <\ t_0 = \frac{\overline{X}-\mu_0}{\frac{S}{\sqrt{n}}}\ <\ \infty$
$\ \mu_0\ -\ t_{\alpha, n-1} \frac{S}{\sqrt{n}} <\ \overline{X}\ <\ \infty $
P_value $ = P(t_{n-1} \leq t_0) > \alpha$

In [6]:

class mean_with_unknown_variance:
  """
  Parameters
  ----------
  null_mean : u0
  n : optional, number of sample members
  S : optional, sample standard deviation
  alpha : significance level
  Sample_mean : optional, mean of the sample
  data : optional, if you do not know the sample mean and S, just pass the data
  """
  def __init__(self, null_mean, alpha, type_t, n = 0.,S = 0., Sample_mean = 0., data=None):
    self.Sample_mean = Sample_mean
    self.S = S
    self.null_mean = null_mean
    self.type_t = type_t
    self.n = n
    self.alpha = alpha
    self.data = data
    if data is not None:
      self.Sample_mean = np.mean(list(data))
      self.S = np.std(list(data), ddof=1)
      self.n = len(list(data))

    mean_with_unknown_variance.__test(self)
  
  def __test(self):
    test_statistic = (self.Sample_mean-self.null_mean) / (self.S/np.sqrt(self.n))
    print('test_statistic:', test_statistic, '\n')
    
    if self.type_t == 'two_tailed':
      p_value = 2*(1-t.cdf(abs(test_statistic), df=self.n-1))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (t.cdf(test_statistic, df=self.n-1))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-t.cdf(test_statistic, df=self.n-1))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [7]:

mean_with_unknown_variance(type_t='two_tailed', Sample_mean=14.5, S=0.3, n=9, null_mean=14.2, alpha=0.05);

test_statistic: 3.0000000000000075 

P_value: 0.017071681233782554 

Since p_value < 0.05, reject null hypothesis.

In [8]:

np.random.seed(1)
data = np.random.normal(0,1,100)
mean_with_unknown_variance(type_t='two_tailed', data=data, null_mean=0.1, alpha=0.05);

test_statistic: -0.4430807396299341 

P_value: 0.6586740373655937 

Since p_value > 0.05, the null hypothesis cannot be rejected.

4.3. Test Concerning the Equality of Means of Two Normal Populations:¶

4.3.1. Known Variances:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_{n_x} \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \mu_x = \mu_y$

$H_1:\ \mu_x \neq \mu_y$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\overline{Y} -\ 0}{\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\frac{\alpha}{2}}\ <\ Z_0 = \frac{\overline{X}-\overline{Y} -\ 0}{\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}\ <\ Z_{\frac{\alpha}{2}}$
$-\ Z_{\frac{\alpha}{2}} {\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}\ <\ \overline{X}\ - \overline{Y}\ <\ Z_{\frac{\alpha}{2}} {\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}$
$|\overline{X}-\overline{Y}| < Z_{\frac{\alpha}{2}} {\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}$
P_value $ = 2 \times P(Z \geq |Z_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_{n_x} \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \mu_x = \mu_y\ \quad or \quad H_0:\ \mu_x \leq \mu_y$

$H_1:\ \mu_x > \mu_y$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\overline{Y} - 0}{\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\infty <\ Z_0 = \frac{\overline{X}-\overline{Y} - 0}{\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}\ <\ Z_{\alpha} $
$- \infty \ <\ \overline{X}\ - \overline{Y}\ <\ Z_{\alpha} {\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}$
P_value $ = P(Z \geq Z_0) > \alpha$

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_{n_x} \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \mu_x = \mu_y \quad or \quad H_0:\ \mu_x \geq \mu_y$

$H_1:\ \mu_x < \mu_y$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\overline{Y} - 0}{\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\alpha}\ <\ Z_0 = \frac{\overline{X}-\overline{Y} - 0}{\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}} <\ \infty $
$\ -Z_{\alpha} {\sqrt{(\frac{\sigma^2_x}{n_x}) + (\frac{\sigma^2_y}{n_y})}}\ <\ \overline{X} - \overline{Y}\ <\ \infty$
P_value $ = P(Z \leq Z_0) > \alpha$

If you want to test the below hypothesis, just the test statistics changes. The other equations are the same.

$H_0:\ \mu_x - k\ \mu_y = d $

$H_1:\ \mu_x - k\ \mu_y \neq d$

$\\ $

Testing statistics:

$Z_0 \equiv \frac{\overline{X}-\ k\ \overline{Y} -\ d}{\sqrt{(\frac{\sigma^2_x}{n_x}) + k^2(\frac{\sigma^2_y}{n_y})}}$

In [9]:

class equal_means_with_known_variances:
  """
  Parameters
  ----------
  diff_mean_null : difference between the means of two population in the null hypothesis
  population_v1 : known variance of the population1
  population_v2 : known variance of the population2
  n1 : optional, number of sample1 members
  n2 : optional, number of sample2 members
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  Sample_mean1 : mean of the sample1
  Sample_mean2 : mean of the sample2
  k : optional
  data1 : optional, if you do not know the sample mean1, just pass the first sample
  data2 : optional, if you do not know the sample mean2, just pass the second sample
  """
  def __init__(self, diff_mean_null, population_v1, population_v2, alpha, type_t, k = 1,
               n1 = 0., n2 = 0., Sample_mean1 = 0., Sample_mean2 = 0., data1=None, data2=None):
    self.Sample_mean1 = Sample_mean1
    self.Sample_mean2 = Sample_mean2    
    self.diff_mean_null = diff_mean_null
    self.population_v1 = population_v1
    self.population_v2 = population_v2
    self.type_t = type_t
    self.n1 = n1
    self.n2 = n2
    self.k = k
    self.alpha = alpha
    self.data1 = data1
    self.data2 = data2
    if data1 is not None:
      self.Sample_mean1 = np.mean(list(data1))
      self.n1 = len(list(data1))
    if data2 is not None:
      self.Sample_mean2 = np.mean(list(data2))
      self.n2 = len(list(data2)) 

    equal_means_with_known_variances.__test(self)
  
  def __test(self):
    test_statistic = (self.Sample_mean1-(self.k)*self.Sample_mean2-self.diff_mean_null) / (np.sqrt((self.population_v1/ self.n1) + (self.k**2)*(self.population_v2/ self.n2)))
    print('test_statistic:', test_statistic, '\n')

    if self.type_t == 'two_tailed':
      p_value = 2*(1-norm.cdf(abs(test_statistic)))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (norm.cdf(test_statistic))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-norm.cdf(test_statistic))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [10]:

data1 = np.multiply([61.1,58.2,62.3,64,59.7,66.2,57.8,61.4,62.2,63.6], 100)
data2 = np.multiply([62.2,56.6,66.4,56.2,57.4,58.4,57.6,65.4], 100)
np.mean(data1), np.mean(data2)

Out[10]:

(6165.0, 6002.5)

In [11]:

equal_means_with_known_variances(diff_mean_null = 0, population_v1 = 4000**2, population_v2 = 6000**2, n1 = 10, n2 = 8, 
                                alpha = 0.05, type_t = 'two_tailed', k = 1, Sample_mean1 = 6165, Sample_mean2 = 6002.5);

test_statistic: 0.06579432682703693 

P_value: 0.947541572987298 

Since p_value > 0.05, the null hypothesis cannot be rejected.

In [12]:

equal_means_with_known_variances(diff_mean_null = 0, population_v1 = 4000**2, population_v2 = 6000**2, 
                                alpha = 0.05, type_t = 'two_tailed', k = 1, data1 = data1, data2 = data2);

test_statistic: 0.06579432682703693 

P_value: 0.947541572987298 

Since p_value > 0.05, the null hypothesis cannot be rejected.

4.3.2. Unknown but Equal Variances:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a unknown variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a unknown variance $\sigma^2_y$.

$X_1, X_2, ..., X_{n_x} \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \mu_x = \mu_y$

$H_1:\ \mu_x \neq \mu_y$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{X}-\ \overline{Y}\ -\ 0}{{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} \sim t_{{n_x}+{n_y}-2}$

$S_p^2 = \frac{(n_x-1)S_x^2 + (n_y-1)S_y^2}{n_x+n_y-2}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ t_{\frac{\alpha}{2},{n_x} + {n_y}-2}\ <\ t_0 = \frac{\overline{X}-\ \overline{Y}\ -\ 0}{{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} <\ t_{\frac{\alpha}{2},{n_x} + {n_y}-2}$
$-\ t_{\frac{\alpha}{2},{n_x} + {n_y}-2} {{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}\ <\ \overline{X}\ - \overline{Y}\ <\ t_{\frac{\alpha}{2},{n_x} + {n_y}-2} {{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}$
$|\overline{X}-\overline{Y}| < t_{\frac{\alpha}{2},{n_x} + {n_y}-2} {{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}$
P_value $ = 2 \times P(t_{\frac{\alpha}{2},{n_x} + {n_y}-2} \geq |t_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a unknown variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a unknown variance $\sigma^2_y$.

$X_1, X_2, ..., X_{n_x} \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \mu_x = \mu_y\ \quad or \quad H_0:\ \mu_x \leq \mu_y$

$H_1:\ \mu_x > \mu_y$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{X}-\ \overline{Y}\ -\ 0}{{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} \sim t_{{n_x}+{n_y}-2}$

$S_p^2 = \frac{(n_x-1)S_x^2 + (n_y-1)S_y^2}{n_x+n_y-2}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\infty <\ t_0 = \frac{\overline{X}-\ \overline{Y}\ -\ 0}{{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}\ <\ t_{{\alpha},{n_x} + {n_y}-2} $
$- \infty \ <\ \overline{X}\ - \overline{Y}\ <\ t_{{\alpha},{n_x} + {n_y}-2} {{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}$
P_value $ = P(t_{{n_x} + {n_y}-2} \geq t_0) > \alpha$

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_{n_x} \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \mu_x = \mu_y \quad or \quad H_0:\ \mu_x \geq \mu_y$

$H_1:\ \mu_x < \mu_y$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{X}-\ \overline{Y}\ -\ 0}{{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} \sim t_{{n_x}+{n_y}-2}$

$S_p^2 = \frac{(n_x-1)S_x^2 + (n_y-1)S_y^2}{n_x+n_y-2}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ t_{{\alpha},{n_x} + {n_y}-2} <\ t_0 = \frac{\overline{X}-\ \overline{Y}\ -\ 0}{{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} <\ \infty $
$\ -t_{{\alpha},{n_x} + {n_y}-2} {{S_p} \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}\ <\ \overline{X} - \overline{Y}\ <\ \infty$
P_value $ = P(t_{{\alpha},{n_x} + {n_y}-2} \leq t_0) > \alpha$

In [13]:

class equal_means_with_unknown_variances:
  """
  Parameters
  ----------
  diff_mean_null : difference between the means of two population in the null hypothesis
  S1 : optional, std of the sample1
  S2 : optional, std of the sample2
  n1 : optional, number of sample1 members
  n2 : optional, number of sample2 members
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  Sample_mean1 : mean of the sample1
  Sample_mean2 : mean of the sample2
  data1 : optional, if you do not know the sample mean1 and S1, just pass the first sample
  data2 : optional, if you do not know the sample mean2 and S2, just pass the second sample
  """
  def __init__(self, diff_mean_null, alpha, type_t, n1 = 0., n2 = 0., S1 = 0., S2 = 0., 
               Sample_mean1 = 0., Sample_mean2 = 0., data1=None, data2=None):
    self.Sample_mean1 = Sample_mean1
    self.Sample_mean2 = Sample_mean2    
    self.diff_mean_null = diff_mean_null
    self.S1 = S1
    self.S2 = S2
    self.type_t = type_t
    self.n1 = n1
    self.n2 = n2
    self.alpha = alpha
    self.data1 = data1
    self.data2 = data2
    self.SP2 = None
    if data1 is not None:
      self.Sample_mean1 = np.mean(list(data1))
      self.S1 = np.std(list(data1), ddof=1)
      self.n1 = len(list(data1))

    if data2 is not None:
      self.Sample_mean2 = np.mean(list(data2))  
      self.S2 = np.std(data2, ddof=1)
      self.n2 = len(list(data2))

    self.SP2 = ((self.n1-1)*(self.S1**2) + (self.n2-1)*(self.S2**2)) / (self.n1+self.n2-2)

    equal_means_with_unknown_variances.__test(self)
  
  def __test(self):
    test_statistic = (self.Sample_mean1-self.Sample_mean2-self.diff_mean_null) / (np.sqrt(self.SP2)*np.sqrt(1/self.n1+1/self.n2))
    print('test_statistic:', test_statistic, '\n')

    if self.type_t == 'two_tailed':
      p_value = 2*(1-t.cdf(abs(test_statistic), df = self.n1+self.n2-2))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (t.cdf(test_statistic, df = self.n1+self.n2-2))
      print('P_value:', p_value, '\n')
    else:
       p_value = 1-t.cdf(test_statistic, df = self.n1+self.n2-2)
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [14]:

equal_means_with_unknown_variances(diff_mean_null = 0, S1 = np.sqrt(0.581), S2 = np.sqrt(0.778), n1 = 10, n2 = 12,
                                   alpha = 0.05, type_t = 'left_tailed', Sample_mean1 = 6.45, Sample_mean2 = 7.125);

test_statistic: -1.8987297952365871 

P_value: 0.03606193933099178 

Since p_value < 0.05, reject null hypothesis.

In [15]:

data1 = [5.5,6,7,6,7.5,6,7.5,5.5,7,6.5]
data2 = [6.5,6,8.5,7,6.5,8,7.5,6.5,7.5,6,8.5,7]
np.mean(data1), np.mean(data2)

Out[15]:

(6.45, 7.125)

In [16]:

equal_means_with_unknown_variances(diff_mean_null = 0, alpha = 0.05, type_t = 'left_tailed', data1 = data1, data2 = data2);

test_statistic: -1.8986953664603443 

P_value: 0.03606431976959166 

Since p_value < 0.05, reject null hypothesis.

You can also do this test using scipy.

scipy.stats.ttest_ind calculate the T-test for the means of two independent samples.

Doc

In [17]:

alpha = 0.05
Test_statistic, p_value = ttest_ind(data1, data2, alternative='less')

print(f'Test_statistic = {Test_statistic},   p_value = {p_value}', '\n')

if p_value < alpha:
  print(f'Since p_value < {alpha}, reject null hypothesis.')
else:
  print(f'Since p_value > {alpha}, the null hypothesis cannot be rejected.')

Test_statistic = -1.8986953664603443,   p_value = 0.03606431976959166 

Since p_value < 0.05, reject null hypothesis.

4.4. Paired t-test:¶

A. Two Tailed Test:

The data can be described by the $n$ pairs $(X_i, Y_i), i = 1, 2, ..., n$, where $X_i$ is the data before an action, and $Y_i$ is the respective data points after an action.

It is important to note that we cannot treat $X_1, ... , X_n$ and $Y_1, ... , Y_n$ as being independent samples.

$D_i = X_i − Y_i, \quad i = 1, ... , n$

$H_0 : \mu_D = 0$

$H_1 : \mu_D \neq 0$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{D}\ -\ 0}{\frac{S_D}{\sqrt{n}}} \sim t_{n-1}$

$S_D = \sqrt{\frac{\sum_{i=1}^n\ (D_i\ -\ \overline{D})^2}{n\ -\ 1}} \quad \quad \overline{D} = \sqrt{\frac{{\sum_{i=1}^n\ D_i}}{n}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\ t_{\frac{{\alpha}}{2},{n-1}} <\ t_0 = \frac{\overline{D}\ -\ 0}{\frac{S_D}{\sqrt{n}}}\ <\ t_{\frac{{\alpha}}{2},{n-1}} $
$- t_{\frac{{\alpha}}{2},{n-1}} \frac{S_D}{\sqrt{n}} \ <\ \overline{D}\ <\ t_{\frac{{\alpha}}{2},{n-1}} \frac{S_D}{\sqrt{n}}$
$| \overline{D} |\ <\ t_{\frac{{\alpha}}{2},{n-1}} \frac{S_D}{\sqrt{n}}$
P_value $ = P(t_{n-1} \geq |t_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

The data can be described by the $n$ pairs $(X_i, Y_i), i = 1, 2, ..., n$, where $X_i$ is the data before an action, and $Y_i$ is the respective data points after an action.

It is important to note that we cannot treat $X_1, ... , X_n$ and $Y_1, ... , Y_n$ as being independent samples.

$D_i = X_i − Y_i, \quad i = 1, ... , n$

$H_0:\ \mu_D = 0\ \quad or \quad H_0:\ \mu_D \leq 0$

$H_1 : \mu_D > 0$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{D}\ -\ 0}{\frac{S_D}{\sqrt{n}}} \sim t_{n-1}$

$S_D = \sqrt{\frac{\sum_{i=1}^n\ (D_i\ -\ \overline{D})^2}{n\ -\ 1}} \quad \quad \overline{D} = \sqrt{\frac{{\sum_{i=1}^n\ D_i}}{n}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ -\infty <\ t_0 = \frac{\overline{D}\ -\ 0}{\frac{S_D}{\sqrt{n}}}\ <\ t_{{\alpha},{n-1}} $
$- \infty \ <\ \overline{D}\ <\ t_{{\alpha},{n-1}} {\frac{S_D}{\sqrt{n}}}$
P_value $ = P(t_{n-1} \geq t_0) > \alpha$

Left-tailed test:

The data can be described by the $n$ pairs $(X_i, Y_i), i = 1, 2, ..., n$, where $X_i$ is the data before an action, and $Y_i$ is the respective data points after an action.

It is important to note that we cannot treat $X_1, ... , X_n$ and $Y_1, ... , Y_n$ as being independent samples.

$D_i = X_i − Y_i, \quad i = 1, ... , n$

$H_0:\ \mu_D = 0\ \quad or \quad H_0:\ \mu_D \geq 0$

$H_1 : \mu_D < 0$

$\\ $

Testing statistics:

$t_0 \equiv \frac{\overline{D}\ -\ 0}{\frac{S_D}{\sqrt{n}}} \sim t_{n-1}$

$S_D = \sqrt{\frac{\sum_{i=1}^n\ (D_i\ -\ \overline{D})^2}{n\ -\ 1}} \quad \quad \overline{D} = \sqrt{\frac{{\sum_{i=1}^n\ D_i}}{n}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ t_{{\alpha},{n-1}} <\ t_0 = \frac{\overline{D}\ -\ 0}{\frac{S_D}{\sqrt{n}}} <\ \infty $
$\ -t_{{\alpha},{n-1}} {\frac{S_D}{\sqrt{n}}}\ <\ \overline{D}\ <\ \infty$
P_value $ = P(t_{n-1} \leq t_0) > \alpha$

In [18]:

class paired_test:
  """
  Parameters
  ----------
  null_mean : diff in the null hpothesis
  n : optional, number of sample members
  SD : D standard deviation
  alpha : significance level
  D_bar : mean of the D
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  data1 : optional, if you do not know the D_bar and SD, just pass the data
  data2 : optional, if you do not know the D_bar and SD, just pass the data
  """
  def __init__(self, null_mean, alpha, type_t, n = 0., SD = 0., D_bar = 0., data1=None, data2=None):
    self.D_bar = D_bar
    self.SD = SD
    self.null_mean = null_mean
    self.type_t = type_t
    self.n = n
    self.alpha = alpha
    self.data1 = data1
    self.data2 = data2

    if data1 is not None:
      D = np.subtract(data1, data2)
      self.D_bar = np.mean(list(D))
      self.SD = np.std(list(D), ddof=1)
      self.n = len(list(data1))

    paired_test.__test(self)
  
  def __test(self):
    test_statistic = (self.D_bar - self.null_mean) / (self.SD / np.sqrt(self.n))
    print('test_statistic:', test_statistic, '\n')
    
    if self.type_t == 'two_tailed':
      p_value = 2*(1-t.cdf(abs(test_statistic), df=self.n-1))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (t.cdf(test_statistic, df=self.n-1))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-t.cdf(test_statistic, df=self.n-1))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [19]:

data1 = [7,6,10,16]
data2 = [9,10,14,18]
paired_test(null_mean = 0, n = 4, alpha = 0.05, type_t = 'two_tailed', data1=data1, data2=data2);

test_statistic: -5.196152422706632 

P_value: 0.013846832988859026 

Since p_value < 0.05, reject null hypothesis.

In [20]:

data1 = [30.5,18.5,24.5,32,16,15,23.5,25.5,28,18]
data2 = [23,21,22,28.5,14.5,15.5,24.5,21,23.5,16.5]
paired_test(null_mean = 0, n = 10, alpha = 0.05, type_t = 'right_tailed', data1=data1, data2=data2);

test_statistic: 2.2659493332258522 

P_value: 0.02484551530199397 

Since p_value < 0.05, reject null hypothesis.

You can also do this test using scipy.

scipy.stats.ttest_rel calculate the T-test for the means of two independent samples.

Doc

In [21]:

alpha = 0.05
Test_statistic, p_value = ttest_rel(data1, data2, alternative='greater')

print(f'Test_statistic = {Test_statistic},   p_value = {p_value}', '\n')

if p_value < alpha:
  print(f'Since p_value < {alpha}, reject null hypothesis.')
else:
  print(f'Since p_value > {alpha}, the null hypothesis cannot be rejected.')

Test_statistic = 2.2659493332258522,   p_value = 0.024845515301993935 

Since p_value < 0.05, reject null hypothesis.

4.5. Test Concerning the Variance of a Normal Population:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \sigma^2 = \sigma_0^2$

$H_1:\ \sigma^2 \neq \sigma_0^2$

$\\ $

Testing statistics:

$ \chi^2_0 \equiv \frac{(n-1)\ S^2}{\sigma_0^2}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$\ \chi^2_{1-\frac{\alpha}{2}, n-1}\ <\ \chi^2_0 = \frac{(n-1)\ S^2}{\sigma_0^2} \ <\chi^2_{\frac{\alpha}{2}, n-1} $
$\sigma_0^2\ \frac{\chi^2_{1-\frac{\alpha}{2}, n-1}}{n-1} <\ S^2 <\ \sigma_0^2\ \frac{\chi^2_{\frac{\alpha}{2}, n-1}}{n-1}$
P_value $ = 2 \times Min\ ([P(\chi^2_{n-1})< \chi^2_0],\ [1-P(\chi^2_{n-1}< \chi^2_0)]) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \sigma^2 = \sigma_0^2\ \quad or \quad H_0:\ \sigma^2 \leq \sigma_0^2$

$H_1:\ \sigma^2 > \sigma_0^2$

$\\ $

Testing statistics:

$ \chi^2_0 \equiv \frac{(n-1)\ S^2}{\sigma_0^2}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ 0 <\ \chi^2_0 = \frac{(n-1)\ S^2}{\sigma_0^2}\ <\ \chi^2_{{\alpha}, n-1}$
$0 \ <\ S^2\ <\ \frac{\chi^2_{{\alpha}, n-1}\ \sigma_0^2}{(n-1)\ S^2} $
P_value $ = P(\chi^2_{n-1} \geq Z_0) > \alpha$

B. One-sided tests:

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \sigma^2 = \sigma_0^2\ \quad or \quad H_0:\ \sigma^2 \geq \sigma_0^2$

$H_1:\ \sigma^2 < \sigma_0^2$

$\\ $

Testing statistics:

$ \chi^2_0 \equiv \frac{(n-1)\ S^2}{\sigma_0^2}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$ \chi^2_{1-{\alpha}, n-1} <\ \chi^2_0 = \frac{(n-1)\ S^2}{\sigma_0^2} <\ \infty$
$\frac{\chi^2_{1-{\alpha}, n-1}\ \sigma_0^2}{(n-1)\ S^2} \ <\ S^2\ <\ \infty $
P_value $ = P(\chi^2_{n-1} \leq Z_0) > \alpha$

In [22]:

class variance_normal:
  """
  Parameters
  ----------
  null_v : v0
  n : optional, number of sample members
  S : optional, sample variance
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  data : optional, if you do not know the S2, just pass the data
  """
  def __init__(self, null_v, alpha, type_t, n = 0., S2 = 0., data=None):
    self.S2 = S2
    self.null_v = null_v
    self.type_t = type_t
    self.n = n
    self.alpha = alpha
    self.data = data
    if data is not None:
      self.S2 = np.std(list(data), ddof=1)**2
      self.n = len(list(data))

    variance_normal.__test(self)
  
  def __test(self):
    test_statistic = ((self.n-1)*(self.S2)) / self.null_v
    print('test_statistic:', test_statistic, '\n')
    
    if self.type_t == 'two_tailed':
      a = 1-chi2.cdf(test_statistic, df=self.n-1)
      b = chi2.cdf(test_statistic, df=self.n-1)
      p_value = 2*np.min([a, b])
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (chi2.cdf(test_statistic, df=self.n-1))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-chi2.cdf(test_statistic, df=self.n-1))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [23]:

variance_normal(null_v = 0.15**2, n = 20, alpha = 0.05, type_t = 'right_tailed', S2 = 0.025);

test_statistic: 21.111111111111114 

P_value: 0.33069403418551535 

Since p_value > 0.05, the null hypothesis cannot be rejected.

In [24]:

data = [11,10,9,10,10,11,11,10,12,9,7,9,11,10,11]
variance_normal(null_v = 2**2, alpha = 0.05, type_t = 'two_tailed', data = data);

test_statistic: 5.233333333333333 

P_value: 0.035414043881992714 

Since p_value < 0.05, reject null hypothesis.

4.6. Test Concerning the Equality of Variances of Two Normal Populations:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_n \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_n \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \sigma_x^2 / \sigma_y^2 = 1$

$H_1:\ \sigma_x^2 / \sigma_y^2 \neq 1$

$\\ $

Testing statistics:

$ F_0 \equiv \frac{S^2_x}{S^2_y}$

$S^2_x = \frac{\sum_{i=1}^{n_x}\ (x_i\ -\ \overline{x})^2}{{n_x}-1}$

$S^2_y = \frac{\sum_{i=1}^{n_y}\ (y_i\ -\ \overline{y})^2}{{n_y}-1}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$F_{{1-\frac{\alpha}{2}}, n_x-1, n_y-1}\ <\ F_0 = \frac{S^2_x}{S^2_y}\ < F_{{\frac{\alpha}{2}}, n_x-1, n_y-1}$
P_value $ = 2 \times Min\ ([P(F_{n_x-1,n_y-1})< F_0],\ [1-P(F_{n_x-1,n_y-1}< F_0]) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_n \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_n \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \sigma_x^2 / \sigma_y^2 = 1 \quad or \quad \sigma_x^2 / \sigma_y^2 \leq 1$

$H_1:\ \sigma_x^2 / \sigma_y^2 > 1$

$\\ $

Testing statistics:

$ F_0 \equiv \frac{S^2_x}{S^2_y}$

$S^2_x = \frac{\sum_{i=1}^{n_x}\ (x_i\ -\ \overline{x})^2}{{n_x}-1}$

$S^2_y = \frac{\sum_{i=1}^{n_y}\ (y_i\ -\ \overline{y})^2}{{n_y}-1}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$0\ <\ F_0 = \frac{S^2_x}{S^2_y}\ < F_{{\alpha}, n_x-1, n_y-1}$
P_value $ = P(F_{n_x-1,n_y-1} \geq F_0) > \alpha$

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n_x$ from a normal distribution having an unknown mean $\mu_x$ and a known variance $\sigma^2_x$.

$Y_1, Y_2, ..., Y_n$ is a sample of size $n_y$ from a normal distribution having an unknown mean $\mu_y$ and a known variance $\sigma^2_y$.

$X_1, X_2, ..., X_n \sim N( \mu_x, \sigma^2_x)$

$Y_1, Y_2, ..., Y_n \sim N( \mu_y, \sigma^2_y)$

$H_0:\ \sigma_x^2 / \sigma_y^2 = 1 \quad or \quad \sigma_x^2 / \sigma_y^2 \geq 1$

$H_1:\ \sigma_x^2 / \sigma_y^2 < 1$

$\\ $

Testing statistics:

$ F_0 \equiv \frac{S^2_x}{S^2_y}$

$S^2_x = \frac{\sum_{i=1}^{n_x}\ (x_i\ -\ \overline{x})^2}{{n_x}-1}$

$S^2_y = \frac{\sum_{i=1}^{n_y}\ (y_i\ -\ \overline{y})^2}{{n_x}-1}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$F_{{1-\alpha}, n_x-1, n_y-1}\ <\ F_0 = \frac{S^2_x}{S^2_y}\ < \infty$
P_value $ = P(F_{n_x-1,n_y-1} \leq F_0) > \alpha$

In [25]:

class equal_variances:
  """
  Parameters
  ----------
  null_v : v0
  n1 : optional, number of data1 members
  n2 : optional, number of data2 members
  S2X : sample1 variance
  S2Y : sample2 variance
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  data1 : optional, if you do not know the S2X, just pass the data
  data2 : optional, if you do not know the S2Y, just pass the data
  """
  def __init__(self, null_v, alpha, type_t, n1 = 0., n2 = 0., S2X = 0., S2Y = 0., data1=None, data2=None):
    self.S2X = S2X
    self.S2Y = S2Y
    self.null_v = null_v
    self.type_t = type_t
    self.n1 = n1
    self.n2 = n2
    self.alpha = alpha
    self.data1 = data1
    self.data2 = data2
    if data1 is not None:
      self.SX2 = np.std(list(data1), ddof=1)**2
      self.n1 = len(list(data1))
    if data2 is not None:
      self.S2Y = np.std(list(data2), ddof=1)**2
      self.n2 = len(list(data2))

    equal_variances.__test(self)
  
  def __test(self):
    test_statistic = self.S2X / self.S2Y
    print('test_statistic:', test_statistic, '\n')
    
    if self.type_t == 'two_tailed':
      a = 1-stats.f.cdf(test_statistic, self.n1-1, self.n2-1)
      b = stats.f.cdf(test_statistic, self.n1-1, self.n2-1)
      p_value = 2*np.min([a, b])
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (stats.f.cdf(test_statistic, self.n1-1, self.n2-2))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-stats.f.cdf(test_statistic, self.n1-1, self.n2-2))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [26]:

equal_variances(null_v = 1, n1 = 10, n2 = 12, alpha = 0.05, type_t = 'two_tailed', S2X = .14, S2Y = .28);

test_statistic: 0.5 

P_value: 0.3075191907594375 

Since p_value > 0.05, the null hypothesis cannot be rejected.

4.7. Test Concerning P in Bernoulli Populations:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ (large) from a bernoulli distribution having an unknown parameter $P$.

$X_1, X_2, ..., X_n \sim Ber(P)$

$H_0 : P = P_0$

$H_1 : P \neq P_0$

$\\ $

Testing statistics:

$ Z_0 \equiv \frac{\overline{X}\ -\ P_0}{\sqrt{\frac{P_0(1-P_0)}{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\frac{\alpha}{2}}\ <\ Z_0 = \frac{\overline{X}\ -\ P_0}{\sqrt{\frac{P_0(1-P_0)}{n}}}\ < Z_{\frac{\alpha}{2}}$
$P_0\ -\ Z_{\frac{\alpha}{2}} \sqrt{\frac{P_0(1-P_0)}{n}} \ <\ \overline{X} \ < P_0\ +\ Z_{\frac{\alpha}{2}} \sqrt{\frac{P_0(1-P_0)}{n}}$
P_value $ = 2 \times P(Z \geq |Z_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ (large) from a bernoulli distribution having an unknown parameter $P$.

$X_1, X_2, ..., X_n \sim Ber(P)$

$H_0 : P = P_0 \quad or \quad P \leq P_0$

$H_1 : P > P_0$

$\\ $

Testing statistics:

$ Z_0 \equiv \frac{\overline{X}\ -\ P_0}{\sqrt{\frac{P_0(1-P_0)}{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ \infty <\ Z_0 = \frac{\overline{X}\ -\ P_0}{\sqrt{\frac{P_0(1-P_0)}{n}}}\ < Z_{\alpha}$
$-\ \infty \ <\ \overline{X} \ < P_0\ +\ Z_{\alpha} \sqrt{\frac{P_0(1-P_0)}{n}}$
P_value $ = P(Z \geq Z_0) > \alpha$

Left-tailed test:

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ (large) from a bernoulli distribution having an unknown parameter $P$.

$X_1, X_2, ..., X_n \sim Ber(P)$

$H_0 : P = P_0 \quad or \quad P \geq P_0$

$H_1 : P < P_0$

$\\ $

Testing statistics:

$ Z_0 \equiv \frac{\overline{X}\ -\ P_0}{\sqrt{\frac{P_0(1-P_0)}{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\alpha} <\ Z_0 = \frac{\overline{X}\ -\ P_0}{\sqrt{\frac{P_0(1-P_0)}{n}}}\ < \infty$
$\ P_0\ -\ Z_{\alpha} \sqrt{\frac{P_0(1-P_0)}{n}} \ <\ \overline{X} \ < \infty$
P_value $ = P(Z \leq Z_0) > \alpha$

In [27]:

class p_bernoulli:
  """
  Parameters
  ----------
  null_mean : u0
  population_sd : known standrad deviation of the population
  n : optional, number of sample members
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  Sample_mean : optional, mean of the sample
  data : optional, if you do not know the sample mean, just pass the data
  """
  def __init__(self, null_p, alpha, type_t, n = 0., Sample_mean = 0., data=None):
    self.Sample_mean = Sample_mean
    self.null_p = null_p
    self.type_t = type_t
    self.n = n
    self.alpha = alpha
    self.data = data
    if data is not None:
      self.Sample_mean = np.mean(list(data))
      self.n = len(list(data))

    p_bernoulli.__test(self)
  
  def __test(self):
    test_statistic = (self.Sample_mean - self.null_p) / np.sqrt(self.null_p * (1-self.null_p) / self.n)
    print('test_statistic:', test_statistic, '\n')

    if self.type_t == 'two_tailed':
      p_value = 2*(1-norm.cdf(abs(test_statistic)))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (norm.cdf(test_statistic))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-norm.cdf(test_statistic))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [28]:

p_bernoulli(null_p = 0.2, n = 100, alpha = 0.05, type_t = 'right_tailed', Sample_mean = 0.3);

test_statistic: 2.4999999999999996 

P_value: 0.006209665325776159 

Since p_value < 0.05, reject null hypothesis.

4.8. Test Concerning the Equality of P in Two Bernoulli Populations:¶

A. Two Tailed Test:

Suppose that $X_1, X_2, ..., X_{n_x}$ is a sample of size $n_x$ (large) from a bernoulli distribution having an unknown parameter $P_x$ and $Y_1, Y_2, ..., Y_{n_y}$ is a sample of size $n_y$ (large) from a bernoulli distribution having an unknown parameter $P_y$.

$X_1, X_2, ..., X_{n_x} \sim Ber(P_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim Ber(P_y)$

$H_0 : P_x = P_y$

$H_1 : P_x \neq P_y$

$\\ $

Testing statistics:

$ Z_0 \equiv \frac{\overline{X}\ -\ \overline{Y}\ -\ 0}{\sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}}$

$\widehat{p} = \frac{n_x \overline{X}\ +\ n_y \overline{Y}}{n_x\ +\ n_y}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\frac{\alpha}{2}}\ <\ Z_0 = \frac{\overline{X}\ -\ \overline{Y}\ -\ 0}{\sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}}\ < Z_{\frac{\alpha}{2}}$
$-\ Z_{\frac{\alpha}{2}} \sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}\ <\ \overline{X}\ -\ \overline{Y} < Z_{\frac{\alpha}{2}} \sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}$
P_value $ = 2 \times P(Z \geq |Z_0|) > \alpha$

B. One-sided tests:

Right-tailed test:

$X_1, X_2, ..., X_{n_x} \sim Ber(P_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim Ber(P_y)$

$H_0 : P_x = P_y \quad or \quad P_x \leq P_y$

$H_1 : P_x > P_y$

$\\ $

Testing statistics:

$ Z_0 \equiv \frac{\overline{X}\ -\ \overline{Y}\ -\ 0}{\sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}}$

$\widehat{p} = \frac{n_x \overline{X}\ +\ n_y \overline{Y}}{n_x\ +\ n_y}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ \infty\ <\ Z_0 = \frac{\overline{X}\ -\ \overline{Y}\ -\ 0}{\sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}}\ < Z_{\alpha}$
$-\ \infty\ <\ \overline{X}\ -\ \overline{Y} < Z_{\alpha} \sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}$
P_value $ = P(Z \geq Z_0) > \alpha$

Left-tailed test:

$X_1, X_2, ..., X_{n_x} \sim Ber(P_x)$

$Y_1, Y_2, ..., Y_{n_y} \sim Ber(P_y)$

$H_0 : P_x = P_y \quad or \quad P_x \geq P_y$

$H_1 : P_x < P_y$

$\\ $

Testing statistics:

$ Z_0 \equiv \frac{\overline{X}\ -\ \overline{Y}\ -\ 0}{\sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}}$

$\widehat{p} = \frac{n_x \overline{X}\ +\ n_y \overline{Y}}{n_x\ +\ n_y}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

$-\ Z_{\alpha}\ <\ Z_0 = \frac{\overline{X}\ -\ \overline{Y}\ -\ 0}{\sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}}}\ < \infty$
$-\ Z_{\alpha} \sqrt{\frac{\widehat{p}(1- \widehat{p})}{n_x} + \frac{\widehat{p}(1-\widehat{p})}{n_y}} <\ \overline{X}\ -\ \overline{Y} < \infty$
P_value $ = P(Z \leq Z_0) > \alpha$

In [29]:

class p_bernoulli_two_populations:
  """
  Parameters
  ----------
  null_mean : u0
  population_sd : known standrad deviation of the population
  n1 : optional, number of data1 members
  n2 : optional, number of data2 members
  alpha : significance level
  type_t : 'two_tailed', 'right_tailed', 'left_tailed'
  Sample_mean : mean of the sample
  data : optional, if you do not know the sample mean, just pass the data
  """
  def __init__(self, null_p, alpha, type_t, n1 = 0., n2 = 0., Sample_mean1 = 0., Sample_mean2 = 0., data1=None, data2=None):
    self.Sample_mean1 = Sample_mean1
    self.Sample_mean2 = Sample_mean2
    self.null_p = null_p
    self.type_t = type_t
    self.n1 = n1
    self.n2 = n2
    self.alpha = alpha
    self.data1 = data1
    self.data2 = data2
    if data1 is not None:
      self.Sample_mean1 = np.mean(list(data1))
      self.n1 = len(list(data1))
    if data2 is not None:
      self.Sample_mean2 = np.mean(list(data2))
      self.n2 = len(list(data2))

    p_bernoulli_two_populations.__test(self)
  
  def __test(self):
    p_hat = ((self.n1 * self.Sample_mean1) + (self.n2 * self.Sample_mean2)) / (self.n1 + self.n2)
    test_statistic = (self.Sample_mean1 - self.Sample_mean2) / np.sqrt((p_hat*(1-p_hat)/self.n1) + (p_hat*(1-p_hat)/self.n2))
    print('test_statistic:', test_statistic, '\n')

    if self.type_t == 'two_tailed':
      p_value = 2*(1-norm.cdf(abs(test_statistic)))
      print('P_value:', p_value, '\n')
    elif self.type_t == 'left_tailed':
      p_value = (norm.cdf(test_statistic))
      print('P_value:', p_value, '\n')
    else:
       p_value = (1-norm.cdf(test_statistic))
       print('P_value:', p_value, '\n')

    if p_value < self.alpha:
      print(f'Since p_value < {self.alpha}, reject null hypothesis.')
    else:
      print(f'Since p_value > {self.alpha}, the null hypothesis cannot be rejected.')

In [30]:

p_bernoulli_two_populations(null_p = 0, n1 = 200, n2 = 200, alpha = 0.05, Sample_mean1 = 150/200,
                            Sample_mean2 = 170/200, type_t = 'two_tailed');

test_statistic: -2.4999999999999996 

P_value: 0.012419330651552318 

Since p_value < 0.05, reject null hypothesis.