Berdasarkan isu #124: anfrek: Log Normal 2 Parameter

Referensi Isu:

Soetopo, W., Montarcih, L., Press, U. B., & Media, U. (2017). Rekayasa Statistika untuk Teknik Pengairan. Universitas Brawijaya Press. https://books.google.co.id/books?id=TzVTDwAAQBAJ
Soewarno. (1995). hidrologi: Aplikasi Metode Statistik untuk Analisa Data.NOVA.

Deskripsi Isu:

Mencari nilai ekstrim dengan kala ulang tertentu. Penerapan ini bisa digunakan untuk hujan rancangan atau debit banjir rancangan.

Diskusi Isu:

#156 - Bagaimana menghitung periode ulang distribusi (analisis frekuensi) tanpa melihat tabel?

Strategi:

Akan mengikuti fungsi log pearson #126 seperti pada manual.

PERSIAPAN DAN DATASET¶

In [1]:

import numpy as np
import pandas as pd
from scipy import stats

In [2]:

# contoh data diambil dari buku
# hidrologi: Aplikasi Metode Statistik untuk Analisa Data hal. 144

_DEBIT = [
    58.3, 50.5, 46.0, 41.8, 38.2, 37.9, 37.7, 35.3, 35.2, 33.4, 31.9, 
    31.1, 30.9, 30.1, 28.8, 24.7, 23.6, 23.5, 23.1, 22.5, 21.1, 20.5, 
    20.5, 20.3, 20.2, 18.7, 17.2, 14.9, 12.4, 11.8
]

data = pd.DataFrame(
    data=_DEBIT, columns=['debit'], index=range(1, 31)
)

data

Out[2]:

	debit
1	58.3
2	50.5
3	46.0
4	41.8
5	38.2
6	37.9
7	37.7
8	35.3
9	35.2
10	33.4
11	31.9
12	31.1
13	30.9
14	30.1
15	28.8
16	24.7
17	23.6
18	23.5
19	23.1
20	22.5
21	21.1
22	20.5
23	20.5
24	20.3
25	20.2
26	18.7
27	17.2
28	14.9
29	12.4
30	11.8

TABEL¶

Terdapat 1 tabel untuk modul hk124 yaitu:

t_normal_sw: Tabel nilai $k$ dari Tabel 3.3 Nilai Variabel Reduksi Gauss. Sumber: hidrologi: Aplikasi Metode Statistik untuk Analisa Data.

Dalam modul hk124 nilai $k$ akan dibangkitkan menggunakan scipy secara default. Mohon diperhatikan jika ingin menggunakan nilai $k$ yang berasal dari sumber lain.

In [3]:

# Tabel Nilai Variabel Reduksi Gauss
# Dari buku hidrologi: Aplikasi Metode Statistik untuk Analisa Data. hal.119

# KODE: SW

_DATA_SW = [
    [1.001, 0.999, -3.050],
    [1.005, 0.995, -2.580],
    [1.010, 0.990, -2.330],
    [1.050, 0.950, -1.640],
    [1.110, 0.900, -1.280],
    [1.250, 0.800, -0.840],
    [1.330, 0.750, -0.670],
    [1.430, 0.700, -0.520],
    [1.670, 0.600, -0.250],
    [2.000, 0.500, 0.000],
    [2.500, 0.400, 0.250],
    [3.330, 0.300, 0.520],
    [4.000, 0.250, 0.670],
    [5.000, 0.200, 0.840],
    [10.000, 0.100, 1.280],
    [20.000, 0.050, 1.640],
    [50.000, 0.200, 2.050],
    [100.000, 0.010, 2.330],
    [200.000, 0.005, 2.580],
    [500.000, 0.002, 2.880],
    [1000.000, 0.001, 3.090],
]

_COL_SW = ['periode_ulang', 'peluang', 'k']

t_normal_sw = pd.DataFrame(
    data=_DATA_SW, columns=_COL_SW
)
t_normal_sw

Out[3]:

	periode_ulang	peluang	k
0	1.001	0.999	-3.05
1	1.005	0.995	-2.58
2	1.010	0.990	-2.33
3	1.050	0.950	-1.64
4	1.110	0.900	-1.28
5	1.250	0.800	-0.84
6	1.330	0.750	-0.67
7	1.430	0.700	-0.52
8	1.670	0.600	-0.25
9	2.000	0.500	0.00
10	2.500	0.400	0.25
11	3.330	0.300	0.52
12	4.000	0.250	0.67
13	5.000	0.200	0.84
14	10.000	0.100	1.28
15	20.000	0.050	1.64
16	50.000	0.200	2.05
17	100.000	0.010	2.33
18	200.000	0.005	2.58
19	500.000	0.002	2.88
20	1000.000	0.001	3.09

KODE¶

In [4]:

def _find_k_in_table(return_period, table):
    x = table.periode_ulang
    y = table.k
    return np.interp(return_period, x, y)

def _calc_prob_in_table(k, table):
    x = table.k
    y = table.peluang
    return np.interp(k, x, y)

In [5]:

def find_K(return_period, source='scipy'):
    if source.lower() == 'soewarno':
        return _find_k_in_table(return_period, t_normal_sw)
    elif source.lower() == 'scipy':
        return_period = np.array(return_period)
        return stats.norm.ppf(1 - 1/return_period)

def calc_x_lognormal(x, return_period=[5], source='scipy', show_stat=False):
    return_period = np.array(return_period)
    y = np.log10(x)
    y_mean = np.mean(y)
    y_std = np.std(y, ddof=1)
    n = len(y)

    k = find_K(return_period, source=source)

    if show_stat:
        print(f'y_mean = {y_mean:.5f}')
        print(f'y_std = {y_std:.5f}')
        print(f'k = {k}')

    val_y = y_mean + k * y_std
    val_x = np.power(10, val_y)
    return val_x

def freq_lognormal(
    df, col=None,
    return_period=[2, 5, 10, 20, 25, 50, 100], show_stat=False, source='scipy',
    col_name='Log Normal', index_name='Kala Ulang'):

    col = df.columns[0] if col is None else col

    x = df[col].copy()

    arr = calc_x_lognormal(
        x, return_period=return_period, show_stat=show_stat, source=source)

    result = pd.DataFrame(
        data=arr, index=return_period, columns=[col_name]
    )

    result.index.name = index_name
    return result

def calc_prob(k, source='scipy'):
    if source.lower() == 'soewarno':
        k = np.array(k)
        return 1 - _calc_prob_in_table(k, t_normal_sw)
    elif source.lower() == 'scipy':
        return stats.norm.cdf(k)

FUNGSI¶

Fungsi `find_K(return_period, ...)`¶

Function: find_K(return_period, source='scipy')

Fungsi find_K(...) digunakan untuk mencari nilai $K$ dari berbagai sumber berdasarkan kala ulang.

Argumen Posisi:
- return_period: kala ulang. Dapat diisi dengan scalar atau array_like.
Argumen Opsional:
- source: sumber nilai $k$, scipy (default). Sumber yang dapat digunakan antara lain: Soewarno ('soewarno').

In [6]:

find_K(10)

Out[6]:

1.2815515655446004

In [7]:

find_K([2, 5, 10], source='soewarno')

Out[7]:

array([0.  , 0.84, 1.28])

In [8]:

# perbandingan antara masing-masing sumber

_rp = [2, 5, 10, 15, 20, 25, 27, 50, 100]
source_test = ['soewarno', 'scipy']

for _source in source_test:
    print(f'k {_source:10}= {find_K(_rp, source=_source)}')

k soewarno  = [0.         0.84       1.28       1.46       1.64       1.70833333
 1.73566667 2.05       2.33      ]
k scipy     = [0.         0.84162123 1.28155157 1.50108595 1.64485363 1.75068607
 1.78615556 2.05374891 2.32634787]

Fungsi `calc_x_lognormal(x, ...)`¶

Function: calc_x_lognormal(x, return_period=[5], source='scipy', show_stat=False)

Fungsi calc_x_lognormal(...) digunakan untuk mencari besar $X$ berdasarkan kala ulang (return period), yang hasilnya dalam bentuk numpy.array.

Argumen Posisi:
- x: array.
Argumen Opsional:
- return_period: kala ulang (tahun), [5] (default).
- source: sumber nilai $k$, 'scipy' (default). Sumber yang dapat digunakan antara lain: Soewarno ('soewarno').
- show_stat: menampilkan parameter statistik. False (default).

In [9]:

calc_x_lognormal(data.debit)

Out[9]:

array([37.23254445])

In [10]:

calc_x_lognormal(data.debit, show_stat=True)

y_mean = 1.42638
y_std = 0.17174
k = [0.84162123]

Out[10]:

array([37.23254445])

In [11]:

calc_x_lognormal(data.debit, return_period=[5, 10, 15, 20, 21], show_stat=True)

y_mean = 1.42638
y_std = 0.17174
k = [0.84162123 1.28155157 1.50108595 1.64485363 1.66839119]

Out[11]:

array([37.23254445, 44.30748222, 48.32593502, 51.15300628, 51.63135738])

Fungsi `freq_lognormal(df, ...)`¶

Function: freq_lognormal(df, col=None, return_period=[2, 5, 10, 20, 25, 50, 100], show_stat=False, source='scipy', col_name='Log Normal')

Fungsi freq_lognormal(...) merupakan fungsi kembangan lebih lanjut dari calc_x_lognormal(...) yang menerima input pandas.DataFrame dan memiliki luaran berupa pandas.DataFrame.

Argumen Posisi:
- df: pandas.DataFrame.
Argumen Opsional:
- col: nama kolom, None (default). Jika tidak diisi menggunakan kolom pertama dalam df sebagai data masukan.
- return_period: kala ulang (tahun), [2, 5, 10, 20, 25, 50, 100] (default).
- source: sumber nilai $k$, 'scipy' (default). Sumber yang dapat digunakan antara lain: Soewarno ('soewarno').
- show_stat: menampilkan parameter statistik. False (default).
- col_name: nama kolom luaran, Log Normal (default).

In [12]:

freq_lognormal(data)

Out[12]:

	Log Normal
Kala Ulang
2	26.692012
5	37.232544
10	44.307482
20	51.153006
25	53.339262
50	60.130596
100	66.974904

In [13]:

freq_lognormal(data, source='soewarno', col_name='Log Normal (Soewarno)')

Out[13]:

	Log Normal (Soewarno)
Kala Ulang
2	26.692012
5	37.208682
10	44.280305
20	51.054919
25	52.453355
50	60.041518
100	67.071701

In [14]:

freq_lognormal(data, 'debit', source='scipy', col_name=f'Log Normal (scipy)', show_stat=True)

y_mean = 1.42638
y_std = 0.17174
k = [0.         0.84162123 1.28155157 1.64485363 1.75068607 2.05374891
 2.32634787]

Out[14]:

	Log Normal (scipy)
Kala Ulang
2	26.692012
5	37.232544
10	44.307482
20	51.153006
25	53.339262
50	60.130596
100	66.974904

In [15]:

_res = []

for _s in ['soewarno', 'scipy']:
    _res += [freq_lognormal(data, 'debit', source=_s, col_name=f'Log Normal ({_s})')]

pd.concat(_res, axis=1)

Out[15]:

	Log Normal (soewarno)	Log Normal (scipy)
Kala Ulang
2	26.692012	26.692012
5	37.208682	37.232544
10	44.280305	44.307482
20	51.054919	51.153006
25	52.453355	53.339262
50	60.041518	60.130596
100	67.071701	66.974904

Fungsi `calc_prob(k, ...)`¶

Function: calc_prob(k, source='scipy')

Fungsi calc_prob(...) digunakan untuk mencari nilai peluang/probabilitas dari berbagai sumber berdasarkan nilai $K$.

Argumen Posisi:
- k: $K$ (faktor frekuensi).
Argumen Opsional:
- source: sumber nilai peluang, scipy (default). Sumber yang dapat digunakan antara lain: Soewarno ('soewarno').

Catatan: Fungsi ini sama saja dengan yang di modul hk172 atau #172 (Anfrek: Normal).

In [16]:

calc_prob(-0.25)

Out[16]:

0.4012936743170763

In [17]:

calc_prob(0.52, source='soewarno')

Out[17]:

0.7

In [18]:

# perbandingan antara masing-masing sumber

_k = [
    -3.09, -2.58, -2.33, -1.67,  0.  ,  0.84,  1.28,  1.5 ,  1.64,
    1.75,  1.79,  2.05,  2.33
]

source_test = ['soewarno', 'scipy']

for _source in source_test:
    print(f'prob {_source:10}= {calc_prob(_k, source=_source)}')

prob soewarno  = [0.001      0.005      0.01       0.04826087 0.5        0.8
 0.9        0.93055556 0.95       0.9097561  0.89512195 0.8
 0.99      ]
prob scipy     = [0.00100078 0.00494002 0.00990308 0.04745968 0.5        0.79954581
 0.89972743 0.9331928  0.94949742 0.95994084 0.96327304 0.97981778
 0.99009692]

Changelog¶

- 20220323 - 1.1.0 - tambah argumen index_name="Kala Ulang" pada fungsi freq_lognormal() untuk penamaan index
- 20220316 - 1.0.3 - ubah fungsi _calc_prob(...) (hasil menjadi 1-P)
- 20220315 - 1.0.2 - ubah nama fungsi find_prob -> calc_prob(...)
- 20220314 - 1.0.1 - Tambah Fungsi find_prob(...)
- 20220310 - 1.0.0 - Initial

Source code in this notebook is licensed under a MIT License. Data in this notebook is licensed under a Creative Common Attribution 4.0 International.

PERSIAPAN DAN DATASET¶

TABEL¶

KODE¶

FUNGSI¶

Fungsi find_K(return_period, ...)¶

Fungsi calc_x_lognormal(x, ...)¶

Fungsi freq_lognormal(df, ...)¶

Fungsi calc_prob(k, ...)¶

Changelog¶

Copyright © 2022 Taruma Sakti Megariansyah¶

Fungsi `find_K(return_period, ...)`¶

Fungsi `calc_x_lognormal(x, ...)`¶

Fungsi `freq_lognormal(df, ...)`¶

Fungsi `calc_prob(k, ...)`¶

Copyright © 2022 Taruma Sakti Megariansyah ¶