# Benford's Law¶

## Purpose¶

To take an iterable object (assumed to contain numbers) and plot the frequency of their leading digits. Based on Benford's Law (also called the first-digit law), if it is a "natural dataset," we should see the following distribution of leading digits:

d P(d)
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%

## Application¶

In data science, this pattern is used to detect fraud, primarily for taxes purposes. It can also be used to detect deepfakes or altered images.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
world = pd.read_csv('world_population_data.csv')

Out[2]:
Country Population_2020 Yearly_Change Net_Change Density Land_Area Migrants Fert_Rate Med_Age Urban_Pop World_Share
0 China 1,439,323,776 0.39% 5,540,090 153 9,388,211 -348,399 1.7 38 61% 18.47%
1 India 1,380,004,385 0.99% 13,586,631 464 2,973,190 -532,687 2.2 28 35% 17.70%
2 United States 331,002,651 0.59% 1,937,734 36 9,147,420 954,806 1.8 38 83% 4.25%
3 Indonesia 273,523,615 1.07% 2,898,047 151 1,811,570 -98,955 2.3 30 56% 3.51%
4 Pakistan 220,892,340 2.00% 4,327,022 287 770,880 -233,379 3.6 23 35% 2.83%
In [3]:
def digit_widget(list):
number_stash = []
for num in list:
continue
continue
number_stash = sorted(number_stash)
fig, ax = plt.subplots()
ax.set_yticks([0.10, 0.20, 0.30])
plt.hist(number_stash, bins=9, density=True)
return plt.show()

In [4]:
digit_widget(world['Population_2020'])

In [5]:
digit_widget(world['Migrants'])

In [6]:
digit_widget(world['Net_Change'])