Project: Mobile App for Lottery Addiction

Introduction :

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

  • What is the probability of winning the big prize with a single ticket?
  • What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
  • What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018.

The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

Main Functions

Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

  • A function that calculates factorials; and
  • A function that calculates combinations.
In [1]:
def factorial(n):
    res = n
    # Base case: 0! = 1
    if n == 0:
        return 1
    # Recursive case: n! = n * (n-1)!
    else:
        return res * factorial(n-1)

    
def combinations(n,k): 
    permutations = factorial(n)/factorial(n-k)
    combinations = permutations / factorial(k)
    return combinations

One ticket probability

We need to write a function, which takes a six unique numbers and prints the probability of winning the big prize.

The engineer team told us that we need to be aware of the following details when we write the function:

  • Inside the app, the user inputs six different numbers from 1 to 49.
  • Under the hood, the six numbers will come as a Python list and serve as an input to our function.
  • The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

Check ticket validity

We are going to write an interactive function to make sure that the user inputs six different numbers from 1 to 49.

We will use the 'try/exception' block in combination with the 'while loop' to allow users to try multiple times until the input satisfies the condition and to make sure that this input does not lead to modes of failure such as when the user enters a non-integer number.

The function will print messages with respect to what the user inputs and it will serve as an input to the function 'one_ticket_probability'.

In [2]:
def check_validity():
    print("Please enter your 6 ticket numbers: ")
    print("*********************************") # output delimiter
    numbers=[]
    while len(numbers) < 6:
        try:
            number=input('Enter ticket number {}: '.format(len(numbers)+1))
            print("*********************************")
            if int(number) in range(1,50) and int(number) not in numbers:
                numbers.append(int(number))
            else:
                if int(number) not in range(1,50):
                    print("The number must be in the range from 1 to 49.")
                    print("*********************************")
                else:
                    print("The number exists already.")
                    print("*********************************")
        except :
            print("The input is not valid.")
            print("*********************************")
    return numbers
In [3]:
check_validity()
Please enter your 6 ticket numbers: 
*********************************
Enter ticket number 1: 1
*********************************
Enter ticket number 2: 2
*********************************
Enter ticket number 3: 3
*********************************
Enter ticket number 4: 4
*********************************
Enter ticket number 5: 5
*********************************
Enter ticket number 6: 6
*********************************
Out[3]:
[1, 2, 3, 4, 5, 6]

Below, we write the one_ticket_probability function, which takes the output of the function check_validity as an input and prints the probability of winning in percentage.

In [4]:
def one_ticket_probability():
    ticket = check_validity()
    possible_outcomes = combinations(49,6)
    successful_outcomes = 1
    chances = successful_outcomes * 100 / possible_outcomes
    print("Your chances to win the big prize is {:.8f}%. In other words, you have a 1 in 13,983,816 chances to win.".format(chances))
    print("*********************************")
    return ticket,chances   
In [5]:
one_ticket_probability()
Please enter your 6 ticket numbers: 
*********************************
Enter ticket number 1: 10
*********************************
Enter ticket number 2: 11
*********************************
Enter ticket number 3: 12
*********************************
Enter ticket number 4: 15
*********************************
Enter ticket number 5: 45
*********************************
Enter ticket number 6: 49
*********************************
Your chances to win the big prize is 0.00000715%. In other words, you have a 1 in 13,983,816 chances to win.
*********************************
Out[5]:
([10, 11, 12, 15, 45, 49], 7.151123842018516e-06)

Exploring the Canada lottery data set

The institute also wants us to consider the data coming from the national 6/49 lottery game in Canada. The data set contains historical data for 3,665 drawings, dating from 1982 to 2018 (the data set can be downloaded from here).

In [6]:
import pandas as pd
import numpy as np
data = pd.read_csv('649.csv')
print('The data set contains {} rows and {} columns.'.format(data.shape[0],data.shape[1]))
The data set contains 3665 rows and 11 columns.
In [7]:
data.head(5)
Out[7]:
PRODUCT DRAW NUMBER SEQUENCE NUMBER DRAW DATE NUMBER DRAWN 1 NUMBER DRAWN 2 NUMBER DRAWN 3 NUMBER DRAWN 4 NUMBER DRAWN 5 NUMBER DRAWN 6 BONUS NUMBER
0 649 1 0 6/12/1982 3 11 12 14 41 43 13
1 649 2 0 6/19/1982 8 33 36 37 39 41 9
2 649 3 0 6/26/1982 1 6 23 24 27 39 34
3 649 4 0 7/3/1982 3 9 10 13 20 43 34
4 649 5 0 7/10/1982 5 14 21 31 34 47 45
In [8]:
data.tail(5)
Out[8]:
PRODUCT DRAW NUMBER SEQUENCE NUMBER DRAW DATE NUMBER DRAWN 1 NUMBER DRAWN 2 NUMBER DRAWN 3 NUMBER DRAWN 4 NUMBER DRAWN 5 NUMBER DRAWN 6 BONUS NUMBER
3660 649 3587 0 6/6/2018 10 15 23 38 40 41 35
3661 649 3588 0 6/9/2018 19 25 31 36 46 47 26
3662 649 3589 0 6/13/2018 6 22 24 31 32 34 16
3663 649 3590 0 6/16/2018 2 15 21 31 38 49 8
3664 649 3591 0 6/20/2018 14 24 31 35 37 48 17
In [9]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB

Function for Historical Data Check

We're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

The engineering team told us that we need to be aware of the following details:

  • Inside the app, the user inputs six different numbers from 1 to 49.
  • Under the hood, the six numbers will come as a Python list and serve as an input to our function.

The engineering team wants us to write a function that prints:

  • the number of times the combination selected occurred in the Canada data set; and
  • the probability of winning the big prize in the next drawing with that combination.

We need first to extract all the winning six numbers from the historical data set, for that purpose we are going to write a function named extract_numbers that takes as input a row of the lottery dataframe and returns a set containing all the six winning numbers then we will use extract_numbers in combination with the DataFrame.apply() method to extract all the winning numbers.

In [10]:
def extract_numbers(row):
    winning_six = set()
    for i in range(4,10):
        winning_six.add(row.iloc[i])
    return winning_six  
In [11]:
data['winning_six'] = data.apply(extract_numbers,axis=1)
data.winning_six.head()
Out[11]:
0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
Name: winning_six, dtype: object

Below, we write the check_historical_occurrence function that takes in the output of the function one_ticket_probability (wich contains already the user numbers) and the historical numbers an inputs and prints information with respect to the number of occurrences and the probability of winning in the next drawing.

In [12]:
def check_historical_occurence(winning_six=data.winning_six):
    ticket=one_ticket_probability()
    numbers = set(ticket[0])
    matches = data.winning_six == numbers
    print("The combination {} has occurred {} time(s) previously.".format(numbers,matches.sum()))
    print("*********************************")
    if matches.sum()==0:
        print("That combination has never occured. This doesn't mean it's more likely to occur now.\nYour chances to win the big prize in the next drawing are still 0.0000072%.")
    else:
        print('Your chances to win the big prize in the next drawing with that combination are still {:.8f} %.'.format(ticket[1]))
    print("*********************************")
In [13]:
check_historical_occurence()
Please enter your 6 ticket numbers: 
*********************************
Enter ticket number 1: 34
*********************************
Enter ticket number 2: 5
*********************************
Enter ticket number 3: 14
*********************************
Enter ticket number 4: 47
*********************************
Enter ticket number 5: 21
*********************************
Enter ticket number 6: 31
*********************************
Your chances to win the big prize is 0.00000715%. In other words, you have a 1 in 13,983,816 chances to win.
*********************************
The combination {34, 5, 14, 47, 21, 31} has occurred 1 time(s) previously.
*********************************
Your chances to win the big prize in the next drawing with that combination are still 0.00000715 %.
*********************************

Multi ticket probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning so we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

  • The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
  • Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
  • The function should print information about the probability of winning the big prize depending on the number of different tickets played.

We are going to write a function named multi_ticket_probability that prints the probability of winning the big prize depending on the number of different tickets played.

In [14]:
def multi_ticket_probability():
    possible_outcomes = combinations(49,6)
    while True:
        n=input('How many different tickets are you going to play: ')
        print('*********************************')
        try:
            if int(n) in range(1,13983817):
                chances = int(n)*100 / possible_outcomes
                print("Your chances to win the big prize by playing {} ticket(s) are {:.10f} %.".format(n,chances))
                print('*********************************')
                break
            else:
                print('Please enter a valid and reasonable number of tickets.')
                print('*********************************')
        except:
            print('Invalid number of tickets')
            print('*********************************')
In [15]:
multi_ticket_probability()
How many different tickets are you going to play: 910
*********************************
Your chances to win the big prize by playing 910 ticket(s) are 0.0065075227 %.
*********************************

Less winning numbers probability

In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

  • Inside the app, the user inputs:
    • six different numbers from 1 to 49; and
    • an integer between 2 and 5 that represents the number of winning numbers expected
  • Our function prints information about the probability of having the inputted number of winning numbers.
In [16]:
def probability_less_6():
    print("Calculate the probability of having two, three, four or five winning numbers.")
    print('*********************************')
    while True:  
        n = input('Enter a number between 2 and 5: ')
        print('*********************************')
        try : 
            if int(n) in [2,3,4,5]:
                combination_ticket = combinations(6,int(n)) 
                successful_outcomes = combination_ticket * combinations(49-6,6-int(n))
                possible_outcomes = combinations(49,6)
                proba_less_6 = successful_outcomes*100/possible_outcomes
                print('Your chances to have exactly {} winning numbers are {} %'.format(n,'{:.10f}'.format(proba_less_6)))
                print('*********************************')
                break
            else:
                print('This number is out of range.')
                print('*********************************')
        except : 
            print('Invalid input')
            print('*********************************')
In [17]:
probability_less_6()
Calculate the probability of having two, three, four or five winning numbers.
*********************************
Enter a number between 2 and 5: 910
*********************************
This number is out of range.
*********************************
Enter a number between 2 and 5: 5
*********************************
Your chance to have exactly 5 winning numbers is 0.0018449900 %
*********************************

1,2,3,4 or 5 winning numbers

We will create a function similar to probability_less_6 which calculates the probability of having at least two, three, four or five winning numbers. For instance the the probability of having at least four winning numbers is the sum of these three probabilities:

  • The probability for having four winning numbers exactly
  • The probability having five winning numbers exactly
  • The probability for having six winning numbers exactly
In [18]:
def probability_at_least():
    print("Calculate the probability of at least two, three, four or five winning numbers.")
    print('*********************************')
    while True:  
        n = input('Enter a number between 2 and 5: ')
        print('*********************************')
        try : 
            if int(n) in [2,3,4,5]:
                possible_outcomes = combinations(49,6)
                chances_at_least=[1/possible_outcomes]
                for i in range(int(n),1,-1):
                    ticket_combinations = combinations(6,i)
                    successful_outcomes = ticket_combinations * combinations(43,6-i)
                    at_least = successful_outcomes *100 / possible_outcomes
                    chances_at_least.append(at_least)
                print('Your chances to have at least {} winning numbers are {} %'.format(n,sum(chances_at_least)))
                print('*********************************')
                break
            else:
                print('This number is out of range.')
                print('*********************************')
        except : 
            print('Invalid number')
            print('*********************************')
In [19]:
probability_at_least()
Calculate the probability of at least two, three, four or five winning numbers.
*********************************
Enter a number between 2 and 5: 44
*********************************
This number is out of range.
*********************************
Enter a number between 2 and 5: 0
*********************************
This number is out of range.
*********************************
Enter a number between 2 and 5: 5
*********************************
Your chance to have exactly 5 winning numbers is 15.101550320742207 %
*********************************

Conclusion

For the first version of the app, we coded four main functions using interactive inputs:

  • one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
  • check_historical_occurrence() — prints information on probability and historical occurrence in the Canada lottery data set at the same time
  • multi_ticket_probability() — calculates the probability for any number of tickets between 1 and 13,983,816
  • probability_less_6() — calculates the probability of having two, three, four or five winning numbers exactly
  • probability_at_least()— calculates the probability of having at least two, three, four or five winning numbers