{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Probability: Mathematical/Theoretical and Computational Approaches\n",
"\n",
"*Dan L. Nicolae* \n",
"\n",
"We will illustrate throughout this textbook how some problems can be solved either mathematically, or computationally (using simulations), or through a combination of mathematical/analytical and computational approaches. Each approach has its strengths and, in this chapter, we use the calculation versus estimation of probabilities to highlight some of them.\n",
"\n",
"We illustrate the concepts we want to introduce with a classic probability exercise called **the birthday problem**. Suppose you and a friend go to a party where there are 30 people (all unknown to both of you) and your friend wants to bet you that there are two people at that party who share their birthday. Would you be willing to take that bet? \n",
"\n",
"Your willingness to take the bet should be related to the chance of winning the bet. What do you think it is more likely to happen: finding a pair with shared birthdays or having 30 distinct birthdays? \n",
"\n",
"We will answer this question using the language of probability; we will calculate the probability of the event that at least two people share birthdays in a group of 30 people. In the following sections, we will introduce the rules we need for deriving this probability and then show how to estimate them using simulations.\n",
"\n",
"Let's start with a simpler problem: what is the probability that two people share their birthday. Can you justify the following result?\n",
"\n",
"$$P(\\mbox{two random people have the same birthday}) ~=~ \\frac{1}{365}$$\n",
"\n",
"Think about the assumptions you implicitly or explicitly made in your justification. \n",
"\n",
"We will show in the next section that, given a number of people, **n** (with $2\\leq n\\leq 365$), the probability, $P_n$, that at least two share a birthday is given by:\n",
"\n",
"$$P_n ~=~ 1-\\frac{365\\times364\\times ...\\times (365-n+1)}{365^n}$$\n",
"\n",
"which can also be written as\n",
"\n",
"$$P_n ~=~ 1-\\frac{365}{365}\\times\\frac{364}{365}\\times\\frac{363}{365}\\times ...\\times \\frac{(365-n+1)}{365}$$\n",
"\n",
"The asssumptions used to obtain the above formula for $P_n$ are:\n",
"\n",
"a. 365 days in a year;\n",
"\n",
"b. All days are equally likely to be a birthday;\n",
"\n",
"c. Subjects' birthdays are independent of each other (for example, no twins in the room).\n",
"\n",
"The function below calculates these probabilities. Note that, for computational reasons, we implement the second formula for $P_n$: consider how large $365^n$ is for n= 30 or 50."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# a function that calculates the probability for 1\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"

Number of people

Probability

0

2

0.0027

1

3

0.0082

2

4

0.0164

3

5

0.0271

4

6

0.0405

5

7

0.0562

6

8

0.0743

\n",
""
],
"text/plain": [
" Number of people Probability\n",
"0 2 0.0027\n",
"1 3 0.0082\n",
"2 4 0.0164\n",
"3 5 0.0271\n",
"4 6 0.0405\n",
"5 7 0.0562\n",
"6 8 0.0743"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Construct a data frame with the probabilities for a range of n's\n",
"number_people=np.arange(2,61,1)\n",
"b_probs= np.array([]) # an empty array\n",
"\n",
"for i in number_people: \n",
" b_probs= np.append(b_probs,birthday_prob(i))\n",
"\n",
"Birthday_df=pd.DataFrame(\n",
" {\"Number of people\":number_people,\n",
" \"Probability\":b_probs})\n",
"Birthday_df.head(7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We construct below a line graph of these probabilities that shows the trend. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"