Probability: Definitions and Rules

Probability: Definitions and Rules#

The intention here is not to have a comprehensive introduction to probability, but just to provide a reminder of the basic definitions and rules. Every statistics textbook has a chapter on probability that is more complete than this section. We encourage the readers who have not encountered the concept of probability to find a good introductory chapter, and we offer a suggestion/reference at the end of this section.

We start with some basic definitions illustrated on three examples:

Random phenomenon: where individual outcomes are uncertain; for example:

Roll a die and record the outcome. We do not know before rolling the die what the outcome will be.
The number of boys in 100 births in a Chicago hospital; outcome is uncertain as we do not know if the number of boys will be 50, or 40 or something else.
The set of birthdays in a group of 30 people.

Sample space (denote by S): the set of all possible outcomes of a phenomenon; in the above examples:

S is the set of integers from 1 to 6: S=\(\{1,2,3,4,5,6\}\)
S is the set of integers from 0 to 100 (possible outcomes for the number of boys are 0, 1, 2 , …, 100).
S is the set of all possible combinations of 30 birthdates.

Event (denoted by A or B here): An outcome or a set of outcomes of a random phenomenon; for example:

Rolling an even number: A=\(\{2,4,6\}\).
A is the event that less than half of the babies are boys. A is the set of integers from 0 to 49.
Having at least two people sharing birthdays.

Mutually exclusive events: Events \(A\) and \(B\) are mutually exclusive (or disjoint) if they have no outcomes in common. Examples:

A is as above (rolling an even number) and B is rolling a 3.
A is as above (less than half of the babies are boys) and B is the event that the number of boys is between 60 and 70.
A is as above (t least two people share birthdays) and B is the event that there is a birthday to celebrate for every day in the month of March.

Complement of an event: The complement of an event \(A\) is the event that \(A\) does not occur, denoted by \(A^C\).

For the events \(A\) defined above:

\(A\) is as above (rolling an even number): \(A^C\) is rolling an odd number: \(A^C=\{1,3,5\}\)
\(A\) is as above (less than half of the babies are boys): \(A^C\) is the event that more than half of the babies are boys, or the set of integers from 50 to 100.
\(A\) is as above (t least two people share birthdays): \(A^C\) is the event when there are no shared birthdays.

Compound events: Events built from combinations of other events; for example, union and intersection.

Union: (\(A\) or \(B\)) = (\(A\cup B\)): set of all outcomes in \(A\), or in \(B\), or in both.

Intersection: (\(A\) and \(B\)) = (\(A\cap B\)): set of all outcomes that are in \(A\) and in \(B\).

Definition of Probability#

Probabilities describe how likely events are and so probability models consist of:

A list of possible outcomes (sample space)
An assignment of probabilities \(P\) for each possible outcome

The frequentist interpretation of the probability of an event \(A\), \(\mbox{P}(A)\), is the long run relative frequency of the event \(A\). Suppose you are interested in the probability of “Heads” when tossing a coin. In this frequentist interpretation, probability is given by the limit of the relative frequency of “Heads” when tossing the coin repeatedly. Note that while you can imagine repeating the coin toss for a large number of times (and some people have done it!), there are other events where the intuition behind frequentists probabilities are not as evident. For example, what is the probability that it will rain next Sunday? This where the Bayesian interpretation of probability - based on a subjective degree of belief - is more natural. In the Bayesian world, two people could have different viewpoints and assign different probabilities.

Note that the rules below are universal.

Basic Probability Rules#

Given a sample space S and events \(A, B \subseteq S\), we have:

\(0 \le \mbox{P}(A) \le 1\)
\(\mbox{P}(S) = 1\)
\(\mbox{P}(A^C) = 1 - \mbox{P}(A)\)
\(\mbox{P}(A \cup B) = \mbox{P}(A) + \mbox{P}(B) - \mbox{P}(A \cap B)\)
Equally likely outcomes:

\[P(A)=\frac{\mbox{ Number of outcomes in A}}{\mbox{ Total number of outcomes}}\]

The last rule refers to situations where all outcomes of an experiment are equally likely (for example, roll a fair die).

Conditional Probability#

If \(\mbox{P}(B) \ne 0\), the conditional probability of event \(A\) given \(B\) has occurred, denoted by \(\mbox{P}(A|B)\), is defined by,

\[ \mbox{P}(A|B) = \frac{\mbox{P}(A \mbox{ and } B)}{\mbox{P}(B)}\]

../../_images/conditionalprobability.png

Example:

Select one subject at random in US;
A is the event that the subject read a book last week;
B is the event that the subject is a college student;
Consider P(A|B) versus P(A): the fraction of college students who read a book last week is likely different than the fraction of US population who did that.

Multiplication rule: \(\mbox{P}(A \mbox{ and } B) = \mbox{P}(A|B) \mbox{P}(B)\). Note that this follows directly from the definition of conditional probability.

Independence#

Events \(A\) and \(B\) are called independent if \(\mbox{P}(A|B) = \mbox{P}(A)\) (or equivalently, \(\mbox{P}(B|A) = \mbox{P}(B)\))

Equivalent condition for independence:

\[\mbox{P}(A \mbox{ and } B) = \mbox{P}(A) \mbox{P}(B)\]

Bayes’ Theorem#

The following property follows directly from the definition of conditional independence and the multiplication rule:

\(\mbox{P}(A|B) = \frac{\mbox{P}(B|A) \mbox{P}(A)}{\mbox{P}(B)}\)

This is one of the most important rules in statistics and data science because it describes statistical learning, and provides a way to update a belief (probability) given additional evidence (data).

The solution to the birthday problem#

We will use the equally likely outcomes formula from the Basic Probability Rules above. Note that, for \(n\) random subjects, the total number of outcomes (number of possible combination of birthdays) is

\[365^n.\]

The number of outcomes that lead to a set of distinct birthdays is \(365\times364\times ...\times (365-n+1)\) and the intuition comes from the way we can count the total number of distinct birthdays as follows:

suppose you look at people sequentially;
first person can have any of the 365 birthdays without leading to matched birthdays;
the second can have any of birthdays except the one of the first person: so 364 possibilities;
the \(n\)-th person can have any of birthdays except any of the (n-1) different birthdays of the other people: so (365-n+1) possibilities.

So the probability of having \(n\) distinct birthdays is:

\[\frac{365\times364\times ...\times (365-n+1)}{365^n}\]

The complement of this event is the event of interest (at least two people share birthdays) and so the probability of interest is:

\[P_n ~=~ 1-\frac{365\times364\times ...\times (365-n+1)}{365^n}\]

Reference.

OpenIntro Statistics (Chapter 3 on Probability). Available for download at https://www.openintro.org/book/os/.