1.3. Conditional Probability
Patial information
Conditional probability provides us with a way to reason about the outcome of an experiment, based on patial information.
- Given an experiment, a corresponding sample space, and a probability law, suppose that we know that the outcome is within some given event B.
- We wish to quantify the likelihood that the outcome also belongs to some other given event A.
- conditional probability of A given B, denoted by \(P(A|B)\)
An appropriate definition of conditional probability when all outcomes are equally likely is given by
$$P(A|B) = \frac{\text{number of elements of} \ A\cap B}{\text{number of elements of} \ B}.$$
Generalized definition of conditional probability
$$ P(A|B)=\frac{P(A\cap B)}{P(B)},$$
where we assume that \(P(B) > 0\); the conditional probability is undefined if the conditioning event has zero probability.
Conditional Probabilities Specify a Probability Law
For a fixed event B, it can be verified that the conditional probabilities \(P(A|B)\) form a legitimate probability law that satisfies the three axioms.
- Indeed, non-negativity is clear.
- The normalization axiom is also satisfied: $$ P(\Omega |B)=\frac{P(\Omega \cap B)}{P(B)} = \frac{P(B)}{P(B)} = 1$$
- The additivity axiom: we write for any two disjoint events \(A_{1}\) and \(A_{2}\),
$$P(A_{1}\cup A_{2}|B)
=\frac{P((A_{1}\cup A_{2})\cap B)}{P(B)}
=\frac{P((A_{1}\cap B)\cup (A_{2}\cap B))}{P(B)}
=\frac{P(A_{1}\cap B)+ P(A_{2}\cap B)}{P(B)}
=\frac{P(A_{1}\cap B)}{P(B)}+\frac{P(A_{2}\cap B)}{P(B)}
=P(A_{1}|B)+P(A_{2}|B)$$
where for the third equality, we used the fact that \(A_{1}\cap B\) and \(A_{2}\cap B\) are disjoint sets, and the additivity axiom for the probability law.
All general probperties of probability laws remain valid.
\(P(A \cup B) \leq P(A) + P(B).\)
\(P(A \cup C | B) \leq P(A | B) + P(C|B).\)
Properties of Conditional Probability
- The conditional probability of an event \(A\), given an event \(B\) with \(P(B)>0\), is defined by $$P(A|B) = \frac{P(A \cap B)}{P(B)},$$ and specifies a new (conditional) probability law on the same sample space \(\Omega\).
- In particular, all properties of probability laws remain valid for conditional probability laws.
- Conditional probabilities can also be viewd as a probability law on a new universe B, because all of the conditional probability is concerntrated on B.
- If the possible outcomes are finitely many and equally likely, then $$P(A|B) = \frac{\text{number of elements of} \ A \cap B}{\text{number of elements of} \ B}.$$
Three examples are given below.
Ex 1.6: toss a fair coin
Experiment: toss a fair coin three successive times.
To find: \(P(A|B)\)
$$A = \text{{ more heads than tails come up}}, B = \text{{1st toss is a head}}.$$
The sample space (8 sequences):
$$\Omega = \text{{ HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}}$$
which we assume to be equally likely.
The event \(B = \text{{ HHH, HHT, HTH, HTT}}\), so \(P(B) = \frac{4}{8}\)
The event \(A \cap B = \text{{ HHH, HHT, HTH}} \), so \(P(B) = \frac{3}{8}\)
Thus, the conditional probability \(P(A|B)\) is
$$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{3/8}{4/8} = \frac{3}{4}.$$
+ All possible outcomes are equally likely here, so we can simply divide the number of elements shared by A and B (which is 3) with the number of elements of B (which is 4), the same result 3/4.
Ex 1.7: roll a fair 4-sided die
Experiment:
A fair 4-sided die is rolled twice, and we assume that all 16 possible outcomes are equally likely.
Let X and Y be the result of the 1st and the 2nd roll, respectively.
To find: \(P(A|B)\)
$$A = \text{{ max(X, Y) = m }}, B = \text{{ min(X, Y) = 2 }},$$ and m takes each of the values 1, 2, 3, 4.
The set A = {max(X, Y) = m} shares with B two elements if m = 3 or m = 4. one element if m = 2 and no element if m = 1.
Thus, we have
$$P( \{ \ max(X,Y) = m\} |B) = \begin{cases}
2/5 & \text{ if } \ m=3\ or \ m=4 \\
1/5 & \text{ if } \ m=2 \\
0 & \text{ if } \ m=1
\end{cases}.$$
Ex 1.8
Using Conditional Probability for Modeling
When constructing probabilistic models for experiments that have a sequential character, it is often natural and convenient to first specify conditional probabilities and then use them to determine unconditional probabilities.
The rule \(P(A\cap B)=P(B)P(A|B)\) is often helpful in this process.
Ex 1.9: Rader Detection
Event A: Airplane is flying above
Event B: Something registers on radar screen
- If an aircraft is present in a certain area, a radar detects it and generates an alarm signal with probability 0.99.
- If an aircraft is not present, the radar generates a (false) alarm with probability 0.10.
- We assume that an aircraft is present sith probability 0.05.
$$A = \text{{an aircraft is present}},$$
$$B = \text{{the radar generates an alarm}}$$
and consider also their complements
$$A = \text{{an aircraft is not present}},$$
$$B = \text{{the radar does not generates an alarm}}$$
Each possible outcome corresponds to a leaf of the tree, and its probability is equal to the product of the probabilities associated with the branches in a path from the root to the corresponding leaf.
$$P( \text{not present, false alarm}) = P(A^c \cap B) = P(A^c)P(B|A^c) = 0.95*0.10 = 0.095,$$
$$P( \text{present, no detection}) = P(A \cap B^c) = P(A)P(B^c|A) = 0.05*0.01 = 0.0005$$
General rule for calculating probabilities with a tree-based sequential description
We have a general rule for calculating various probabilities in conjunction with a tree-based sequential description of an experiment.
- We set up the tree so that an event of interest is associated with a leaf.
We view the occurrence of the event as a sequence of steps, namely, the traversals of the branches along the path from the root to the leaf. - We record the conditional probabilities associated with the branches of the tree.
- We obtain the probability of a leaf by multiplying the probabilities recorded along the corresponding path of the tree.
In mathematical term,
we are dealing with an event A which occurs if and only if each one of several events \(A_1, \cdots , A_n\).
The occurrence of A is viewed as an occurrence of A1, followed by the occurrence of A2, then of A3, etc., and it is visualized as a path with n branches, corresponding to the events \(A_1, \cdots , A_n\). The probability of A is given by the following rule
The intersection event \(A = A_1 \cap A_2 \cap \cdots \cap A_n\) is associated with a particular path on a tree that describes the experiment. We associate the branches of this path with the events \(A_1, \cdots , A_n\), and we record next to the branches the corresponding conditional probabilities.
The final node of the path corresponds to the intersection event A, and its probability is obtained by multiplying the conditional probabilities recorded along the branches of the path
$$P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)P(A_2|A_1) \cdots P(A_n | A_1 \cap A_2 \cap \cdots \cap A_{n-1} ).$$
Note that any intermediated node along the path corresponds to some intersection event.
And its probability is obtained by multiplying the corresponding conditional probabilities up to that node.
Multiplication Rule
Assuming that all of the conditioning events have positive probability, we have $$ P(\bigcap_{i=1}^{n}A_{i}) = P(A_1)P(A_2|A_1)P(A_3|A_1 \cap A_2)\cdots P(A_n|\bigcap_{i=1}^{n-1}A_{i}).$$
The multiplication rule can be verified by writing
$$ P(\bigcap_{i=1}^{n}A_{i}) = P(A_1)\cdot \frac{P(A_1 \cap A_2)}{P(A_1)}\cdot \frac{P(A_1 \cap A_2 \cap A_3)}{P(A_1\cap A_2)}\cdots \frac{P(\bigcap_{i=1}^{n}A_{i})}{P(\bigcap_{i=1}^{n-1}A_{i})}.$$
Ex 1.10: Three cards are drawn from an 52-card deck without replacement.
To find: the probability that none of the three cards is a heart.
Assuming: at each step, each one of the remaining cards is equally likely to be picked.
Define the events
$$A_i = \text{{the ith card is not a heart}, i = 1, 2, 3.}$$
We will calculate \(P(A_1 \cap A_2 \cap A_3)\), the probability that none of the three cards is a heart, using the multiplication rule
$$P(A_1 \cap A_2 \cap A_3) = P(A_1)P(A_2|A_1)P(A_3|A_1 \cap A_2).$$
There are 39 cards that are not hearts in the 52-card deck, so we have
$$P(A_1) = \frac{39}{52}, P(A_2|A_1) = \frac{38}{51}, P(A_3|A_1 \cap A_2) = \frac{37}{50}.$$
These probabilities are recorded along the corresponding branches of the tree describing the sample space. as shown in Fig. 1.11. The desired probability is now obtained by multiplying the probabilities recorded along the corresponding path of the tree:
$$P(A_1 \cap A_2 \cap A_3) = \frac{39}{52} \cdot \frac{38}{51} \cdot \frac{37}{50}.$$
Ex 1.11: Dividing students randomly into 4 groups of 4
A class consisting of 4 graduate and 12 undergraduate students is randomly divided into 4 groups of 4.
To find: the probability that each group includes a graduate student
Let us denote the 4 graduate students by 1, 2, 3, 4, and consider the events
$$A_1 = \text{{ students 1 and 2 are in different groups},}$$
$$A_2 = \text{{ students 1, 2 and 3 are in different groups},}$$
$$A_3 = \text{{ students 1, 2, 3 and 4 are in different groups}.}$$
We will calculate \(P(A_3)\) using the multiplication rule:
$$P(A_3) = P(A_1 \cap A_2 \cap A_3) = P(A_1)P(A_2|A_1)P(A_3|A_1 \cap A_2).$$
We have $$P(A_1) = \frac{12}{15},$$
since there are 12 student slots in groups other than the one of student 1, and there are 15 slots overall, excluding student 1.
$$P(A_2|A_1) = \frac{8}{14},$$
since there are 8 student slots in groups other than those of students 1 and 2, and there are 14 student slots, excluding students 1 and 2.
$$P(A_3|A_1 \cap A_2) = \frac{4}{13},$$
since there are 4 student slots in groups other than those of students 1, 2, and 3 and there are 13 student slots, excluding students 1, 2, and 3.
Thus, the desired probability is $$\frac{12}{15} \cdot \frac{8}{14} \cdot \frac{4}{13}$$.
Ex 1.12: The Monty Hall Problem.
You are told that a prize is equally likely to be found behind any one of three closed doors in front of you.
You point to one of the doors. A friend opens for you one of remaining two doors, after making sure that the prize is not behind it. At this point, you can stick to your initial choice, or switch to the other unopened door.
stick or switch?
- (a) Under the strategy of no switching, your initial choice will determine whether you win or not, and the probability of winning is 1/3 because the prize is equally likely to be behind each door.
- (b) Under the strategy of switching, if the prize is behind the initially chosen door (probability 1/3), you do not win.
If it is not (probability 2/3), and given that another door without a prize has been opened for you, you will get to the winning door once you switch.
- (c) You first point to door 1. If door 2 is opened, you do not switch. If door 3 is opened, you switch.
- Under this strategy, there is insufficient information for determining the probability of winning. The answer depends on the way that your friend chooses which door to open. Let us consider two possibilities.
- Suppose that if the prize is behind door 1, your friend always chooses to open door 2. (If the prize is behind door 2 or 3, your friend has no choice.)
- If the prize is behind door 1, your friend opens door 2, you do not switch, and you win.
If the prize is behind door 2, your friend opens door 3, you switch, and you win.
If the prize is behind door 3. your friend opens door 2. you do not switch, and you lose.
Thus, the probability of winning is 2/3. so strategy (c) in this case is as good as strategy (b).
- If the prize is behind door 1, your friend opens door 2, you do not switch, and you win.
- Suppose now that if the prize is behind door 1. your friend is equally likely to open either door 2 or 3.
- If the prize is behind door 1 (probability 1/3). and if your friend opens door 2 (probability 1/2), you do not switch and you win (probability 1/6).
But if your friend opens door 3, you switch and you lose.
If the prize is behind door 2, your friend opens door 3. you switch, and you win (probability 1/3).
If the prize is behind door 3, your friend opens door 2, you do not switch and you lose.
Thus. the probability of winning is 1 /6 + 1/3 = 1/2, so strategy (c) in this case is inferior to strategy (b).
- If the prize is behind door 1 (probability 1/3). and if your friend opens door 2 (probability 1/2), you do not switch and you win (probability 1/6).
- Suppose that if the prize is behind door 1, your friend always chooses to open door 2. (If the prize is behind door 2 or 3, your friend has no choice.)
- Under this strategy, there is insufficient information for determining the probability of winning. The answer depends on the way that your friend chooses which door to open. Let us consider two possibilities.
References
Monty Hall problem - Wikipedia
Monty Hall problem - Wikipedia
From Wikipedia, the free encyclopedia Probability puzzle In search of a new car, the player picks a door, say 1. The game host then opens one of the other doors, say 3, to reveal a goat and offers to let the player switch from door 1 to door 2. The Monty H
en.wikipedia.org
Bertsekas, D. P., Tsitsiklis, J. N. (2008). Introduction to Probability Second Edition. Athena Scientific.
Lecture 2: Conditioning and Bayes' Rule | Probabilistic Systems Analysis and Applied Probability | Electrical Engineering and Co
This section provides materials for a lecture on conditioning and Bayes' rule. It includes the list of lecture topics, lecture video, lecture slides, readings, recitation problems, and recitation help videos.
ocw.mit.edu