Suppose that you are worried that you might have a rare disease. You decide to get tested, and suppose that the testing methods for this disease are correct 99 percent of the time (in other words, if you have the disease, it shows that you do with 99 percent probability, and if you don’t have the disease, it shows that you do not with 99 percent probability). Suppose this disease is actually quite rare, occurring randomly in the general population in only one of every 10,000 people.
If your test results come back positive, what are your chances that you actually have the disease?
Do you think it is approximately: (a) .99, (b) .90, (c) .10, or (d) .01?
Surprisingly, the answer is (d), less than 1 percent chance that you have the disease!
After discussing the reasons for the surprising probability (below), you should see how changing the parameters affects the outcome. Would the result be so surprising if the disease were more common? How would the probability change if you allow the percentage of false positives and false negatives to be different?
The Math Behind the Fact:
This fact may be deduced using something called Bayes’ theorem, which helps us find the probability of event A given event B, written P(A|B), in terms of the probability of B given A, written P(B|A), and the probabilities of A and B:
P(A|B)=P(A)P(B|A) / P(B)
In this case, event A is the event you have this disease, and event B is the event that you test positive. Thus P(B|not A) is the probability of a “false positive”: that you test positive even though you don’t have the disease.
Here, P(B|A)=.99, P(A)=.0001, and P(B) may be derived by conditioning on whether event A does or does not occur:P(B)=P(B|A)P(A)+P(B|not A)P(not A)or .99*.0001+.01*.9999. Thus the ratio you get from Bayes’ Theorem is less than 1 percent.
The basic reason we get such a surprising result is because the disease is so rare that the number of false positives greatly outnumbers the people who truly have the disease. This can be seen by thinking about what we can expect in 1 million cases. In those million, about 100 will have the disease, and about 99 of those cases will be correctly diagnosed as having it. Otherwise about 999,900 of the million will not have the disease, but of those cases about 9999 of those will be false positives (test results that are positive because of errors). So, if you test positive, then the likelihood that you actually have the disease is about 99/(99+9999), which gives the same fraction as above, approximately .0098 or less than 1 percent!
Note that you can increase this probability by lowering the false positive rate.
Also note that these calculations wouldn’t hold if the disease were not independently and identically distributed throughout the population (e.g., in the case of cancer due to familial tendency, environmental factor, asbestos exposure, etc.).