Volume 7: Bayesian Analysis

Why You Can Never Be Sure of Anything
“Bet you $50 it hasn't.” – xkcd #1132
People don’t like thinking probabilistically. They crave certainty. That rock on the side of the road is definitely a rock. That apple is definitely red. It rained today, so the chances of rain were definitely 100%. The discussion of “what we know” is a philosophical topic, not a mathematical one. Math is useless if we want to know if our entire universe is a single atom in a larger dimension.
Bayesian Inference helps us to describe “what we learned.” In other words, we begin by knowing some things and then we gain information; now we know more. The difference from what we knew before to what we know now is our learning. Bayesian thinking can provide insight as to how much a piece of information should change your knowledge.
But Bayes Theorem is also helpful for traders – or gamblers! Maybe you think the Lions have a 10% chance to win the next Super Bowl; I’ve incorporated more information and think it’s closer to 1%. If my information is better, I can expect to profit from a wager.
Who was Thomas Bayes? What is Bayes Theorem?
How is Frequentist probability different from Bayesian probability? Why do Frequentists and Bayesians fight so much?
How can I use Bayesian Statistics in my everyday life?
Who was Thomas Bayes? What is Bayes Theorem?
The story of the 18th century is the story of rising British world domination. After nearly 100 years of religious conflict and battles between King and Parliament, the Glorious Revolution of 1688 installed a new dynasty, settling the major points of dispute. After this, Britain would face no more major revolutions. The government would change over the next three centuries, but it would happen slowly and peacefully. Without the internal strife that plagued France, Germany, Spain and other European nations, Britain could focus on building the world’s first industrial economy and its finest navy. This would lead to untold wealth due partially to the benefits of colonization.
A small, side benefit of this golden age was the ability of the British Empire(1) to afford a system of country pastors. Coming especially from the later sons of the lesser nobility, these parish priests were supposed to spread across the country, advancing the Church of England. But, for many of them, the life of a country preacher was not so arduous. Fortunately for us, many of their side pursuits were of great cultural and scientific value. Jonathan Swift and Joseph Priestly are two examples of great contributors to art and science whose day job was working for the Church of England(2).
Why do I go on at length about English country parish life in the 18th century(3)? Well, we just don’t know that much about Thomas Bayes so describing his world is the best we can do. We think he might have been born in 1701 but we don’t really know(4). His father was a well-known preacher in the world described above. Thomas spent his early years working with his father before moving to his own parish in Kent (southwest England) around 1735. Today, we only know of two of his publications, and only one of those is about mathematics. I’ll quote from Wikipedia(5) – not for the information, but just to demonstrate how little we know:
“It is speculated that Bayes was elected as a Fellow of the Royal Society in 1742 on the strength of the Introduction to the Doctrine of Fluxions, as he is not known to have published any other mathematical works during his lifetime.
In his later years he took a deep interest in probability. Professor Stephen Stigler, historian of statistical science, thinks that Bayes became interested in the subject while reviewing a work written in 1755 by Thomas Simpson, but George Alfred Barnard thinks he learned mathematics and probability from a book by Abraham de Moivre. Others speculate he was motivated to rebut David Hume's anti-Christian An Enquiry Concerning Human Understanding.”
I count two times that we speculate, two where we “think that,” and once where we don’t know at all. What we do know is that his most lasting work, the eponymous Theorem, only came to light after his death(6).
After that little digression, we can get to the point; here is Bayes Theorem(7) in all its glory:

In this equation, means the probability of X and is the conditional probability of X, assuming Y(8). The letters “A” and “B” can represent anything, from a statement like “that man is six feet tall” to an event like “this coin flip will come up heads.” Remember that probability is the likelihood of something happening or being true; it is expressed as a number between 0 and 1 (or 0% and 100%). Let’s go through a simple example.
Let “B” be the event of me going to the grocery store tomorrow and “A” be the event of me buying eggs. Maybe the probability of going to the store is 50%. We can also say that, without knowing whether I went to the store, the probability of buying eggs is 40%. Let’s also say that if I bought eggs, there is a 90% probability I went to the grocery store(9). Then:

What does this mean? In English, “if I go to the grocery store, the probability that I buy eggs is 72%(10).”
Bayes Theorem itself is a trivial piece of mathematics; it can literally be proven on one side of an index card(11). But as mentioned above, we can use this simple formula to gain a deeper idea of the concept of learning. To demonstrate this, let’s use the example of a drug test, a common application of Bayesian Inference(12).
Let’s say that 1% of the population uses cocaine. You see a person on the street, but know nothing about them; the probability that he or she uses cocaine is therefore 1%. The person walks up and says to you “I took a drug test yesterday and it showed positive for cocaine use.” What is the updated probability that the person uses cocaine?
You are tempted to say “100%” or very close – but this is wrong. Let’s guess that the “false positive” rate on cocaine tests is around 5% and the “false negative” around 2%(13). Let’s go back to Bayes – we’ll abbreviate the event “Tested positive” as “+” and Uses/Doesn’t Use as “U” and “DU” respectively.

We also know that the probability of a positive test, , is equal to the probability of a non-user testing positive plus the probability of a user testing positive, so:

So,

We’ve learned that, given a person had a positive test for cocaine, the probability they are a user is only around 16%. This is only as likely as getting a “6” on a single roll of a die. If the person failed a second drug test, the probability would go to 79.5%, and a third would take it to 98.7%(14). But no matter how many tests were failed, the probability would never go to exactly 100%. This is an important point about Bayesian theory: unless you are already certain of something, no amount of additional information can make you completely certain(15).
How is Frequentist probability different? Why do Frequentists and Bayesians fight so much?