Volume 24: The 2018 Election, Who Projected It Best?

A log-loss comparative analysis of quantitative and qualitative 2018 U.S. House of Representatives election projections
“Well, how did your projections do?” – Dale Cohodes.
It will come as a shock to nobody that I maintained a personal set of projections for the recently completed elections to the House of Representatives. It may surprise you more to know that reviewing my projections alongside the so-called “professionals” gives us an excellent opportunity to think through one of our favorite topics—probability. Elections are an interesting class of random event: probabilistic with a single trial and a discrete outcome. The tools we have to predict their outcome—polls, demographics, past voting patterns—result in distributions that include deviations from a mean. But no matter how much we’d like to, we cannot re-run the recent election in Georgia’s 7th or North Carolina’s 9th Congressional district, even though each was decided by fewer than 1,000 votes. And no matter how small the margin, the candidate with a plurality of the votes wins; a margin of 10,000 votes or 1,000,000 votes results in the same practical outcome. Elections are fundamentally different from random processes like flipping a coin or tomorrow’s high temperature.
Because of this, a simple question about forecast quality can be extended to provide insight into the general nature of probabilistic forecasts.
• What’s a good probabilistic forecast? • Whose House projections were the best?
What’s a good probabilistic forecast?
Let’s start with the basics. We define a probabilistic forecast as a statement of the likelihood of the occurrence of a discrete event, made by a person (the forecaster) before the event is decided.(1) When a sports handicapper says the Wolverines have a 75% chance of winning their next game, that is a probabilistic forecast. When your local weatherman says there is a 50% chance of rain tomorrow, that is also a probabilistic forecast.(2)
We already know that probabilistic thinking is a skill the human mind does not necessarily possess. We are not good at translating concepts like “possible,” “likely,” and “almost certain” into quantitative likelihoods of occurrence. If we are told that the probability of something happening is 80%, and it doesn’t occur, we are frequently quite distraught. And maybe we should be. But a forecaster who places an 80% probability on an event that always happens is also not doing a very good job. Saying that there is an 80% chance of the sun rising tomorrow is not a show of forecasting skill, but rather a lack thereof.
So how do we know a good probabilistic forecast when we see one? Consider a weatherman(3) who says that there is a 50% chance of rain on Tuesday. If it rains, then the weatherman wasn’t wrong; it was clearly something in the realm of possibilities. But the rub is that, if the same prediction is made the following day, and the sun in fact comes out, the forecast is equally good—and equally bad. Over the two-day span, the forecasts did not add any informational value. A weather forecast that says day after day after day that the chance of rain is 50% is useless. Such a weatherman would soon be exited from your local television station, and they should be.
But let’s move to Phoenix, where it rains only 10% of the time on average.(4) Now a forecast showing a 50% chance of rain that is borne out is a fantastic one. On the other hand, if it doesn’t rain, then it isn’t such a bad prediction, as it almost never rains in Phoenix. A 2-day forecast showing a 50% chance of rain each day, one day of which is borne out, has a lot of value in the desert. Which brings us to a principle: the quality of a forecast depends on how different it is from the probability that would be assigned in its absence.(5) Showing a few different sets of Phoenix predictions gives us more information on which weathermen should keep their jobs.

First, let’s check against our prior. It rains in Phoenix 10% of the time, and we had one shower in ten days. Check; our expectations for long-run rain held out.(6)
Let’s start with Weatherman Ugly. These were some bad predictions. Not only did he think rain was likely on five dry days, but he also put a probability of 0% on the one day where it did rain.(7) This man is bad at his job; listening to him is literally worse than just going with the long-term average of 10% chance of rain every day.
Which is precisely what our Bad Weatherman did. These predictions were not so bad as his Uglier brother-in-forecasting, but they are also essentially useless. You don’t need a degree in meteorology or fancy weather radar to make these predictions. He should still be fired.
On the other hand, our Good Weatherman in fact did some strong work. It rained on one of the three days on which he thought it might rain; 33% realization on a 40% prediction isn’t bad. He also confidently predicted no rain on seven days and was correct on each. Using these predictions is far superior to simply relying on the long-run average.
Before we finally describe our metric for the quality of a probabilistic forecast,(8) let’s run through one more set of forecasters. For this, we go back to a wet sub-tropical climate where we can expect rain 50% of the time.

Our Bad Forecaster…well he’s still doing his thing, going with the historical average. I’m getting tired of this guy. Our Good Weatherman puts up a reasonable showing. When he predicts a 60% chance of rain, it rains; when he predicts a 40% chance of rain, it doesn’t. It almost seems that he is better at this job than he thinks he is. The days are segregated properly, but he lacks confidence. And this is made our Better forecaster better. Even though each individual day is still far from certain, these predictions are clearly better than the previous set. Perhaps these predictions should also be more confident; after all, on days where rain was likely, she was right 100% of the time, not 80%. But we’ve come far enough to state two principles of probabilistic forecasting:
A probabilistic forecast is “good” if it is better than a relevant, uninformed estimate.
A more certain forecast is better than a less certain forecastif it is correct.
Now for our very simple metric of the quality of a probabilistic forecast: log-loss.(9) For a probabilistic forecast with probability p,
If the event occurs,
Log-loss = -1 * log ( p )
If the event does not occur,
Log-loss = -1 * log ( 1 – p )
That’s it. Just be sure that for your “log,” you use the natural log of base e. Also, don’t try to use a probability of 0% of 100%; use a very small number of your choice.(10) Rather than describe what this looks like, let’s visualize it:

The first thing that we see is that log-loss is a penalty. See those big numbers for low probability events that occur (and high probability events that don’t)? You don’t want to be out there. Don’t make especially confident predictions that don’t come true. The two lines intersect at a probability of exactly 50%. Estimating a 50% probability on a coin flip is equally good or bad no matter what you forecast, or what happens.
Log-loss is especially useful when you sum it over several probabilistic forecasts.(11) We can do so for the first set of weather forecasters we considered earlier.

As we expected, our Good Forecaster did the best. One drawback of log-loss is that the number has little objective meaning. Our Good Weatherman had a total log-loss of 1.945, but that means nothing on its own. However, when we compare mult