“Learning from Forecast Disasters” – Mark Blessington
The 2016 presidential election forecast may be one of the worst forecasting disasters of the modern era. It is arguably in the same league as the sub-prime mortgage forecasts that led to the Great Recession. Both seem systemic in that many independent forecasters made the same mistake. And both outcomes were a tremendous shock. Until the Great Recession, there had never been so many AAA commercial bond defaults. And until the recent election, there had never been so many inaccurate major media forecasts of who would win a presidential election.
Until the evening of November 8, 2016, the poster child of horrible political forecasting was the Chicago Daily Tribune. On November 3, 1948, they incorrectly declared “Dewey Defeats Truman.”
By the morning of November 8, 2016, 16 out of 17 major forecasters made an outright prediction that Hillary Clinton would defeat Donald Trump in the presidential election. As we all know, the predictions were wrong and Clinton only won 43% of the electoral vote. As anyone watching live coverage that Tuesday night could tell, the media was in shock. They did not see this one coming. Nor did voters themselves. While emotions ranged from glee to gloom, one thing was certain: the outcome was not expected.
What was reported
Consider the six presidential election win probabilities in the first “Technical” forecaster table. On the morning of November 8, the confidence levels of these Clinton win predictions ranged from 99% to 71%. In other words, they assessed Trump’s chances of winning to be 1% to 29%. The lowest confidence level of 71% represents a moderately high confidence level for Clinton. If a Clinton win prediction was weak, then the probability would be in the low to mid-50s. Not a single probability was in the 50s or 60s, so the confidence levels for all six technical predictions were moderately high to very, very high.
In the “Electoral” forecaster table, 10 out of 11 predictions favored Clinton. However, 7 out of 11 forecasters indicated that the race was very close: they only gave Clinton a one- to three-point advantage over Trump. CNN went even further and said (albeit in a convoluted manner) the race was too close to call.
Today’s conventional wisdom says the “Technical” forecasters are more scientific. These six forecasters use statistical models and publish confidence levels (i.e., probabilities) with their forecast. But these predictions were less accurate than the “Electoral” predictions, where 8 out of 11 forecasts were either very close or too close to call.
If the voting public had only seen the 8 conservative forecasts for the month prior to the election, the mood leading up to the election and on election night would have been very different. There would have been less confidence in the outcome and less shock Tuesday night because we would have known all along that the election was very close.
The most accurate forecast in the months leading up to the election was that the presidential race was too close to call. We now know this with the benefit of hindsight. But, was it possible to know this going in to the final months of the campaign? This is the big question that forecasters must now address.
I think the answer is yes: Forecasters should have declared the election too close to call. The following explains why.
High margins of error
The media consistently acknowledged there were battleground states during the last four to six months of the campaign. Depending on the month and the outlet, there were 15 such states: AZ, CO, FL, GA, IA, MI, MN, NH, NV, NC, OH, PA, UT, VA, and WI.
Presidential election analyses in the battleground states often involved demographics: state total population (e.g., “What percent of eligible voters in the state will actually vote?”), age (e.g., “Will younger voters switch from Bernie to Hillary or vote independent?”), sex (e.g., “Will women strongly favor a female candidate?”), race (e.g., “Will white voters favor Trump?”), and ethnicity (e.g., “Will the Latino vote greatly favor Clinton?”).
Many analyses dug even deeper into demographics involving income and education levels. It was also common to see projections by voting district and county. These are very precise levels of election analysis.
It is essential for everyone who reports or reads poll results to understand this critical fact: Every incremental level of demographic detail increases forecast error.
The best source for state population statistics with numerous fine cuts of geography and demography is the US Census Bureau. They provide margin of error numbers for their population and demographics statistics. Pollsters must include these margins of error in their calculations.
For example, the 2015 population estimate for Arizona is 4,710,448 with a margin of error of +/-14,258 or 0.3% of the estimate. A pollster uses the total Arizona population estimate to determine how many people to contact and ask, for example, if they plan to vote in the upcoming election. If their sampling level gives them a 2% margin of error on the Arizona population, then the total margin of error is 2.3%: 2% sampling error, and 0.3% population estimate error.
Another polling error needs to be considered: “Are survey respondents giving 100% accurate or honest answers to every question?” For example, people may not want to say who they intend to vote for, or they may not actually intend to vote even though they say they intend to. In this election where intense emotions were involved, some people may have been unwilling to say how they intended to vote. The bottom line is that every survey question has response error, and this needs to be included in the total error calculation. It seems quite likely that, in this election, response error was ignored or badly underestimated.
The table below shows error levels for the 15 battleground states. Columns two through five provide US Census background error levels on four variables. The correct assumption here is that the pollster must use state population and demographic estimates to establish their sampling targets. This is how pollsters determine how to obtain a representative sample of the state’s population.
This table ignores error sources for surveys that also assess variables such as urban versus rural, college versus high school education, or income level. These items increase base line error well beyond the levels shown in the Base Line Error column.
The table then shows two more items: sampling error and response error. Sampling error involves the number of people included in the poll. Since surveys do not contact everyone, a level of error is created. And, as previously discussed, we need to also account for response error. Uniform sampling error and response error rates of 2% are entered in these two columns and added to the Base Line Error column to create the “Total Error” column.
The total error percentages in the last column are quite large compared to the 2.9% median margin of victory for either Clinton or Trump in these 15 states for 2016. In other words, it is quite likely that most polls did not detect victory margins that exceeded the true error margins for their polls. Therefore, the presidential election outcome was probably not forecastable. Even though a few states like Utah and Iowa started to show strong polling results for Trump, the election would still have been unpredictable because too many other battleground states were too close to call.
Let’s take Nevada as an example, where the total survey error is 8.4%. Suppose a poll tells us that 51% of eligible voters will vote for Clinton, 48% for Trump, and 1% for independents. That gives Clinton an election margin of 3 points over Trump. Can we predict with any confidence that Clinton will win in Nevada? NO! The survey error margin in Nevada is probably close to the +/- 5.8% shown in the table, which is almost double the 3% margin for Clinton found in the hypothetical survey. We would have to report that the Nevada election outcome cannot be forecasted by this survey.
Consider an actual poll in Ohio. The Columbus Dispatch predicted on November 6 a win for Clinton by 1%, while simultaneously reporting that the margin of error was 2.9%. The correct report would have been the outcome in Ohio was too close to call. Further, the 2.9% margin of error lacks credibility. How many people did the pollster survey out of Ohio’s 8.8 million eligible voters? What demographic cuts were used? Did that pollster assume 100% response accuracy?
Yes, but …
It is wrong to argue that population and demographic errors can be ignored. They are an essential part of determining how many people to contact. Further, what about the fact that the population estimates are a year old? How should they be expanded or contracted to fit the most likely 2016 population and demographics levels? Such projections add additional levels of error that cannot be ignored.
It is probably wrong to argue that sampling error is less than 2% at the state level. It is very expensive to sample a state’s population across multiple demographic dimensions at a 90% to 95% confidence level without creating at least a 2% margin of error. When considering the numerous sampling variables that were examined during the election—age, gender, race, ethnicity, income, education, county—it is shocking that so many surveys reported overall error levels of 2% and 3%. This means they probably ignored population and demographic error sources.
It is probably irresponsible to argue that response error is less than 2% in a hotly contested and contentious presidential election. Even if the only question is ‘Who will you vote for?’ some people will not give an honest answer. Indeed, I have seen little research to indicate that response accuracy for sensitive political and economic issues is anything other than 2% or higher.
One phenomenon that is often dismissed by surveys is the “feedback loop” phenomenon: A voter may change their opinion or intention after they hear or read the survey results. In election polling, the principle should be: “you can’t measure it without changing it.” For example, a voter may say they intend to vote, but then they hear the survey results on the local news. If the results suggest a clear advantage for one candidate, the voter may change their intention to vote. They may start to regard voting hassles (e.g., being late for work, losing pay, registration complications, long lines, potential intimidation) as more important than voting by rationalizing: “my vote won’t make a difference anyway.” This is a yet another level of error, and should be factored into the total margin of error estimate.
One last technical point involves error calculations. Statistics such as standard error of the mean are calculated using a “t statistic,” which assumes the data are normally distributed. Our society is not normally distributed, especially on economic or political issues. It is much better described as bimodal, which violates the normal distribution assumption. The implication is that forecasters must use even wider safety margins when interpreting bimodal economic and political survey results.
The consequences of bad forecasting are profound. What if the media had adhered to a stricter ethical code and paid closer attention to error margins? They would have reported the race was too close to call because the margins of error were wider than the victory margins detected in the polls. This would have placed more pressure on the appropriate parties—the candidates and the voters—to resolve the ambiguity in their own ways. The parties and their volunteers might have worked harder. Voters might have read more, viewed more or discussed more rigorously. Maybe more people would have voted. The list of hypothetical scenarios goes on and on.
There seems to be tremendous resistance to accurate poll reporting and interpretation. The media behaves as if accuracy is inconsistent with maximizing viewership or readership. More generically, “too close to call does not sell newspapers.”
In the end, all a good forecaster can do is maintain integrity, insist on high degrees of statistical precision and accuracy from survey sources, pay very close attention to error levels, be conservative when considering all possible error sources, and report “NF” or non-forecastable when survey results do not exceed the margin of error. Granted, when people pay money for a poll, they do not want to hear “NF;” they want numbers. Then the forecaster can report a range.
Mark Blessington is a Partner at Consentric Marketing.