15 Forecasting
A core function in firms is the ability to forecast.
There are a range of problems that may derail our ability to forecast - many that we have seen during this unit - including errors due to the use of heuristics, overconfidence (particularly overprecision and overestimation), and noise.
Forecasting also suffers from the problem that knowledge is distributed across the firm. For instance, members of the sales team may get detailed knowledge on what customers think about the new product. How can you effectively gather this information for use in the forecast?
15.1 Expert Political judgment
In Expert Political Judgment: How Good Is It? How Can We Know?, Tetlock (2005) reports on a project in which he asked a range of experts to predict future events. With the need to see how the forecasts panned out, the project ran for almost 20 years.
The basic methodology was to ask each participant to rate three possible outcomes for a political or economic event on a scale of 0 to 10 on how likely each outcome is (with, assuming some basic mathematical literacy, the sum allocated to the three options being 10). An example question might be whether a government will retain, lose or strengthen its position after the next election. Or whether GDP growth will be below 1.75 per cent, between 1.75 per cent and 3.25 per cent, or above 3.25 per cent.
Once the results were in, Tetlock scored the participants on two dimensions - calibration and discrimination. To get a high calibration score, the frequency with which events are predicted needs to correspond with their actual frequency. For instance, events predicted to occur with a 10 per cent probability need to occur around 10 per cent of the time, and so on. Given experts made many judgments, these types of calculations could be made.
To score highly on discrimination, the participant needs to assign a score of 1.0 to things that happen and 0 to things that don’t. The closer to the ends of the scale for predictions, the higher the discrimination score. It is possible to be perfectly calibrated but a poor discriminator (fence sitter) through to a perfect discriminator (only using the extreme values correctly).
Tetlock’s analysis revealed that:
- Experts, who typically have a doctorate and average 12 years experience in their field, barely outperform equal probability of 33 per cent to each potential outcome.
- However, experts outperform unsophisticated forecasters (a role filled by Berkeley undergrads).
- The experts were not differentiated on a range of dimensions, such as years of experience or whether they are forecasting on their area of expertise. Subject matter expertise translates less into forecasting accuracy than confidence.
The one dimension where forecast accuracy was differentiated is on what Tetlock calls the fox-hedgehog continuum (borrowing from Isiah Berlin). Hedgehogs know one big thing and aggressively expand that idea into all domains, whereas foxes know many small things, are skeptical of grand ideas and stitch together diverse, sometimes conflicting information. Foxes are more willing to change their minds in response to the unexpected, more likely to remember past mistakes, and more likely to see the case for opposing outcomes. Foxes outperformed on both measures of calibration and discrimination.
What is it about foxes and hedgehogs that leads to differences in performance? Among other reasons, Tetlock identified the following:
- Foxes are better Bayesians in that they update their beliefs in response to new evidence and in proportion to the extremity of the odds they placed on possible outcomes. They weren’t perfect Bayesian’s however - when surprised by a result, Tetlock calculated that foxes moved around 59 per cent of the prescribed amount compared to 19 per cent for hedgehogs. In some of the exercises, hedgehogs moved in the opposite direction.
- Foxes were also less prone to hindsight effects. Many experts claimed that they assigned higher probabilities to outcomes that materialised than they did. As Tetlock notes, it is hard to say someone got it wrong if they think they got it right.
15.2 How likely is likely
Andrew Mauboussin and Michael Mauboussin (2018) write in the Harvard Business Review:
In a famous example (at least, it’s famous if you’re into this kind of thing), in March 1951, the CIA’s Office of National Estimates published a document suggesting that a Soviet attack on Yugoslavia within the year was a “serious possibility.” Sherman Kent, a professor of history at Yale who was called to Washington, D.C. to co-run the Office of National Estimates, was puzzled about what, exactly, “serious possibility” meant. He interpreted it as meaning that the chance of attack was around 65%. But when he asked members of the Board of National Estimates what they thought, he heard figures from 20% to 80%. Such a wide range was clearly a problem, as the policy implications of those extremes were markedly different. Kent recognized that the solution was to use numbers, noting ruefully, “We did not use numbers…and it appeared that we were misusing the words.”
Not much has changed since then. Today, people in business, investing, and politics continue to use vague words to describe possible outcomes.
To examine this problem in more depth, team Mauboussin asked 1700 people to attach probabilities to a range of words or phrases. For instance, if a future event is likely to happen, what percentage of the time would you estimate it will happen? Or what if the future event has a real possibility of happening?
Unsurprisingly, as you can see in this figure, the answers are all over the place.
You see more detailed results here. (You can also take the survey there too).
What is the range of answers for an event that is “likely”? The 90% probability range for “likely” - that is, the range that 90% of the answers fell within (and 5% of the answers were above, and 5% below) was 55% to 90%. “Real possibility” had a probability range of between 20% and 80% - the phrase is nearly meaningless. Even “always” is ambiguous, with a probability range of 90% to 100%.
An interesting finding of the survey was that men and women differ in their interpretations. Women are more likely to take a phrase as indicating a higher probability.