25  Better forecasting

In this page, I examine techniques to improve forecasting accuracy, primarily through evidence from the Good Judgment Project and related research.

25.1 The Good Judgment Project

The Good Judgment Project emerged from a tournament conducted by the United State’s Intelligence Advanced Research Projects Activity (IARPA). The tournament pitted teams of forecasters against each other in predicting political and economic events. The result was a decisive victory for the Good Judgment Project team - so decisive IARPA dropped the other teams for later years of the tournament.

There were three unique features of the Good Judgment Project approach (Tetlock and Gardner (2016)):

  • Aggregating many forecasts to gain wisdom of crowds, giving extra weight to top forecasters
  • Using an algorithm to “extremise” the forecast
  • Training forecasters to overcome biases.

The aggregation of forecasts is a use of statistical groups. Statistical groups involve the aggregation of individual inputs to make the forecast or decision. Voting is a simple example, which works well if the majority is right. There is strong evidence that statistical groups make better decisions than most individuals. A story involving Francis Galton is a classic example. Galton (1907) had people estimate the weight of an ox. The median of the individual predictions was only 0.8% from the actual figure.

Extremising shifts the forecast toward limits. If the forecast is 70% probability, bump up to 85%. If 30%, cut it to 15%. This approach simulates what would happen if everyone had access to all available information, which would typically raise confidence. The effectiveness of extremising requires diversity of information sources. If everyone holds the same information, there is no sharing of information to be simulated.

25.2 Training superforecasters

A particularly valuable element of the Good Judgment Project was their approach to training forecasters to overcome cognitive biases. While most attempts to debias decision makers show limited effectiveness, Mellers et al. (2014) and Chang et al. (2016) demonstrated that structured training can significantly improve forecasting accuracy.

25.2.1 Training methods and content

The training program ran throughout the four-year project but comprised less than an hour’s instruction each year. Participants received one of two training types.

The probabilistic reasoning training covered:

  • using reference classes and base rates to establish initial probabilities
  • leveraging statistical groups to aggregate estimates
  • applying statistical and mathematical models to structure predictions
  • developing self-awareness of common cognitive biases.

The scenario development training involved:

  • establishing coherent and logically consistent probabilities (ensuring options sum to 100%)
  • systematically exploring underlying assumptions
  • identifying key drivers that could affect outcomes
  • considering best and worst case scenarios to expand thinking.

25.3 Evidence of effectiveness

Both training approaches improved forecast accuracy by approximately 10% across all four years as measured by Brier scores, which assess the accuracy of probabilistic predictions.

The techniques from the training were evident in the habits of top performers (later dubbed “superforecasters”), who consistently:

  • Started with an “outside view” by establishing base rates before diving into details
  • Only then applied an “inside view” by incorporating information specific to the particular question
  • Showed greater sensitivity to scope and timeframes than average forecasters.

This last point deserves emphasis. When presented with questions about whether an event would occur within different timeframes (e.g., “within 6 months” versus “within 12 months”), average forecasters tended to give nearly identical probability estimates regardless of timeframe. In contrast, skilled forecasters adjusted their probabilities appropriately, recognising that longer timeframes typically increase the likelihood of events occurring.

25.3.1 Methodological concerns

Despite these promising results, Hauenstein et al. (2025) raised methodological concerns about the training studies. Their analysis suggested the reported effects might not be robust when controlling for extraneous variables. Specifically, trained and untrained forecasters often answered different questions at different times (ostensibly due to the training structure). This difference in questions and timing — rather than improved forecasting skill — could potentially explain the performance differences.

25.4 Scenario planning

In Expert Political Judgment (Chapter 21), Tetlock (2005) explored whether scenario planning techniques could improve forecasting. This investigation occurred in the context of his larger study where he had identified the fox-hedgehog distinction — with “foxes” (who draw on diverse information sources) typically outperforming “hedgehogs” (who rely on a single big idea).

In one experiment, Tetlock had experts branch high-level forecasts into multiple sub-scenarios. For the question of whether Canada would break up during the Quebec separatist referendum period, experts considered combinations of outcomes involving separatist party electoral success, referendum results, economic conditions, and levels of political acrimony, rather than simply predicting whether Quebec would separate.

This branching exercise uncovered a consistent bias in probability judgment. While experts assigned mathematically coherent probabilities to the basic question (summing to 1.0), their assessments of the component sub-scenarios revealed probability incoherence. The average sum of probabilities for the branched scenarios was 1.58 — mathematically impossible for mutually exclusive outcomes.

Most notably, this effect was strongest among foxes, whose typical forecasting advantages disappeared in this context. Their branched scenario probabilities summed to an average of 2.09, showing even greater incoherence than hedgehogs. When experts were subsequently required to adjust these probabilities to sum to one (a standard practice in scenario planning), foxes’ forecasting accuracy declined to match that of hedgehogs. The process designed to improve consideration of alternatives neutralised the performance edge that foxes had previously demonstrated.

This outcome suggests a qualification to the benefits of fox-like thinking. While drawing on diverse sources and perspectives generally improves forecasting, the cognitive demands of scenario branching may introduce distortions in probability judgments, particularly for those with more nuanced forecasting approaches. For organisations implementing scenario development, pre-mortems, or red teaming techniques, these potential effects on different cognitive styles deserve consideration.

If you want more on this topic, listen to this podcast where Robert Wiblin (n.d.) interviews Philip Tetlock on the 80,000 hours podcast.