# 15 Models

A common tool for improving decision making is the use of models.

## 15.1 What is a model?

A model is a mathematical construct representing the process of interest, made up of a set of variables and the mathematical relationships between them. A model can be used to predict future or unseen outcomes.

An example of a model we discussed in Chapter 10 involved predicting the level of drop out in a school using variables such as attendance rates, family socio-economic status, the school’s average SAT score, and the degree of parental involvement in the child’s schooling. This set of variables could be input into the model to give an output of the predicted level of drop out.

A model is typically developed using a set of observations for which we know both the input and output variables. (This is often called the training data set.) We then use the model to predict the outcome for a new observation where we know the input variables but not the outcome of interest.

## 15.2 Intuition versus statistical prediction

There is a considerable literature that models outperform expert judgment.

### 15.2.1 An illustration: predicting the quality of Bordeaux wines

Wine has been produced in the Bordeaux region of France since the Romans planted the first vineyards there almost 2000 years ago. Today, Bordeaux is famous for its red wine, and it’s the source of some of the most expensive wine in the world.

Yet the quality and price of that wine varies considerably from year to year. So each year the questioned is asked. Will this vintage be a classic, like the great 1961? Or will it be more like the disappointing vintage of 1960?

To answer those questions, four months after a new Bordeaux vintage is barrelled, the experts take their first taste. This early in its life, the immature wine tastes like swill. The wine is still over a year away from being bottled, and it’s years away from its prime.

Despite the difficulty in determining the quality of a wine when it is so young, the experts use these first tastes to predict how good this wine will be when it matures. The wineries hang on these first expert predictions of quality. The predictions appear in wine guides and drive early demand. They influence the price of the early wine sales.

The economist Orley Ashenfelter (2008) proposed an alternative way to predict wine quality. A simple statistical model. There were only three inputs to his model - the temperature during the summer growing season, the previous winter’s rainfall, and the harvest rainfall. Ashenfelter circulated the predictions derived from his model in a newsletter to a small circle of wine lovers.

You can see in this story two contrasting ways of informing or making a decision - expert or human judgement in the first case and a model in the second. Which approach worked better?

Ashenfelter’s model could predict more of the variation in vintage price than the expert judgements (Ashenfelter and Jones (2013)). This is despite the fact that those expert judgements affected the price. When Ashenfelter added weather information to the expert predictions, he improved them. To top it off, he didn’t even need to taste the wine. He could make predictions months before the experts.

### 15.2.2 Evidence

The story of Orley Ashenfelter’s prediction of wine quality is not an isolated example.

In 1954, the experimental psychologist Paul Meehl (2013) published *Clinical versus Statistical Prediction*. Meehl catalogued twenty empirical competitions between statistical methods and clinical judgement, involving prediction tasks such as academic results, psychiatric prognosis after electroshock therapy, and parole violation. The results were consistently victory for the statistical model or a tie with the clinical decision maker. In only one study could Meehl generously give a point to the experts.

Similarly, Grove et al. (2000) looked at 136 studies in medicine and psychiatry in which models had been compared to expert judgement. In 63 of these studies, the model was superior. In 65 there was a tie. This left 8 studies, out of 136, in which the expert was the better option.

This does not, however, mean that statistical methods are perfect. They have flaws. Models can be biased. There are many circumstances where they should not replace human decision making. Ultimately, decision making methods should be tested. Compare their accuracy. Examine the errors they make and the costs of those errors. And choose based on the empirical evidence that you can generate.

## 15.3 Why might some models outperform?

Many of the decision making problems that we discussed in the first module are eliminated by the use of models. By creating a formalised structure concerning what information is used and how it is incorporated, the heuristics that can lead to human error are removed as factors. Models confidence intervals can be calculated and calibrated.

I will now illustrate this as it relates to the problem of noise that I discussed in Chapter 14.

### 15.3.1 Noise

One of the major reasons that models can outperform is the *noise* in human decision making. Model, in contrast to humans, are typically consistent, returning the same decision each time.

An interesting implication of this difference in consistency is that models designed to *predict the decisions of human decision makers* typically outperform the decision makers whose decisions are used to develop the model (for example, Goldberg (1970)). When developing a predictive model, you normally develop it using the outcome of interest. For example, if seeking to predict whether a loan applicant will default on a loan, develop a model using long-term borrower outcomes and their characteristics. In a technique called bootstrapping (not to be confused with the statistical term bootstrapping), you don’t use the loan outcomes to develop the model, but rather historic predictions by loan assessors of whether a borrower will default. This use of predicted rather than actual default means that there will be errors in the data on which you are building your model. Despite this, bootstrapping can be effective, largely due to the removal of noise. For example, in one study, models developed from decisions of clinical psychologists tended to outperform most of those same psychologists in differentiating psychotic from neurotic patients.

## 15.4 A model example: The outside view

The outside view is a simple modelling approach in which you develop an estimate based on a class of similar previous cases. The outside view can overcome problems such as overplacement, overestimation, availability, representativeness, and anchoring and adjustment.

The following story provides an illustration.

### 15.4.1 Designing a textbook

Daniel Kahneman (2011) was involved in a project to develop a curriculum and textbook for a course in judgement and decision making. At the conclusion of a working session on the curriculum, Kahneman asked each of the participants in that session to write down how long they thought it would take to submit a draft of the textbook to the Department of Education.

Kahneman collected the estimates and wrote them on the board. All of the responses were between 18 and 30 months, a narrow band suggesting a couple of years of work to come.

Kahneman then asked the curriculum expert in the room about his past experience.

The expert started: “You know, I never realized this before, but in fact not all the teams at a stage comparable to ours ever did complete their task. A substantial fraction of the teams ended up failing to finish the job.”

Kahneman asked how large he estimated that fraction was.

The expert: “About 40%”

Kahneman: “Those who finished, how long did it take them?”

The expert: “I cannot think of any group that finished in less than seven years,” he replied, “nor any that took more than ten.”

Kahneman again: “When you compare our skills and resources to those of the other groups, how good are we? How would you rank us in comparison with these teams?”

“We’re below average,” he said, “but not by much.”

All of this came from someone who only moments earlier had estimated a time of similar magnitude as the rest of the group.

The textbook was ultimately delivered in 8 years and was never used.

### 15.4.2 The inside versus the outside view

Contrast the two estimates of the time to complete the textbook.

First, we have an estimate of the time it would take from the perspective of those who know all of the details of the plan to develop the textbook. They have taken these details and turned them into an estimate. An estimate of this type - looking at the specifics of the case - is often called the inside view.

We also have a second estimate from the curriculum expert derived from other similar projects. It incorporated none of the specific details of this particular textbook. This is often called the outside view.

Contrasting the two, the inside view focuses on the specific circumstances and experiences, maybe with margin for caution. The outside view captures a bigger picture absent the detail. The inside view uses the specific information about the problem at hand. The outside view looks at whether there are similar situations - a reference class - that can provide a statistical basis for the judgement.

The problem with the inside view is that, while seemingly taking more into account, it effectively fails to account for unforeseen events that inevitably crop up during every project. There are many ways for plans to fail. Although most are improbable, the likelihood that something will go wrong in a big project is high. These problems are naturally contained in the outside view.

The result is that the outside view - ignoring the finer details of the project - can give us a better estimate.

### 15.4.3 Reference class forecasting

One way of using the outside view is reference class forecasting. When making a forecast, don’t just look at the specific circumstances of the case. Ask if there is a broader *reference class* of events that you can look at to see how they turned out. Base your forecast on the outcomes of the reference class rather than your own specific forecast. Or if you believe your case has some unique features, start with your reference class forecast, and cautiously adjust from there for any unique features of your case.

The power of reference case forecasting in corporate decision settings is well established. For example, Lovallo et al. (2012) examined private equity investment decisions and found that an outside view performed better than using a few analogies familiar to the decision maker.

Reference case forecasting can be particularly powerful in overcoming the planning fallacy. The planning fallacy is the tendency of people to underestimate the completion times and costs for difficult tasks (recall the circumstances where we are “overconfident”), even when they know that most similar tasks have run late or over-budget. An estimate based on a reference class can provide a check to the project estimate.

Flyvbjerg (2008) catalogued the planning mistakes associated with large projects, such as IT and infrastructure builds. These projects consistently run over budget and over time, and are often based on overoptimistic assumptions of their value. For example, he had found rail cost forecasts were typically out by 44% for rail projects, 33% for bridges and tunnels and 20% for roads. Demand for rail projects is typically overestimated by 51%.

Using a database of major projects he developed, Flyvberg has access to an effective reference class with which to adjust cost estimates for major projects. This has now been used in decision making for projects such as the Edinburgh Tram and London’s Crossrail project.

If we think of a reference-class forecast as a model, the variable that is input into the model is the reference class outcome, such as the mean time to completion of similar projects or the average cost overrun. In the simplest case, there is no mathematical transformation of that input: the output, such as the estimation of the time to completion of your project, simply equals the input.

For another perspective on the outside view, read Kahneman and Lovallo (2003) or listen to Russ Roberts (n.d.) interview Bent Flyvbjerg on *Econtalk*.

### 15.4.4 Failure to use the outside view

The failure of the curriculum expert to use the outside view initially in making an estimate of the time to complete the forecast was not an anomaly. There is ample evidence that people will ignore the outside view even when it is right in front of them.

Freymuth and Ronan (2004) asked experimental participants to select treatment for a fictitious disease. Participants were told the efficacy of two different treatments, plus given an anecdote of a patient outcome for each.

When the participants were able to choose a treatment with a 90% success rate that was paired with a positive anecdote, they chose it 90% of the time (choosing a control treatment with 50% efficacy the remaining 10% of the time). But when paired with a negative anecdote, only 39% chose the 90% efficacy treatment. Similarly, a treatment with 30% efficacy paired with a negative anecdote was chosen only 7% of the time, but this increased to 78% when it was paired with a positive anecdote. The stories drowned out the base rate information.

## 15.5 Model complexity

There are markedly more complex analytical methods than reference class forecasting, involving vastly greater data, with which to make decisions. However, there is actually much power in simple analytical techniques.

### 15.5.1 Improper linear models

In a 1979 paper Robyn Dawes (1979) demonstrated the power of “improper linear models.”

A common statistical procedure when developing a model is a process called multiple regression. Each of the variables and outputs are entered into a formula that determines the optimal weighting of each input variable in developing the output.

Dawes showed that a model that combines each of the variables with equal weight, rather than optimal weight, does just as well as multiple regression in many instances (and, as per the earlier examples, better than the humans they are matched against).

A large driver of this is the bias-variance trade-off we discussed earlier in the unit. The improper linear model with equal weights is biased, but it is less affected by random variation in the data that is used to develop it. It is biased but has lower variance than multiple regression.

### 15.5.2 Simple heuristics

In Simple Heuristics That Make Us Smart, Czerlinski et al. (1999) describe a competition between some simple heuristics and multiple regression. Both were to predict outcomes across 20 environments, such as school dropout rates and fish fertility.

One simple heuristic in their competitions was “Take the Best”. This heuristic operates by working through variables in order of validity in predicting the outcome. For example, if you want to know which of two schools has the highest dropout rate, you ask which of the many possible predictive cues has the highest validity. If attendance rate has the highest validity, and one school has lower attendance than the other, infer that that school has the higher dropout rate. If the attendance rate is the same, look at the next cue.

Depending on the precise specifications, the result of the competition across the 20 environments was either a victory for Take the Best or at least equal performance with multiple regression. This is impressive for something that is less computationally expensive and ignores much of the data (in other words, is biased).