Calibration: Weather forecasts

forecast graph

NWS example

Confidence judgments about cities

  1. Los Angeles N/S New York
  2. Los Angeles N/S Shanghai
  3. New York N/S Tokyo
  4. Paris N/S Toronto
  5. Paris N/S Tokyo
  6. Moscow N/S Vancouver
  7. New York N/S Rome
  8. Tehran N/S San Francisco
  9. Boston E/W Rio de Janeiro
  10. New York E/W Santiago
  11. Moscow E/W Cairo
  12. Tokyo E/W Canberra

Confidence judgments about cities

  1. Los Angeles S New York
  2. Los Angeles N Shanghai
  3. New York N Tokyo
  4. Paris N Toronto
  5. Paris N Tokyo
  6. Moscow N Vancouver
  7. New York S Rome
  8. Tehran S San Francisco
  9. Boston W Rio de Janeiro
  10. New York W Santiago
  11. Moscow E Cairo
  12. Tokyo W Canberra

Calibration curves for everyone else

calibration curves

Effect of difficulty

hard vs. easy image

Debiasing experiments (Koriat et al.; Hoch)

Four groups:

  1. Control.
  2. Think of reasons why you are right.
  3. Think of reasons why you are wrong.
  4. Think of reasons on both sides.

Extreme confidence reduced in groups 3 and 4.

Small effect.


Some people spend a lot of time thinking. "Should I do this or that?" I don't. I just decide what I want to do and do it. I don't go over and over decisions. My wife does, but I don't. It's a matter of self-confidence.

Dan Quayle
from The making of a senator, by Richard F. Fenno, Jr.

Regression to the mean

We are imperfect judges of confidence. We make errors in both directions. But when our judgment is extreme, errors can only go in one direction.

Thus, when we say 100%, any error will be in the direction of lower accuracy than 100%.

This is psychologically uninteresting, but it has great practical implications.

Scoring of judgments of forecasters A and B

YesNoNoYesYes LinearQuadratic
Ea. 1.50.71
Eb. 1.50.95
Ea'. 1.001.00
Eb'. 1.001.00

Linear score (SUM) provides no incentive not to exaggerate probabilities.

Square is "strictly proper." Encourages best guess.