The Behavioral Economics Guide 2022, edited by Alain Samson, begins with an essay by Dan Goldstein that offers an unnerving reminder for studies that compare only a few potential outcomes, rather than the full range (“Leveling up Applied Behavioral Economics”). He sets the stage:

You’re sitting in a workshop in a hotel somewhere in the world. You know the kind, with the U-shaped table and the dozen people and the bottle of sparkling water for every person. It’s 10 in the morning, someone’s presenting, and you’re having productive daydreams. You’re inspired, and you know because it’s 10 AM you’re about to have the best idea you’ll have all day. You hear something about probability weighting, that is, how people overweight small probabilities when they read them (as in the gamble studies on which prospect theory was built) but underweight small probabilities when they experience them (Hertwig et al., 2004). You start thinking about communicating probabilities with visual stimuli. You think that if people see visualizations of probabilities, it would be different than reading about them and different than experiencing them. Because frequency representations help people in other tasks (e.g., Walker et al., 2022), perhaps people seeing visualizations of probabilities as frequencies would cause them to neither overestimate nor underestimate the probabilities they represent. You think that if you can find a way to visually display probabilities as frequency-based icon arrays, without language or simulated experience, it might have a lot of applied uses and improve decision-making in other tasks such as mortgage borrowing, gambling, or investment.

So the idea goes something like this. Show people a grid like this one. Ask them to estimate the number of black squares, which can be viewed as a way of presenting a probability (in this case, 24 out of 100 or 24%). It’s not clear what will happen. Will people follow the common pattern of underweighting smaller probabilities and overweighting larger ones? Or will they on average accurate in their predictions?

As Goldstein tells the story, you pick some values to test this theory, and you have a friend pick some values to test the theory. But when you get together to talk it over, you find that you have opposite results! How can this happen? The problem arises because you and your friend each looked a just a few results, not at the full range of possibilities from 0 to 100. When Goldstein and co-authors did a study with a full range of values, here’s what they found. Estimates of the number of boxes were pretty good at low levels under about 10; slightly overestimated at levels around 20; substantially underestimated from about 35 to 55; substantially overestimated from about 65 to 80; slightly underestimated at about 90; and then pretty accurate for high levels above 95.

In Goldstein’s hypothetical story, imagine that you tried out just a few values shown by the black boxes, while your friend tried out just a few values in the orange boxes. Each of you would be missing a big part of the puzzle. Clearly, looking at only a few values can be deeply misleading; It’s only by looking at all the potential outcomes that one can draw a conclusion here. Goldstein writes:

When you test all the values from 0 to 100, you get this very weird—but very reliable—up, down, up, down, up pattern. I believe it was first discovered by Shuford in 1961 (see also
Hollands & Dyre, 2000). Since you tested low values under 20 and high values around 50 and 90, you saw overestimation at low values and underestimation at high values. However, because your friend tested low values around 30 and high values around 70, they saw the opposite, namely, underestimation at low values and overestimation at high values. The moral of the story is that looking at the world through the keyholes of a two-level design can give you a very misleading picture.

Any study that offers on a few selected options out of a broader range will face this potential problem. As a real-world example, consider the problem of a program that seeks to encourage people to save for retirement. You want to describe to people the benefits of saving. Is it better to emphasize to people the total value of their savings or how much they could receive per month in retirement benefits? The following pattern emerges:

If you ask people about which they find more satisfactory for retirement, a lump sum of money or an equivalent annuity, they often say the lump sum sounds more satisfactory. For example, people tend to say that a $100,000 lump sum seems more satisfactory than $500 / month for life … Upon hearing this, people might say, “What’s new there? Everybody knows that chopping up large amounts into monthly amounts makes them
seem smaller. That’s why companies advertise their monthly instead of their annual prices! That’s why charities ask you to donate pennies per day!”

However, … when you ask about larger amounts of money, people find the lump sum less, not more, adequate. For example, $8,000 / month for life sounds more adequate than a $1.6 million lump sum. What happened to the conventional wisdom that monthly amounts seem like less? Where’s the pennies per day effect everyone knows about?

Again, a study that offers only a few options may give a misleading result.

The volume includes descriptions of a number areas of recent research in behavioral economics, a 42-page glossary of behavioral economics terms from to “Action bias: to “Zero-price effect (with references!), nd pages of advertising for graduate programs in behavioral economics. As a sample, I’ll just mention one of the other research discussions, this one about the “FRESH framework” that applies to whether people show the self-control to meet long-run goals, Kathleen D. Vohs and Avni M. Shah: From the abstract:

[W]e distilled the latest findings and advanced a set of guiding principles termed the FRESH framework: Fatigue, Reminders, Ease, Social influence, and Habits. Example findings reviewed include physicians giving out more prescriptions for opioids later in the workday compared to earlier (fatigue); the use of digital reminders to prompt people to re-engage with goals, such as for personal savings, from which they may have turned away (reminders); visual displays that give people data on their behavioral patterns so as to enable feedback and active monitoring (ease); the importance of geographically-local peers in changing behaviors such as residential water use (social influence); and digital and other tools that help people break the link between aspects of the environment and problematic behaviors (habits).