Every intro statistics class teaches \”correlation is not causation\”–that is, because two patterns consistently move together (or consistently opposite), you can\’t jump to a conclusion that A causes B, B causes A, some alternative factor C is affecting both A and B, or that among all the millions of possible patterns you can put side-by-side, maybe the correlation between this specific A and B is just a fluky coincidence.

As part of the \”credibility revolution\” in empirical economics, researchers in the last 20 years or so have become much more careful in thinking about what kind of a study would demonstrate causality. For example, one approach is to set up an experiment in which some people are randomly assigned to a certain program, while others are not. For example, here are discussions of experiments about the effectiveness of preschool, health insurance, and subsidized employment. Another approach is to look for real-world situations where some randomness exists, and then use that as a \”natural experiment.\” As an example, I recently wrote about research on the effects of money bail which take advantage of the fact that defendants are randomly assigned to judges, some of who are tougher or more lenient in granting bail. Or in certain cities, admission to oversubscribed charter schools uses a lottery, so some students are randomly in the school and others are not. Thus, one can study the effects of bail based on this randomness.

This search for an underlying random factor that allows a researcher to obtain an estimate of an underlying cause is called \”identification.\” It\’s hard to overstate how much this change has affected empirical work in economics. Pretty much every published paper or seminar presentation has a discussion of the \”identification strategy.\” If you present correlations without such a strategy, you need to be very explicit that you are not drawing any causal inferences, just describing some patterns in the data.

There\’s not any dispute that this greater thoughtfulness about how to infer causality is overall a good thing. However, one can question whether it has gone too far. Christopher J. Ruhm raised this question in his \”Presidential Address: Shackling the Identification Police?\” given to the Southern Economic Association last November. The talk doesn\’t seem to be freely available online, but it has now been published in the April 2019 issue of the Southern Economic Journal (85:4, pp. 1016–1026) and is also available as an NBER working paper.

There are two main sets of concerns about the focus on looking for sources of experimental or natural randomness, as a way of addressing issues about causality. One is that these approaches have issues of their own. For example, imagine a study where people volunteer to be in a program, and then are randomly assigned. It might easily be true that the volunteers are not a random sample of the entire population (after all, they are the ones with connections to hear about the study and motivation to apply), and so the results of as study based on such a group may not generalize to the population as a whole. Ruhm acknowledges these issues, but they are not his main focus.

Ruhm\’s concern is that when research economists obsess over the issue of identification and causality, they can end up focusing on small questions where they have a powerful argument for causality, but ignoring large questions where getting a nice dose of randomization so that causality can be inferred is difficult or even impossible. Ruhm writes:

I sent out the following query on social media (Facebook and Twitter) and email: “I would like to get your best examples of IMPORTANT microeconomic questions (in labor/health/public/environmental/education etc.) where clean identification is difficult or impossible to obtain.” Responses included the following.

• Effects of trade liberalization on the distribution of real wages.
• Contributions of location, preferences, local policy decisions, and luck to geographic differences in morbidity and mortality rates.
• Effects of the school climate and work environment on teacher and student outcomes.
• Importance of norms on firms’ wage setting.
• Extent to which economic factors explain the rise in obesity.
• Impact of family structure on child outcomes.
• Effects of inequality, child abuse, and domestic violence on later life outcomes.
• Social cost of a ton of SO2 emissions.
• Effect of race on healthcare use.
• Effect of climate change on agricultural productivity.

Ruhm argues that for a number of big picture questions, an approach which starts by demanding a nice clear source of randomness for clear identification of a causal factor is going to be too limiting. It can look at slices of the problem, but not the problem as a whole. He writes (footnotes and citations omitted):

For a more concrete indication of the value and limitations of experimental and quasiexperimental approaches, consider the case of the fatal drug epidemic, which is possibly the most serious public health problem in the United States today. To provide brief background, the number of U.S. drug deaths increased from 16,849 in 1999 to 63,632 in 2016 and they have been the leading cause of injury deaths since 2009. The rise in overdose mortality is believed to have been initially fueled by enormous increases in the availability of prescription opioids, with more recent growth dominated by heroin and fentanyl. However, some researchers argue that the underlying causes are economic and social decline (rather than supply factors) that have particularly affected disadvantaged Americans. What role can different methodological approaches play in increasing our understanding of this issue?

RCTs [randomized control trials] could be designed to test certain short-term interventions—such as comparing the efficacy of specific medication-assisted treatment options for drug addicts—but probably have limited broader applicability because randomization will not be practical for most potential policies and longer term effects will be difficult to evaluate. Quasi-experimental methods have provided useful information on specific interventions such as the effects of prescription drug monitoring programs and the effects of , like the legalization of medical marijuana. However, the challenges of using these strategies should not be understated because the results often depend on precise characteristics of the policies and the timing of implementation, which may be difficult to ascertain in practice. Moreover, although the estimated policy impacts are often reasonably large, they are dwarfed by the overall increase in fatal drug overdoses.

Efforts to understand the root causes of the drug epidemic are therefore likely to be resistant to clean identification and instead require an “all of the above” approach using experimental and quasiexperimental methods where possible, but also the accumulation evidence from a variety of data sources and techniques, including descriptive and regression analyses that in isolation may fail to meet desired standards of causal inference but, hopefully, can be combined with other investigations to provide a compelling preponderance of evidence.

The relationship between smoking and lung cancer provides a striking example of an important question that was “answered” using strategies that would be viewed as unacceptable today by the identification police. The understanding of tobacco use as a major causal factor was not based upon RCTs involving humans but rather resulted from the accretion of evidence from a wide variety of sources including: bench science, animal experiments, and epidemiological evidence from nonrandomized prospective and retrospective studies. Quasi-experimental evidence was eventually provided (e.g., from analyses of changes in tobacco taxes) but long after the question had been largely resolved.

To summarize, clean identification strategies will frequently be extremely useful for examining the partial equilibrium effects of specific policies or outcomes—such as the effects of reducing class sizes from 30 to 20 students or the consequences of extreme deprivation in-utero—but will often be less successful at examining the big “what if ” questions related to root causes or effects of major changes in institutions or policies.

In summing up, Ruhm writes:

Have the identification police become too powerful? The answer to this question is subjective and open to debate. However, I believe that it is becoming increasingly difficult to publish research on significant questions that lack sufficiently clean identification and, conversely, that research using quasi-experimental and (particularly) experimental strategies yielding high confidence but on questions of limited importance are more often being published. In talking with PhD students, I hear about training that emphasizes the search for discontinuities and policy variations, rather than on seeking to answer questions of fundamental importance. At professional presentations, experienced economists sometimes mention “correlational” or “reduced-form” approaches with disdain, suggesting that such research has nothing to add to the canon of applied economics.

Thus, Ruhm is pointing to a tradeoff. Researchers would like to have a study with a strong and defensible methodology, and also a study that addresses a big and important question. Tackling a big question by looking at a bunch of correlations or other descriptive evidence is going to have some genuine limitations–but at least it\’s looking at fact patterns about a big question. Using a great methodology to tackle a small question will never provide more than a small answer–although there is of course a hope that if lots of researchers use great methods on small questions, the results may eventually form a body of evidence that supports broader conclusions. My own sense is that the subject of economics is hard enough to study that researchers should be willing to consider, with appropriate skepticism, a wide array of potential sources of insight.