Back in January, I posted about an article that was getting some attention in my world. Megan T. Stevenson is an active researcher in the criminal-justice-and-economics literature. She argues that when you look at the published studies that use randomized control trial methods to evaluate ways of reducing crime, most of the studies don’t show a meaningful effect, and of those that do show a meaningful effect, the effect often isn’t replicated in follow-up studies. She mulls over this finding in “Cause, Effect, and the Structure of the Social World” (forthcoming in the Boston University Law Review when they get around to finalizing the later issues of 2023, pp. 2001-2027, but already available at the Review’s website).
This essay feels disheartening. Thus, the editors of Vital City online magazine asked a dozen or so social scientists to react. Here are a few of the reactions from a few of the essays that caught my eye:
When studying policies for long-standing problems, like reducing crime or improving education, we should expect that the results will often be negative, because that’s how reality is, a case made by Sharon Gleid. She writes:
Most new ideas fail. When tested, they show null results, and when replicated, apparent findings disappear. This is a truth that is in no way limited to social policy. Social science RCTs are modeled on medical research — but fewer than 2% of all drugs that are investigated by academics in preclinical trials are ultimately approved for sale. A recent study found that just 1 in 5 drugs that were successful after Stage 1 trials made it through the FDA approval process.
Even after drugs are approved for sale at the completion of the complex FDA process (involving multiple RCTs), new evidence often emerges casting those initial results in doubt. There’s a 1 in 3 chance that an approved drug is assigned a black-box warning or similar caution post-approval. And in most cases, the effectiveness of a drug in real-world settings, where it is prescribed by harried physicians and taken by distracted patients, is much lower than its effectiveness in trial settings, where the investigative team is singularly focused on ensuring that the trial adheres to the sponsor’s conditions — or where an academic investigator is focused on publishing a first-class paper. Most of the time, new ideas and products don’t work in the physical world either — and a darned good thing that is, or we’d be changing up everything all the time …
In contrast, most social science problems are very, very old, and the ideas we have to address them generally employ technologies that have existed for a long time. Our forebears were not all fools — if these strategies were successful, they’d almost certainly have been implemented already (and many have been, so we take them for granted). Operating near the feasible margin means recognizing that, even when they work, interventions are likely to have very modest effects. … It would be worrisome if there were big, effective criminal justice interventions out there that we had missed for centuries. Perhaps we should start our analysis by recognizing that we stand on the shoulders of centuries of social reformers and are operating fairly close to the feasible margin.
It’s implausible to expect transformational change from a randomized trial, but incremental gains can be real and meaningful, argues Aaron Chalfin.
[W]hy should we expect randomized experiments to produce evidence of transformational social change? That seems an impossible standard given that our world is shaped by human nature and a variety of unforgiving social and political constraints. If change is hard and most interventions fail to change the world in transformational ways, then it stands to reason that randomized control trial (RCT) evidence should reflect this seemingly fundamental truth. The fact that most RCT evidence is associated with modest impacts at best matches our understanding of the structure of the social world and serves as a sign that research evidence, rather than being subject to researcher biases, is credible. … Is a 5% or 10% improvement in a given problem a transformation — or is the only true transformation a much bigger result than that? More to the point, why is transformation, which is in the eye of the beholder, the standard to which we must adhere? … Lots of stuff we try indeed doesn’t make much of a difference. But at the same time, RCTs have also led to genuine learning — both about what fails and what succeeds.
Positive incremental reforms do happen over time, and in fact can be better than leaping into a unknown big-picture change, argue Philip J. Cook and Jens Ludwig.
In these hyperpolarized times, there’s a growing view that the quality of life can’t be improved by modest policy interventions that are limited with respect to scope and scale. In this view, interventions need to be bold and broad, or the status quo will inevitably reassert itself. Of course, the evidence base for “bold and broad” interventions is often nonexistent, so what is really being advocated is a giant leap into the unknown — what one might call, for lack of a better term, the “you only live once” (YOLO) approach to policy. We disagree.
Ludwig and Cook point to examples of success, including the gradual spread of compulsory K-12 school attendance, or a program in Chicago that reduced violence among young men with a combination of behavioral counseling and mentors.
Maybe the randomized control trial method shouldn’t be viewed as the “gold standard” methodology for causal evidence, a case made by Anna Harvey.
For those not familiar with the idea of a “randomized control trial,” the basic idea is that a group of people are randomly divided. Some get access to the program or the intervention or are treated in a certain way, while others do not. Because the group was randomly divided, a researcher can then just compare the outcomes between the treated and untreated group. This approach is sometimes called a “gold standard” methodology, because it’s straightforward and persuasive. But of course, no method is infallible. One can always ask questions like: “Was it really random?” “Was some charismatic person involved in the treatment in a way that won’t carry over to future projects?” “Was the sample size big enough to draw a reliable result?” “Did the researcher study a bunch of treatments, on a number of groups, but then only publish the few results that looked statistically significant?”
Harvey points out that there are a number of “quasi-experimental” methods where the randomness is not designed by a research study, but instead emerges from a situation. For example, some public programs are rolled out in different places at different times, and if the roll-out is random, one can compare across these places. Sometimes a program is set up so all of those above a certain score can enter a program, while others below the score cannot. Comparing those who are just barely above the benchmark with those just barely below it–who are likely to be quite similar in other ways, can offer a useful comparison. One can look at a certain trendline before and after a given event, and see if it has shifted.
Studies may have gains for participants in terms of exposing them to expertise and experiences they would not otherwise have had, argues John Maki.
I can’t think of a reform I’ve worked on where the process wasn’t as, or arguably even more, valuable than the outcomes it produced. For instance, when I led the state of Illinois’ public safety grantmaking and research agency from 2015-2019, my colleagues and I created a multiyear funding opportunity for medium-sized cities to implement evidence-based programs to reduce gun violence. The award also required grantees to meet regularly with my staff and subject matter experts to talk about their experiences. While the funding ended several years ago, I still hear from people who were part of the program. They talk not so much about the outcomes their work produced but about the relationships they built with experts they otherwise would never have met, what they learned about working with state grantmakers and researchers, and what they learned about better engaging their community.
