Big Data in Political Campaigns

How does the collection and use of big data work in political campaigns? David W. Nickerson and Todd Rogers pull back the curtain and offer a glimpse of what\’s been  happening in \”Political Campaigns and Big Data,\” which appears in the Spring 2014 issue of the Journal of Economic Perspectives.  Nickerson is a Notre Dame professor of political science who was \”`Director of Experiments\’ in the Analytics Department in the 2012 re-election campaign of President  Obama.\”  Rogers is a professor of public policy at Harvard\’s Kennedy school who \”co-founded the Analyst Institute, which uses field experiments and behavioral science insights to develop best practices in progressive political communications.\” They write:

Over the past six years, campaigns have become increasingly reliant on analyzing large and detailed datasets to create the necessary predictions. While the adoption of these new analytic methods has not radically transformed how campaigns operate, the improved efficiency gives data-savvy campaigns a competitive advantage. This has led the political parties to engage in an arms race to leverage ever-growing volumes of data to create votes. This paper describes the utility and evolution of data in political campaigns. The techniques used as recently as a decade or two ago by political campaigns to predict the tendencies of citizens appear extremely rudimentary by current standards.

Like all articles in JEP back to the first issue in 1987, it is freely available courtesy of the American Economic Association. (Full disclosure: I\’ve been managing editor of JEP back to that first issue in 1987.) Here are some points from their essay that jumped out at me.

The starting point for gathering data on potential voters are the publicly available files of official voters maintained in each state. As Nickerson and Rogers write: \”The official voter file contains
a wide range of information. In addition to personal information such as date of birth and gender, which are often valuable in developing predictive scores, voter files also contain contact information such as address and phone.\” In addition, while the files of course don\’t record who anyone voted for, they do show whether people voted, and how they voted–say, on Election Day, or using some form of early or absentee voting.

This data can then be merged with data from other sources. Census data is available for the average of a voting precinct, showing \”the average household income, average level of education, average number of children per household, and ethnic distribution\” for that precinct.

Additional data can be purchased from commercial firms. Nickerson and Rogers report that the most cost-effective data to purchase is updated phone numbers (because the phone numbers in the state voter registration files are often outdated after a few years) as well as data about \”estimated years of education, home ownership status, and mortgage information.\” Other information, while available, isn\’t cost-effective to buy. They write: \” In contrast, information on magazine subscriptions, car purchases, and other consumer tastes are relatively expensive to purchase from vendors, and also tend to be available for very few individuals. Given this limited coverage, this data tends not to be useful in constructing predictive scores for the entire population—and so campaigns generally avoid or
limit purchases of this kind of consumer data.\”

Finally, a major source of voter information is provided by voters themselves when they sign up at a candidate\’s website or party website. Not only do people provide information directly, but the campaign can also keep track of what sorts of topics or messages cause people to respond by clicking on a link or donating money, so much can be learned about people in that way.

These sources of information have some interesting implications. Campaigns know more about those who vote, and who are politically active, than about those who don\’t vote regularly or who are not politically active. Campaigns also tend to know more about their own supporters.  Nickerson and Rogers write: \”To the extent that predictive scores are useful and reveal true unobserved characteristics about citizens, it means that multiple organizations will produce predictive scores that recommend targeting the same sets of citizens. For example, some citizens might find themselves
contacted many times, while other citizens—like those with low turnout behavior scores in 2012—might be ignored by nearly every campaign.\”

After collecting and collating and coordinating all this data, the question is how to use it. Nickerson and Rogers point out that focusing on those who are already very likely to vote for you, or focusing on those who are already very likely to vote against you, tends to be a waste of money. Thus, one way that data can make a campaign more cost-effective is that it can minimize spending money on those who are unpersuadeable or who are already persuaded. This also reduces the risk of \”backlash,\” in which attempts to encourage voting for your candidate revs up voters for the other side.

Another possible advantage is that campaigns can run small-scale experiments  about what messages or actions are likely to cause a certain slice of voters to take an action–clicking on a link, volunteering time, putting up a sign, giving money–that is likely to be correlated with voting for the candidate later on. When small-scale experiments have shown what steps are likely to be effective, then the approach can be used at larger scale. How effective can such steps be? They write: \”Suppose a campaign’s persuasive communications has an average treatment effect of 2 percentage points—a number
on the high end of persuasion effects observed in high-expense campaigns: that is, if half of citizens who vote already planned to vote for the candidate, 52 percent would support the candidate after the persuasive communication.\”

Nickerson and Rogers point out in their conclusion that while using big data to drive campaigning, in a very real way, makes traditional boots-on-the-ground campaigning more important than ever. After all, the bottom line of the campaign is still to push for more of your voters to turn out. Big data can help a campaign allocate resources more cost-effectively, but campaign still needs to do the actual work.

\”The improved capability to target individual voters offers campaigns an opportunity to concentrate their resources where they will be most effective. This power, however, has not radically transformed the nature of campaign work. One could argue that the growing impact of data analytics in campaigns has amplified the importance of traditional campaign work. . . . Professional phone interviews are still used for message development and tracking, but they are also essential for developing predictive scores of candidate support and measuring changes in voter preferences in randomized experiments. Similarly, better targeting has made grassroots campaign tactics more efficient and therefore more cost competitive with mass communication forms of outreach. Volunteers still need to persuade skeptical neighbors, but they are now better able to focus on persuadable neighbors and use messages more likely to resonate. This leads to higher-quality interactions and (potentially) a more pleasant volunteer experience. So while savvy campaigns will harness the power of predictive scores, the scores will only help the campaigns that were already effective.\”

Work Philosophy from Gabriel García Márquez

I\’m often at least a few beats behind the tune on news that doesn’t involve economics or policy, so I just heard a few days ago that Gabriel García Márquez who won the Nobel Prize in Literature in 1982 for One Hundred Years of Solitude and other works, died on April 17. I could see the genius in his work, but it was never among my favorites: the magic in his “magic realism” felt to me a little too contrived and mannered. But I was reading in English translation, not in Spanish, and what do I know about literature, anyway?

I do have a quotation from Marquez up on my office door that conveys a home truth about my own work life. It\’s from an interview with him that was published in the Boston Review (March-April 1983, pp. 26-27), and later reprinted in the 2006 collection Conversations with Gabriel Garcia Marquez, edited by Gene H. Bell-Villada (p. 137). He was asked about how he felt about One Hundred Years of Solitude being used as a required reading in college courses and cited by academics. Here’s part of his answer:

“On another occasion a sociologist from Austin, Texas, came to see me because he’d grown dissatisfied with his methods, found them arid, insufficient. So he asked me what my own method was. I told him I didn’t have a method. All I do is read a lot, think a lot, and rewrite constantly. It’s not a scientific thing.”

I’m the managing editor of an academic economics journal, and an occasional lecturer and writer. That ethic might serve as a useful motto for editors everywhere.

Farewell to Notes

When the first issue of the American Economic Review, which would become the preeminent research journal in academic economics, was published back in 1911, it devoted 13 pages to \”Notes\”–that is, news about the profession of economics. At a time when the number of academic economists was much smaller, and methods of broad-based communication were much slower, the \”Notes\” included mentions of conferences that had already happened, books that were soon to be published, contributions of historical papers to libraries, even the sabbatical plans for some prominent economists. When I took the job as managing editor of the Journal of Economic Perspectives in 1987, we inherited the \”Notes\” from the AER. But now, after a run of 103 years, the rise of the web means that the time has come to stop publishing of conferences announcements, calls for papers, awards, and the like in a quarterly journal–or indeed on paper at all.  

In the just-released Spring 2014 issue of JEP, I commemorated the occasion with a  \”Farewell to Notes.\” Here are the opening and closing paragraphs:

The great composer Johannes Brahms once remarked: “It is not difficult to compose; but it is incredibly difficult to let the superfluous notes drop under the table” (as quoted in Musgrave and Pascall 1987, p. 138). Here at the Journal of Economic Perspectives, the challenges of composing each issue remain, but the \”Notes” have become superfluous, at least in their paper version.

The “Notes,” as those who lurk in these back pages of JEP know well, announce forthcoming conferences, calls for papers, awards, and the like. However, the Internet has made it obsolete to deliver such information on paper in a quarterly journal. … But as we say farewell to the print version of the “Notes,” a moment of remembrance seems appropriate. The first issue of the American Economic Review, published in 1911, found it worthwhile to devote 13 out of 219 total pages to “Notes.” …

Admittedly, the ending of the “Notes” section as printed within the covers of the Journal of Economic Perspectives doesn’t rank with some of the other great endings, like the revelation of what Citizen Kane meant by “Rosebud”; or “Forget it, Jake, it’s Chinatown”; or “Oh, Auntie Em, there’s no place like home!” But in its own small way, the end of the paper version of the “Notes” after its run of 103 years is one more sign of the remarkable changes in information and communication technology that surround us—and thus worth remarking.

Spring 2014 Journal of Economic Perspectives

The Spring 2014 issue of the Journal of Economic Perspectives is now freely available on-line, courtesy of the publisher, the American Economic Association. Indeed, not only this issue but all previous issues back to 1987 are available. (Full disclosure: I\’ve been the Managing Editor since the journal started, so this issue is #108 for me.) I\’ll probably blog about some of these articles in the next week or two. But for now, I\’ll first list the table of contents, and then below will provide abstracts of articles and weblinks.

Symposium: Big Data

\”Big Data: New Tricks for Econometrics,\”  by Hal R. Varian
\”High-Dimensional Methods and Inference on Structural and Treatment Effects,\” by Alexandre Belloni, Victor Chernozhukov and Christian Hansen
\”Political Campaigns and Big Data,\” by David W. Nickerson and Todd Rogers
\”Privacy and Data-Based Research,\” by Ori Heffetz and Katrina Ligett

Symposium: Global Supply Chains

\”Slicing Up Global Value Chains,\” by  Marcel P. Timmer, Abdul Azeez Erumban, Bart Los, Robert Stehrer and Gaaitzen J. de Vries
\”Five Facts about Value-Added Exports and Implications for Macroeconomics and Trade Research,\” by Robert C. Johnson

Articles and Features

\”Raj Chetty: 2013 Clark Medal Recipient,\” by Martin Feldstein
\”Fluctuations in Uncertainty,\” by Nicholas Bloom
\”The Market for Blood,\” by Robert Slonim, Carmen Wang and Ellen Garbarino
\”Retrospectives: The Cyclical Behavior of Labor Productivity and the Emergence of the Labor Hoarding Concept,\” by Jeff E. Biddle
\”Recommendations for Further Reading,\” by Timothy Taylor
\”Correction and Update: The Economic Effects of Climate Change,\” by Richard S. J. Tol
\”Farewell to Notes,\” by Timothy Taylor

________________________

And here are the abstracts and links:

Symposium: Big Data


\”Big Data: New Tricks for Econometrics,\”  by Hal R. Varian

Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of thes e tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists.
Full-Text Access | Supplementary Materials

\”High-Dimensional Methods and Inference on Structural and Treatment Effects,\” by Alexandre Belloni, Victor Chernozhukov and Christian Hansen

Data with a large number of variables relative to the sample size—\”high-dimensional data\”—are readily available and increasingly common in empirical economics. High-dimensional data arise through a combination of two phenomena. First, the data may be inherently high dimensional in that many different characteristics per observation are available. For example, the US Census collects information on hundreds of individual characteristics and scanner datasets record transaction-level data for households across a wide range of products. Second, even when the number of available variables is relatively small, researchers rarely know the exact functional form with which the small number of variables enter the model of interest. Researchers are thus faced with a large set of potential variables formed by different ways of interacting and transforming the underlying variables. This paper provides an overview of how innovations in \”data mining\” can be adapted and modified to provide high-quality inference about model parameters. Note that we use the term \”data mining\” in a modern sense which denotes a principled search for \”true\” predictive power that guards against false discovery and overfitting, does not erroneously equate in-sample fit to out-of-sample predictive ability, and accurately accounts for using the same data to examine many different hypotheses or models.
Full-Text Access | Supplementary Materials

\”Political Campaigns and Big Data,\”  by David W. Nickerson and Todd Rogers

Modern campaigns develop databases of detailed information about citizens to inform electoral strategy and to guide tactical efforts. Despite sensational reports about the value of individual consumer data, the most valuable information campaigns acquire comes from the behaviors and direct responses provided by citizens themselves. Campaign data analysts develop models using this information to produce individual-level predictions about citizens\’ likelihoods of performing certain political behaviors, of supporting candidates and issues, and of changing their support conditional on being targeted with specific campaign interventions. The use of these predictive scores has increased dramatically since 2004, and their use could yield sizable gains to campaigns that harness them. At the same time, their widespread use effectively creates a coordination game with incomplete information between allied organizations. As such, organizations would benefit from partitioning the electorate to not duplicate efforts, but legal and political constraints preclude that possibility.
Full-Text Access | Supplementary Materials

\”Privacy and Data-Based Research,\” by Ori Heffetz and Katrina Ligett

What can we, as users of microdata, formally guarantee to the individuals (or firms) in our dataset, regarding their privacy? We retell a few stories, well-known in data-privacy circles, of failed anonymization attempts in publicly released datasets. We then provide a mostly informal introduction to several ideas from the literature on differential privacy, an active literature in computer science that studies formal approaches to preserving the privacy of individuals in statistical databases. We apply some of its insights to situations routinely faced by applied economists, emphasizing big-data contexts.
Full-Text Access | Supplementary Materials

Symposium: Global Supply Chains

\”Slicing Up Global Value Chains,\” by Marcel P. Timmer, Abdul Azeez Erumban, Bart Los, Robert Stehrer and Gaaitzen J. de Vries

In this paper, we \”slice up the global value chain\” using a decomposition technique that has recently become feasible due to the development of the World Input-Output Database. We trace the value added by all labor and capital that is directly and indirectly needed for the production of final manufacturing goods. The production systems of these goods are highly prone to international fragmentation as many stages can be undertaken in any country with little variation in quality. We seek to establish a series of facts concerning the global fragmentation of production that can serve as a starting point for future analysis. We describe four major trends. First, international fragmentation, as measured by the foreign value-added content of production, has rapidly increased since the early 1 990s. Second, in most global value chains there is a strong shift towards value being added by capital and high-skilled labor, and away from less-skilled labor. Third, within global value chains, advanced nations increasingly specialize in activities carried out by high-skilled workers. Fourth, emerging economies surprisingly specialize in capital-intensive activities.
Full-Text Access | Supplementary Materials

\”Five Facts about Value-Added Exports and Implications for Macroeconomics and Trade Research,\” by Robert C. Johnson

Due to the rise of global supply chains, gross exports do not accurately measure the amount of value added exchanged between countries. I highlight five facts about differences between gross and value-added exports. These differences are large and growing over time, currently around 25 percent, and manufacturing trade looks more important, relative to services, in gross than value-added terms. These differences are also heterogenous across countries and bilateral partners, and changing unevenly across countries and partners over time. Taking these differences into account enables researchers to obtain better quantitative answers to important macroeconomic and trade questions. I discuss how the facts inform analysis of the transmission of shocks across countries; the mechanics of trade balance adjustments; the impact of frictions on trade; the role of endowments and comparative advantage; and trade policy.
Full-Text Access | Supplementary Materials

Articles and Features

\”Raj Chetty: 2013 Clark Medal Recipient,\” by Martin Feldstein

Raj Chetty is eminently deserving of being awarded the John Bates Clark Medal at the age of 33. His research has transformed the field of public economics. His work is motivated by important public policy issues in the fields of taxation, social insurance, and public spending for education. He approaches his subjects with a creative redefinition of the problems that he studies, and his empirical methods often draw on experimental evidence or unprecedentedly large sets of integrated data. While his work is founded on basic microeconomics, he modifies this framework to take into account behavioral and institutional considerations. Chetty is a prolific scholar. It is difficult to summarize all of Chetty\’s research or even to capture the details of his most significant papers. I have therefore chosen a selection of Chetty\’s important papers dealing with taxation, social insurance, and education that contributed to his selection as the winner of the John Bates Clark Medal.
Full-Text Access | Supplementary Materials

\”Fluctuations in Uncertainty,\” by Nicholas Bloom

Uncertainty is an amorphous concept. It reflects uncertainty in the minds of consumers, managers, and policymakers about possible futures. It is also a broad concept, including uncertainty over the path of macro phenomena like GDP growth, micro phenomena like the growth rate of firms, and noneconomic events like war and climate change. In this essay, I address four questions about uncertainty. First, what are some facts and patterns about economic uncertainty? Both macro and micro uncertainty appear to rise sharply in recessions and fall in booms. Uncertainty also varies heavily across countries—developing countries appear to have about one-third more macro uncertainty than developed countries. Second, why does uncertainty vary during business cycles? Third, do fluctuations in uncertainty affect behavior? Fourth, has higher uncertainty worsened the Great Rec ession and slowed the recovery? Much of this discussion is based on research on uncertainty from the last five years, reflecting the recent growth of the literature.
Full-Text Access | Supplementary Materials

\”The Market for Blood,\” by Robert Slonim, Carmen Wang and Ellen Garbarino

Donating blood, \”the gift of life,\” is among the noblest activities and it is performed worldwide nearly 100 million times annually. The economic perspective presented here shows how the gift of life, albeit noble and often motivated by altruism, is heavily influenced by standard economic forces including supply and demand, economies of scale, and moral hazard. These forces, shaped by technological advances, have driven the evolution of blood donation markets from thin one-to-one \”marriage markets\” in which each recipient needed a personal blood donor, to thick, impersonalized, diffuse markets. Today, imbalances between aggregate supply and demand are a major challenge in blood markets, including excess supply after disasters and insufficient supply at other times. These imbalances are not unexpected given that the blood market operate s without market prices and with limited storage length (about six weeks) for whole blood. Yet shifting to a system of paying blood donors seems a practical impossibility given attitudes toward paying blood donors and concerns that a paid system could compromise blood safety. Nonetheless, we believe that an economic perspective offers promising directions to increase supply and improve the supply and demand balance even in the presence of volunteer supply and with the absence of market prices.
Full-Text Access | Supplementary Materials

\”Retrospectives: The Cyclical Behavior of Labor Productivity and the Emergence of the Labor Hoarding Concept,\” by Jeff E. Biddle

The concept of \”labor hoarding,\” at least in its modern form, was first fully articulated in the early 1960s by Arthur Okun (1963). By the end of the 20th century, the concept of \”labor hoarding\” had become an accepted part of economists\’ explanations of the workings of labor markets and of the relationship between labor productivity and economic fluctuations. The emergence of this concept involved the conjunction of three key elements: the fact that measured labor productivity was found to be procyclical, rising during expansions and falling during contractions; a perceived contradiction with the theory of the neoclassical firm in a competitive economy; and a possible explanation based on optimizing behavior on the part of firms. Each of these three elements—fact, contradiction , and explanation—has a history of its own, dating back to at least the opening decades of the twentieth century. Telling the story of the emergence of the modern labor hoarding concept requires recounting these three histories, histories that involve the work of economists motivated by diverse purposes and often not mainly, if at all, concerned with the questions that the labor hoarding concept was ultimately used to address. As a final twist to the story, the long-standing positive relationship between labor productivity and output in the US economy began to disappear in the late 1980s; and during the Great Recession, labor productivity rose while the economy contracted.
Full-Text Access | Supplementary Materials

\”Recommendations for Further Reading,\” by Timothy Taylor
Full-Text Access | Supplementary Materials

\”Correction and Update: The Economic Effects of Climate Change,\” by Richard S. J. Tol

Gremlins intervened in the preparation of my paper \”The Economic Effects of Climate Change\” published in the Spring 2009 issue of this journal. In Table 1 of that paper, titled \”Estimates of the Welfare Impact of Climate Change,\” minus signs were dropped from the two impact estimates, one by Plambeck and Hope (1996) and one by Hope (2006). In Figure 1 of that paper, titled \”Fourteen Estimates of the Global Economic Impact of Climate Change,\” and in the various analyses that support that figure, the minus sign was dropped from only one of the two estimates. The corresponding Table 1 and Figure 1 presented here correct these errors. Figure 2 titled,\”Twenty-One Estimates of the Global Economic Impact of Climate Change\” adds two overlooked estimates from before the time of the original 2009 paper and five more recent ones.
Full-Text Access | Supplementary Materials

\”Farewell to Notes,\” by Timothy Taylor
Full-Text Access | Supplementary Materials

Highway Patrol Traffic Enforcement

When trying to estimate the extent to which law enforcement efforts reduce crime, there\’s a standard problem of thinking through cause and effect.  If a neighborhood with a lot of crime gets more police, then a higher number of police will be correlated with higher crime rates–but the crime caused the higher police presence, not the other way around. There are many more traffic police out on New Year\’s Eve, and also many more drunk drivers, but that doesn\’t mean that the added police caused additional drunkenness. Ideally, researchers would find an experiment where the police presence was changed randomly, in a way that didn\’t reflect crime levels, in some places but not others, so that the effect of police could be studied. But random variations in the assignment of police officers, without looking at crime levels, is not a popular policy for politicians to support.

But sometimes an event occurs that offers what researchers call a \”natural experiment\”–that is, a situation where a variation in police presence occurred for reasons that had nothing to do with crime levels. Gregory DeAngelo and Benjamin Hansen take advantage of such an opportunity to look at how highway traffic patrols, looking for speeders and reckless drivers, affect fatalities in their paper, \”Life and Death in the Fast Lane: Police Enforcement and Traffic Fatalities.\” It appears in the most recent issue of the American Economic Journal: Economic Policy (6:2, pp. 231–257). The AEJ: Policy is not freely available on-line, but many readers will have access through a library subscription. The

Their bottom line: The highway patrol saves lives at a cost of about $309,000 per life. A standard metric among economists, the \”value of a statistical life,\” says that in the United States it is worth taking regulatory or law enforcement actions that reduce the risks of death when the costs of such actions are less than about $9 million per life.

Their story starts in 1997, when the state of Oregon passed a ballot proposition that placed sharp limits on property taxes. The state\’s public finances suffered, and after a ballot proposition to raise some additional tax revenues failed in 2003,  117 out of 354 full-time roadway troopers were laid off. The number of traffic citations given for speeding or reckless driving on Oregon highways fell by 25 percent.  At about the same time, the speed of Oregon drivers crept up. One measure comes from automated speed counters. Another comes from comparing the average speed someone was travelling when they received a speeding ticket to the posted speed.

The number of motor vehicle deaths in Oregon rose. Here\’s a graph showing the number of incapacitating injuries or deaths in Oregon, using monthly data for the three years before and after the reduction in highway troopers. The data varies through the year, with more fatalities and accidents in the summer driving season. But you can see that the peaks get higher when the number of police is reduced.

There are various ways to estimate the effects of reducing the highway patrol in Oregon. One can look before and after the change. One can do a comparison with trends in neighboring states, like Washington and Idaho. The authors do these, and also They also generate a \”synthetic\” control group of states which show similar patterns to Oregon before change; in this case, the synthetic control group turns out to be Idaho, Washington, Nevada, and West Virginia, with differing weights on these states. The exact estimates vary, but looking over a longer period of time, they argue:

\”An analysis of the reduction in state police in Oregon since 1979 suggests that there
would have been 2,167 fewer deaths over the 1979–2005 time span if the state police
had maintained their original 1979 staffing levels. Moreover, if the police force were
allowed to grow at the same rate as the increases in VMT [vehicle-miles travelled] (which would amount to a 360 percent increase over actual staffing levels in 2005), then there would have been 5,031 fewer traffic fatalities over 1979–2005.\”

Of course, one can debate how the Oregon numbers would extrapolate to other states, or whether the benefits of adding more highway patrol officers would be symmetric with the costs of reducing the number. But overall, the US has about 30,000 deaths from motor vehicle accidents each year, plus 1.5 million injuries. A few years ago, the Centers for Disease Control estimated that motor vehicle fatalities cost about $41 billion of medical care and lost work. Of course, this estimate is an underestimate of the total social costs, because it doesn\’t place any value on the actual lives lost, nor on the suffering and other costs of motor vehicle injuries (as opposed to deaths). It seems very plausible that increasing the numbers of the highway patrol would be a cost-effective way of saving lives.