The Utterly Predictable Problem of Long-Run US Budget Deficits

For anyone who can do arithmetic, it did not come as a surprise that the \”baby boom generation,\” born from 1946 up through the early 1960s, started turning 65 in 2010. Here\’s the pattern over time of the \”Daily Average Number of People Turning 65.\” The jump of the boomer generation is marked.

Because two major federal spending programs are focused on older Americans–Social Security and Medicare–it has been utterly predictable for several decades that the long-run federal budget situation would come under strain at about this time. That figure comes from a report by the US Government Accountability Office, \”The Nation’s Fiscal Health Action Is Needed to Address the Federal Government’s Fiscal Future\” (April 2019).

Here\’s a breakdown of the GAO predictions on federal spending for the next 30 years. The \”all else\” category bumps up about 0.6% of GDP during this time, and normal politics could deal with that easily enough. Social Security spending is slated to bump up about 1% of GDP, which is a bigger problem. Still, some mixture of limits on benefits (like a later retirement age) and a modest bump in the payroll tax rate could deal address this. Indeed, it seems to me an indictment of the US political class, from both parties, that no forward-looking politician has built a movement around steps to \”save Social Security.\”

But the projected rise in government health care spending of 3.2% of GDP is a challenge that no one seems to know how to fix. It\’s a combination of the rising share of older people, and in particular the rising share of the very-old who are more likely to face needs for nursing home and Alzheimer\’s care, combined with an overall trend toward higher per person spending on health care. As I\’ve noted before, every dollar of government health care spending represents both care for a patient and income for a provider, and both groups will fight hard against cutbacks.

Meanwhile, this rise in spending, coupled with the assumption that tax revenues remain on their current trajectory, means higher government deficits and borrowing. Over time, this also means higher government interest payments. So if we lack the ability to control the rise in deficits, interest payments soar. In 2018, about 7.9% of federal spending is interest payments on past borrowing. By 2048, on these projections, about 22% of all federal spending will be interest payments on past debt. Of course, this also means that finding ways to reduce the deficit with spending and tax changes also has the benefit that it avoids this soaring rise in interest payments.

As a side note, I thought a figure in the GAO report showing who holds US debt was interesting.  The report notes:

Domestic investors—consisting of domestic private investors, the Federal Reserve, and state and local governments—accounted for about 60 percent of federal debt held by the public as of June 2018, while international investors accounted for the remaining 40 percent. International investors include both private investors and foreign official institutions, such as central banks and national government-owned investment funds. Central banks hold foreign currency reserves to maintain exchange rates or to facilitate trade. Therefore, demand for foreign currency reserves can affect overall demand for U.S. Treasury securities. An economy open to international investment, such as the United States, can essentially borrow the surplus of savings of other countries to finance more investment than U.S. national saving would permit. The flow of foreign capital into the United States has gone into a variety of assets, including Treasury securities, corporate securities, and direct investment.

The arguments for restraint on federal borrowing are fairly well-known. There\’s the issue that too much debt means less ability to respond to a future recession, or some other crisis. There\’s an issue that high levels of federal borrowing soak up investment capital that might have been used more productively elsewhere in the economy. There\’s a concern that federal borrowing is financing a fundamental shift in the nature of the federal government: it used to be that most of government spending was about making investments in the future–infrastructure, research and development, education, and so on–but over the decades it has become more and more about cutting checks for immediate consumption in spending programs.

But at the moment, I won\’t argue for this case in any detail, or offer a list of policy options. Those who want to come to grips with the arguments should look at William Gale\’s just-published book, Fiscal Therapy: Curing America\’s Debt Addiction and Investing in the Future,  For an overview of Gale\’s thinking, a useful starting point is the essay \”Fiscal therapy: 12 framing facts and what they mean\” (Brookings Instution, April 3, 2019).

Of course, I personally like some of Gale\’s proposals better than others. But overall, what I really like is that he takes the issue of rising government debt in the long-run seriously, and takes the responsibility of offering policy advice seriously. He doesn\’t wave his hands and assume that faster economic growth will bring in additional waves of tax revenue; or that \”taxing the rich\” is a magic elixir; or that the proportion of older people who are working will spike upward; or that we can ignore the deficit for five or ten or 15 years before taking some steps; or that government spending restraint needs nothing more than avoiding duplication, waste, and fraud. He uses mainstream estimates and makes concrete suggestions. For an outside analysis of his recommendations, John Ricco, Rich Prisinzano, and Sophie Shin run his proposals through the Penn Wharton Budget Model in \”Analysis of Fiscal Therapy: Conventional and Dynamic Estimates (April 9, 2019).

As a reminder, here\’s the pattern of US debt as a share of GDP since 1790. Through most of US history, the jumps in federal debt in the past are about wars: the Revolutionary War, the Civil War, World Wars I and II. There\’s also a jump in the 1930s as part of fighting the Great Depression, and the more recent jump which was part of fighting the Great Recession.

But the aging of the US population and rising health care spending are going to take federal spending and deficits to new (peacetime) levels. On the present path, we will surpass the previous high of federal debt at 106% of GDP in about 15-20 years. It would b imprudent to wait and see what happens.

US Attitudes Toward Federal Taxes: A Rising Share of "About Right"

The Gallup Poll has been asking Americans since the 1950s whether they think that the income tax they pay is \”too high\” or \”about right.\” The figure shows the responses over time including the 2019 poll, taken in early April. (The \”don\’t know\” and the \”too low\” answers are both quite small, and are not shown on the figure.)
Line graph. Americans’ opinions of the federal income tax they pay, since 1956.
What\’s interesting to me here is that from the late 1960s up through the 1990s, a healthy majority of Americans consistently viewed their income taxes as too high. But since around 2000, the gap has been much narrower. Indeed, the share of Americans saying that their income taxes are \”about right\” has been at its highest historical level in 2018 and 2019.

In more detailed questions, the Gallup poll also asks different income groups, and whether they are paying too much, a fair amount, or too little. Unsurprisingly, the general sentiment is that those with upper income levels could pay more. But there are some patterns I wouldn\’t necessarily have expected in the results, as well. Here are the results for what people think about what upper-income people are paying in federal taxes (for readability, I omitted the 2-4% in the \”don\’t know\” column):

What jumps out at me is that the proportion saying that upper-income people are paying a fair share of federal taxes was at its highest ever in 2019, while the share saying that upper-income people are paying \”too little\” was at its lowest ever. Yes, there\’s still a majority saying that upper-income people are paying \”too little.\” But it\’s a shrinking majority. Given trends toward greater inequality of incomes over the 25 years shown in the table, I wouldn\’t have expected that pattern.

Here\’s the pattern for federal taxes paid by middle-income people:

Here, you can see the pattern from the figure above: that is, the share saying that middle-income people pay too much is falling, while the share saying that they pay their fair share seems to have risen over time.

For the share paid by lower-income people, the pattern looks like this:

Overall, the most common answer is that lower-income people pay \”too much\” in federal taxes. But it\’s interesting that the share saying that lower income people pay \”too little\” is higher than the share saying that middle-income people pay \”too little.\” The share saying that lower-income people pay \”too little\” in federal taxes was especially high in 2014 and 2015, although it\’s dropped off a little since then,

Of course, questions about whether a tax code is fair are going to be influenced by political partisanship. For example, here are some poll responses from the Pew Research Center. Their polling shows that the share of American thinking the federal tax system is very or moderately fair rose overall slightly in the last year overall. However, this modest overall rise is a result of Republicans being much more willing to say the tax system is fair than at any time in the last 20 years, and Democrats being much less willing to say so.
Widest partisan gap in views of fairness of tax system in at least two decades

As I\’ve pointed out in the past, poll responses on economic questions like whether free trade offers benefits are also influenced dramatically by political preferences. During the past couple of years, as President Trump has inveighed against international trade and called for protectionism, Democrats have suddenly become much more positive about trade. One suspects that this pattern emerged more from anti-Trump feeling than from increased time spent reading economics textbooks.

Still, it\’s interesting to me that a plurality of American now see the federal income tax as \”about right,\” while the proportion saying that upper-incomes pay too little is down, the proportion saying that middle-incomes pay a fair share is up, and the proportion saying that lower incomes pay too little has risen. Perhaps politicians who call for cutting taxes, or for dramatic tax increases, are refighting battles from the 1990s that are of less relevance to current voters.

The Captain Swing Riots; Workers and Threshing Machines in the 1830s

\”Between the summer of 1830 and the summer of 1832, riots swept through the English countryside. Over no more than two years, 3,000 riots broke out – by far the largest case of popular unrest in England since 1700. During the riots, rural laborers burned down farmhouses, expelled overseers of the poor and sent threatening letters to landlords and farmers signed by the mythical character known as Captain Swing. Most of all, workers attacked and destroyed threshing machines.\”

 Bruno Caprettini and  Joachim Voth provide a readable overview of their research on the riots in \”Rage against the machines New technology and violent unrest in industrializing England,\” written as a Policy Brief for the UBS International Center of Economics in Society (2018, #2). They write:

\”Threshing machines were used to thresh grain, especially wheat. Until the end of the 1700s, threshing grain was done manually and it was the principal form of employment in the countryside during the winter months. Starting from the Napoleonic Wars (1803-1815), threshing machines spread across England, replacing workers. Horse-driven or water-powered threshers could finish in a matter of weeks a task that would have normally kept workers busy for months. Their use arguably depressed the wages of rural workers.\” 

Here\’s a figure showing locations of the Captain Swing riots: 
The authors collect evidence about where threshing machines were being adopted based on newspaper advertisements for the sale of farms–which listed threshing machines at the farm as well as other property included with the sale. They show a correlation between the presence of more threshing machines and rioting. But as always, correlation doesn\’t necessarily  mean causation. For example, perhaps areas where local workers were already more rebellious and uncooperative were more likely to adopt threshing machines, and the riots that followed only show why local farmers didn\’t want to deal with their local workers. 
Thus, the authors also collect evidence on what areas were especially good soil for wheat, which makes using a thresher more likely, and what areas had water-power available to run threshers. it turns out that these areas are also where the threshers were more likely to be adopted. So a more plausible explanation seems to be that the new technology was adopted where it was most likely to be effective, not because of pre-existing local stroppiness. 
The Captain Swing riots are thus one more example, an especially vivid one, that new technologies which cause a lot of people to lose a way of earning income can be highly disruptive. The authors write: \”The results suggest that in one of the most dramatic cases of labor unrest in recent history, labor-saving technology played a key role. While the past may not be an accurate guide to future upheavals, evidence from the days of Captain Swing serve as a reminder of how disruptive new, labor-saving technologies can be in economic, social and political terms.\”

Building Worker Skills in a Time of Rapid Technological Change

 I\’m congenitally suspicious of \”this time is different\” arguments, which often seem very quick to toss out historical experience for the sake of a lively narrative. So when I find myself in discussions of  whether the present wave of technological change is unprecedented or unique, I often end up making the argument that while the new technologies are obviously different in their specifics from older technologies, the fact of technology leading to very dramatic disruptions of labor markets is not at all new. To me, the more interesting questions are question how the economy, government, and society react to that ongoing pattern of technological change.

Conor  McKay, Ethan Pollack, and Alastair Fitzpayne offer a useful broad overview of these issues in \”Automation and a Changing Economy,\” a two-part report written for the Aspen Institute Future of Work Initiative (April 2019). The first volume focuses on the theme \”A Case for Action,\” with background on how technological change and automation has affected labor markets over time, while the second volume is \”Polices for Shared Prosperity,\” with a list of policy options.

It may turn out to be true that the current wave of technological innovation is uniquely different in some ways. (It\’s very hard to disprove that something might happen!)  But it\’s worth taking a moment to acknowledge that technologies of the past severely disrupted the US labor market, too. For example, here\’s a figure showing shifts in the pattern of US jobs over time: the dramatic rise in white-collar jobs, with falls in other areas.

And of course if one goes beyond broad skill categories and looks in more detail at jobs, the necessary skill mix has been changing quite substantially as well. Remember that in the 1970s, word-processing was mostly on typewriters; in the 1980s, written communications involves mail, photocopying, and sometimes fax machines; in the 1990s, no one carried a smartphone. It\’s not just changes in information technologies and the web, either. Workers across manufacturing and services jobs have had to learn how to use new generations of  physical equipment as well.

Of course, we can noodle back and forth over how new technologies might have bigger effects on labor markets. The report has some discussion of these issues, and dinner parties for economists have been built on less. But the lesson I\’d take away, to quote from the report, is: \”Automation need not be any more disruptive in the future than it has been in the past to warrant increased policy intervention.\”

One key issue in navigating technological change is how workers can obtain the skills that employers want. And here a problem emerges, which is that although employers were a primary source of such training in the past, they have backed away from this role. The report notes (footnotes omitted):

Employers traditionally have been the largest source of funding for workforce training, but businesses are training fewer workers than in the past. From 1996 to 2008, the percentage of workers receiving employer-sponsored or on-the-job training fell 42 percent and 36 percent, respectively. This decline was widespread across industries, occupations, and demographic groups. …  More recent data on employer-provided training has been mixed. Data from the Society for Human Resource Management suggests that employer-provided tuition assistance has been falling in recent years, from 66 percent of surveyed businesses offering tuition assistance benefits in 2008 down to 53 percent in 2017. Meanwhile, data from the Association for Training & Development suggests that employer training investments have been roughly flat over the last decade. …

[A]s unions have lost power and membership, … businesses have had a freer hand to hire already trained external candidates, often leading to fewer within-firm career pathways and  higher turnover. …

Public sector investment has declined, too. For example, WIOA Title I state grants, which fund the core of the federal workforce development system, have been cut by over 40 percent since 2001. The program is currently underfunded by $367 million relative to its authorized levels. Government spending on training and other programs  to help workers navigate job transitions is now just 0.1 percent of GDP, lower than all other OECD countries except for Mexico and Chile, and less than half of what it was 30 years ago.

Why have employers backed away from providing training? The report notes:

Without intervention, business investment in workers may continue to decline. In a recent Accenture survey of 1,200 CEOs and other top executives, 74 percent said that  they plan to use artificial intelligence to automate tasks in their workplace over the next three years. Yet only three percent reported planning to significantly increase investments in training over the same time period.

In part, the decline in employer-provided training can be explained by changes in the employer-employee relationship over the past forty years. … If businesses plan to retain employees over a long period, they will benefit more directly from their training investments. But as relationships between workers and businesses become less stable and short-term, businesses have a difficult time capturing the return on their training investments. The result is less investment in training even as the workforce requires greater access to skills training.

Recent legislation could accelerate this trend. Businesses often have to choose between using workers or machines to accomplish a task. The 2017 Tax Cuts and Jobs Act allows businesses to immediately expense the full cost of equipment purchases—including automation technology—rather than deduct the cost of the equipment over a period of time. By reducing the after-tax cost of investing in physical capital but not providing a similar benefit for investments in human capital, the legislation may further shift business priorities away from worker training.

There are a number of ways one might seek to rebuild connections between employers and job training. The report suggests an employer tax break for spending money on employee training, similar to the tax break now given for investing in research and development. A complementary approach would be to build through a dramatic expansion of the community college system, which has the advantage that it can train workers for an multiple-employer industry that is locally prominent. Yet another approach is a considerable expansion of apprenticeships. Yet another approach would be much greater support for \”active labor market policies,\” that assist workers with job search and training. 
A lot of the concern over adapting to technological change, and whether the economy is providing \”good jobs\” or devolving toward alternative \”gig jobs,\” seems to me rooted in concerns about the kind of attachment that exists between workers and employers.  It relates to the extent that workers feel engaged with their jobs, and to whether the worker and employer both have a plausible expectation that the job relationship is likely to persist for a time–allowing both of them to invest in acquiring skills with the possibility (or likelihood?) of a lasting connection in mind. 
Ultimately, it will matter whether employers view their employees as imperfect robots, always on the verge of being  replaced when the better robots eventually arrive, of whether they view their employees as worthy of investment in themselves. It\’s the difference between automation replacing workers, or complementing them. 

Interview with Preston McAfee: Economists and Tech Companies

David A. Price interviews R. Preston McAfee in the most recent issue of Econ Focus from the Federal Reserve Bank of Richmond (Fourth Quarter 2018, pp. 18-23). From the introduction to the interview:

\”Following a quarter-century career in academia at the California Institute of Technology, the University of Texas, and other universities, McAfee was among the first economists to move from academia to a major technology firm when he joined Yahoo in 2007 as chief economist. Many of the younger economists he recruited to Yahoo are now prominent in the technology sector. He moved to Google in 2012 as director of strategic technologies; in 2014, he joined Microsoft, where he served as chief economist until last year. McAfee combined his leadership roles in the industry with continued research, including on the economics of pricing, auctions, antitrust, and digital advertising. He is also an inventor or co-inventor on 11 patents in such wide-ranging areas as search engine advertising, automatically organizing collections of digital photographs, and adding user-defined gestures to mobile devices. While McAfee was still a professor in the 1990s, he and two Stanford University economists, Paul Milgrom and Robert Wilson, designed the first Federal Communications Commission auctions of spectrum.\”

Here are some comments from the interview that especially caught my eye–although the whole interview is worth reading:

On the antitrust and competition issues with the FAANG companies: 

Of course, a lot of the discussion today is focused on FAANG — Facebook, Apple, Amazon, Netflix, and Google. … First, let\’s be clear about what Facebook and Google monopolize: digital advertising. The accurate phrase is \”exercise market power,\” rather than monopolize, but life is short. Both companies give away their consumer product; the product they sell is advertising. While digital advertising is probably a market for antitrust purposes, it is not in the top 10 social issues we face and possibly not in the top thousand. Indeed, insofar as advertising is bad for consumers, monopolization, by increasing the price of advertising, does a social good. 

Amazon is in several businesses. In retail, Walmart\’s revenue is still twice Amazon\’s. In cloud services, Amazon invented the market and faces stiff competition from Microsoft and Google and some competition from others. In streaming video, they face competition from Netflix, Hulu, and the verticals like Disney and CBS. Moreover, there is a lot of great content being created; I conclude that Netflix\’s and Amazon\’s entry into content creation has been fantastic for the consumer. …

That leaves Apple, and the two places where I think we have a serious tech antitrust problem. We have become dependent on our phones, and Apple does a lot of things to lock in its users. The iMessage program and FaceTime are designed to force people into the Apple ecosystem. Also, Apple\’s app store is wielded strategically to lock in users (apps aren\’t portable), to prevent competition with Apple services, and to prevent apps that would facilitate a move to Android. My concern is that phones, on which we are incredibly dependent, are dominated by two firms that don\’t compete very strongly. While Android is clearly much more open than Apple, and has competing handset suppliers, consumers face switching costs that render them effectively monopolized. …

The second place I\’m worried about significant monopolization is Internet service. In many places, broadband service is effectively monopolized. For instance, I have only one company that can deliver what anyone would reasonably describe as broadband to my house. The FCC says I have two, but one of these companies does not actually come to my street. I\’m worried about that because I think broadband is a utility. You can\’t be an informed voter, you can\’t shop online, and you probably can\’t get through high school without decent Internet service today. So that\’s become a utility in the same way that electricity was in the 1950s. Our response to electricity was we either did municipal electricity or we did regulation of private provision. Either one of those works. That\’s what we need to do for broadband.

Using \”double machine-learning\” to separate seasonal and  price effects

Like most computer firms, Microsoft runs sales on its Surface computers during back-to-school and the December holidays, which are also the periods when demand is highest. As a result, it is challenging to disentangle the effects of the price change from the seasonal change since the two are so closely correlated. My team at Microsoft developed and continues to use a technology to do exactly that and it works well. This technology is called \”double ML,\” double machine learning, meaning it uses machine learning not once but twice.

This technique was originally created by some academic economists. Of course, as with everything that\’s created by academic economists, including me, when you go to apply it, it doesn\’t quite work. It almost works, but it doesn\’t quite work, so you have to change it to suit the circumstances.

What we do is first we build a model of ourselves, of how we set our prices. So our first model is going to not predict demand; it\’s just going to predict what decision-makers were doing in the past. It incorporates everything we know: prices of competing products, news stories, and lots of other data. That\’s the first ML. We\’re not predicting what demand or sales will look like, we\’re just modeling how we behaved in the past. Then we look at deviations between what happened in the market and what the model says we would have done. For instance, if it predicted we would charge $1,110, but we actually charged $1,000, that $110 difference is an experiment. Those instances are like controlled experiments, and we use them in the second process of machine learning to predict the actual demand. In practice, this has worked astoundingly well.

On the power of AI

AI is going to create lots of opportunities for firms in every industry. By AI, I mean machine learning, usually machine learning that has access to large volumes of data, which enables it to be very clever. 

We\’re going to see changes everywhere: from L\’Oréal giving teenagers advice about what makeup works best for them to airplane design to logistics, everywhere you look within the economy.

Take agriculture. With AI, you can start spot-treating farms for insect infestation if you can detect insect infestations, rather than what we do today, which is spread the treatment broadly. With that ability to finely target, you may be able to reduce pesticides to 1 percent of what you\’re currently using, yet still make them more effective than they are today and have them not deteriorate so rapidly in terms of the bugs evolving around them.

For a recent article about \”Economists (and Economics) in Tech Companies,\” interested readers may want to check the article by Athey, Susan, and Michael Luca in the Winter 2019 issue of the Journal of Economic Perspectives (33:1, pp. 209-30).

Snapshots of Trade Imbalances: US in Global Context

A substantial amount of the discussion of international trade issues starts from the premise that the United States has huge trade deficits and China has huge trade surpluses. But what if only half of that premise is true? Here are a couple of tables on trade balances that I\’ve clipped out of the IMF\’s World Economic Outlook for April 2019. One shows national trade deficits and surpluses in dollars; the other shows them as a share of the nation\’s GDP. Of course, the 2019 figures are projections.

The US trade deficit is large, both in absolute dollars (-$469 billion in 2018) and as a share of GDP (-2.3% of GDP). Indeed, the imposition of tariffs by the Trump administration in 2018 is projected to be followed by a larger US trade deficit in 2019–which would tend to confirm the standard lesson that trade deficits and surpluses result from underlying macroeconomic patterns of  domestic consumption, saving, and investment, not from trade agreements.

But China\’s trade surplus is not especially large, at about $49 billion in 2018, which is 0.4% of China\’s GDP. Indeed the IMF is projecting that China\’s small trade surpluses will turn into trade deficits by 2024. As China\’s population ages, it has been shifting toward becoming a higher-consumption society, and its trade surpluses have fallen accordingly.

What other countries have large trade deficits, like the US? And if it\’s not China, what countries have the large trade surpluses?

The US has by far the biggest trade deficit in absolute terms. When measured as a size of its economy, however, the US trade deficit is smaller than the trade deficits in the United Kingdom, Canada, South Africa (or the nations of sub-Saharan Africa as a group) and India.

When it comes to trade surpluses, the absolute size of the surpluses in Germany (+$403 billion), Japan (+$173 billion), Russia (+$114 billion) and Italy (+$53 billion) all outstrip the size of China\’s trade surplus (+$49 billion) in 2018. These economies are also smaller in size than China\’s, so as a percentage of GDP, their trade surpluses are larger than China\’s. Also, trade surpluses for the \”other advanced economies\” is large. This group is made up of the advanced economies outside the Group of Seven countries listed in the table and outside the euro area, so examples would include Korea, Australia, Norway, Sweden, Taiwan, and others.

One final note: Measures of international trade flows are imperfect. An obvious illustration of this point is that the world balance of trade must always be zero, by definition, because exports from any one location are imports for some other location. However, these tables show the world as having an overall trade surplus projected at $154 billion in 2019.

Have the Identification Police Become Overly Intrusive?

Every intro statistics class teaches \”correlation is not causation\”–that is, because two patterns consistently move together (or consistently opposite), you can\’t jump to a conclusion that A causes B, B causes A, some alternative factor C is affecting both A and B, or that among all the millions of possible patterns you can put side-by-side, maybe the correlation between this specific A and B is just a fluky coincidence. 

As part of the \”credibility revolution\” in empirical economics, researchers in the last 20 years or so have become much more careful in thinking about what kind of a study would demonstrate causality. For example, one approach is to set up an experiment in which some people are randomly assigned to a certain program, while others are not. For example, here are discussions of experiments about the effectiveness of preschool, health insurance, and subsidized employment. Another approach is to look for real-world situations where some randomness exists, and then use that as a \”natural experiment.\” As an example, I recently wrote about research on the effects of money bail which take advantage of the fact that defendants are randomly assigned to judges, some of who are tougher or more lenient in granting bail. Or in certain cities, admission to oversubscribed charter schools uses a lottery, so some students are randomly in the school and others are not. Thus, one can study the effects of bail based on this randomness.

This search for an underlying random factor that allows a researcher to obtain an estimate of an underlying cause is called \”identification.\” It\’s hard to overstate how much this change has affected empirical work in economics. Pretty much every published paper or seminar presentation has a discussion of the \”identification strategy.\” If you present correlations without such a strategy, you need to be very explicit that you are not drawing any causal inferences, just describing some patterns in the data.

There\’s not any dispute that this greater thoughtfulness about how to infer causality is overall a good thing. However, one can question whether it has gone too far. Christopher J. Ruhm raised this question in his \”Presidential Address: Shackling the Identification Police?\” given to the Southern Economic Association last November. The talk doesn\’t seem to be freely available online, but it has now been published in the April 2019 issue of the Southern Economic Journal (85:4, pp. 1016–1026) and is also available as an NBER working paper.

There are two main sets of concerns about the focus on looking for sources of experimental or natural randomness, as a way of addressing issues about causality. One is that these approaches have issues of their own. For example, imagine a study where people volunteer to be in a program, and then are randomly assigned. It might easily be true that the volunteers are not a random sample of the entire population (after all, they are the ones with connections to hear about the study and motivation to apply), and so the results of as study based on such a group may not generalize to the population as a whole. Ruhm acknowledges these issues, but they are not his main focus.

Ruhm\’s concern is that when research economists obsess over the issue of identification and causality, they can end up focusing on small questions where they have a powerful argument for causality, but ignoring large questions where getting a nice dose of randomization so that causality can be inferred is difficult or even impossible. Ruhm writes:

I sent out the following query on social media (Facebook and Twitter) and email: “I would like to get your best examples of IMPORTANT microeconomic questions (in labor/health/public/environmental/education etc.) where clean identification is difficult or impossible to obtain.” Responses included the following.

  • Effects of trade liberalization on the distribution of real wages.
  • Contributions of location, preferences, local policy decisions, and luck to geographic differences in morbidity and mortality rates.
  • Effects of the school climate and work environment on teacher and student outcomes.
  • Importance of norms on firms’ wage setting.
  • Extent to which economic factors explain the rise in obesity.
  • Impact of family structure on child outcomes.
  • Effects of inequality, child abuse, and domestic violence on later life outcomes.
  • Social cost of a ton of SO2 emissions.
  • Effect of race on healthcare use.
  • Effect of climate change on agricultural productivity.

Ruhm argues that for a number of big picture questions, an approach which starts by demanding a nice clear source of randomness for clear identification of a causal factor is going to be too limiting. It can look at slices of the problem, but not the problem as a whole. He writes (footnotes and citations omitted):

For a more concrete indication of the value and limitations of experimental and quasiexperimental approaches, consider the case of the fatal drug epidemic, which is possibly the most serious public health problem in the United States today. To provide brief background, the number of U.S. drug deaths increased from 16,849 in 1999 to 63,632 in 2016 and they have been the leading cause of injury deaths since 2009. The rise in overdose mortality is believed to have been initially fueled by enormous increases in the availability of prescription opioids, with more recent growth dominated by heroin and fentanyl. However, some researchers argue that the underlying causes are economic and social decline (rather than supply factors) that have particularly affected disadvantaged Americans. What role can different methodological approaches play in increasing our understanding of this issue? 

RCTs [randomized control trials] could be designed to test certain short-term interventions—such as comparing the efficacy of specific medication-assisted treatment options for drug addicts—but probably have limited broader applicability because randomization will not be practical for most potential policies and longer term effects will be difficult to evaluate. Quasi-experimental methods have provided useful information on specific interventions such as the effects of prescription drug monitoring programs and the effects of , like the legalization of medical marijuana. However, the challenges of using these strategies should not be understated because the results often depend on precise characteristics of the policies and the timing of implementation, which may be difficult to ascertain in practice. Moreover, although the estimated policy impacts are often reasonably large, they are dwarfed by the overall increase in fatal drug overdoses. 

Efforts to understand the root causes of the drug epidemic are therefore likely to be resistant to clean identification and instead require an “all of the above” approach using experimental and quasiexperimental methods where possible, but also the accumulation evidence from a variety of data sources and techniques, including descriptive and regression analyses that in isolation may fail to meet desired standards of causal inference but, hopefully, can be combined with other investigations to provide a compelling preponderance of evidence. 

The relationship between smoking and lung cancer provides a striking example of an important question that was “answered” using strategies that would be viewed as unacceptable today by the identification police. The understanding of tobacco use as a major causal factor was not based upon RCTs involving humans but rather resulted from the accretion of evidence from a wide variety of sources including: bench science, animal experiments, and epidemiological evidence from nonrandomized prospective and retrospective studies. Quasi-experimental evidence was eventually provided (e.g., from analyses of changes in tobacco taxes) but long after the question had been largely resolved. 

To summarize, clean identification strategies will frequently be extremely useful for examining the partial equilibrium effects of specific policies or outcomes—such as the effects of reducing class sizes from 30 to 20 students or the consequences of extreme deprivation in-utero—but will often be less successful at examining the big “what if ” questions related to root causes or effects of major changes in institutions or policies.

In summing up, Ruhm writes:

Have the identification police become too powerful? The answer to this question is subjective and open to debate. However, I believe that it is becoming increasingly difficult to publish research on significant questions that lack sufficiently clean identification and, conversely, that research using quasi-experimental and (particularly) experimental strategies yielding high confidence but on questions of limited importance are more often being published. In talking with PhD students, I hear about training that emphasizes the search for discontinuities and policy variations, rather than on seeking to answer questions of fundamental importance. At professional presentations, experienced economists sometimes mention “correlational” or “reduced-form” approaches with disdain, suggesting that such research has nothing to add to the canon of applied economics.

Thus, Ruhm is pointing to a tradeoff. Researchers would like to have a study with a strong and defensible methodology, and also a study that addresses a big and important question. Tackling a big question by looking at a bunch of correlations or other descriptive evidence is going to have some genuine limitations–but at least it\’s looking at fact patterns about a big question. Using a great methodology to tackle a small question will never provide more than a small answer–although there is of course a hope that if lots of researchers use great methods on small questions, the results may eventually form a body of evidence that supports broader conclusions. My own sense is that the subject of economics is hard enough to study that researchers should be willing to consider, with appropriate skepticism, a wide array of potential sources of insight.

Four Snapshots of China\’s Growth and Inequality

Here\’s China\’s share of world population and the global economy since the start of its economic reforms. Since 1978, China\’s share of world population has declined mildly from 23% to about 19%. In those same 40 yeas, China\’s share of world GDP has risen dramatically from 3% to about 20%
Here\’s a sens of this economic growth on a per adult basis. The vertical axis is in yuan, so for US readers one might want to divide by the exchange rate of roughly 6.5 yuan/dollar. But look at the annual growth rates of national income per adult–especially the average of 8.1% per year from 1998-2015.  
These images are taken from an article by Thomas Piketty, Li Yang and Gabriel Zucman, \”Income inequality is growing fast in China and making it look more like the US: Study provides the first systematic estimates of the level and structure of China’s national wealth since the beginning of market reforms,\” which appears at the LSE Business Review website (April 1, 2019). It\’s a preview of their forthcoming research article in the American Economic Review. The main focus of their research has been to look at income and wealth inequality–and in particular, data on patterns of wealth in China has been hard to find. 
Here\’s the pattern of  China\’s national wealth over time, expresses as a share of national income. Wealth includes the value of companies, the value of the housing stock, and other assets. China\’s wealth as a share of national income has almost doubled since 1978. And all of the increase is a result of rising wealth by households, not government. 
Finally, here\’s a figure showing China\’s shift in income inequality over time. China\’s economic growth has meant a larger share of income for the top 10%, and a falling share for the bottom 50%. 
The authors write: \”To summarise, the level of inequality in China in the late 1970s used to be less than the European average – closer to those observed in the most egalitarian Nordic countries – but they are now approaching a level that is almost comparable with the USA.\” Of course, it\’s important to remember that in a Chinese economy that has been growing rapidly for decades, this doesn\’t means that the bottom 50% have had stagnant growth in income or are actually worse off in absolute terms. It just means that the growth in incomes for the bottom half hasn\’t been as rapid as for the top 10%. 

"Bias Has Been Overestimated at the Expense of Noise:" Daniel Kahneman

Daniel Kahneman (Nobel 2002) is of course known for his extensive work on behavioral biases and how they affect economic decisions. He\’s now working on a new book, together with Olivier Sibony and Cass Sunstein, in which he focuses instead on the concept of \”noise,\” and argues that 
Here\’s a précis of Kahneman\’s current thinking on this and other topics, drawn from an interview with Tyler Cowen (both video and a transcript are available at \”Daniel Kahneman on Cutting Through the Noise,\” December 19, 2018).

KAHNEMAN: First of all, let me explain what I mean by noise. I mean, just randomness. And it’s true within individuals, but it’s especially true among individuals who are supposed to be interchangeable in, say, organizations. …

I’ll tell you where the experiment from which my current fascination with noise arose. I was working with an insurance company, and we did a very standard experiment. They constructed cases, very routine, standard cases. Expensive cases — we’re not talking of insuring cars. We’re talking of insuring financial firms for risk of fraud.

So you have people who are specialists in this. This is what they do. Cases were constructed completely realistically, the kind of thing that people encounter every day. You have 50 people reading a case and putting a dollar value on it.

I could ask you, and I asked the executives in the firm, and it’s a number that just about everybody agrees. Suppose you take two people at random, two underwriters at random. You average the premium they set, you take the difference between them, and you divide the difference by the average.

By what percentage do people differ? Well, would you expect people to differ? And there is a common answer that you find, when I just talk to people and ask them, or the executives had the same answer. It’s somewhere around 10 percent. That’s what people expect to see in a well-run firm.

Now, what we found was 50 percent, 5–0, which, by the way, means that those underwriters were absolutely wasting their time, in the sense of assessing risk. So that’s noise, and you find variability across individuals, which is not supposed to exist.
And you find variability within individuals, depending morning, afternoon, hot, cold. A lot of things influence the way that people make judgments: whether they are full, or whether they’ve had lunch or haven’t had lunch affects the judges, and things like that.
Now, it’s hard to say what there is more of, noise or bias. But one thing is very certain — that bias has been overestimated at the expense of noise. Virtually all the literature and a lot of public conversation is about biases. But in fact, noise is, I think, extremely important, very prevalent.

There is an interesting fact — that noise and bias are independent sources of error, so that reducing either of them improves overall accuracy. There is room for . . . and the procedures by which you would reduce bias and reduce noise are not the same. So that’s what I’m fascinated by these days.

Now, it’s hard to say what there is more of, noise or bias. But one thing is very certain — that bias has been overestimated at the expense of noise. Virtually all the literature and a lot of public conversation is about biases. But in fact, noise is, I think, extremely important, very prevalent. …

COWEN: Do you see the wisdom of crowds as a way of addressing noise in business firms? So you take all the auditors, and you somehow construct a weighted average? …

KAHNEMAN: With respect to the underwriters, I would expect, certainly, that if you took 12 underwriters assessing the same risk, you would eliminate the noise. You would be left with bias, but you would eliminate one source of error, and the question is just price. Google, for example, when it hires people, they have a minimum of four individuals making independent assessments of each candidate. And that reduces the standard deviation of error at least by a factor of two.

COWEN: So is the business world, in general, adjusting for noise right now? Or only some highly successful firms?

KAHNEMAN: I don’t know enough about that. All I do know is that, when we pointed out the results, the bewildering results of the experiment on underwriters, and there was another unit — people who assess the size of claims. Again, actually, it’s more than 50 percent. Like 58 percent. The thing that was the most striking was that nobody in the organization had any idea that this was going on. It took people completely by surprise.
My guess now, that wherever people exercise judgment, there is noise. And, as a first rule, there is more noise than people expect, and there’s more noise than they can imagine because it’s very difficult to imagine that people have a very different opinion from yours when your opinion is right, which it is. …

COWEN: If you’re called in by a CEO to give advice — and I think sometimes you are — how can I reduce the noise in my decisions, the decisions of the CEO, when there’s not a simple way to average? The firm doesn’t have a dozen CEOs. What’s your advice? …
KAHNEMAN: [T]here is one thing that we know that improves the quality of judgment, I think. And this is to delay intuition. … Delaying intuition until the facts are in, at hand, and looking at dimensions of the problem separately and independently is a better use of information.

The problem with intuition is that it forms very quickly, so that you need to have special procedures in place to control it except in those rare cases …  where you have intuitive expertise. That’s true for athletes — they respond intuitively. It’s true for chess masters. It’s true for firefighters … I don’t think CEOs encounter many problems where they have intuitive expertise. They haven’t had the opportunity to acquire it, so they better slow down. … It’s not so much a matter of time because you don’t want people to get paralyzed by analysis. But it’s a matter of planning how you’re going to make the decision, and making it in stages, and not acting without an intuitive certainty that you are doing the right thing. But just delay it until all the information is available.

COWEN: And does noise play any useful roles, either in businesses or in broader society? Or is it just a cost we would like to minimize?

KAHNEMAN: There is one condition under which noise is very useful. If there is a selection process, evolution works on noise. You have random variation and then selection. But when there is no selection, noise is just a cost. … Bias and noise do not cover the universe. There are other categories.

Replacing LIBOR: An International Overview

LIBOR stands for \”London Interbank Offered Rate.\” For a long time, it was probably most common  benchmark interest rate in the world–that is, it was the built into trillions of dollars worth of loans and financial contracts that if the LIBOR interest rate went up or down, the contract would adjust accordingly.

However, a huge scandal erupted back in 2010. Turns out that the LIBOR was not based on actual market transactions; instead, LIBOR was based on a survey in which someone at a bank gave a guess on what interest rate their bank would be charged if the bank wanted to borrow short-term from another bank on a given morning, in a particular currency. A few of the people responding to the survey were intentionally giving answers that pulled LIBOR up just a tiny bit one day, or pulled it down a tiny bit another day. Given that the LIBOR was linked to trillions of dollars in financial contracts, market traders who knew in advance about these shifts could and did reap fraudulent profits.

LIBOR tightened up its survey methods. But it clearly made sense to shift away from using a benchmark interest rate based on a survey, and instead to use one based on an actual market for short-term low-risk borrowing. Various committees formed to consider options. As I noted here about six weeks ago, the US is switching from LIBOR to SOFR–the Secured Overnight Financing Rate. I wrote: \”It refers to the cost of borrowing which is extremely safe, because the borrowing is only overnight, and there are Treasury securities used as collateral for the borrowing. The SOFR rate is based on a market with about $800 billion in daily transactions, and this kind of overnight borrowing doesn\’t just include banks, but covers a wider range of financial institutions. The New York Fed publishes the SOFR rate every morning at 8 eastern time.\”

But what about the switch away from LIBOR in the rest of the world? Andreas Schrimpf and Vladyslav Sushko describe what\’s happening in \”Beyond LIBOR: a primer on the new benchmark rates,\” which appears in the March 2019 issue of the BIS Quarterly Review (pp. 29-52). Here\’s  table showing the alternative risk-free rate (RFR) benchmarks being used with other currencies.

There are several big issues ahead in this area. One is that the LIBOR is actually going to be discontinued in 2021, so any loan or financial contract with a benchmark rate will have to migrate to something else. There will be literally trillions of dollars of contracts that need to shift in this way. Moreover, the LIBOR debacle has made a lot of financial industry participants think more carefully about exactly what benchmark interest rate may be appropriate in any given contract–for example, an appropriate benchmark might include not only an overnight risk-free rate, but also some built-in adjustment for other kinds of risks, including risks over different periods of time or risks at the firm or industry level.

For most of us, discussions of benchmark interest rates have a high MEGO (My Eyes Glaze Over) factor. But when I think in terms of trillions of dollars of loans and financial contracts around the world, all being adjusted in ways that are thoughtful but untested, I find it easier to pay attention to the subject.