The Internal Revenue Service gets something north of 100 million individual tax returns each year. So how does the IRS decide how to deploy its 6,500 auditors? It counts on the computer programs to flag returns that seems more likely to be understating income. For example, a highly-paid two-earner couple might have income well into the mid-six-figures, but if what’s on the tax form matches what their employers and financial institutions reported, there’s not likely to be much gain in auditing them (at least not without some additional information). Nowhere on the tax form is the race of a taxpayer specified, and thus it is impossible for computer algorithm for who-gets-audited to take race into account in any explicit way. Nonetheless, it appears that the algorithm is auditing blacks substantially more than whites.

Hadi Elzayn, Evelyn Smith, Thomas Hertz, Arun Ramesh, Robin Fisher, Daniel E. Ho, and Jacob Goldin dig into the evidence in “Measuring and Mitigating Racial Disparities in Tax
(Stanford Institute for Economic Policy Research, January 2023). They write: “Despite race-blind audit selection, we find that Black taxpayers are audited at 2.9 to 4.7 times the rate of non-Black taxpayers.” The research result has gotten considerable press coverage, like the recent “I.R.S. Acknowledges Black Americans Face More Audit Scrutiny” in the New York Times (May 15, 2023).

The method behind the study is interesting. Given that tax return and audit data doesn’t include race, on what basis can the researchers reach this conclusion? They infer race from data on names and where people live. The authors write:

Through a unique partnership with the Treasury Department, we investigate these
questions using comprehensive microdata on approximately 148 million tax returns and
780,000 audits. … To address the problem of missing race, we use Bayesian Improved First Name and Surname Geocoding (BIFSG), imputing race based on full name and census block groups (Imai and Khanna, 2016; Voicu, 2018). We then propose and implement a novel approach for bounding the true audit disparity by race from the (imperfectly measured) BIFSG proxy. By individually matching a subset of the tax data to self-identified race data from other administrative sources, we provide evidence that the assumptions underlying our bounding approach are satisfied in practice.

When the researchers dig down into the data, they find that the difference in audits by race arises almost entirely in one category: audit rates for the working poor who are claiming the Earned Income Tax Credit. They write: “Black taxpayers claiming the EITC are between 2.9 and 4.4 times as likely to be audited as non-Black EITC claimants. … We find that the disparity cannot be fully explained by racial differences in income, family size, or household structure, and that the observed audit disparity remains large after conditioning on these characteristics. For example, among unmarried men with children, Black EITC claimants are audited at more than twice the rate of their non-Black counterparts.”

The EITC audits are almost all “correspondence audits,” which means that the taxpayer gets a letter from the IRS with some questions, and if you don’t write back with acceptable answers, your tax credit is denied.

Like many economists, I’m a fan of the Earned Income Tax Credit (as explained here). But I’ve also recognized that it has a long-standing problem: about one-fifth of the payments have often gone to those who did not qualify for them (as explained here). This problem arises from a combination of factors, ranging from complexity and uncertainty over whether households actually qualify to outright fraud (as discussed here). But again, it’s not obvious why these factors should affect blacks more than others.

The authors don’t have a definitive answer to this question, but they try to gain some insight by tinkering with the IRS algorithm that determines who gets audited, and then exploring how the mixture of audits would have shifted as a result. They show how “seemingly technocratic choices about algorithmic design” can lead to different results.

For example, it turns out that the IRS audit algorithm is calibrated (in part) to minimize the “no-change rate”–that is, the chance that an audit will not lead to any change in the amount of tax owed. This may seem reasonable enough, but consider two possible audits: one audit has a 95% chance of leading to a small change in taxes owed of less than $500. The other audit as a 10% chance of a large change in taxes owed of more than $10,000. Focusing on the larger payoffs will bring in more money. As the authors write: “[T]he taxpayers with the highest under-reported taxes tend to be non-Black, but the available data allow the classifi er model to assign the highest probabilities of underreporting to more Black than non-Black taxpayers.”

As another example, it seems that the algorithm emphasizes the possibility of ” over-claiming of refundable tax credits rather than total under-reporting due to any error on the return.” One can imagine a possible political motive for this emphasis on over-claiming rathe rather than under-reporting, but it’s not a way to collect more revenue.

Finally, these “correspondence audits” of the working poor who receive the Earned Income Tax Credit are relatively easy to automate: the algorithm flags them and the letters go out. But when most of us think about audits, what we have in mind is a detailed look at the finances of high-income folks, perhaps especially those who own complex businesses or have complex financial arrangements. Given existing economic inequalities by race, such audits would focus less on blacks. And plausible estimates suggest that audits focused in this way could raise $100 billion per year, just through enforcement of existing tax laws. But it takes highly-skilled tax professionals to carry out such audits, and the IRS has a tough time holding on to people with the necessary skills and training.