StatGPT: The Dangers of Asking AI about Statistics

Asking a question of the generative AI tools often produces a reasonable first draft of the desired output. Sure, it may have some inaccuracies and even hallucinations, but first drafts are always imperfect. It’s up to the author to fix them up. (You may say that your personal first drafts don’t hallucinate. Really? You’ve never produced a first draft where you are sure that you remembered a certain article or quotation or statistic, but when you checked it out while revising the draft, you found your memory was just flatly incorrect?) As someone who has worked for many years as an editor, I like to say that the new AI tools have devalued the ability to produce an OK first draft–because that can now be done so easily in many contexts–but upvalued the ability to add value by editing.

But if you ask AI about specific statistics, this “reasonable first draft” standard no longer applies, because you don’t want a “reasonable first draft” of statistics–you want the actual data from the relevant official source. If you don’t ask in a careful way, the AI tool will not give you what you want.

James Tebrake, Bachir Boukherouaa, Jeff Danforth, and Niva Harikrishnan describe the problem and offer some solutions in “StatGPT: AI for Official Statistics” (International Monetary Fund, March 9, 2026,). The authors carry out an experiment with ChatGPT and other AI tools, asking about basic data on annual economic growth rates in recent years for seven leading economies. They describe their process:

The prompt `Can you generate a table of economic growth rates for the G7 countries taking the data from the latest issue of the IMF’s World Economic Outlook. Can you provide data for 2018 to 2025. Can you provide the output in a CSV file.’ was entered 10 times in the same conversation—10 times in 10 different conversations, and 5 times in the same conversation with a copy of the latest World Economic Outlook loaded in memory (total of 25 prompts).

How accurate are the results? For what seems like a fairly basic query, they find:

Overall, ChatGPT provided a correct response 34 percent of the time when the prompts were entered into the same conversation. The level of accuracy declined to 17 percent when the request was made using unique conversations. When the latest publication of the World Economic Outlook was loaded into ChatGPT, the level of accuracy fell to 14 percent.

The authors offer two solutions. For the short-term, they describe how to use a series of prompts so that the AI tool will start with a broader perspective, focus in on a specific dataset, and then on specific data from that dataset, and in that way retrieve the specific data you want.

For the longer-term, they also dream of building “a true Global Trusted Data Commons—a comprehensive, AI-ready index of official statistics data …” Unsurprisingly to those who know me, I love this idea. Like it or not, lots of people are going to seek an understanding of economic statistics through AI tools. Creating an environment in which these tools will actually work is a public good I can support.

Published by conversableeconomist

View all posts by conversableeconomist

StatGPT: The Dangers of Asking AI about Statistics

Published by conversableeconomist

Discover more from Conversable Economist