A Warning Shot for Human Capital: Evidence of an AI Learning Penalty

Education, Featured, News, Technology, World

A Warning Shot for Human Capital: Evidence of an AI Learning Penalty

Are we missing the big story on what AI means for human capital? I raised this question last year in my closing remarks at our AI and human capital conference. Alarming new evidence suggests we may be. A recent study from China finds that when students began using generative AI on their own, homework performance improved while exam performance plummeted.

Many people in the World Bank and elsewhere are carefully designing and evaluating AI tools to boost education and health services. My team is funding a series of research pilots by Bank teams around the globe, and I’m part of a working group that has put together a Generative AI Evaluation Playbook.

But while we have been carefully crafting our interventions, generative AI has been spreading rapidly through society, with effects that remain largely unmeasured. Anyone with a smartphone is already using AI via search engines, and smartphone ownership is nearly ubiquitous in most countries—even for teenagers. The new study suggests that the effects of this unbridled AI use could swamp the gains from our controlled applications. Daron Acemoglu pointed me to this study during a recent visit to the World Bank and cited it in a New York Times roundtable.

The study follows students in grades 7–12 in one Chinese county over a two-and-a-half-year period beginning in 2022. It compares changes in outcomes for students who started to use AI tools on their own with those who did not, exploiting variation in the timing of adoption to identify a causal effect.

They find three striking impacts of student AI use:

Time spent on homework fell by 30 percent
Homework scores rose by 18 percent
Monthly exam results plummeted by 20 percent after five months, and scores of two separate high-stakes entrance exams also fell by 18 and 24 percent.

The figures at the end of this blog illustrate these results. These findings are for average scores across all subjects, with the largest effects for social science, followed by STEM and then languages.

These findings suggest that students used AI as a crutch for homework, which boosted their homework scores but left them learning far less.

The study appears to confirm fears that students using AI are bypassing the work essential to learning. We know learning, like physical training, requires effort. If you want to build muscle, you don’t bring a forklift to the gym. And freely available AI tools put figurative homework forklifts in the hands of anyone with an internet connection.

One way to gauge the effect’s magnitude is to compare it to the overall variation in student performance. By that yardstick, the fall in monthly exam scores after five months is 1.4 standard deviations (SD). This is an extraordinarily large effect by the standards of education research. As a point of comparison, a leading study of AI-based tutoring in Nigeria found a learning boost of 0.31 SD—less than ¼ of the negative effect estimated from broad AI use in the China study.

One potential challenge to the causal interpretation is that other shocks may have both pushed students to use AI and reduced their academic performance. For example, in some parts of China, many children don’t live with their parents, who have migrated for work. As the recent World Bank human capital report highlights, those children see large drops in academic achievement. Could some students in this study have started to use AI in response to their parents migrating? If so, AI could appear responsible for learning losses that were caused by parental absence.

The authors point out that such shocks could not possibly be the entire story behind their findings, since students who do not use AI almost never experience a 20 percent decline in test scores. Nonetheless, they cannot rule out the possibility that shocks played some (likely very minor) role. As a result, the true effect of AI could be somewhat smaller than reported in the paper. I expect that as the paper goes through peer review, the authors will further consider this issue.

This careful and rigorous study is likely to serve as a wake-up call for parents and educators about the potential risks of AI for learning. Many more studies like this one are needed in other contexts to understand to what extent this is a generalized phenomenon.

The study’s implications extend beyond education. As AI becomes integrated into daily life, people increasingly use it for health information, skills development, career advice, and emotional support—often outside the structured applications that researchers can most readily evaluate (because technological revolutions cannot be randomized). The largest human capital effects of AI may come not from the tools we deliberately deploy, but from the ways people choose to use the technology on their own. Understanding this broader story of AI “in the wild” should be a central human capital research priority if we are to realize AI’s promise while protecting people from its risks.