David Didau: The Learning Spy

David Didau: The Learning Spy

Share this post

David Didau: The Learning Spy
David Didau: The Learning Spy
Mock Exams, regression to the mean, and the mirage of progress

Mock Exams, regression to the mean, and the mirage of progress

How schools mistake randomness for rigour, and noise for learning

David Didau's avatar
David Didau
Jun 21, 2025
∙ Paid
10

Share this post

David Didau: The Learning Spy
David Didau: The Learning Spy
Mock Exams, regression to the mean, and the mirage of progress
4
4
Share

There’s something beguiling about mock exams. They have the shape and heft of the real thing. They produce numbers. They seem to offer clarity. In a profession awash with uncertainty, mocks feel like firm ground: a data point we can clutch as proof of learning, impact and progress.

But this clarity is almost always an illusion.

Time after time, schools administer mock exams and interpret the results as meaningful indicators of what will happen in the future. A student scores poorly in November, then significantly better in February. We say, “The intervention worked.” Another performs well early on, then dips. “They’ve gone off the boil.” Both interpretations feel plausible, even confident. But both may be wrong.

As Michael Mauboussin warns in The Success Equation,1 we confuse skill with luck. We see outcomes - mock data, in this case - and instinctively assume they reflect ability, effort, or quality of teaching. But every performance is a mixture of skill and randomness. When luck plays a substantial role - as it invariably does in any single assessment - then extreme outcomes (very high or very low scores) are likely to move closer to average next time, even if nothing has changed.

But why does luck matter in the first place? Because no exam measures pure ability. Every assessment is a snapshot taken under specific conditions, and those conditions are never perfectly controlled. A student might be tired, distracted, ill, or anxious on the day. They might get a topic they revised thoroughly, or one they barely touched. The questions might play to their strengths or expose their blind spots. They might misread a command word, run out of time, or benefit from an educated guess that pays off. All of these variables - none of which have anything to do with underlying skill - contribute to the final mark.

Share

This is what Mauboussin calls the “luck component” of performance. It’s not magic or fate it’s the accumulation of unpredictable, uncontrollable factors that influence outcomes despite our best efforts. The more noise there is in the system, the less any single result can be taken as a true measure of skill. And since assessments in schools tend to be short, infrequent, and high-stakes, they are especially vulnerable to this kind of distortion.

A student’s score on one mock tells you less about their true ability than you’d like to believe. It might reflect what they know. It might also reflect how they slept, what they had for breakfast, or whether they guessed correctly on Question 4. The danger arises when we treat these scores as definitive - as clear-cut indicators of understanding - when they are, in fact, rough approximations filtered through a fog of chance.

And yet we keep making bold inferences. We say, “He’s made progress,” “She’s slipping,” or “This set needs reteaching.” We act as though every change in data must have a cause. But in doing so, we forget that sometimes, the cause isn’t teaching or effort or motivation. Sometimes, it’s just luck. We continue to build narratives around the data in the belief that any improvement should be attributed to interventions, and decline to disengagement. We congratulate or chastise as if every result were a direct consequence of what happened in the classroom. In reality, we are often mistaking noise for signal. We’re drawing confident conclusions from what Mauboussin would call unstable outcomes, results shaped as much by context, timing, and chance as by anything instructional.

This is regression to the mean, a statistical inevitability we routinely ignore in education.


Want to do something more meaningful with assessments? Read this post ↓

Using assessment to improve the curriculum

David Didau
·
Apr 17
Using assessment to improve the curriculum

What is assessment for?

Read full story

The return to average: regression to the mean in action

Imagine two students: Callum and Amina.

Callum panics in his first English mock. He misreads the extract, leaves a question blank, and scores a 2. By the second mock, he’s more comfortable, remembers to plan, and scores a 5. Teachers are thrilled. “The tutoring paid off. He’s made three grades of progress!”

But consider this: some portion of Callum’s first score was bad luck: nerves, unfamiliarity, a topic he hadn’t revised. Some of his second score may just be the natural upward swing from a low baseline. Regression to the mean tells us that extreme results - either unusually good or unusually bad - tend not to repeat. This isn’t because anything has necessarily improved or declined, but because luck fluctuates, while skill tends to stay relatively stable. Over time, outcomes regress toward the true level of ability, especially when luck plays a larger role in performance.

Amina, by contrast, gets an 8 in her first mock. She’s fluent, focused, and happens to get a question on a poem she knows well. In her second mock, the question’s more abstract, and she’s battling a cold. She scores a 6. “She’s coasting,” someone says. “She’s lost her edge.”

Or is this simply the other side of the same coin: an expected dip following an exceptionally good run? Just as Callum’s second score was buoyed by an increase in fortuitous circumstances, Amina’s may have been pulled down by a dose of misfortune. The real shift isn’t necessarily in effort or understanding, but in how much chance is affecting the result.

When an outcome is shaped by both skill and luck, the greater the role of luck, the more quickly we should expect results to revert to the mean. In education, assessments are often brief and high-stakes, exactly the kind of environment in which luck looms large. One dodgy night’s sleep, a misread question, a surprise topic: any of these can swing a grade without reflecting any real change in what a student knows or can do.

Data factory or data theatre?

We love mock exams because they churn out data. This is useful if you believe that numbers measure learning. And in many accountability systems, that belief has become orthodoxy. Heads of department are asked to show “impact,” teachers must “track progress” and students are labelled red, amber, or green depending on where their mock grade sits in relation to some projected endpoint. Everyone acts like these numbers are stable, precise, and meaningful.

Keep reading with a 7-day free trial

Subscribe to David Didau: The Learning Spy to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 David Didau
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share