Memory is messy - revision refines
Why last minute preparation is likely to boost exam performance
Last-minute revision tends to divide opinion. Is it a sign of poor preparation or a useful final push? Does cramming undermine deep learning, or can it sharpen recall just in time? And what does the science of memory have to say about how and when knowledge becomes available under pressure? If students already know something, why does timing make such a difference to whether they can retrieve it? These questions matter, not just for how we advise pupils to revise, but for how we understand the role of memory in performance.1
We like to imagine that teaching deposits knowledge into neat compartments. Put the facts in, ask the questions later, get the answers out. But memory is not a filing cabinet, it’s not a video recorder and - importantly - it’s not a computer hard drive. It’s maybe more like a web. Not the silky precision of a spider’s masterpiece, but something improvised and tangled, cross-wired by experience, skewed by competing points of salience, and indifferent to accuracy.
Psychologists have offered competing explanations for this. The older, fuzzier one is spreading activation. Think of memory as a network of nodes. When one node lights up, activation ripples across the system, waking up its neighbours. Say “nurse,” and suddenly “doctor,” “hospital,” and “clipboard” all start shimmering. Retrieval becomes more likely, but not always more accurate.2
Then there’s Adaptive Control of Thought-Rational (ACT-R) John Anderson’s cognitive architecture, which tries to map the entire thinking process as a system of rational decisions. At the heart of ACT-R are two types of knowledge: declarative (facts, concepts, goals) and procedural (rules for what to do with them). The architecture models cognition as the interplay between activated memory and goal-driven rule selection. When a person tries to achieve something - solve a problem, recall a name, write an essay - the mind searches for a production rule whose conditions match the current state of memory and whose expected value outweighs its cost.3
ACT-R may be rule-based, but its rules are useless unless something’s already been activated. It assumes some chunks of memory are more readily available than others. What determines that availability is largely contextual and historical: how often the chunk has been used, how recently it was rehearsed, and how strongly it’s linked to the current goal or perceptual context. In other words, ACT-R, despite its structured facade, is parasitic on the logic of spreading activation. It inherits its biases and vulnerabilities. The system selects from what’s most readily available.
Consider how this plays out in classrooms. Ask a student what a metaphor is and they reply, “It’s when something’s exaggerated.” That’s not random error. Somewhere in their web of concepts, metaphor and hyperbole sit side by side, probably reinforced by some vague explanation of figurative language. The node for ‘metaphor’ fired and activation spread toward the most accessible neighbour. Spreading activation predicts the faulty connection. ACT-R explains how it became the basis for action.
Or take the thousands of students who, without ever being explicitly taught to do so, begin every essay with “In this essay I will…” This isn’t laziness. It’s a procedural habit, formed through repetition. Although it’s discouraged by teachers, it’s been activated in similar contexts so reliably that it fires by default and never gets overwritten by something stronger. Replacing it requires more than teaching a better opening. To rewire students’ instinctive essay opener, requires increasing the activation strength of the new rule, weakening the old one, and ensuring that the conditions for its execution are met in the right moment.
What does this look like? It looks like deliberate, distributed practice, not a one-off model paragraph or a vague instruction to ‘avoid clichés’, but regular opportunities to start analytical writing in context, with explicit prompts, immediate feedback, and reminders that reward fluency over formula. It means writing opening sentences to the same question repeatedly in multiple lessons, then judging which works best and why. It means low-stakes drills where they generate first lines from thesis statements, or spot and rework clunky openers before they become reflexes. It means reducing the cognitive cost of better choices so they can compete fairly.
Spaced practice makes sense here not simply because it increases retention, but because it boosts a chunk’s activation across time and context. Retrieval practice matters not just because it assesses knowledge, but because it strengthens the competition among nodes, making the desired ones more likely to win. Misconceptions don’t endure through ignorance, but because faulty connections in a student’s mental web are stronger, quicker to activate, and harder to dislodge. But these approaches assume there is both time and opportunity to make changes.
This network logic connects to other theories. Robert Bjork’s idea of desirable difficulties has gained traction because retrieval strengthens not just individual facts, but the structure around them. Schema theory, from Bartlett to Rumelhart, shows that new information is assimilated or distorted depending on the framework already in place. Daniel Kahneman’s dual-process (systems 1 and 2) model gives us the fast, automatic decisions that resemble ACT-R productions, while the slower, deliberative system sometimes intervenes, but often too late. Even connectionism, once pitched in opposition to rule-based models like ACT-R, shares the idea that associations are learned through experience and reinforcement.4 The theories may differ in elegance and emphasis, but the core insight is consistent: what gets retrieved depends on what’s connected.
Of course, both spreading activation and ACT-R have been criticised. Spreading activation lacks precision. It’s hard to measure, hard to constrain, and doesn’t offer clear predictions. It tells us why students remember, but not when they’ll do so or how to intervene. ACT-R, on the other hand, often overstates its rationality. The equations are elegant, but their empirical grounding is shaky. Whitehill has noted inconsistencies and gaps in the way ACT-R defines activation, log-odds, and decay. The theory also struggles with interruptions, overlapping goals, and non-linear tasks, things that feature in most classrooms.5
Despite these limitations, the combined model is useful. It helps explain why some concepts dominate pupils’ thinking even when they’re wrong. It shows why interleaving works, not just by spacing practice but by building stronger links between otherwise distant concepts. It also shows why feedback must do more than flag errors; it must reshape the mental architecture that controls what gets retrieved and what gets done.
The real value here is practical. If you’re designing a curriculum, don’t think of it as a linear sequence. Instead, think of it as network design. You’re not just deciding what to teach. You’re deciding what will compete for students’ attention. If they always retrieve “PEE” when asked to write analytically, that’s because we’ve made “PEE” the most activated node. they reach for the quadratic formula when completing the square would be quicker, it’s not a failure of knowledge. It’s the system defaulting to the method most often practised.
Memory, then, is not neutral. It’s historical, contextual, and unavoidably selective. The aim of teaching should not just be to plant knowledge, but to engineer retrieval. That means shaping the structure of associations, training the system to fire the right rules, and practising them until they’re fast, cheap, and reliable.
Because when memory comes under pressure, it’s not what you know. It’s what fires first. That’s why last-minute revision can have such a powerful - if temporary - effect. It boosts the activation of specific chunks just before they’re needed, making them more likely to win the race for retrieval. The knowledge is already there, perhaps half-buried, but now it’s fresher, clearer, closer to the surface. Under exam conditions, that slight edge in activation strength can make the difference between a clear response and a panicked guess.
So if we know this, what are we doing to do? Are we structuring revision to exploit this window of heightened activation, or are we still relying on vague advice to ‘just go over your notes’? Are we helping students rehearse the specific kinds of retrieval they’ll need under pressure, or leaving it to chance?
What if revision were treated not as recap, but as activation management? What if, instead of ploughing through everything one more time, we asked: what needs to fire first? What’s most likely to be confused? What’s most at risk of being forgotten?
The question isn’t just how well students know something. It’s whether it’s ready when it matters.
See here for my take on learning vs performance
The theory of spreading activation was first formalised by Collins and Quillian (1969) and later refined by Anderson (1983). It posits that concepts are stored as nodes in a semantic network, with activation spreading from one node to others via associative links. The strength and speed of activation depend on factors such as frequency, recency, and contextual relevance. See Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory & Anderson, J. R. (1983). The Architecture of Cognition.
ACT-R is a cognitive architecture developed by John R. Anderson to model human thought processes as a combination of declarative and procedural knowledge. It assumes cognition operates through production rules selected based on expected utility, with memory retrieval influenced by activation levels shaped by prior use and associative context. See Anderson, J. R. (1993). Rules of the Mind and Anderson, J. R. (1990). The Adaptive Character of Thought. (You have to request access for this one.)
Connectionist models, such as parallel distributed processing (PDP) frameworks developed by Rumelhart and McClelland in the 1980s, were initially positioned as alternatives to symbolic, rule-based models like ACT-R. These models emphasise distributed representations and learning via backpropagation rather than explicit rules. However, both traditions converge on the principle that cognition is shaped by patterns of association strengthened through experience. See Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition
See Whitehill, J. (2013). Understanding ACT-R – an Outsider’s Perspective. Whitehill highlights a range of issues in the ACT-R literature, including imprecise mathematical definitions, inconsistent formulations of activation and decay, and ambiguity around key terms such as “log-odds.” He also points out that ACT-R’s single-goal-stack architecture lacks mechanisms for handling goal interruption or conflict, common features of real-world cognition, particularly in dynamic settings like classrooms.