What the biggest education experiment ever really tells us
Inside Project Follow Through: why methods matter, why implementation fails, and why Direct Instruction still divides the profession
In education, boldness is rarely matched by rigour but, occasionally, the attempt is made. Back in the heady days of Lyndon Johnson’s War on Poverty, American policymakers launched what remains the largest, most ambitious educational experiment ever attempted: Project Follow Through. Its goal was deceptively simple: find out how to teach disadvantaged children. For once, ideology was (at least officially) set aside in favour of data. Competing models of early childhood education would be implemented across dozens of sites, with uniform assessments and direct comparisons. The result? Well, it depends who you ask.
Follow Through enrolled over 20,000 children in the 1970s, with each participating school adopting one of multiple sponsored models overa 4 year peiod. These models ranged from highly structured, skills-based methods (notably Direct Instruction) to more progressive, affective or “whole child” approaches.
Evaluation was handled by Abt Associates, who oversaw a national data collection effort involving standardised achievement tests (reading, arithmetic, spelling, vocabulary), cognitive assessments, and affective measures like self-concept and locus of control. A battery of well-intentioned psychometrics if ever there was one.
But of course, the cracks were there from the start:
Sites weren’t randomly assigned.
Implementation fidelity was unknown.
Politics shaped site selection.
Local control and community preferences complicated standardisation.
Still, for all these flaws, Follow Through produced an unprecedented body of evidence.
Abt’s analysis, led by Richard B Anderson produced cautious but striking results:
Most Follow Through sites failed to bring students to national norms. Only 19% of cohorts met national averages on even one academic measure.
The majority of outcomes were null — about 68% of the time, Follow Through students performed no better (or worse) than their non-participating peers.
In other words: for every one time a Follow Through intervention raised achievement, there were at least 5 times it made no difference, and nearly twice as often it made things worse than better.
The clearest comparative data comes from Figure 5 of the Abt evaluation, which tracks outcomes across three domains: basic academic skills, cognitive-conceptual skills, and affective measures like self-concept. The picture that emerges is remarkably consistent.
On basic academic skills - reading, spelling, arithmetic, vocabulary - Direct Instruction stands alone. Its average effect size was substantially higher than any other model (~0.6 SD), with most implementation sites - although, crucially, not all - showing clear positive gains. No other model approached this level of consistency. Some, like Behavior Analysis, demonstrated modest positive effects, but the majority clustered around zero or dipped into negative territory. Several site implementations of the alternative models performed markedly worse than no intervention at all. In short: Direct Instruction more reliably delivered academic gains where others faltered.
In the cognitive-conceptual domain - more complex problem-solving and higher-order reasoning - no model emerged as dominant. The effects here were weaker across the board, with most models producing highly variable outcomes. Even Direct Instruction, while slightly positive, saw diminishing returns as the focus shifted from basic skill acquisition to more abstract reasoning. Tellingly, the programs specifically designed to improve cognitive-conceptual skills performed worst of all. The very interventions most committed to fostering independent thinking and reasoning often delivered the least in measurable gains.
The affective results confounded common assumptions. Direct Instruction, often criticised for being rigid and uninspiring, actually produced some of the most positive outcomes on self-concept and achievement responsibility. By contrast, models designed specifically to enhance self-esteem tended to deliver flat or even slightly negative results. This suggests that academic success, rather than being driven by self-esteem, is more likely to cultivate it: the better your academic performance, the better you feel about your ability to perform academically.
Crucially, across all three domains, the variation within models - between different implementation sites - was often greater than the variation between models themselves. The same program could succeed spectacularly in one school and fail miserably in another. Implementation fidelity, teacher quality, training, and local conditions played a decisive role in determining whether any given model worked at a particular site.
Taken together, these findings point to an uncomfortable but robust conclusion: models that focus directly on building academic competence deliver the most reliable benefits, including for pupils’ self-concept. But success is fragile; even the strongest models require careful implementation. Without it, they are just as vulnerable to failure as their less effective counterparts.
Direct Instruction
The clearest finding concerns Direct Instruction. As the graph below illustrates, this model produced consistently higher rates of positive outcomes in reading, spelling, arithmetic, and vocabulary than any of its competitors. Its emphasis on tightly sequenced, highly explicit teaching delivered reliable academic gains where other approaches struggled. In contrast to the developmentalist hope that self-esteem would drive achievement, the pattern observed was the reverse: academic success appeared to foster improvements in self-concept.
Yet even here, context mattered. The effectiveness of Direct Instruction was not uniform. Some sites saw strong gains; others, much weaker results. The same instructional package, in different hands and settings, could yield widely varying outcomes. Implementation matters almost as much - maybe more - than programme design. However, where other programs resulted in a small number of notable successes but otherwise performed poorly, the reverse was true of DI.
What the critics said
Naturally, not everyone agreed. The evaluation itself became its own mini-war of ideology.
House et al. (1978) argued that Abt’s methods were deeply flawed: sloppy statistics, poor model classification, weak measurement. They found no convincing evidence that any model, Direct Instruction included, outperformed others.
Bereiter and Kurland (1981) strongly disagreed. Their reanalysis, carefully stripping out site mismatch noise, reaffirmed substantial advantages for Direct Instruction in basic skills.
Becker & Gersten (1982) provided follow-up studies showing that Direct Instruction students maintained advantages into 5th and 6th grade, particularly in reading and problem-solving.
And standing somewhat apart, Cathy Watkins (1997) put her finger on what remains, to this day, the most uncomfortable conclusion: even when effective methods are identified, the educational establishment resists them. Institutional inertia, ideological commitments, teacher preparation models, and economic self-interest all combined to ensure that Direct Instruction, despite its empirical support, would remain a marginalised curiosity rather than mainstream policy.
The hard truth
What emerges from Follow Through is not a tidy “what works” answer, but something much more troubling: Implementation trumps intention.
No model - not even Direct Instruction - succeeded everywhere.
Where it was implemented well, Direct Instruction worked spectacularly. Where it wasn’t, it floundered like everything else.
The default state of large-scale educational reform is noise, drift, and regression to the mean.
Poverty is remarkably resistant to shallow interventions.
What Follow Through ultimately revealed was not so much which method “works” but how hard it is to deliver any method consistently at scale.
Why this still matters
We still hear the call to scale what works. But Follow Through, if it tells us anything, shows how naive that is. The problem has never been lack of promising methods. The problem is reliably translating method into practice, site after site, year after year, under varying local conditions.
In truth, education policy would do well to stop asking what works? and start asking:
Under what conditions?
With what training?
At what cost?
How will fidelity be maintained?
What is the actual causal pathway between intervention and long-term success?
The lesson from Follow Through is not that Direct Instruction is a magic bullet. It isn’t. Such a thing could never exist: the most perfectly designed instructional programme won’t work if it’s implemented half-heartedly by teachers who don’t believe in it. However, it is the best bullet we’ve yet manufactured. We simply lack the will, institutional structures, and professional culture to load it into the gun.1
Apologies for the egregious gun metaphor, I’m afraid I got a bit carried away.
I really appreciate this article! When I was in graduate school, I got a grant from Direct Instruction- to run a study comparing the use of Direct Instruction (Corrective Reading) to teacher-developed reading lessons. I thought it was going to be great for the school district - it was going to be about $20,000 worth of curriculum that was being given to us to conduct the study. The curriculum had to then sit in the office for a good 6 months while certain school administrators tried to interfere with the approved research (that had been district-approved). I remember thinking it was such a shame, because in the end, the students who received Corrective Reading made more gains than the students using the teacher-developed reading lessons. If the decisions were truly about helping students, then the school administrators wouldn't have let the curriculum sit unused for over six months.
I am a new follower of your work David. I appreciate how you advocate direct instruction. Directing my small non profit, Educational Guidance Institute, I have developed a curriculum that aims for teaching today's young people about human nature and the world through classic films like It's A Wonderful Life and 12 Angry Men and through direct instruction is the goal. As a great grandmother who grew up in the 50s and early 60s, with all my heart I want to help this rising generation recover truth, goodness and beauty - and I think but do not know - that a possible research project could be done and more: help get back to reality and away from the messes of social media - test out whether seeing for example, from 12 Angry Men that true justice is not just a hopeless ideal. each of us can play a role - bridging political divides. Check out at Educationalguidanceinstitute.com thank you for all you do.