Thinking of schools as systems that operate beyond the hopes and intentions of their leaders can help explain why outcomes and intentions often misalign. If we assume school staff are well-intentioned, hardworking, and competent, it can be mystifying when exam results decline, teacher morale dips, or student behaviour worsens. Why do bad things happen in systems led by good people?
We know from life that good intentions can misfire. Someone tidies your messy desk and you lose a vital paper. A friend organises an event you dread. These are trivial, but the same pattern plays out in history: the Versailles Treaty, Prohibition, the Partition of India, the Marshall Plan, the War on Drugs, and No Child Left Behind: all efforts led by well-meaning people, all with disastrous unintended consequences. The real question might be: why doesn’t this happen more often?
A common response is to assign blame. If we could just remove the incompetent or malicious, all would be well. This is the logic behind school inspection: if a school is underperforming, its leaders must be the problem. Expose them, replace them, and improvement will follow. But this is not only wrong, it’s foolish.
There may be a few schools led by tyrants or incompetents, but most schools, regardless of outcomes, are led by people doing their best. Leading a struggling school is hard. Success and failure are often influenced by luck. Punishing the unlucky for failing to control every variable deters others from taking on such roles.
Good intentions aren’t enough (we all know what the road to hell is paved with), but they do matter. If we assumed that well-intentioned leaders could be supported, mentored, and developed into effective ones, we might create a system with higher morale, less burnout, and better outcomes, especially for the most disadvantaged students.
To do this, we must move away from blame and towards responsibility. Schools are complex networks, not only of people, some of whom may unintentionally act against their own best interests, but also of policies, specifications, software, and other interrelated elements. This complexity means parts will always misalign. Schools aren’t tidy jigsaw puzzles; they’re often cobbled together with string. Sometimes literally.
Leaders must juggle all of this while also responding to external pressures: government mandates, media scrutiny, societal crises, emerging technologies. It’s inevitable that schools will sometimes struggle. While unintended outcomes aren’t always a leader’s fault, they are still their responsibility.
All of which is a long preamble to Stafford Beer’s famous maxim: “The purpose of a system is what it does.” POSIWID, for short. Beer argued that systems should be evaluated on their observable results. A system’s true purpose is revealed by its consistent outputs, regardless of what its designers claim.
No one intends for a school to produce poor outcomes. But if it consistently does, then intention is functionally irrelevant. Focusing on what the system does, rather than what it should do, helps us address discrepancies between aims and outcomes.
Applying the principles of POSIWID to Ofsted reveals something about these discrepancies in the inspection process.
What are Ofsted’s stated aims?
“To improve lives by raising standards in education and children’s social care”
To conduct inspections that help services improve
To apply proportionate regulation focused on risks
To inform policy and practice through inspection insights
To adapt to ensure relevant and effective oversight
To be transparent and engage stakeholders
To what extent are these aims fulfilled in practice?
What does Ofsted do?
Publishes a rubric for judging school performance
Conducts two-day inspections assessing leadership, behaviour, curriculum, and safeguarding
Publishes inspection reports with brief recommendations on how to improve on future inspections
Releases subject reviews and clarifies inspection practices1
Shapes school priorities - schools align teaching and assessment with what they believe Ofsted wants
Encourages performative compliance - schools prepare evidence to impress inspectors
Promotes high-stakes accountability - risk aversion, exam focus, and data obsession follow
Increases workload and stress - inspection anticipation spikes pressure, sometimes with tragic outcomes
Constrains curriculum innovation - schools avoid novel approaches that might not align with Ofsted frameworks
Reinforces system hierarchies - inspections influence parental choice and school reputation
Shapes public perception - inspection grades serve as shorthand, often lacking nuance
Fuels a consultancy industry advising schools on how to pass inspections. Though diminished, it still exists.2
Each of these can reasonably be argued to be consequences of Ofsted’s activities.
What does Ofsted not do?
Improve student outcomes directly - this is the work of school staff
Close performance gaps - despite intentions, these remain stubborn or widen
Make schools happier, safer places - while safeguarding is important, cliff-edge judgments likely don’t help
These repeated outcomes reveal Ofsted’s true functional purpose. Whether intended or not, inspection arguably distorts the system more than it improves it.
Ofsted presents itself as the gold standard for school evaluation. But through the lens of construct validity, problems emerge. Construct validity asks whether a test genuinely measures what it claims to. Although inspectors believe they measure “quality of education,” a vague, complex construct, instead, they assesses proxies: lesson snapshots, curriculum maps, brief conversations with leaders, teachers and students etc. But can we meaningfully assess the quality of education a school provides in two days?
When inspections are supposed to reveal something as complex and long-term as the effectiveness of a school, we should ask: do Ofsted ratings correlate with other meaningful indicators? The answer is, at best, inconsistently. While high Ofsted grades sometimes align with good academic outcomes, the relationship is far from perfect. Blunt, high-stakes judgments are antithetical to consistent and valid measures of actual learning. Ofsted too often rely on unreliable proxies like short-term outcomes or performance during inspections, which tell us little about what pupils have genuinely learned or retained. This exposes the emperor’s new clothes: what passes for accountability is often little more than data theatre. If we’re serious about improving education, we need to stop mistaking performance for learning and start measuring what really matters.
Instead, too many inspections end up evaluating whether schools are good at being inspected. That is not the same thing as being good at educating. We mistake performance for learning. Worse still, inspection outcomes skew against disadvantaged schools. These schools are less likely to receive ‘Good’ or ‘Outstanding’ ratings, even after adjusting for prior attainment. If judgments track pupil intake more closely than teaching quality, we’re evaluating demographics, not education. This is a failure of discriminant validity: inspections measure the wrong things. A school might be good at being inspected but poor at educating. That’s a dangerous confusion.
To compound this, Ofsted’s impact is hardly neutral. High-stakes inspection distorts practice. Jane Perryman paints a bleak but familiar picture: under the watchful eye of inspectors, schools stop being places of learning and become stages for performance. The inspection regime, she argues, doesn’t just evaluate schools, it reshapes them. Teachers begin to teach not for understanding, but for optics. Leaders write policies for show, not for substance. In essence, schools are being judged on how well they simulate the illusion of effectiveness. It’s not about what works, it’s about what looks like it works. Perryman’s work reveals the corrosive impact of high-stakes accountability: it warps priorities, distorts identities, and rewards surface over depth.
None of this is to say that accountability is unimportant. But if we are to hold schools accountable, we need to be clear about what we’re holding them accountable for, and how we’re measuring it. If our tools can’t validly and reliably do that, then they’re not just ineffective, they’re dangerous. A poorly calibrated system of accountability risks incentivising the wrong behaviours, punishing the wrong schools, and rewarding performativity over substance.
In truth, Ofsted does achieve some of its aims: public visibility, pressure to improve, and a semblance of consistency. But if we’re serious about the quality of education rather than the appearance of it, then we need an accountability system with far stronger construct validity. Otherwise, we risk mistaking noise for signal and performance for learning. For all its good intentions, Ofsted has become something of a pantomime.
We must ask what a better system might look like.
A cybernetic alternative
One promising alternative lies not in reforming the current model, but in replacing its very operating logic. A cybernetic approach to inspection could help us move away from the language of accountability and towards the language of feedback and adaptation with the goal being stability and improvement over time, achieved not through one-off interventions but through constant calibration.
Contrast that with the current model of school inspection: infrequent, high-stakes, and heavily dependent on surface-level proxies. This is a closed-loop system with no capacity to learn. It provides judgment, not feedback; pressure, not precision. Unsurprisingly, it distorts rather than develops.
As we’ve seen, one of the foundational problems with the current framework is its failure to clearly define and validly measure the construct it claims to assess: ‘quality of education.’ Construct validity demands that we evaluate only what we have taught and only in ways that align with the underlying construct we wish to measure.
But what is quality? All too often, it’s operationalised through proxies: lesson snapshots, curriculum documents, books arranged in tidy piles. These indicators may be easy to observe, but they are rarely good measures of what pupils actually know, remember, and understand over time. In practice, inspection becomes a kind of educational theatre, performed by staff who have learned to play the game.
A cybernetic model would begin with better-defined constructs. It would privilege long-term retention over short-term compliance and would seek multiple sources of evidence over time: curriculum samples, student interviews, teacher reflections, and pupil work taken longitudinally. A functioning inspection system would acknowledge context without falling into relativism. It would ask: How well is this school serving its particular community? It would offer comparisons not across the entire system, but among similar schools. In doing so, it could help us understand what works in context, not just in theory.
Perhaps most importantly, a cybernetic system would reconceive the role of the inspector from auditor to professional interlocutor. Inspectors should be knowledgeable, credible, and current practitioners who engage schools in meaningful conversations about their curriculum, pedagogy, and pupil learning.
Rather than delivering summative judgments, inspections would be opportunities for formative insight. This is not to say accountability should disappear but that it should be made intelligent. Inspectors and schools would learn together, identifying strengths and weaknesses through evidence and shared expertise.
In cybernetic terms, the current system exhibits single-loop learning at best, tweaking the framework without questioning its assumptions. What we need is double-loop learning: the capacity for the system to reflect on its own logic, adjust its definitions of quality, and evolve in response to its failures.
[A] thermostat that automatically turns on the heat whenever the temperature in a room drops below 69°F is a good example of single-loop learning. A thermostat that could ask, "why am I set to 69°F?" and then explore whether or not some other temperature might more economically achieve the goal of heating the room would be engaged in double-loop learning.
Chris Agyris, ‘Teaching Smart People How To Learn’
Imagining an inspection process which built in a process which forced reflection on how future inspections - both of a specific whole and across the system - presents us with the possibility of recursive system where both individual inspectors and Ofsted as a whole would constantly be improving. Schools’ feedback on what went well and what felt wrong would be scrutinised, sifted for gold and then fed back into policy and training.
For this to happen, the system itself must be accountable. And, if Ofsted is to be truly accountable, it must be willing to look in the mirror and ask whether it is doing more harm than good. Accountability shouldn’t be a cudgel wielded from on high, but a reciprocal arrangement rooted in mutual trust, shared expertise and transparency. That means opening up its processes to scrutiny, being honest about what it can and cannot measure, and having the humility to admit when it gets things wrong. It means acknowledging that schools are complex, messy systems—full of good people trying their best—and that real improvement comes not from fear, but from feedback. But, if inspections continue to result in gaming, performativity, and demoralisation, then the problem is not with the schools, it’s with the design of the inspection process.
It is tempting to think we can fix Ofsted by adjusting the criteria, smoothing the tone, or softening the consequences. But the problem is not just in the framework, it is in the logic of the model itself. A cybernetic approach offers an alternative. One rooted in feedback rather than finality. In learning rather than labelling. In substance rather than show. It wouldn’t be perfect but it might be better aligned with how schools, teachers, and pupils actually grow.
If an inspection system encourages looking good over being good, then its incentives are perverse. It’s time to stop playing the game and start designing a system better suited to tackling the endemic problems inherent in improving education.
In Part 1 of this series looking at cybernetics and systems thinking, I discussed how recognising predictable problems flagged by systems theory might help school leaders avoid avoidable traps. In Part 2, I examined the structure of systems theory and how understanding these features could help prevent drift from our stated aims.
While recent subject reviews have been generally very good, this has not always been the case, but regardless of my biases this activity serves to elevate Ofsted as a ‘go to’ source for ‘best practice’ which can blur the line between inspection and prescription. It also creates a ‘preferred canon’ of educational thinking and research
To be completely fair, Sir Michael Wilshaw (CHMI 2012-2016) explicitly dismantled many of the systems causing this outcome and, although it still happens, there is far less of it as a direct result. That said, there continues to be a ‘gotcha’ culture that trains schools to game inspections