This is the fourth post in a five part series on Ofsted. You can read the previous post here and the next post here.
We have today published a new academic paper investigating how Ofsted inspection outcomes vary across inspectors with different characteristics. This has been supported by the Nuffield Foundation and uses data we have pulled together on approximately 30,000 school inspections conducted between September 2011 and August 2019.
You can read a full version of our academic working paper along with our responses to some FAQs about the research.
This fourth blog in the series provides an illustrative example of how inspection outcomes differ across two lead inspectors with very different characteristics.
Why are you looking at this?
In our previous blogs we have considered each characteristic of the lead inspector (or inspection team) in isolation.
Yet, in reality, lead inspectors differ in terms of multiple characteristics. For instance, one school may be inspected by an inexperienced female HMI working as part of a team. In contrast, another may be inspected by an experienced, male OI who is working alone.
How, then, does the combination of different characteristics of lead inspectors and the size of their team correlate with inspection outcomes?
Let’s take a look…
What do we find when looking at multiple characteristics jointly?
Table 1 provides the answer for primary schools, where we estimate the distribution of Overall Effectiveness judgements awarded by two different hypothetical inspectors. These estimates control – as far as possible – for background differences in the types of schools they are deployed to inspect. Note that, given the results presented in our previous blogs, we focus on primary schools because this seems to be where the biggest differences are.
As can be seen, a fairly substantial difference emerges.
Inspector A – a female HMI working with one other inspector – has a 49% chance of awarding a primary school an Inadequate or Requires Improvement grade.
This is notably higher than for inspector B – a male OI working alone– where the chance of an Inadequate or Requires Improvement grade being awarded is only 32%
The difference in outcomes between these two hypothetical inspectors is most notable at the Inadequate grade (13% for inspector A versus 3% for inspector B).
We also find a similar difference between these hypothetical inspectors in the outcomes from Ofsted’s short inspections (see blog 5 for further information about these).
The same caveats apply, however, as we discussed in our second blog. Our analysis has only been able to control for a limited set of background factors, and Ofsted might deploy our two hypothetical inspectors to different inspection tasks. For instance, they might be more likely to assign larger teams, led by a female HMI to primary schools where there are safeguarding concerns (which we are unable to control for). This could – at least partially – explain some of the difference we observe in Table 1, including the large difference observed for Inadequate judgements.
It should be noted that the results presented in Table 1 represent quite an extreme example. We are comparing two hypothetical inspectors at either end of the distribution (the most “lenient” and the “harshest”). Table 2 therefore also presents an alternative view of the results, illustrating the probability of a primary school receiving an Inadequate or Requires Improvement judgements across eight different hypothetical inspectors.
And, as our previous blogs have shown, some characteristics of lead inspectors and their teams are more strongly associated with inspection outcomes than others. For the two hypothetical inspectors considered in Table 1, the HMI versus OI distinction will be responsible for a fair chunk of the difference observed.
Yet, in reality, schools will indeed receive inspections led by very different inspectors. Not only in terms of the characteristics we can observe in Tables 1 and 2, but also many other potentially important unobservable characteristics as well (e.g. personality types, experiences of leading challenging schools).
We hence hope that the results presented in Table 1 are at least a useful thought experiment for readers, in terms of how much inspection outcomes may differ in the extreme across rather different lead inspectors.