If you’ve ever wondered whether the Big Brother of accountability is watching you; it is now. Ofsted has always carried out risk assessments within schools using pre-determined thresholds. It now has a new computerised algorithm to help it identify which good and outstanding schools it should be zooming in on.
Each risk assessment is in two stages: the first of published data now has the newly determined super algorithm driving it and the second part involves a wider data look by a senior HMI.
The new methodology is known as ‘supervised machine learning’.
“Supervised machine learning is a way of getting computers to make decisions that have not been explicitly programmed. A common application is classifying items into two or more groups.
In a typical application, there will be a large dataset called ‘training data’ for which we already know which groups the items belong to. This is used to train the machine learning algorithm to distinguish between unknown items.
Known inspection outcomes over the 2016/17 academic year were retrospectively predicted using a machine learning algorithm. The training dataset consisted of inspection outcomes and a wide range of other data that was available when the inspections took place, specifically:
- progress and attainment data from DfE
- school workforce census data
- Parent View responses.
The machine learning algorithm combined the data to make an optimum fit to the known inspection outcomes. The outcomes were coded as ‘Good or better’ or ‘Less than good’.
One drawback to machine learning can be that the predictions vary slightly according to the training dataset used. To help overcome this, we fitted a large number of models to slightly different training sets. In this way, we effectively produced a probability of a forthcoming inspection being less than good. This is our ‘raw risk score’, which takes a value between 0 and 1.”
As with all algorithms what you get out, in terms of data, is based on the quality of what you put in. My concern is not that Ofsted are using a computerised algorithm, in many ways this is arguably an improvement on the human bias inherent in any person based assessment system, it is the algorithm is built on inspection data from 2016/17 that has no reliability testing wrapped around it. If you put unreliable data in then you’ll get unreliable data out; this isn’t the computer’s fault rather a flaw in the design. If we are honest and accept that the inspection process has a degree of unreliability then we have to be brave and strong enough to reduce the current high stakes and cliff edges.
A copy of the full paper is here: