Before you worry about results, consider the results of the results
I’ve participated in many discussions about the design of measurement systems, from examinations to performance management. While it is true that you can design measures to be closer or further from the real world entity you are interested in (the knowledge possessed by a student, for example, or the behaviour of a member of staff), and that this is a fascinating intellectual exercise, the practical value and harm of a measurement system does not seem solely tied to its validity. It is equally dependent on the consequences attached to the measurement.
This is a strong statement. And at the extremes it falls apart. It must matter what we choose to measure. Imagine if we judged the effectiveness of our school system solely by the number of trombone players it produced. This would be eccentric to say the least. Yet, at the same time, consider two different scenarios for the consequences of this measurement system. In the first scenario, there are grave consequences for any school that did not produce a certain high percentage of trombone players to a national standard. Head teachers are sacked; underperforming brass bands are forcibly academised. Compare this to a scenario where there are no heavy consequences attached to the number of trombone players: it does not affects jobs or status. Let’s say the number gets published annually in the statistical appendix of wind instrument competence. A couple of local papers do annual features on the best (and worst) schools in the country for trombone players.
In the first world, we would see maths and English lessons being shortened to make time for trombone lessons; we would see weak trombone players feeling inadequate about their skills and possibly even ‘managed out’ of institutions. The final year of primary school would involve a lot of trombone cramming. The assistant head teacher for brass instrument standards would attend frequent conferences on musical pedagogy and pore over the latest tips and fads. Aspirant parents would spend significant money on trombone lessons.
In the second world, there are probably a few more trombone players than in our own, but not much else of difference. Schools leaders are able to use their professional judgement about how much curriculum and management time to devote to the topic; young people take pride in a wider variety of accomplishments; parents save their money. A wide range of instruments are played, alongside the more traditional subjects.
Consequences are a significant determinant of the real world impact of a system of measurement. As managers, rather than scientists, real world impact is the purpose of measurement. The measures exist to make something happen. And the consequences of those measurements determine the amount of attention paid and the extent of the behaviour change that results.
High consequence measurement systems reduce professional judgement, narrow activity, limit innovation and encourage gaming or even cheating. They erode common sense, breadth, courage and ethical judgement.
If you think you have found the perfect measure, such that unlimited attention and behaviour to increase the measure can only be a good thing, then you will shrug your shoulders at this problem. The trouble is, that measure does not exist. No practical measurement system captures the full range of things we value about an institution, recognises the nuanced trade offs between priorities, adapts to changing circumstances or is immune to clever manipulation. In an ideal system, we rely on the common sense and integrity of those within it to navigate the inadequacies and limitations of our measures. But common sense and integrity are the very attributes that high consequences eliminate from the system. We encourage either unthinking or devious compliance, like some kind of modern day sorcerer’s apprentice.
The map is not the territory. The measure is not the goal. Yes, what gets measured gets done. But you don’t really want just the measure at any cost. You want the messy reality that it imperfectly represents and you want other things too.
You can adapt the consequences of a measurement system by working with severity, breadth, frequency and discretion. The worse case is a brutal punishment attached to a single measure which is frequently and automatically applied. In such an environment you will get almost robotically stupid behaviour or astonishing degrees of corruption.
The other extreme is not much better: zero consequences for even the most egregious failure or misconduct, complex measures which confuse and obscure, drift and high levels of subjectivity and bias. Clearly there is a need to navigate a sensible middle course: with a range of consequences (including positive ones), a sensible basket of measures taken at sensible intervals and a balance of rules and discretion. There’s no standard recipe for this, you’ll need to figure it out as you go.
But beware the shadow of accountability. The shadow cast by a measure falls further than its reality. Thus, for example, you may have a severe consequence for a result which is very rare. It could apply to one per cent of the population in question and the vast majority are unlikely to every come near it. You will find a surprising number of people worrying about it. Including people who should not be devoting any time to considering at all.
So the other requirements of an effective system of measurement and accountability are transparency and predictability. Participants need to understand the rules, understand their own position in relation to them and be able to predict the consequences of their choices and actions. They need to know if they are seriously at risk.
Moderate, pluralistic, intelligent, transparent and predictable… These are the hallmarks of a sensible measurement system. With those in place you can spend useful time honing the validity and reliability of the measures themselves. It is a waste of time to debate measures without also considering the consequences of those measurements.
After writing this blog, I came across a fascinating article by James Scott on the processes of “state legibility” which resonates with the theme of this blog. The section on scientific forestry in eighteenth century Prussia and Saxony is exquisite. The pursuit of standardised measurement ultimately meant, in the most poetic irony, that “the utilitarian state could, quite literally, not see the real existing forest for the (commercial) trees”.