Got the T-shirt (a moderate tale)

Given that teacher assessment is a nonsense which lacks reliability, and that moderation can not really reduce this, nor ensure that gradings are comparable, our moderation experience was about as good as it could be! It was thus:

Each of we two Y6 teachers submitted all our assessments and three children in each category (more ridiculous, inconsistent and confusable codes, here), of which one each was selected, plus another two from each category at random. So, nine children from each class. We were told who these nine were a day in advance. Had we wanted to titivate, we could have, but with our ‘system’ it really wasn’t necessary.

The ‘system’ was basically making use of the interim statements and assigning each one of them a number. Marking since April has involved annotating each piece of work with these numbers, to indicate each criterion. It was far less onerous than it sounds and was surprisingly effective in terms of formative assessment. I shall probably use something similar in the future, even if not required to present evidence.

The moderator arrived this morning and gave us time to settle our classes whilst she generally perused our books. I had been skeptical. I posted on twitter that though a moderator would have authority, I doubted they’d have more expertise. I was concerned about arguing points of grammar and assessment. I was wrong. We could hardly have asked for a better moderator. She knew her stuff. She was a y6 teacher. We had a common understanding of the grammar and the statements. She’d made it her business to sample moderation events as widely as possible and therefore had had the opportunity to see many examples of written work from a wide range of schools. She appreciated our system and the fact that all our written work from April had been done in one book.

Discussions and examination of the evidence, by and large led to an agreed assessment. One was raised from working towards; one, who I had tentatively put forward as ‘greater depth’, but only recently, was agreed to have not quite made it. The other 16 went through as previously assessed, along with all the others in the year group. Overall my colleague and I were deemed to know what we were doing! We ought to, but a) the county moderation experience unsettled us and fed my ever-ready cynicism about the whole business and b) I know that it’s easy to be lulled into a false belief that what we’ve agreed is actually the ‘truth’ about where these pupils are at. All we can say is that we roughly agreed between the three of us. The limited nature of the current criteria makes this an easier task than the old levels, (we still referred to the old levels!) but the error in the system makes it unusable for accountability or for future tracking. I’m most interested to see what the results of the writing assessment are this year – particularly in moderated v non-moderated schools. Whatever it is, it won’t be a reliable assessment but, unfortunately it will still be used (for good or ill) by senior leaders, and other agencies, to make judgements about teaching.

Nevertheless, I’m quite relieved the experience was a positive one and gratified and somewhat surprised to have spent the day with someone with sense and expertise. How was it for you?






Shouldn’t we just say ‘no’?

I’m beginning to wonder why we are playing their game at all. Why are we not questioning the basis for the assumptions about what children should know/be able to do by whatever year, as prescribed in the new curriculum and the soon to be published, rapidly cobbled together, waste of time and paper that are the new ‘descriptors’. Have they based these on any actual research other than what Michael Gove dimly remembered from his own school days?

We recently purchased some published assessments, partly, I’m sorry to say, on my suggestion that we needed something ‘external’ to help us measure progress, now that levels no longer work. It wasn’t what I really wanted – I favour a completely different approach involving sophisticated technology, personal learning and an open curriculum, but that’s another long story and potential PhD thesis! Applying these assessments, though, is beginning to look unethical, to say the least. I’ve always been a bit of a fan of ‘testing’ when it’s purposeful, aids memory and feeds back at the right level, but these tests are utterly demoralising for pupils and staff and I’m pretty sure that’s not a positive force in education. I’m not even sure that I want to be teaching the pupils to jump through those hoops that they’re just missing; I strongly suspect they are not even the right hoops – that there are much more important things to be doing in primary school that are in no way accounted for by the (currently inscrutable) attaining/not attaining/exceeding criteria of the new system.

So what do we do when we’re in the position of being told we have to do something that is basically antagonistic to all our principles? Are we really, after all this time, going to revert to telling pupils that they’re failures? It seems so. Historically, apart from the occasional union bleat, teachers in England have generally tried their best to do what they’re told, as if, like the ‘good’ pupils they might have been when they were at school, they believe and trust in authority. Milgram would have a field day. Fingers on buttons, folks!

Moderation still doesn’t tell us the weight of the pig.

The recent culture of leaving more and more of the process of assessment in the hands of teachers, raises the important question of reliability. Much research into teacher assessment, even by strong proponents of its advantages, reveals that it is inherently unreliable. We might have guessed this from our experience of human beings and the reliability of their subjective judgements! This is difficult even for quantitative measures, such as the weight of a pig at a fair, but much more so for such qualitative aspects as those in the wording of rubrics. These are what we are currently working with in the English primary school system. We teachers are required to be: assessing formatively and feeding back; summing up and evaluating; reporting in an unbiased way and  all along being held accountable for progress, which we, ourselves are expected to be measuring. Imagine, if you will the aforementioned pig. Judge its weight yourself now, and then again when you have fed it for a month, but bear in mind that you will be accountable for the progress it has made. How reliable will either of these judgements be?

So, in an attempt to improve the reliability of teacher assessments (in order for them to have high-stakes, accountability purposes) we introduce the idea of moderation. This usually takes the form of a colleague or external moderator assisting in the judgement, based on the ‘evidence’ produced by the teacher. Now, whilst I can see the value to the teacher of the moderation process, if it involves discussion of criteria and evidence with colleagues and supposed ‘experts’ (who, exactly?), I’m skeptical that simply introducing more people into the discussion will lead to greater reliability. The problem is that the external yardstick is still missing. Even if the teacher and all those involved in the moderation process agree on the level, objective or whatever measurement is required of us, we are still making subjective judgements. Are collective, subjective judgements any better than individual ones? Sometimes, they may be if they genuinely have the effect of moderating extremes. However, we need also to consider the impact of cultural drift. By this, I mean that there is a group effect that reinforces bias and this does have an impact on assessment. I am convinced that I witnessed this over the years in the assessment of writing, where the bar for attaining each level seemed to continually be raised by teachers, afraid that they would be accused of inflating results – a real shame for the pupils who were being judged unfairly. In these instances, the moderation process doesn’t improve reliability; all it does is give a false sense of it which is then resistant to criticism or appeal. This is where we all stand around staring at the pig and we all agree that he looks a bit thinner than he should. Without the use of a weighing device, we really do not know.

June 2016

I had a look back at this post – moderation being in the wind at the moment. I was interested in articles such as this one and I wonder what it will take to stop doing such pointless, meaningless practices in education? Do we not know? Do some people still believe these things work? Isn’t it a bit obvious that teacher assessment for high stakes purposes is completely counter-productive and that moderation can in no way be considered a strategy to achieve greater reliability?

I’d like to extend the ubiquitous pig metaphor now. In the case of primary writing moderation in 2016, it’s not even a case of staring at the pig and guessing its weight. We have a farmer who has a whole field of pigs – he has been told to guess all their weights, but he’d better not have more than 30% underweight! In order to make sure he doesn’t cheat, another farmer comes along, equally clueless, and tells him whether he thinks the farmer’s guesses are the same as his own guesses. The farmer next door doesn’t have to go through this pointless ritual. Strangely, that farmer’s pigs are all just a little fatter.

Another pointless consultation

The DfE are apparently ‘seeking views on draft performance descriptors for determining pupil attainment at the end of key stages 1 and 2’.

They have previously ‘sought views’ on the draft national curriculum and the assessment policy, which they acknowledged and then proceeded to largely ignore. I should imagine this will be no different. Needless to say, I still responded, as I did with the others, if only for the opportunity to point out how vague and meaningless their descriptors are.

My response in brief:

It is really important that you remove all vague terminology, such as ‘increasing’, or ‘wider’. In removing levels, you acknowledged the unreliability of the system and difficulty faced by teachers in agreeing levels. This document falls into the same trap. It would be far better to provide examples of what was expected at each key stage (and in each year), than these vague descriptions, some of which could apply to any level of study (Reception to post-doctoral). Many teachers have worked for years on helping colleagues to understand exactly what was required to show a pupil’s attainment, and in one fell swoop, the new curriculum has demolished all that work without replacing it with anything effective. Give us a standardised set of concrete examples and explanations (not exemplars of pupils’ work), along the lines of those provided by Kangaroo Maths (when we were grappling with what the levels represented in the old curriculum). Give us some e-assessment software that will allow us to quickly determine and collate this information.

I did want also to say, ‘Give us some mid 20th Century text books, since that’s obviously the source of your ‘new’ curriculum.’ In actual fact this isn’t just a just a bitter jibe. A text book would at least guide us through the current morass. We could really do with some clarity and consistency. I suggest a state of the art information source written by actual experts rather than the range of opportunistic publications which will be cobbled together by commercial companies who are ill-prepared to jump on this latest bandwagon.