Ed Select Committee report – improvements to come?

The Education Select Committee has published its report into the impact of the changes to primary assessment. It’s been an interesting journey from the point at which I submitted written evidence on primary assessment; I wrote a blog back in October, where I doubted there would be much response, but in fact I was wrong. Not only did they seem to draw widely from practioners, stake-holders and experts to give evidence, the report actually suggests that they might have listened quite well, and more to the point, understood the gist of what we were all trying to say. For anyone who had followed assessment research, most of this is nothing new. Similar things have been said for decades. Nevertheless, it’s gratifying to have some airing of the issues at this level.

Summative and formative assessment

The introduction to the report clarifies that the issues being tackled relate to summative assessment and not the ongoing process of formative assessment carried out be teachers. For me, this is a crucial point, since I have been trying, with some difficulty sometimes, to explain to teachers that the two purposes should not be confused. This is important because the original report on assessment without levels suggested that schools had ‘carte blanche’ to create their own systems. Whilst it also emphasised that purposes needed to be clear, many school systems were either extensions of formative assessment that failed to grasp the implications and the requirements of summative purposes, or they were clumsy attempts to create tracking systems based on data that really had not been derived from reliable assessment!

Implementation and design

The report is critical of the time-scale and the numerous mistakes made in the administration of the assessments. They were particularly critical of the STA, which was seen to be chaotic and insufficiently independent. Furthermore, they criticise Ofqual for lack of quality control, in spite of Ofqual’s own protestations that they had scrutinised the materials. The report recommends an independent panel to review the process in future.

This finding is pretty damning. This is not some tin-pot state setting up its first exams – how is incompetence becoming normal? In a climate of anti-expertise, I suppose it is to be expected. It will be very interesting to see if the recommendations have any effect in this area.

The Reading Test

The report took on board the wide-spread criticism of the 2016 Reading Test. The STA defense was that it had been properly trialled and performed as expected. Nevertheless, the good news (possibly) is that the Department has supposedly “considered how this year’s test experience could be improved for pupils”. 

Well we shall see on Monday! I really hope they manage to produce something that most pupils will at least find vaguely interesting to read. The 2016 paper was certainly the least well-received of all the practice papers we did this year.

Writing and teacher assessment

Teacher assessment of writing emerged as something that divided opinion. On the one hand there were quotes from heads who suggested that ‘teachers should be trusted’ to assess writing. My view is that they miss the point and I was very happy to be quoted alongside Tim Oats, as having deep reservations about teacher assessment. I’ve frequently argued against it for several reasons (even when moderation is involved) and I believe that those who propose it may be confusing the different purposes of assessment, or fail to see how it’s not about ‘trust’ but about fairness to all pupils and an unacceptable burden on teachers.

What is good to see, though, is how the Committee have responded to our suggested alternatives. Many of us referred to ‘Comparative Judgement’ as a possible way forward. The potential of comparative judgement as an assessment method is not new, but is gaining credibility and may offer some solutions – I’m glad to see it given space in the report. Something is certainly needed, as the way we currently assess writing is really not fit for purpose. At the very least, it seems we may return to a ‘best-fit’ model for the time being.

For more on Comparative Judgment, see:

Michael Tidd  The potential of Comparative Judgement in primary

Daisy Christodoulou Comparative judgment: 21st century assessment

No More Marking

David Didau  10 Misconceptions about Comparative Judgement

Support for schools

The report found that the changes were made without proper training or support. I think this is something of an understatement. Systems were changed radically without anything concrete to replace them. Schools were left to devise their own systems and it’s difficult to see how anyone could not have foreseen that this would be inconsistent and often  inappropriate. As I said in the enquiry, there are thousands of primary schools finding thousands of different solutions. How can that be an effective national strategy, particularly as, by their own admission, schools lacked assessment expertise? Apparently some schools adopted commercial packages which were deemed ‘low quality’. This, too, is not a surprise. I know that there are teachers and head-teachers who strongly support the notion of ‘doing their own thing’, but I disagree with this idea and have referred to it in the past as the ‘pot-luck’ approach. There will be ways of doing things that are better than others. What we need to do is to make sure that we are trying to implement the most effective methods and not leaving it to the whim of individuals. Several times, Michael Tidd has repeated that we were offered an ‘item bank’ to help teachers with ongoing assessment. The report reiterates this, but I don’t suggest we hold our collective breath.

High-stakes impact and accountability

I’m sure the members of the Assessment Reform Group, and other researchers of the 20th century, would be gratified to know that this far down the line we’re still needing to point out the counter-productive nature of high-stakes assessment for accountability! Nevertheless, it’s good to see it re-emphasised in no uncertain terms and the report is very clear about the impact on well-being and on the curriculum. I’m not sure that their recommendation that OFSTED broadens its focus (again), particularly including science as a core subject, is going to help. OFSTED has already reported on the parlous state of science in the curriculum, but the subject has continued to lose status since 2009. This is as a direct result of the assessment of the other subjects. What is assessed for accountability has status. What is not, does not. The ASE argues (and I totally understand why) that science was impoverished by the test at the end of the year. Nevertheless, science has been impoverished far more, subsequently, in spite of sporadic ‘success stories’ from some schools. This is a matter of record. (pdf). Teacher assessment of science for any kind of reliable purpose is even more fraught with difficulties than the assessment of writing. The farce, last year, was schools trying to decide if they really were going to give credence to the myth that their pupils had ‘mastered’ all 24 of the objectives or whether they were going to ‘fail’ them. Added to this is the ongoing irony that primary science is still ‘sampled’ using an old-fashioned conventional test. Our inadequacy in assessing science is an area that is generally ignored or, to my great annoyance, completely unappreciated by bright-eyed believers who offer ‘simple’ solutions. I’ve suggested that complex subjects like science can only be adequately assessed using more sophisticated technology, but Edtech has stalled in the UK and so I hold out little hope for developments in primary school!

When I think back to my comments to the enquiry, I wish I could have made myself clearer in some ways. I said that if we want assessment to enhance our pupils’ education then what we currently have is not serving that purpose. At the time, we were told that if we wished to further comment on the problem of accountability, then we could write to the Committee, which I did. The constant argument has always been ‘…but we need teachers to be accountable.’ I argued that they need to be accountable for the right things and that a single yearly sample of small populations in test conditions, did not ensure this. This was repeated by so many of those who wrote evidence for the Committee, that it was obviously hard to ignore. The following extract from their recommendations is probably the key statement from the entire process. If something changes as a result of this, there might be a positive outcome after all.

Many of the negative effects of assessment are in fact caused by the use of results
in the accountability system rather than the assessment system itself. Key Stage 2
results are used to hold schools to account at a system level, to parents, by Ofsted, and results are linked to teachers’ pay and performance. We recognise the importance of holding schools to account but this high-stakes system does not improve teaching and learning at primary school. (my bold)

Not good is sometimes good

I was reading Beth Budden’s blog on the cult of performativity in education and thinking of the many times when I’ve thanked the gods no-one was watching a particular lesson. It’s gratifying that there is a growing perception that a single performance in a 40 minute session is no kind of measure of effectiveness – I’ve railed against that for many years. During observations, I’ve sometimes managed to carry off the performance (and it’s always a hollow victory) and sometimes I haven’t (it always leads to pointless personal post-mortems). Lately I’ve managed to introduce the idea that I will give a full briefing of the lesson, the background, my rationale, the NC, the focus, the situation etc. etc. before any member of the leadership team sets foot in my classroom to make a formal observation. It’s been a long time coming and it goes some way to mitigating the performance effects. Not everyone in my school does it.

But what about the lessons that I really didn’t want anyone to watch? If they had, would I be recognised as a bad teacher?  If I think about lessons that seem to have been pretty poor by my own judgement, they almost always lead on to a better understanding overall. A recent example is a lesson I taught (nay crammed) on the basics of electricity. It was a rush. The pupils needed to glean a fair amount of information in a short time from a number of sources. The resultant writing showed that it was poorly understood by everyone. Of course, it was my fault and I’d have definitely failed that lesson if I were grading myself. Fortunately I wasn’t being graded and nobody was watching. Fortunately, also,  I could speak to the pupils the day after looking at their confused writing on the subject, tell them that I took responsibility for it being below par and say that we needed to address the myriad of misconceptions that has arisen. We did. The subsequent work was excellent and suggested a far higher degree of understanding from all; I assumed that something had been learned. Nowhere in here was a ‘good lesson’ but somewhere in here was some actual education – and not just about electricity.



Final report of the Commission on Assessment without Levels – a few things.

I’ve read the report and picked out some things. This is not a detailed analysis, but more of a selection of pieces relevant to me and anyone else interested in primary education and assessment:

Our consultations and discussions highlighted the extent to which teachers are subject to conflicting pressures: trying to make appropriate use of assessment as part of the day-today task of classroom teaching, while at the same time collecting assessment data which will be used in very high stakes evaluation of individual and institutional performance. These conflicted purposes too often affect adversely the fundamental aims of the curriculum,

Many of us have been arguing that for years.

the system has been so conditioned by levels that there is considerable challenge in moving away from them. We have been concerned by evidence that some schools are trying to recreate levels based on the new national curriculum.

Some schools are hanging on to them like tin cans in the apocalypse.

levels also came to be used for in-school assessment between key stages in order to monitor whether pupils were on track to achieve expected levels at the end of key stages. This distorted the purpose of in-school assessment,

Whose fault was that?

There are three main forms of assessment: in-school formative assessment, which is used by teachers to evaluate pupils’ knowledge and understanding on a day-today basis and to tailor teaching accordingly; in-school summative assessment, which enables schools to evaluate how much a pupil has learned at the end of a teaching period; and nationally standardised summative assessment,

Try explaining that to those who believe teacher assessment through the year can be used for summative purposes at the end of the year.

many teachers found data entry and data management in their school burdensome.

I love it, when it’s my own.

There is no intrinsic value in recording formative assessment;

More than that – it degrades the formative assessment itself.

the Commission recommends schools ask themselves what uses the assessments are intended to support, what the quality of the assessment information will be,

I don’t believe our trial system using FOCUS materials and assigning a score had much quality. It was too narrow and unreliable. We basically had to resort to levels to try to achieve some sort of reliability.

Schools should not seek to devise a system that they think inspectors will want to see;


Data should be provided to inspectors in the format that the school would ordinarily use to monitor the progress of its pupils

‘Ordinarily’ we used levels! This is why I think we need data based on internal summative assessments. I do not think we can just base it on a summative use of formative assessment information!

The Carter Review of Initial Teacher Training (ITT) identified assessment as the area of greatest weakness in current training programmes.

We should not expect staff (e.g. subject leaders) to devise assessment systems, without having had training in assessment.

The Commission recommends the establishment of a national item bank of assessment questions to be used both for formative assessment in the classroom, to help teachers evaluate understanding of a topic or concept, and for summative assessment, by enabling teachers to create bespoke tests for assessment at the end of a topic or teaching period.

But don’t hold your breath.

The Commission decided at the outset not to prescribe any particular model for in-school assessment. In the context of curriculum freedoms and increasing autonomy for schools, it would make no sense to prescribe any one model for assessment.

Which is where it ultimately is mistaken, since we are expected to be able to make comparisons across schools!

Schools should be free to develop an approach to assessment which aligns with their curriculum and works for their pupils and staff


Although levels were intended to define common standards of attainment, the level descriptors were open to interpretation. Different teachers could make different judgements

Well good grief! This is true of everything they’re expecting us to do in teacher assessment all the time.

Pupils compared themselves to others and often labelled themselves according to the level they were at. This encouraged pupils to adopt a mind-set of fixed ability, which was particularly damaging where pupils saw themselves at a lower level.

This is only going to be made worse, however, by the ‘meeting’ aspects of the new system.

Without levels, schools can use their own assessment systems to support more informative and productive conversations with pupils and parents. They can ensure their approaches to assessment enable pupils to take more responsibility for their achievements by encouraging pupils to reflect on their own progress, understand what their strengths are and identify what they need to do to improve.

Actually, that’s exactly what levels did do! However…

The Commission hopes that teachers will now build their confidence in using a range of formative assessment techniques as an integral part of their teaching, without the burden of unnecessary recording and tracking.

They hope?

Whilst summative tasks can be used for formative purposes, tasks that are designed to provide summative data will often not provide the best formative information. Formative assessment does not have to be carried out with the same test used for summative assessment, and can consist of many different and varied tasks and approaches. Similarly, formative assessments do not have to be measured using the same scale that is used for summative assessments.

OK – this is a key piece of information that is misunderstood by nearly everybody working within education.

However, the Commission strongly believes that a much greater focus on high quality formative assessment as an integral part of teaching and learning will have multiple benefits:

We need to make sure this is fully understood. We must avoid formalising what we think is ‘high quality formative assessment’ because that will become another burdensome and meaningless ritual. Don’t get me started on the Black Box!

The new national curriculum is founded on the principle that teachers should ensure pupils have a secure understanding of key ideas and concepts before moving onto the next phase of learning.

And they do mean 100% of the objectives.

The word mastery is increasingly appearing in assessment systems and in discussions about assessment. Unfortunately, it is used in a number of different ways and there is a risk of confusion if it is not clear which meaning is intended

By  leading politicians too. A common understanding of terms is rather important, don’t you think?

However, Ofsted does not expect to see any specific frequency, type or volume of marking and feedback;

OK, it’s been posted before, but it’s worth reiterating. Many SL and HTs are still fixated on marking.

On the other hand, standardised tests (such as those that produce a reading age) can offer very reliable and accurate information, whereas summative teacher assessment can be subject to bias.

Oh really? Then why haven’t we been given standardised tests and why is there still so much emphasis on TA?

Some types of assessment are capable of being used for more than one purpose. However, this may distort the results, such as where an assessment is used to monitor pupil performance, but is also used as evidence for staff performance management. School leaders should be careful to ensure that the primary purpose of assessment is not distorted by using it for multiple purposes.

I made this point years ago.

Primary Science Assessment – not Even Close, Yet.

Assessment of primary science is something of a bugbear of mine. While I consider the so-called ‘formative assessment’ (it should never have been called ‘assessment’) to be no more or less of a challenge than the other core subjects, summative assessment of science is different. There is a multitude of research papers and writings on just how difficult it is to assess it properly for any type of measurement, particularly to track progress and for accountability purposes. In the UK the decline of science since the demise of the KS2 SATs test, has passed into legend. Check out OFSTED’s Maintaining Curiosity, for an official account of just how dire the situation is. It’s now been six years, since that event, however, and the protagonists in the world of science education and assessment have pretty much failed to come up with anything manageable and reliable. I’m not surprised; I think the job is almost impossible. However, I am surprised that they continue to try to fool themselves into thinking that it isn’t. Examples of advice from the most authoritative of sources are here and here and I’m very appreciative of their efforts, but I look at these and my heart sinks. I can’t imagine these ideas being put into effective practice in real primary schools.

When I was pushing to try and influence the protagonists, before they finished their projects and put their suggestions out to teachers, I compiled a list of questions which I felt needed to be addressed in thinking about assessment in primary science. I see very little to give me hope that these have been addressed. My main concern is that there is a persistent belief in the ‘magic’ of teacher assessment and moderation, serving a high-stakes purpose.

  • Should we really be dissolving the formative/summative divide?
    • I have seen much confusion amongst teachers as to the purposes of assessment and they often conflate summative and formative, unwittingly, to the detriment of both.
  • Isn’t there more clarity needed on just how assessment can be made to serve different purposes?
    • Isn’t there a fair amount of debate about this in the literature?
  • How do we avoid serving neither very well?
  • How do we use formative information for summative purposes when this is often information gained in the early stages of learning and therefore not fair to pupils who may have progressed since its capture?
  • If summative assessments are to be used for high stakes purposes, how do we ensure that summarised, formative information really quantifies attainment and progress?
  • How can we avoid teachers always assessing instead of teaching?
Teacher assessment
  • Can we really resolve the issue of unreliability of teacher assessment when used in high-stakes settings?
  • Is it fair to expect teachers to carry out teacher assessments when they are directly impacted by the outcome of those assessments?
  • How do we make teacher assessment fair to all the children in the country if it is not standardised? – How do we avoid a ‘pot-luck’ effect for our pupils?
  • Have we really addressed the difficulty of assessing science as a multi-faceted subject?
  • How can we streamline this process?
  • How can we make sure it doesn’t feel to teachers as though they would be assessing science all the time?
Moderation and reliability
  • Are researchers assuming that moderation is a simple and effective ‘catch all’ to achieve reliability?
  • Do researchers know that this often feels like something that is done ‘to’ teachers and not part of a collaborative process?
    • This is a fraught process in many schools. It takes up an enormous amount of time and can be very emotional if judgements are being made and if there are disagreements. Moderation helps to moderate extremes, but can also lead groups in the wrong direction.
  • Will schools be able to give over the time required to adequately moderate science?
  • Is there really a good evidence base for the effectiveness of moderation on reliability?
  • Do we need to clarify the exact process of moderation?
  • Is ‘reliable’ something that is actually achievable by any assessment system? Should we not be talking about maximising rather than achieving reliability?


On Friday I went to Reading to attend the ASE (Association for Science Education) conference and one of the sessions was run by an OFSTED HMI. I took some notes and these are as below for your interest (!).
Looking at books to monitor progress
Apparently, since they rarely see any actual science going on, they tend to look in pupil books to see what science is happening. Hopefully we can point them in some more directions than that, e.g. pictures and videos etc. Talking to teachers and pupils would be nice.
Levels or not – so what?
They don’t care what we call the levels/degrees/grades/points etc. They want to know how we use assessment to identify whether or not individuals are making progress, how we identify those falling behind and what we do about it.
Evidence of feedback making a difference
It’s crucial to allow time to feed back to pupils and for them to respond in a way that shows that they have overcome misconceptions or improved understanding. This really needs to be built into the time we give to lessons. I know I have to do this, but I still tend to start ‘new’ lessons sometimes without thinking about whether I have finished the previous one and done all the follow up properly. My junior school teachers were brilliant at this. Why do we still need to be told?
General statements to parents will be fine.
Just like the ones we gave out after parents’ evening last time. We wrote a descriptive summary on each of the core subjects, instead of just giving them the level. They actually preferred it.
Heads up on schools paying lip service to evolution.
They’ve been given instructions to look out for schools teaching evolution but only because they ‘have to’ and giving any kind of weight to ‘alternative theories’ – these are not scientific theories – they are religious indoctrination by the back door.
Detailed formative and summative information
Show high expectations
Be careful in any ‘differentiation by task’, since this frequently consigns the lower attaining pupils to lower expectations. Pupils should have access to the curriculum relevant to their age. Good – because I’ve been saying this for years. Differentiation by preplanned task is counter-productive.
We need to have local cluster moderation
Or we’ll deceive ourselves about our assessments (?).
Make sure pupils finish what they start
Unfinished work is a dead giveaway that we’re not allowing for follow up time. Make sure we allow for pupils to finish in subsequent sessions.
Make sure the work is by and from the children
There should not be work by the teacher in the pupils’ books. Think about it – how much of the content of the books (backgrounds, printouts, learning intention decorations, worksheets, proformas etc.) is currently produced by you?
It should not look all the same
Avoid ‘production line’ outcomes. Pupils’ work should demonstrate individuality.
Writing up science is literacy
I think we knew that.
Use past papers to assess units
Interestingly – the use of ‘test’ papers in a constructive way and to give good feedback etc. is recommended.
He also said that OFSTED inspectors were not allowed to say how any teacher ‘should have’ done anything. That’s considered giving advice. He said that they should only say what happened, what was successful and what was missing or not successful. Hmm…