Ed Select Committee report – improvements to come?

The Education Select Committee has published its report into the impact of the changes to primary assessment. It’s been an interesting journey from the point at which I submitted written evidence on primary assessment; I wrote a blog back in October, where I doubted there would be much response, but in fact I was wrong. Not only did they seem to draw widely from practioners, stake-holders and experts to give evidence, the report actually suggests that they might have listened quite well, and more to the point, understood the gist of what we were all trying to say. For anyone who had followed assessment research, most of this is nothing new. Similar things have been said for decades. Nevertheless, it’s gratifying to have some airing of the issues at this level.

Summative and formative assessment

The introduction to the report clarifies that the issues being tackled relate to summative assessment and not the ongoing process of formative assessment carried out by teachers. For me, this is a crucial point, since I have been trying, with some difficulty sometimes, to explain to teachers that the two purposes should not be confused. This is important because the original report on assessment without levels suggested that schools had ‘carte blanche’ to create their own systems. Whilst it also emphasised that purposes needed to be clear, many school systems were either extensions of formative assessment that failed to grasp the implications and the requirements of summative purposes, or they were clumsy attempts to create tracking systems based on data that really had not been derived from reliable assessment!

Implementation and design

The report is critical of the time-scale and the numerous mistakes made in the administration of the assessments. They were particularly critical of the STA, which was seen to be chaotic and insufficiently independent. Furthermore, they criticise Ofqual for lack of quality control, in spite of Ofqual’s own protestations that they had scrutinised the materials. The report recommends an independent panel to review the process in future.

This finding is pretty damning. This is not some tin-pot state setting up its first exams – how is incompetence becoming normal? In a climate of anti-expertise, I suppose it is to be expected, but it will be very interesting to see if the recommendations have any effect in this area.

The Reading Test

The report took on board the wide-spread criticism of the 2016 Reading Test. The STA defense was that it had been properly trialled and performed as expected. Nevertheless, the good news (possibly) is that the Department has supposedly “considered how this year’s test experience could be improved for pupils”. 

Well we shall see on Monday! I really hope they manage to produce something that most pupils will at least find vaguely interesting to read. The 2016 paper was certainly the least well-received of all the practice papers we did this year.

Writing and teacher assessment

Teacher assessment of writing emerged as something that divided opinion. On the one hand there were quotes from heads who suggested that ‘teachers should be trusted’ to assess writing. My view is that they miss the point and I was very happy to be quoted alongside Tim Oates, as having deep reservations about teacher assessment. I’ve frequently argued against it for several reasons (even when moderation is involved) and I believe that those who propose it may be confusing the different purposes of assessment, or fail to see how it’s not about ‘trust’ but about fairness to all pupils and an unacceptable burden on teachers.

What is good to see, though, is how the Committee have responded to our suggested alternatives. Many of us referred to ‘Comparative Judgement’ as a possible way forward. The potential of comparative judgement as an assessment method is not new, but is gaining credibility and may offer some solutions – I’m glad to see it given space in the report. Something is certainly needed, as the way we currently assess writing is really not fit for purpose. At the very least, it seems we may return to a ‘best-fit’ model for the time being.

For more on Comparative Judgment, see:

Michael Tidd  The potential of Comparative Judgement in primary

Daisy Christodoulou Comparative judgment: 21st century assessment

No More Marking

David Didau  10 Misconceptions about Comparative Judgement

Support for schools

The report found that the changes were made without proper training or support. I think this is something of an understatement. Systems were changed radically without anything concrete to replace them. Schools were left to devise their own systems and it’s difficult to see how anyone could not have foreseen that this would be inconsistent and often  inappropriate. As I said in the enquiry, there are thousands of primary schools finding thousands of different solutions. How can that be an effective national strategy, particularly as, by their own admission, schools lacked assessment expertise? Apparently some schools adopted commercial packages which were deemed ‘low quality’. This, too, is not a surprise. I know that there are teachers and head-teachers who strongly support the notion of ‘doing their own thing’, but I disagree with this idea and have referred to it in the past as the ‘pot-luck’ approach. There will be ways of doing things that are better than others. What we need to do is to make sure that we are trying to implement the most effective methods and not leaving it to the whim of individuals. Several times, Michael Tidd has repeated that we were offered an ‘item bank’ to help teachers with ongoing assessment. The report reiterates this, but I don’t suggest we hold our collective breath.

High-stakes impact and accountability

I’m sure the members of the Assessment Reform Group, and other researchers of the 20th century, would be gratified to know that this far down the line we’re still needing to point out the counter-productive nature of high-stakes assessment for accountability! Nevertheless, it’s good to see it re-emphasised in no uncertain terms and the report is very clear about the impact on well-being and on the curriculum. I’m not sure that their recommendation that OFSTED broadens its focus (again), particularly including science as a core subject, is going to help. OFSTED has already reported on the parlous state of science in the curriculum, but the subject has continued to lose status since 2009. This is as a direct result of the assessment of the other subjects. What is assessed for accountability has status. What is not, does not. The ASE argues (and I totally understand why) that science was impoverished by the test at the end of the year. Nevertheless, science has been impoverished far more, subsequently, in spite of sporadic ‘success stories’ from some schools. This is a matter of record. (pdf). Teacher assessment of science for any kind of reliable purpose is even more fraught with difficulties than the assessment of writing. The farce, last year, was schools trying to decide if they really were going to give credence to the myth that their pupils had ‘mastered’ all 24 of the objectives or whether they were going to ‘fail’ them. Added to this is the ongoing irony that primary science is still ‘sampled’ using an old-fashioned conventional test. Our inadequacy in assessing science is an area that is generally ignored or, to my great annoyance, completely unappreciated by bright-eyed believers who offer ‘simple’ solutions. I’ve suggested that complex subjects like science can only be adequately assessed using more sophisticated technology, but edtech has stalled in the UK and so I hold out little hope for developments in primary school!

When I think back to my comments to the enquiry, I wish I could have made myself clearer in some ways. I said that if we want assessment to enhance our pupils’ education then what we currently have is not serving that purpose. At the time, we were told that if we wished to further comment on the problem of accountability, then we could write to the Committee, which I did. The constant argument has always been ‘…but we need teachers to be accountable.’ I argued that they need to be accountable for the right things and that a single yearly sample of small populations in test conditions, did not ensure this. This was repeated by so many of those who wrote evidence for the Committee, that it was obviously hard to ignore. The following extract from their recommendations is probably the key statement from the entire process. If something changes as a result of this, there might be a positive outcome after all.

Many of the negative effects of assessment are in fact caused by the use of results
in the accountability system rather than the assessment system itself. Key Stage 2
results are used to hold schools to account at a system level, to parents, by Ofsted, and results are linked to teachers’ pay and performance. We recognise the importance of holding schools to account but this high-stakes system does not improve teaching and learning at primary school. (my bold)

Perverse incentives are real

I’ve just spent a few pleasurable hours looking at the science writing from my y6 class. I say pleasurable, because they’re very good writers this year (thanks Mr M in y5!), but also because there were elements of their writing that hinted at an education. Some children had picked up on, and correctly reinterpreted, the higher level information I had given in reply to their questions on the chemistry of the investigation. All of them had made links with ‘the real world’ following the discussions we’d had.

It all sounds good doesn’t it?

The sad truth is that in spite of the fact that I’m an advocate of education not attainment, the knowledge of what will and will not form part of the end of year measurement is still there, influencing my decisions and having a detrimental impact on my education of the children.

This is because while I am marking their work, I am making decisions about feedback and whether  to follow up misconceptions, or take understanding further. Let’s remember that this is science. Although I personally view its study as crucial, and its neglect  as the source of most of the world’s ills, it has nevertheless lost its status in the primary curriculum. So my thoughts are, ‘Why bother? This understanding will not form part of any final assessment and no measurement of this will be used to judge the effectiveness of my teaching, nor of the school’. Since this is true for science, still nominally a ‘core subject’, how much more so for the non-entities of art, music, DT, etc.? Is there any point in pursuing any of these subjects in primary school in an educational manner?

The argument, of course, is that we have an ethical responsibility as educators to educate. That teachers worth their salt should not be unduly swayed by the knowledge that a narrow set of criteria for a small population of pupils are used at the end of KS2 to judge our success or failure. It reminds me of the argument that senior leaders shouldn’t do things just for OFSTED. It’s an unreasonable argument. It’s like saying to the donkeys, ‘Here’s a carrot and a very big stick, but just act as you would if they weren’t there!’

I’m not in favour of scrapping tests and I’m no fan of teacher assessment, but it’s undeniable, that what I teach is influenced by the KS2 SATs and not all in a good way. The primary  curriculum is vast. The attainment tests are narrow. It also brings into question all research based on using attainment data as a measure of success. Of course it’s true that the things they measure are important – they may even indicate something – but there are a lot of things which aren’t measured which may indicate a whole lot of other things.

I can’t see how we can value a proper primary education – how we can allow the pursuit of further understanding – if we set such tight boundaries on how we measure it. Testing is fine – but if it doesn’t measure what we value then we’ll only value what it measures. I’m resistant to that fact, but I’m not immune. I’m sure I’m no different to every other primary teacher out there. Our assessment system has to change so that we can feel fine about educating our pupils and not think we’re wasting our time if we pursue an area that doesn’t count towards a final mark.

 

 

 

Primary Science Assessment – no miracles here

In April I wrote here on the draft science assessment guidance from the TAPS group. The final version is now out in the public domain (pdf), described thus:

“The Teacher Assessment in Primary Science (TAPS) project is a 3 year project based at Bath Spa University and funded by the Primary Science Teaching Trust (PSTT), which aims to develop support for a valid, reliable and manageable system of science assessment which will have a positive impact on children’s learning.”

I was vainly hoping for a miracle: valid, reliable AND manageable! Could they pull off the impossible? Well if you read my original post, you’d know that I had already abandoned that fantasy. I’m sorry to be so disappointed – I had wished to be supportive, knowing the time, effort (money!) and best of intentions put into the project. Others may feel free to pull out the positive aspects but here I am only going to point out some of the reasons why I feel so let down.

Manageable?

At first glance we could could probably dismiss the guidance on the last of the three criteria straight away. 5 layers and 22 steps would simply not look manageable to most primary school teachers. As subject leader, I’m particularly focussed on teaching science and yet I would take one look at that pyramid and put it away for another day. Science has such low priority, regardless of the best efforts of primary science enthusiasts like myself, that any system which takes more time and effort than that given to the megaliths of English and Maths, is highly unlikely to be embraced by class teachers. If we make assessment more complicated, why should we expect anything else? Did the team actually consider the time it would take to carry out all of the assessment steps for every science objective in the New Curriculum? We do need to teach the subject, after all, even if we pretend that we can assess at every juncture.

Reliable?

In my previous post on this subject, I did include a question about the particular assessment philosophy of making formative assessment serve summative aims. I question it because I assert that it can not. It is strongly contested in the research literature and counter-indicated in my own experience. More importantly, if we do use AfL (assessment for learning/formative assessment) practices for summative data then in no way can we expect it to be reliable! Even the pupils recognise that it is unfair to make judgements about their science based on their ongoing work. Furthermore, if it is teacher assessment for high stakes or data driven purposes then it can not be considered reliable, even if the original purpose is summative. At the very least, the authors of this model should not be ignoring the research.

Valid?

Simply put, this means ‘does what it says on the tin’ – hence the impossibility of assessing science adequately. I’m frequently irritated by the suggestion that we can ‘just do this’ in science. Even at primary school (or perhaps more so) it’s a massive and complex domain. We purport to ‘assess pupils’ knowledge, skills and understanding’ but these are not simply achieved. At best we can touch on knowledge, where at least we can apply a common yardstick through testing. Skills may be observed, but there are so many variables in performance assessment that we immediately lose a good deal of reliability. Understanding can only be inferred through a combination of lengthy procedures. Technology would be able to address many of the problems of assessing science, but as I’ve complained before, England seems singularly disinterested in moving forward with this.

Still, you’d expect examples to at least demonstrate what they mean teachers to understand by the term ‘valid’. Unfortunately they include some which blatantly don’t. Of course it’s always easy to nit-pick details, but an example, from the guidance, of exactly not assessing what you think you are assessing is, ‘I can prove air exists’ (now there’s a fine can of worms!) which should result from an assessment on being able to prove something about air, not the actual assessment criterion ‘to know air exists’ (really? In Year 5?).

1. Ongoing formative assessment

This is all about pupil and peer assessment and also full of some discomforting old ideas and lingering catch phrases. I admit, I’ve never been keen on WALTs or WILFs and their ilk. I prefer to be explicit with my expectations and for the pupils to develop a genuine understanding of what they are doing rather than cultivate ritualised, knee-jerk operations. Whilst I concede that this model focusses on assessment, it’s not very evident where the actual teaching takes place. Maybe it is intended to be implied that it has already happened, but my concern is that this would not be obvious to many teachers. The guidance suggests, instead, that teachers ‘provide opportunities’, involve pupils in discussions’, ‘study products’, ‘adapt their pace’ and ‘give feedback’. I would have liked to see something along the lines of ‘pick up on misconceptions and gaps in knowledge and then teach.’

Most disheartening, is to see the persistence of ideas and rituals to do with peer assessment. Whilst peer assessment has come under some scrutiny recently for possibly not being as useful as it has been claimed, I think it does have a place, but only with some provisos. In my experience, the most useful feedback comes not when we insist that it’s reduced to a basic format (tick a box, etc.) but when pupils can genuinely offer a thoughtful contribution. As such, it has to be monitored for misinformation; the pupils have to be trained to understand that their peers might be wrong and this takes time. After fighting hard against mindless practices such as ‘two stars and a wish’, my heart sinks to find it yet again enshrined in something that is intended for primary teachers across the country.

2. Monitoring pupil progress

In this layer, we move from the daily activities which are considered part of ongoing, formative assessment, to the expectation that teachers are now to use something to monitor ‘progress’. This involves considerable sleight of hand and I would have to caution teachers and leadership to assume that they can just do the things in the boxes. Let’s see:

TEACHERS BASE THEIR SUMMATIVE JUDGEMENTS OF PUPILS’ LEARNING ON A RANGE OF TYPES OF ACTIVITY

When? To get a good range, it would have to start early in the year, particularly if it includes all the science coverage from the curriculum. In that case, summative judgements are not reliable, because the pupils should have progressed by the end of the year. If it takes place at the end of the year, do we include the work from the earlier part of the year? Do we ignore the areas covered up to February? If we don’t, do we have time to look at a range of types of activity in relation to everything they should have learned? Neither ongoing work, nor teacher observation, are reliable or fair if we need this to be used for actual comparative data.

TEACHERS TAKE PART IN MODERATION/DISCUSSION WITH EACH OTHER OF PUPILS’ WORK IN ORDER TO ALIGN JUDGEMENTS

Oh how I despise the panacea of moderation! This is supposed to reduce threats to reliability and I’m constantly calling it out in that regard. Here they state:

“Staff confidence in levelling is supported by regular moderation. The subject leader set up a series of 10 minute
science moderation slots which take place within staff meetings across the year. Each slot consists of one class
teacher bringing along some samples of work, which could be children’s writing, drawings or speech, and the staff agreeing a level for each piece. This led to lengthy discussions at first, but the process became quicker as staff developed knowledge of what to look for.”

Where to begin? Staff confidence does not mean increased reliability. All it does is reinforce group beliefs. 10 minute slots within staff meetings are unrealistic expectations, both in perceiving how long moderation takes and in the expectation that science will be given any slots at all. Whatever staff ‘agree’, it can not be considered reliable: a few samples of work are insufficient to agree anything; the staff may not have either science or assessment expertise to be qualified to make the judgement; more overtly confident members of staff may influence others and there may be collective misunderstanding of the criteria or attainment; carrying out a 10 minute moderation for one pupil in one aspect of science does not translate to all the other pupils in all the aspects of science we are expected to assess. It might also have been a good idea to vet this document for mention of levels, given that it was brought out to address their removal.

3.Summative reporting

A MANAGEABLE SYSTEM FOR RECORD-KEEPING IS IN OPERATION TO TRACK AND REPORT ON PUPILS’ LEARNING IN SCIENCE

I just want to laugh at this. I have some systems for record-keeping which in themselves are quite manageable, once we have some real data. Where we have testable information, for example, factual knowledge, they might also mean something, but as most of us will know, they quickly become a token gesture simply because they are not manageable. Very quickly, records become ‘rule of thumb’ exercises, simply because teachers do not have the time to gather sufficient evidence to back up every statement. I note that one of the examples in the guide is the use of the old APP rubric which is no longer relevant to the new curriculum. We made the best of this in our school in a way that I devised to try to be as sure of the level as was possible, but even then, we knew that our observations were best guesses. The recording system is only as good as the information which is entered, despite a widespread misconception that records and assessment are the same thing! I’m no longer surprised, although still dismayed, at the number of people that believe the statistics generated by the system.

I didn’t intend this to be a balanced analysis – I’d welcome other perspectives – and I apologise to all involved for my negativity, but we’re clearly still a long way from a satisfactory system of assessing primary science. The model can not work unless we don’t care about reliability, validity or manageability. But in that case, we need no model. If we want a fair assessment of primary science, with data on pupils’ attainment and progress that we feel is dependable, then we need something else. In my view, it only begins to be attainable if we make creative use of technology. Otherwise, perhaps we have been led on a wild goose chase, pursuing something that may be neither desirable, nor achievable. Some aspects of science are amenable to testing, as they were in the SATs. I conceded to the arguments that these were inadequate in assessing the whole of science, particularly the important parts of enquiry and practical skills, but I don’t believe anything we’ve been presented with has been adequate either. Additionally, the loss of science status was not a reasonable pay-off. To be workable, assessment systems have to be as simple and sustainable as possible. Until we can address that, if we have to have tracking data (and that’s highly questionable), perhaps we should consider returning to testing to assess science knowledge and forget trying to obtain reliable data on performance and skills – descriptive reporting on these aspects may have to be sufficient for now.

Primary Science Assessment – not Even Close, Yet.

Assessment of primary science is something of a bugbear of mine. While I consider the so-called ‘formative assessment’ (it should never have been called ‘assessment’) to be no more or less of a challenge than the other core subjects, summative assessment of science is different. There is a multitude of research papers and writings on just how difficult it is to assess it properly for any type of measurement, particularly to track progress and for accountability purposes. In the UK the decline of science since the demise of the KS2 SATs test, has passed into legend. Check out OFSTED’s Maintaining Curiosity, for an official account of just how dire the situation is. It’s now been six years, since that event, however, and the protagonists in the world of science education and assessment have pretty much failed to come up with anything manageable and reliable. I’m not surprised; I think the job is almost impossible. However, I am surprised that they continue to try to fool themselves into thinking that it isn’t. Examples of advice from the most authoritative of sources are here and here and I’m very appreciative of their efforts, but I look at these and my heart sinks. I can’t imagine these ideas being put into effective practice in real primary schools.

When I was pushing to try and influence the protagonists, before they finished their projects and put their suggestions out to teachers, I compiled a list of questions which I felt needed to be addressed in thinking about assessment in primary science. I see very little to give me hope that these have been addressed. My main concern is that there is a persistent belief in the ‘magic’ of teacher assessment and moderation, serving a high-stakes purpose.

Formative/summative
  • Should we really be dissolving the formative/summative divide?
    • I have seen much confusion amongst teachers as to the purposes of assessment and they often conflate summative and formative, unwittingly, to the detriment of both.
  • Isn’t there more clarity needed on just how assessment can be made to serve different purposes?
    • Isn’t there a fair amount of debate about this in the literature?
  • How do we avoid serving neither very well?
  • How do we use formative information for summative purposes when this is often information gained in the early stages of learning and therefore not fair to pupils who may have progressed since its capture?
  • If summative assessments are to be used for high stakes purposes, how do we ensure that summarised, formative information really quantifies attainment and progress?
  • How can we avoid teachers always assessing instead of teaching?
Teacher assessment
  • Can we really resolve the issue of unreliability of teacher assessment when used in high-stakes settings?
  • Is it fair to expect teachers to carry out teacher assessments when they are directly impacted by the outcome of those assessments?
  • How do we make teacher assessment fair to all the children in the country if it is not standardised? – How do we avoid a ‘pot-luck’ effect for our pupils?
  • Have we really addressed the difficulty of assessing science as a multi-faceted subject?
  • How can we streamline this process?
  • How can we make sure it doesn’t feel to teachers as though they would be assessing science all the time?
Moderation and reliability
  • Are researchers assuming that moderation is a simple and effective ‘catch all’ to achieve reliability?
  • Do researchers know that this often feels like something that is done ‘to’ teachers and not part of a collaborative process?
    • This is a fraught process in many schools. It takes up an enormous amount of time and can be very emotional if judgements are being made and if there are disagreements. Moderation helps to moderate extremes, but can also lead groups in the wrong direction.
  • Will schools be able to give over the time required to adequately moderate science?
  • Is there really a good evidence base for the effectiveness of moderation on reliability?
  • Do we need to clarify the exact process of moderation?
  • Is ‘reliable’ something that is actually achievable by any assessment system? Should we not be talking about maximising rather than achieving reliability?

Is it really necessary to keep inventing our own wheels?

In other arenas I’ve written extensively on the loss of status of primary science in British schools following the removal of the end of Key Stage written assessments in 2009. In their latest report, ‘Maintaining Curiosity‘, Ofsted point out the detrimental impact this appears to have, if you take International Standards (PISA) as a measure. In my discussions with primary school subject leaders for Science, it’s clear that Science is the ‘poor relation’ in the Core Subjects. How much is it tracked and monitored by senior management?

The difficulty has always been that science is really impossible to assess properly. It is a multi-dimensional complex subject with a host of different aspects and a strongly practical component. If you believe you are reliably assessing your pupils in science, you are wrong. I have yet to be blown away by any of the models of science assessment during my research in schools in the UK and around the world but I am convinced that software could help in so many ways. Unfortunately there seems to be no motivation to marry the educational world with the world of computer simulations, game software and interactive APPs.

Currently, all I’m looking for is a responsive rubric for the new curriculum – read the statement, click on it, and the software does the boring part in order for the teacher to be able to build up a written summary and to analyse the data statistically for classes, groups, individuals etc. I’m convinced an ‘IT expert’ could knock that out in an hour. It’s taking me a lot longer.