Assessment of primary science is something of a bugbear of mine. While I consider the so-called ‘formative assessment’ (it should never have been called ‘assessment’) to be no more or less of a challenge than the other core subjects, summative assessment of science is different. There is a multitude of research papers and writings on just how difficult it is to assess it properly for any type of measurement, particularly to track progress and for accountability purposes. In the UK the decline of science since the demise of the KS2 SATs test, has passed into legend. Check out OFSTED’s Maintaining Curiosity, for an official account of just how dire the situation is. It’s now been six years, since that event, however, and the protagonists in the world of science education and assessment have pretty much failed to come up with anything manageable and reliable. I’m not surprised; I think the job is almost impossible. However, I am surprised that they continue to try to fool themselves into thinking that it isn’t. Examples of advice from the most authoritative of sources are here and here and I’m very appreciative of their efforts, but I look at these and my heart sinks. I can’t imagine these ideas being put into effective practice in real primary schools.
When I was pushing to try and influence the protagonists, before they finished their projects and put their suggestions out to teachers, I compiled a list of questions which I felt needed to be addressed in thinking about assessment in primary science. I see very little to give me hope that these have been addressed. My main concern is that there is a persistent belief in the ‘magic’ of teacher assessment and moderation, serving a high-stakes purpose.
-
Should we really be dissolving the formative/summative divide?
-
I have seen much confusion amongst teachers as to the purposes of assessment and they often conflate summative and formative, unwittingly, to the detriment of both.
-
-
Isn’t there more clarity needed on just how assessment can be made to serve different purposes?
-
Isn’t there a fair amount of debate about this in the literature?
-
-
How do we avoid serving neither very well?
-
How do we use formative information for summative purposes when this is often information gained in the early stages of learning and therefore not fair to pupils who may have progressed since its capture?
-
If summative assessments are to be used for high stakes purposes, how do we ensure that summarised, formative information really quantifies attainment and progress?
-
How can we avoid teachers always assessing instead of teaching?
-
Can we really resolve the issue of unreliability of teacher assessment when used in high-stakes settings?
-
Is it fair to expect teachers to carry out teacher assessments when they are directly impacted by the outcome of those assessments?
-
How do we make teacher assessment fair to all the children in the country if it is not standardised? – How do we avoid a ‘pot-luck’ effect for our pupils?
-
Have we really addressed the difficulty of assessing science as a multi-faceted subject?
-
How can we streamline this process?
-
How can we make sure it doesn’t feel to teachers as though they would be assessing science all the time?
-
Are researchers assuming that moderation is a simple and effective ‘catch all’ to achieve reliability?
-
Do researchers know that this often feels like something that is done ‘to’ teachers and not part of a collaborative process?
-
This is a fraught process in many schools. It takes up an enormous amount of time and can be very emotional if judgements are being made and if there are disagreements. Moderation helps to moderate extremes, but can also lead groups in the wrong direction.
-
-
Will schools be able to give over the time required to adequately moderate science?
-
Is there really a good evidence base for the effectiveness of moderation on reliability?
-
Do we need to clarify the exact process of moderation?
-
Is ‘reliable’ something that is actually achievable by any assessment system? Should we not be talking about maximising rather than achieving reliability?