Testing their resilience

After nearly 3 decades of primary teaching, you’d think I’d have a secure philosophy, but we’re currently caught in the variable winds of different approaches, sometimes considered more traditional or more progressive, and often seeming to be contradictory.  I can’t follow the apparent narrow path of either, but I do question myself on what mine is. If anything, it’s that education is paramount – and by that I don’t mean ‘learning’ and I definitely don’t mean ‘attainment’ as I have blogged about here, before.

I tried to give my class an analogy last week. I drew it as a bowl into which different pieces of knowledge were put. There’s no predicting what will be useful knowledge, but what is certain is that the more there is and the wider the range, the better equipped they will be to link it together in an effective way. This bears no relationship to fixed notions of attainment, cleverness, ability, SEND, class or all the other supposed divisions this country likes to impose on its citizens. Anybody can add to their bowl of knowledge at any time.

With that in mind, I want to ride rough-shod over anything I see as an impediment to the education of each and every human being. It’s a global goal for good reason and it’s not difficult to think of examples where the high-attaining, poorly educated have had a negative impact on all of us – on the planet.

I try hard to explain to my pupils that the important thing is ‘knowing stuff’. They think it’s important to be ‘good at stuff’ and increasingly, I’m stunned by some of the reactions and self-denigration I’m seeing: a small error causes a child to throw their book on the floor; not immediately ‘getting’ long division makes another child throw his hands up and begin to weep; feeling like something is hard, stops one from ever tackling a new area. Simply talking ‘growth-mindset’ has no impact.

Is it really all the high-stakes testing culture that we’re in? Though I totally agree with the arguments against it, I struggle to accept that this is the issue completely – mainly because there has always been high-stakes testing, since I’ve been teaching, but this feels like a new phenomenon. I don’t think I’ve yet encountered pupils with such a fragile sense of their own ability to overcome small set-backs, no matter how much I tell them that learning happens when we put right what we got wrong.

Recent whole-staff CPD seems to pull us in opposite directions. In some, we’re to challenge pupils to step outside their comfort zone. In others, the stress of our system is damaging their mental health. It’s difficult to know when to persevere and when to back off. I’ve always found it hard to do the latter. I’m a fan of tests, for example, even when they’re not embraced by every child in the class. In the past, they were tested regularly on old SATs papers, without any of the symptoms I’m seeing recently. Tests help us establish what we can remember and identify what we don’t know. Furthermore they aid memory. This is now common knowledge, though it wasn’t six years ago when I first came across Roediger‘s work.

This year, a SATs-style arithmetic test caused Ferdy* to cry and put his head on the desk when we marked them. The second one caused him to wail when it was announced. He was furious with himself again when we marked them. He wasn’t ‘doing badly’ he just wasn’t perfect. It was nearly impossible to help him with the misconceptions that we could identify – it was like he thought it was magic and he didn’t have the special ability. At this point it seemed like cruelty to force the poor lad to go through more. I was conscious however, that it’s not actually torture – it’s just maths. I feel pretty sure that giving up and giving in – avoidance – reinforces the negativity and doesn’t help Ferdy in the long run. So I persevered and gave them all another, two weeks later, and then another. Ferdy has pretty much sorted out every misconception that was revealed in the first test. He correctly calculated all the long divisions – his nemesis. So it paid off. Ferdy’s sense of his ability to overcome obstacles is strengthened and he’s very pleased with himself. It’s a bit of a relief to me, too.

 

*He’s not really called Ferdy.

 

Advertisements

Assessment for Accountability – Taking the Biscuit

Thinking about assessment and accountability again. I adapted this from a letter I wrote to the then Ed Select Committee.

The problem of accountability

If we take it to be the case that teachers and schools need to be ‘held to account’, then we need to ask ourselves some questions.

Held to account for what?

The answer to this is  crucial. For a long time we were held to account for pupil ‘attainment’. Recently there has been the reasonable suggestion that there are many factors outside of our control which impact on attainment, and that progress might be a better measure of how good or bad we are. The measurement of progress nevertheless, remains a massive challenge, in spite of attempts to contextualise it, or to use national trends for comparison: baseline data is not reliable (it’s ludicrous to believe that you can use the behaviour of 4-year-olds to derive data that will hold teachers to account at the end of KS2 – and beyond!); pupils do not make standard amounts of progress; domains at the start and end of the progress measurements are different (GCSE art teachers, beware!); cohorts are different; significance is difficult with small groups, etc.

Whilst ‘progress’ seems at first fairer and a preferable measure to ‘attainment’, neither is sufficient for our purposes – not the way currently measured and not when matched against the aspirations of the National Curriculum.

Does the current system even serve the right purpose?

In the drive to measure ‘attainment’ and ‘progress’, I think we sometimes forget that we are using these things as proxies for the quality of the education being provided. We need to return to the drawing board for how we might ensure this happens. Currently we use an assessment system that cannot do this; the measurement is too narrow, subject to chance variables, and too much driven by fear of failure, leading to all the perverse incentives the assessment experts have been writing about for so many decades. A quality primary education is not ensured by testing the very few items that are currently measured at the end of KS2, any more than a quality factory is ensured by eating one of its biscuits.

I think schools and teachers do want to provide a good education to their pupils and that a climate of fear is unnecessary and counterproductive. Real accountability must involve a move away from looking only at outcomes and focus instead on quality input: we need well-educated teachers with excellent and maintained subject knowledge, quality text books produced by experts and thoroughly vetted by the profession, online materials and required reading, use of evidence and avoidance of fads. Quality training is essential, as is development and retention of teachers with expertise.

How can we ensure accountability where it counts?

Realistically, I don’t expect a quick move away from summative assessments for the purpose of accountability, in spite of all the arguments against. But I feel that we could address some of the issues that arise with the current system by generating and providing (to the DfE if need be) not less but more information:

  • Frequent, low-stakes tests help both teaching and learning – require/provide tests throughout the year, every year.
  • Fine grained, specific tests provide useful information – test what we want to know about – keep that data.
  • Assessing the same domain more than once and in different ways helps to reduce unreliability – do not rely on one single end of year test.
  • Testing earlier in the cycle gives useful feedback for teaching – do not wait until the end of the year or the end of the Key Stage.
  • Random selection from a broad range of criteria helps to reduce ‘teaching to the test’ – test knowledge in all curriculum areas without publishing a narrow list of criteria.
  • Use assessment experts and design assessments that test what we want pupils to know or do. Criteria need to be reasonable – not obscure and mystical as they have been recently.

If these aspects were applied to an assessment system throughout the primary phase, I believe we could enhance learning, improve accountability in what really matters and provide vast amounts of data.

We really need to make better use of technology at all stages; this is the only way in which we can feasibly make assessment serve multiple purposes. There would need to be a move away from the high stakes pass/fail system which is not fit for purpose, towards a timely monitoring and feedback system that could alert all stakeholders to issues and provide useful tools for intervention. Data collected from continuous low-stakes assessments provides a far more valid picture of teaching and learning.

Ed Select Committee report – improvements to come?

The Education Select Committee has published its report into the impact of the changes to primary assessment. It’s been an interesting journey from the point at which I submitted written evidence on primary assessment; I wrote a blog back in October, where I doubted there would be much response, but in fact I was wrong. Not only did they seem to draw widely from practioners, stake-holders and experts to give evidence, the report actually suggests that they might have listened quite well, and more to the point, understood the gist of what we were all trying to say. For anyone who had followed assessment research, most of this is nothing new. Similar things have been said for decades. Nevertheless, it’s gratifying to have some airing of the issues at this level.

Summative and formative assessment

The introduction to the report clarifies that the issues being tackled relate to summative assessment and not the ongoing process of formative assessment carried out by teachers. For me, this is a crucial point, since I have been trying, with some difficulty sometimes, to explain to teachers that the two purposes should not be confused. This is important because the original report on assessment without levels suggested that schools had ‘carte blanche’ to create their own systems. Whilst it also emphasised that purposes needed to be clear, many school systems were either extensions of formative assessment that failed to grasp the implications and the requirements of summative purposes, or they were clumsy attempts to create tracking systems based on data that really had not been derived from reliable assessment!

Implementation and design

The report is critical of the time-scale and the numerous mistakes made in the administration of the assessments. They were particularly critical of the STA, which was seen to be chaotic and insufficiently independent. Furthermore, they criticise Ofqual for lack of quality control, in spite of Ofqual’s own protestations that they had scrutinised the materials. The report recommends an independent panel to review the process in future.

This finding is pretty damning. This is not some tin-pot state setting up its first exams – how is incompetence becoming normal? In a climate of anti-expertise, I suppose it is to be expected, but it will be very interesting to see if the recommendations have any effect in this area.

The Reading Test

The report took on board the wide-spread criticism of the 2016 Reading Test. The STA defense was that it had been properly trialled and performed as expected. Nevertheless, the good news (possibly) is that the Department has supposedly “considered how this year’s test experience could be improved for pupils”. 

Well we shall see on Monday! I really hope they manage to produce something that most pupils will at least find vaguely interesting to read. The 2016 paper was certainly the least well-received of all the practice papers we did this year.

Writing and teacher assessment

Teacher assessment of writing emerged as something that divided opinion. On the one hand there were quotes from heads who suggested that ‘teachers should be trusted’ to assess writing. My view is that they miss the point and I was very happy to be quoted alongside Tim Oates, as having deep reservations about teacher assessment. I’ve frequently argued against it for several reasons (even when moderation is involved) and I believe that those who propose it may be confusing the different purposes of assessment, or fail to see how it’s not about ‘trust’ but about fairness to all pupils and an unacceptable burden on teachers.

What is good to see, though, is how the Committee have responded to our suggested alternatives. Many of us referred to ‘Comparative Judgement’ as a possible way forward. The potential of comparative judgement as an assessment method is not new, but is gaining credibility and may offer some solutions – I’m glad to see it given space in the report. Something is certainly needed, as the way we currently assess writing is really not fit for purpose. At the very least, it seems we may return to a ‘best-fit’ model for the time being.

For more on Comparative Judgment, see:

Michael Tidd  The potential of Comparative Judgement in primary

Daisy Christodoulou Comparative judgment: 21st century assessment

No More Marking

David Didau  10 Misconceptions about Comparative Judgement

Support for schools

The report found that the changes were made without proper training or support. I think this is something of an understatement. Systems were changed radically without anything concrete to replace them. Schools were left to devise their own systems and it’s difficult to see how anyone could not have foreseen that this would be inconsistent and often  inappropriate. As I said in the enquiry, there are thousands of primary schools finding thousands of different solutions. How can that be an effective national strategy, particularly as, by their own admission, schools lacked assessment expertise? Apparently some schools adopted commercial packages which were deemed ‘low quality’. This, too, is not a surprise. I know that there are teachers and head-teachers who strongly support the notion of ‘doing their own thing’, but I disagree with this idea and have referred to it in the past as the ‘pot-luck’ approach. There will be ways of doing things that are better than others. What we need to do is to make sure that we are trying to implement the most effective methods and not leaving it to the whim of individuals. Several times, Michael Tidd has repeated that we were offered an ‘item bank’ to help teachers with ongoing assessment. The report reiterates this, but I don’t suggest we hold our collective breath.

High-stakes impact and accountability

I’m sure the members of the Assessment Reform Group, and other researchers of the 20th century, would be gratified to know that this far down the line we’re still needing to point out the counter-productive nature of high-stakes assessment for accountability! Nevertheless, it’s good to see it re-emphasised in no uncertain terms and the report is very clear about the impact on well-being and on the curriculum. I’m not sure that their recommendation that OFSTED broadens its focus (again), particularly including science as a core subject, is going to help. OFSTED has already reported on the parlous state of science in the curriculum, but the subject has continued to lose status since 2009. This is as a direct result of the assessment of the other subjects. What is assessed for accountability has status. What is not, does not. The ASE argues (and I totally understand why) that science was impoverished by the test at the end of the year. Nevertheless, science has been impoverished far more, subsequently, in spite of sporadic ‘success stories’ from some schools. This is a matter of record. (pdf). Teacher assessment of science for any kind of reliable purpose is even more fraught with difficulties than the assessment of writing. The farce, last year, was schools trying to decide if they really were going to give credence to the myth that their pupils had ‘mastered’ all 24 of the objectives or whether they were going to ‘fail’ them. Added to this is the ongoing irony that primary science is still ‘sampled’ using an old-fashioned conventional test. Our inadequacy in assessing science is an area that is generally ignored or, to my great annoyance, completely unappreciated by bright-eyed believers who offer ‘simple’ solutions. I’ve suggested that complex subjects like science can only be adequately assessed using more sophisticated technology, but edtech has stalled in the UK and so I hold out little hope for developments in primary school!

When I think back to my comments to the enquiry, I wish I could have made myself clearer in some ways. I said that if we want assessment to enhance our pupils’ education then what we currently have is not serving that purpose. At the time, we were told that if we wished to further comment on the problem of accountability, then we could write to the Committee, which I did. The constant argument has always been ‘…but we need teachers to be accountable.’ I argued that they need to be accountable for the right things and that a single yearly sample of small populations in test conditions, did not ensure this. This was repeated by so many of those who wrote evidence for the Committee, that it was obviously hard to ignore. The following extract from their recommendations is probably the key statement from the entire process. If something changes as a result of this, there might be a positive outcome after all.

Many of the negative effects of assessment are in fact caused by the use of results
in the accountability system rather than the assessment system itself. Key Stage 2
results are used to hold schools to account at a system level, to parents, by Ofsted, and results are linked to teachers’ pay and performance. We recognise the importance of holding schools to account but this high-stakes system does not improve teaching and learning at primary school. (my bold)

Perverse incentives are real

I’ve just spent a few pleasurable hours looking at the science writing from my y6 class. I say pleasurable, because they’re very good writers this year (thanks Mr M in y5!), but also because there were elements of their writing that hinted at an education. Some children had picked up on, and correctly reinterpreted, the higher level information I had given in reply to their questions on the chemistry of the investigation. All of them had made links with ‘the real world’ following the discussions we’d had.

It all sounds good doesn’t it?

The sad truth is that in spite of the fact that I’m an advocate of education not attainment, the knowledge of what will and will not form part of the end of year measurement is still there, influencing my decisions and having a detrimental impact on my education of the children.

This is because while I am marking their work, I am making decisions about feedback and whether  to follow up misconceptions, or take understanding further. Let’s remember that this is science. Although I personally view its study as crucial, and its neglect  as the source of most of the world’s ills, it has nevertheless lost its status in the primary curriculum. So my thoughts are, ‘Why bother? This understanding will not form part of any final assessment and no measurement of this will be used to judge the effectiveness of my teaching, nor of the school’. Since this is true for science, still nominally a ‘core subject’, how much more so for the non-entities of art, music, DT, etc.? Is there any point in pursuing any of these subjects in primary school in an educational manner?

The argument, of course, is that we have an ethical responsibility as educators to educate. That teachers worth their salt should not be unduly swayed by the knowledge that a narrow set of criteria for a small population of pupils are used at the end of KS2 to judge our success or failure. It reminds me of the argument that senior leaders shouldn’t do things just for OFSTED. It’s an unreasonable argument. It’s like saying to the donkeys, ‘Here’s a carrot and a very big stick, but just act as you would if they weren’t there!’

I’m not in favour of scrapping tests and I’m no fan of teacher assessment, but it’s undeniable, that what I teach is influenced by the KS2 SATs and not all in a good way. The primary  curriculum is vast. The attainment tests are narrow. It also brings into question all research based on using attainment data as a measure of success. Of course it’s true that the things they measure are important – they may even indicate something – but there are a lot of things which aren’t measured which may indicate a whole lot of other things.

I can’t see how we can value a proper primary education – how we can allow the pursuit of further understanding – if we set such tight boundaries on how we measure it. Testing is fine – but if it doesn’t measure what we value then we’ll only value what it measures. I’m resistant to that fact, but I’m not immune. I’m sure I’m no different to every other primary teacher out there. Our assessment system has to change so that we can feel fine about educating our pupils and not think we’re wasting our time if we pursue an area that doesn’t count towards a final mark.

 

 

 

Primary assessment is more than a fiasco – it’s completely wrong

I’ve written my submission to the Education Committee’s inquiry on primary assessment for what it’s worth. I can’t imagine that they’re interested in what we have to say, given that this government have ignored just about all the expert advice they’ve ever received or requested on nearly everything else. This country has ‘had enough of experts’ after all.

I won’t paste my submission here – there are various restrictions on publishing them elsewhere, it seems. However it’s a good time to get some thoughts off my chest. Primary assessment (and school-based assessment generally) has all gone a bit wrong. OK, a lot wrong. It’s so wrong that it’s actually very damaging. Conspiracy theorists might have good cause to think it is deliberate; my own cynicism is that it is underpinned by a string of incompetencies and a distinct failure to listen at all to any advice.

In thinking about why it has all gone wrong, I want to pose a possibly contentious question: is the attainment we are attempting to measure, a thing that should dominate all educational efforts and discourse? I’ve written before about my growing doubts about the over-emphasis on attainment and how I think it detracts from the deeper issue of education. The further we get down this line, particularly with the current nonsense about bringing back selective education, the more this crystalises for me. Just to be clear, this is not an anti-intellectual stance, nor a woolly, liberal dumbing-down view. I fully embrace the idea that we should not put a ceiling on all kinds of achievement for everybody. Having a goal and working towards it – having a way of demonstrating what you have achieved – that’s an admirable thing. What I find ridiculous is that the kind of attainment that is obsessing the nation, doesn’t actually mean very much and yet somehow we are all party to serving its ends. Put it this way – tiny fluctuations in scores in a set of very narrow domains make headlines for pupils, teachers, schools, counties etc. Every year we sweat over the %. If there’s a rise above the ‘expectation’ we breathe a sigh of relief. If, heaven forbid, we had a difficult cohort and a couple of boxes are in the ‘blue zone’ we dread the repercussions because now we’re no longer an outstanding school. But, as Jack Marwood writes here, there’s no pattern. We’ve even begun to worry about whether we’re going to be labelled a ‘coasting school’! Good should be good enough because the hysteria over these measures is sucking the life out of the most important resource – us. Of course the inspectorate needs to be on the lookout for actually bad schools. Are these really going to be so difficult to spot? Is it really the school that was well above average in 2014 and 15 but dipped in 16? Is the child who scores 99 on the scaled score so much more of a failure than the one who scored 101? Is our group of 4 pupil premium children getting well above average, in a small set of tests, an endorsement of our good teaching compared to another school’s 4 getting well below?

Attainment has become an arms race and teachers, pupils and parents are caught in the crossfire. In spite of the ‘assessment without levels’ rhetoric, all our accountability processes are driven by a focus on an attainment with one level. This is incredibly destructive in my experience. Notwithstanding those self-proclaimed paragons of good practice who claim that they’ve got the balance right etc., what I’ve mainly seen in schools are teachers at the end of their wits, wondering what on earth they can further do (what miracle of intervention they can concoct) to ‘boost’ a group of ‘under-performing’ children to get to ‘meeting’, whilst maintaining any kind of integrity with regard to the children who have never been anywhere near. I was recently told in a leadership meeting that all children should make the same amount of progress. Those ‘middle achievers’ should be able to progress at the same rate as the ‘high achievers’. It’s the opposite which is true. The high achievers are where they are exactly because they made quicker progress – but the ‘middle achievers’ (and any other category – good grief!) will also get to achieve, given time. And while all this talk of progress is on the table – let’s be honest – we’re talking about ‘attainment’ again: a measure taken from their KS2 assessments, aggregated, and compared to KS1 in a mystical algorithm.

It’s not like the issues surrounding assessment have never been considered. Just about all the pitfalls of the recent primary debacle have been written about endlessly, and frequently predicted. High-stakes testing has always been the villain of the piece: perverse incentives to teach to the test, narrowing of the curriculum, invalidity of testing domain, unreliability/bias/downright cheating etc. The problem is the issues won’t go away, because testing is the wrong villain. Testing is only the blunt tool to fashion the club of attainment with which to beat us (apologies for extended metaphor). I’m a big fan of testing. I read Roediger and Karpicke’s (pdf) research on ‘testing effect’ in the early days, long before it became a fashionable catch-phrase. I think we should test as many things in as many ways as we can: to enhance recall; to indicate understanding; to identify weaknesses; to demonstrate capacity; to achieve certification etc. I was all in favour of Nicky Morgan’s proposal to introduce an online tables test. What a great idea! Only – make it available all the time and don’t use the results against the pupil or the teacher. No – testing doesn’t cause the problem. It’s caused by the narrow, selective nature, the timing and the pressure of attaining an arbitrary ‘meeting expectations’ (one big level, post levels). The backwash on the curriculum is immense. Nothing has any status anymore: not art, not music, not D&T, not history nor geography, and certainly not science – that ‘core subject’ of yore! Some might argue that it’s because they’re not tested, and of course, I agree up to a point, but the real issue is that they’re not seen as being important in terms of attainment.

I shall add a comment here on teacher assessment, just because it continues to drag on in primary assessment like some old ghost that refuses to stop rattling its chains. If teacher assessment is finally exorcised, I will be particularly grateful. It is an iniquitous, corrupted sop to those who believe ‘teachers are best placed to make judgements about their own pupils’. Of course they are – in the day to day running of their class and in the teaching of lessons – but teacher assessment should not be used in any way to measure attainment. I am not arguing that teachers are biased, that they make mistakes or inflate or deflate their assessments. I am arguing that there is simply no common yardstick and so these cannot be considered reliable. The ‘moderated’ writing debacle of 2016 should have put that fact squarely on the table for all doubters to see. Primary assessments are used in accountability. How can we expect teachers to make judgements that could be used against them in appraisal and in pay reviews?

I’m an idealist in education. I think that it has a purpose beyond the establishment of social groups for different purposes (leadership, administrative work, manual labour). I don’t think that it is best served by a focus on a narrow set of objectives and an over-zealous accountability practice based on dubious variations in attainment. I tried to sum up my proposals for the Education Committe, and I will try to sum up my summing up:

  • Stop using small variations in flawed attainment measures for accountability
  • Give us fine-grained, useful but low-stakes testing, for all (use technology)
  • If we have to measure, get rid of teacher assessment and give us lots of common, standardised tools throughout the primary phase
  • Give us all the same technology for tracking the above (how many thousands of teacher hours have been spent on this?)
  • If you have to have end of stage tests, listen to the advice of the experts and employ some experts in test design – the 2016 tests were simply awful
  • Include science
  • Be unequivocal in the purposes of assessment and let everybody know

I didn’t say ‘get rid of the end of key stage assessments altogether and let us focus again on educating our pupils’. Maybe I should have.

 

 

 

Got the T-shirt (a moderate tale)

Given that teacher assessment is a nonsense which lacks reliability, and that moderation can not really reduce this, nor ensure that gradings are comparable, our moderation experience was about as good as it could be! It was thus:

Each of we two Y6 teachers submitted all our assessments and three children in each category (more ridiculous, inconsistent and confusable codes, here), of which one each was selected, plus another two from each category at random. So, nine children from each class. We were told who these nine were a day in advance. Had we wanted to titivate, we could have, but with our ‘system’ it really wasn’t necessary.

The ‘system’ was basically making use of the interim statements and assigning each one of them a number. Marking since April has involved annotating each piece of work with these numbers, to indicate each criterion. It was far less onerous than it sounds and was surprisingly effective in terms of formative assessment. I shall probably use something similar in the future, even if not required to present evidence.

The moderator arrived this morning and gave us time to settle our classes whilst she generally perused our books. I had been skeptical. I posted on twitter that though a moderator would have authority, I doubted they’d have more expertise. I was concerned about arguing points of grammar and assessment. I was wrong. We could hardly have asked for a better moderator. She knew her stuff. She was a y6 teacher. We had a common understanding of the grammar and the statements. She’d made it her business to sample moderation events as widely as possible and therefore had had the opportunity to see many examples of written work from a wide range of schools. She appreciated our system and the fact that all our written work from April had been done in one book.

Discussions and examination of the evidence, by and large led to an agreed assessment. One was raised from working towards; one, who I had tentatively put forward as ‘greater depth’, but only recently, was agreed to have not quite made it. The other 16 went through as previously assessed, along with all the others in the year group. Overall my colleague and I were deemed to know what we were doing! We ought to, but a) the county moderation experience unsettled us and fed my ever-ready cynicism about the whole business and b) I know that it’s easy to be lulled into a false belief that what we’ve agreed is actually the ‘truth’ about where these pupils are at. All we can say is that we roughly agreed between the three of us. The limited nature of the current criteria makes this an easier task than the old levels, (we still referred to the old levels!) but the error in the system makes it unusable for accountability or for future tracking. I’m most interested to see what the results of the writing assessment are this year – particularly in moderated v non-moderated schools. Whatever it is, it won’t be a reliable assessment but, unfortunately it will still be used (for good or ill) by senior leaders, and other agencies, to make judgements about teaching.

Nevertheless, I’m quite relieved the experience was a positive one and gratified and somewhat surprised to have spent the day with someone with sense and expertise. How was it for you?

 

 

 

 

Trialling moderation

A quick one today to cover the ‘trialling moderation’ session this afternoon.

We had to bring all the documents and some samples of pupils’ writing, as expected.

Moderators introduced themselves. They seemed to be mainly Y6 teachers who also were subject leaders for English. Some had moderated before, but obviously not for the new standards.

The ‘feel’ from the introduction to the session was that it wasn’t as big a problem as we had all been making it out to be. We were definitely using the interim statements and that ‘meeting’ was indeed equivalent to a 4b.

At my table, we expressed our distrust of this idea and our fear that very few of our pupils would meet expected standards. Work from the first pupil was shared and the criteria ticked off. We looked at about 3 pieces of work. It came out as ‘meeting’ even though I felt it was comparable to the exemplar, ‘Alex’. The second pupil from the next school was ‘nearly exceeding’. I wasn’t convinced. There were lots of extended pieces in beautiful handwriting but sentence structures were rather unsophisticated. There was arguably a lack of variety in the range and position of clauses and transitional phrases. There was no evidence of writing for any other  curriculum area, such as science.

I put forward the work from a pupil I had previously thought  to be ‘meeting’ but had then begun to doubt. I wanted clarification. Formerly, I would have put this pupil at a 4a/5c with the need to improve consistency of punctuation. Our books were the only ones on our table (and others) that had evidence of writing across the curriculum; we moved a few years ago to putting all work in a ‘theme book’ (it has its pros and cons!).

Unfortunately the session was ultimately pretty frustrating as we didn’t get to agree on the attainment of my pupil; I was told that there needed to be evidence of the teaching process that had underpinned the writing that was evident in the books. That is to say, there should be the grammar exercises where we had taught such things as ‘fronted adverbials’ etc. and then the written pieces in which that learning was then evidenced. I challenged that and asked why we couldn’t just look at the writing as we had done for the first pupil. By then the session was pretty much over. In spite of the moderator’s attempt to finish the moderation for me, we didn’t. The last part of the session was given over to the session leader coming over and asking if we felt OK about everything, and my reply that no, I didn’t. I still didn’t know which of the multiplicity of messages to listen to and I hadn’t had my pupil’s work moderated. I had seen other pieces of work, but I didn’t trust the judgements that had been made.

The response was ‘what mixed messages?’ and the suggestion that it may take time for me to ‘get my head around it’ just like I must have had to do for the previous system. She seemed quite happy that the interim statements were broadly equivalent to a 4b and suggested that the government certainly wouldn’t want to see the data showing a drop in attainment. I suggested that if people were honest, that could be the only outcome.

My colleague didn’t fare much better. She deliberately brought samples from a pupil who fails to write much but when he does, it is accurate, stylish and mature. He had a range of pieces, but most of them were short. The moderator dismissed his work as insufficient evidence but did inform my colleague that she would expect to see the whole range of text types, including poetry because otherwise how would we show ‘figurative language and metaphor’?

I’m none the wiser but slightly more demoralised than before. One of my favourite writers from last year has almost given up writing altogether because he knows his dyslexia will prevent him from ‘meeting’. Judging the writing of pupils as effectively a pass or fail is heart-breaking. I know how much effort goes into their writing. I can see writers who have such a strong grasp of audience and style, missing the mark by just a few of the criteria. This is like being faced with a wall – if you cant get over it, stop bothering.

We are likely to be doing a lot of writing over the next few weeks.

 

Final report of the Commission on Assessment without Levels – a few things.

I’ve read the report and picked out some things. This is not a detailed analysis, but more of a selection of pieces relevant to me and anyone else interested in primary education and assessment:

Our consultations and discussions highlighted the extent to which teachers are subject to conflicting pressures: trying to make appropriate use of assessment as part of the day-today task of classroom teaching, while at the same time collecting assessment data which will be used in very high stakes evaluation of individual and institutional performance. These conflicted purposes too often affect adversely the fundamental aims of the curriculum,

Many of us have been arguing that for years.

the system has been so conditioned by levels that there is considerable challenge in moving away from them. We have been concerned by evidence that some schools are trying to recreate levels based on the new national curriculum.

Some schools are hanging on to them like tin cans in the apocalypse.

levels also came to be used for in-school assessment between key stages in order to monitor whether pupils were on track to achieve expected levels at the end of key stages. This distorted the purpose of in-school assessment,

Whose fault was that?

There are three main forms of assessment: in-school formative assessment, which is used by teachers to evaluate pupils’ knowledge and understanding on a day-today basis and to tailor teaching accordingly; in-school summative assessment, which enables schools to evaluate how much a pupil has learned at the end of a teaching period; and nationally standardised summative assessment,

Try explaining that to those who believe teacher assessment through the year can be used for summative purposes at the end of the year.

many teachers found data entry and data management in their school burdensome.

I love it, when it’s my own.

There is no intrinsic value in recording formative assessment;

More than that – it degrades the formative assessment itself.

the Commission recommends schools ask themselves what uses the assessments are intended to support, what the quality of the assessment information will be,

I don’t believe our trial system using FOCUS materials and assigning a score had much quality. It was too narrow and unreliable. We basically had to resort to levels to try to achieve some sort of reliability.

Schools should not seek to devise a system that they think inspectors will want to see;

!

Data should be provided to inspectors in the format that the school would ordinarily use to monitor the progress of its pupils

‘Ordinarily’ we used levels! This is why I think we need data based on internal summative assessments. I do not think we can just base it on a summative use of formative assessment information!

The Carter Review of Initial Teacher Training (ITT) identified assessment as the area of greatest weakness in current training programmes.

We should not expect staff (e.g. subject leaders) to devise assessment systems, without having had training in assessment.

The Commission recommends the establishment of a national item bank of assessment questions to be used both for formative assessment in the classroom, to help teachers evaluate understanding of a topic or concept, and for summative assessment, by enabling teachers to create bespoke tests for assessment at the end of a topic or teaching period.

But don’t hold your breath.

The Commission decided at the outset not to prescribe any particular model for in-school assessment. In the context of curriculum freedoms and increasing autonomy for schools, it would make no sense to prescribe any one model for assessment.

Which is where it ultimately is mistaken, since we are expected to be able to make comparisons across schools!

Schools should be free to develop an approach to assessment which aligns with their curriculum and works for their pupils and staff

We have a NATIONAL CURRICULUM!

Although levels were intended to define common standards of attainment, the level descriptors were open to interpretation. Different teachers could make different judgements

Well good grief! This is true of everything they’re expecting us to do in teacher assessment all the time.

Pupils compared themselves to others and often labelled themselves according to the level they were at. This encouraged pupils to adopt a mind-set of fixed ability, which was particularly damaging where pupils saw themselves at a lower level.

This is only going to be made worse, however, by the ‘meeting’ aspects of the new system.

Without levels, schools can use their own assessment systems to support more informative and productive conversations with pupils and parents. They can ensure their approaches to assessment enable pupils to take more responsibility for their achievements by encouraging pupils to reflect on their own progress, understand what their strengths are and identify what they need to do to improve.

Actually, that’s exactly what levels did do! However…

The Commission hopes that teachers will now build their confidence in using a range of formative assessment techniques as an integral part of their teaching, without the burden of unnecessary recording and tracking.

They hope?

Whilst summative tasks can be used for formative purposes, tasks that are designed to provide summative data will often not provide the best formative information. Formative assessment does not have to be carried out with the same test used for summative assessment, and can consist of many different and varied tasks and approaches. Similarly, formative assessments do not have to be measured using the same scale that is used for summative assessments.

OK – this is a key piece of information that is misunderstood by nearly everybody working within education.

However, the Commission strongly believes that a much greater focus on high quality formative assessment as an integral part of teaching and learning will have multiple benefits:

We need to make sure this is fully understood. We must avoid formalising what we think is ‘high quality formative assessment’ because that will become another burdensome and meaningless ritual. Don’t get me started on the Black Box!

The new national curriculum is founded on the principle that teachers should ensure pupils have a secure understanding of key ideas and concepts before moving onto the next phase of learning.

And they do mean 100% of the objectives.

The word mastery is increasingly appearing in assessment systems and in discussions about assessment. Unfortunately, it is used in a number of different ways and there is a risk of confusion if it is not clear which meaning is intended

By  leading politicians too. A common understanding of terms is rather important, don’t you think?

However, Ofsted does not expect to see any specific frequency, type or volume of marking and feedback;

OK, it’s been posted before, but it’s worth reiterating. Many SL and HTs are still fixated on marking.

On the other hand, standardised tests (such as those that produce a reading age) can offer very reliable and accurate information, whereas summative teacher assessment can be subject to bias.

Oh really? Then why haven’t we been given standardised tests and why is there still so much emphasis on TA?

Some types of assessment are capable of being used for more than one purpose. However, this may distort the results, such as where an assessment is used to monitor pupil performance, but is also used as evidence for staff performance management. School leaders should be careful to ensure that the primary purpose of assessment is not distorted by using it for multiple purposes.

I made this point years ago.

Unpicking just one tiny part of interim teacher assessment

We’ve been waiting, but not, I may say, with bated breath. There was no doubt in my mind that the descriptors would be less useful for measuring attainment than a freshly-caught eel. Let’s just look at Reading for KS2 and see how easy it would be to make judgements that would be fair across pupils, classes and schools.

The pupil can:
• read age-appropriate books with confidence and fluency (including whole novels)

  • which books are deemed age-appropriate?
  • define confidence
  • define fluency
  • novels?
  • compare it to the KS1 statement: read words accurately and fluently without overt sounding and blending, e.g. at over
    90 words per minute

• read aloud with intonation that shows understanding

  • intonation does not imply understanding. My best orator from last year, had no understanding of what he was reading so beautifully.

• work out the meaning of words from the context

  • how is this an end of ks2 requirement? This is what we do from the moment we start to read.

• explain and discuss their understanding of what they have read, drawing inference and justifying these with evidence

  • again – how do we extract the end of KS2 requirement from this? It could apply to year1 or PhD level.
  • compare the KS1 requirement: make inferences on the basis of what is said and done

• predict what might happen from details stated and implied

  • again – end of KS2 requirement?
  • compare to KS1 working at greater depth: predict what might happen on the basis of what has been read so far

• retrieve information from non-fiction

  • again – end of KS2 requirement? To what extent? What level of non-fiction? What type of information? In what way? If a child can not retrieve information from non-fiction, they are operating at a very much lower level than the end of the key stage.

• summarise main ideas, identifying key details and using quotations for illustration

  • to what extent? Again, this is also a degree level requirement

• evaluate how authors use language, including figurative language, considering the impact on the reader

  • to what extent?

• make comparisons within and across books.

  • what comparisons? ‘This book has animals and this book has machines.’
  • KS1 greater depth: make links between the book they are reading and other books they have read

I’ve felt like I’ve been arguing for many years, against the strength of mythological belief in the wonders of teacher assessment. Fortunately, it looks like, at long last, there is some recognition in this report that it can not be used where reliability is an issue, e.g.

Some types of assessment are capable of being used for more than one purpose. However, this may distort the results, such as where an assessment is used to monitor pupil performance, but is also used as evidence for staff performance management. School leaders should be careful to ensure that the primary purpose of assessment is not distorted by using it for multiple purposes. (p 24)

and the attempt to create assessment statements from the national curriculum objectives is just one clear reason why that is true. Mike Tidd suggests that we are heading towards the demise of statutory teacher assessment used in this way. Good, because it’s been a nightmare we should be happy to wake from!

The nonsense of ‘teacher assessment’ – an analogy

As we approach the start of the new school year, some of us will be continuing to try to make a silk purse out of the sow’s ear of  the new assessment requirements, ‘formally’ introduced last year. Whatever system individual schools decide to use to approach this farce, teachers will be expected to make judgements based on ‘teacher assessment’. Almost everywhere, this will be accepted without question, so I’m going to try to outline in simple terms just how I think it does not make sense.

I’m using a high-stakes analogy in which human judgement of performance needs to be seen to be as reliable as possible – the ‘execution’ score for competitive gymnastics as follows:

  • 6 independent, highly skilled, judges
  • 1 individual is judged on 1 performance at a time (and within a limited time)
  • Each performance has a small number of clearly defined criteria
  • There is no conferring (or moderating!)
  • The maximum score is 10 and points are dropped for errors

These are pretty good conditions for a high degree of reliability and yet the judges still arrive at different scores. Because of that, the top and bottom scores are dropped and the remaining 4 are averaged. Even so, the resulting scores are often ‘disputed’, although queries and official objections are not allowed. The judges are not the coaches and will not be held to account for the performance of the gymnasts.

Now let’s compare that with teacher assessment in an English primary school:

  • 1 class teacher, most of whom are not experts, neither in the subject, the curriculum nor in assessment
  • 32 individuals are judged on multiple performances in multiple subjects throughout the year
  • There are hundreds of criteria (somewhere along the lines of 130 for the core subjects in year 5)
  • Reliability is expected to be improved by moderation and discussion (conferring!)
  • There is no way to eliminate outlying judgements
  • There is no transparent way to score or translate observations of performance into grades

In most schools, there will be some kind of tracking system whereby teachers will be asked to make termly entries along the lines of ‘developing, meeting, exceeding’ and degrees thereof, for tracking purposes, culminating in a final decision which will indicate pupil attainment (readiness to move to the next stage) and teacher effectiveness for that year. In many cases, in spite of union objections, these judgements will form part of appraisal, promotion and performance-related pay. Is there any way, under those circumstances, that teacher assessment can reliable enough to be used for the high-stakes purposes expected in English primary schools?