Communication Center  Conference  Projects Share  Reports from the Field Resources  Library  LSC Project Websites  NSF Program Notes
 How to Use this site    Contact us  LSC-Net: Local Systemic Change Network
Educational Reform & Policy

Systemic Reform

Math Reform

Science Reform

Technology Integration


International Focus and TIMSS


Assessment and Accountability

School Culture

Public Engagement

Professional Development

Teaching and Learning

LSC Papers and Reports



Transcript of Paul Black's talk

author: Paul Black
description: This transcript is from Paul Black's talk at the 2000 LSC PI Meeting.
published in: talk at LSC 2000 PI meeting
published: 2000
posted to site: 03/03/2000

Well, thank you for that welcome. It's always very difficult to know how you're going to live up to that sort of background, but since it's one of my own making, it's my own fault. Also, I'm honored that you've asked me to speak here, particularly because you've gone to the trouble of bringing me from Britain. This involves risks, like my speaking our old version of English, which might give you some problems. And of course, I'm suffering from a five hour jet lag, so it's after midnight now, and the interesting question for this evening is whether I fall asleep before I send you to sleep.

Anyway, in order that you can wake up, there's a notion that you should get to work now and warm the circuits by looking at these three questions.

FIRST : To what extent do the teachers in your project use formative or classroom based assessment and why do they do so? Within that, of course, you might think about what those terms mean to you, and consider whether there's a shared meaning.

SECONDLY : On what evidence did you base your response?

THIRDLY: Is that evidence good enough?

So those are the questions which the organisers and myself have posed to you. And now I can sit down and let you get on with it. Next time I stand up, I'll go on for a bit longer. Thank you. There are about 15 minutes for this exercise.

(after the 15 minute break)

In my talk, I want to set out the answers to three questions, as follows :

(a) Is there evidence that improving formative assessment raises standards ?

(b) Is there evidence that there is room for improvement ?

(c) Is there evidence about how to improve formative assessment ?

I have first to make clear my definition of formative assessment. For me, the term has to be interpreted as encompassing all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged. I draw attention to three features here. First, "all those activities" : the implication here is of a wide range of activities - classroom dialogue, particularly questioning, seat work, homework, class tests, project reports, portfolios, and so on. Second "and/or by their students ": the student's role in formative assessment is particularly important, as I shall emphasise later. Third " used as to modify " : only if assessment information is used like can it be considered formative. If, for example, tests set on a study are not used to modify learning. To modify the learning in that study, then whilst it might have other uses, it won't be formative. That sets the meaning and the boundaries of what we're trying to talk about.

Now, when we look at the first question, whether the evidence is adequate, then I can say that my colleague Dylan Wiliam and myself reviewed the literature thoroughly in 1997 and published a review article in 1998 (Assessment in Education, Vol.5, No.1, pp. 7- 74) and we found quite a lot of evidence to justify a positive response to question (a). All I can do here in a short time is give you the flavor of that by just showing you one or two examples. So here's the first example.

This is a piece of work by Frederickson and White, done in California with 12 science classes in a couple of schools with children aged 12 to 13. They were linking this to evaluating a new course on forces in motion with practical work, emphasis on open inquiry projects, and with the pupils having the criteria of the marking explained to them.

In the experiment there were two matched sets of classes. Laboratory and associated work were the same for all of them. But for one special part of the time, the two sets differed. The experimental classes had reflective discussion about assessing their work in relationship to understanding the science, understanding inquiry and teamwork and communication. The formative assessment was self and peer assessment - they were evaluating what they'd achieved.

The control classes spent similar time discussing the design of the teaching and formulating ways to improve it. So they were doing a job, too, but it wasn't an assessment task for them.

And for all the classes, they used a comprehensive test of basic skills at the outset, three teachers marked their project reports, and there was a test of the physics concept at the end.

Comparisons of the scores on the projects showed that overall there was a clear gain for the assessment students compared with the control classes. But the data are split into data for three different subgroups of students - those with low, medium and high scores respectively on the comprehensive test of basic skills. For those who were high, there was very little gain, but for the so-called low attainers the gain was of the order of two to three standard deviations. So what was achieved was not just increased learning gain, but also an enhancement of that gain for the lowest attainers and the narrowing therefore of the range of performance, something we would all dearly like to achieve. There was a similar result for the physics conceptual test scores: not very significant gains for the top guys, but quite big gains for those classified as low attainers.

I'd like to draw attention to two features of that study. First of all, assessment practice that they were experimenting with was focusing on self and peer assessment. And secondly, the important outcome was not just better learning, but enhancement for the people at the lower end.

A quite different piece of work, the second example, was a meta-analysis by Lynn and Douglas Fuchs done in 1986. So it's quite an old study published in the journal "Exceptional Children " (Vol. 53(3), pp.199-208).They studied research reports on the effects of formative evaluation procedures on student achievement. What they did was to select studies with a control group and an experimental group i.e.traditional experimental studies, with use of curriculum based data collected at least twice a week and used as feedback for teachers. Decisions about acting on the evidence differed between the studies - there was no uniformly imposed way.

They found 29 studies, eliminated 8 of them because they thought their data wasn't good enough and analysed the other 21. Each one of those would have measured effects in several different ways, so they had 96 effect sizes in all.

The average effect size was 0.7. Effect size is a technical term : it is the ratio of the learning gain pre-post to the spread in the scores of the population. So you're shifting up, between the experimental and control groups, by 70 percent of the standard deviation of student scores. An effect size of 0.7 means that the upper 50 percent of the experimental group would exceed over three-quarters of the control group. That's a very large effect for any educational innovation.

Most of those studies were with handicapped children. The effect size was .73, but they also analyzed data of comparable studies of non-handicapped children, where the effect size was a about 0.63. Although this difference was small, it seemed to indicate again that those with the biggest disadvantage gained more.

They also looked at whether the teachers within these experiments had uniform rules and procedures which set out some systematic way to deal with the data, or whether they had no such rules and just used their judgement about how to respond. When they divided the studies into two groups according to that criterion, they found that those who were using some systematic procedure (49 studies) produced an average effect size of 0.91, whereas those using personal judgement with no systematic procedures (47 studies) yielded a mean effect size of 0.42.

There's a message there: it does show that it's not enough merely to collect assessment data. A lot depends on the detail of how you use it to look after the learning needs of children.

Overall in what we reviewed, we've stuck to looking only at studies with matched experimental and control groups, pre and post tests, frequent assessment, feedback to guide learning, and quantitative analysis to compare learning gains. Those are the things we tried to tease out. We found between 20 and 30 relevant studies. The one I just quoted is only one of those, and that itself includes a lot more studies. They all show significant learning advantage with going along with formative assessment practice. Effect sizes range from 0.4 to 0.7. An effect size of 0.7, if you're interested in TIMSS results, would change the USA ranking from the middle of the range of countries to amongst the top five, say, in mathematics, which might please those who set great store by such results - which I do not.

Several also show that low attainers and slow learners show the largest and, several involve emphasis on self and peer assessment by pupils. There's one reservation - if you look at these papers, you find that they lack detail on the whole of what actually happened in the classroom and of the details of the tests used as the criteria of success

So if you read these studies and say, "How can I do this?", you find there isn't enough there to help you imitate the work. And yet, to go back to being positive about it, they have a variety of ways of implementing formative assessment, not one way. They all differ in detail. Some don't focus on self assessment particularly, some do, and so on.

The fact that in spite of a variety of detail you get learning gains which are worth having shows that there must be something robust, something common to them which is of value. And that's what's in formative assessment.

That really deals with the first question. Yes, there's ample evidence. And that evidence is more widespread and more convincing in terms of learning gains than almost any other education experiment, certainly that we know about.

That leads onto question (b) - are teachers doing formative assessment, using it to good effect? The results of surveys relevant to that question, which is the second bit of the research that Dylan Wiliam and I reviewed, are quite depressing. A French researcher talking about a survey of formative assessment, in primary schools said that " the criteria used were virtually invalid by external standards". He was talking about the poor quality of the assessment methods used.

In the state of Ontario where they had a new curriculum and tried to enhance formative assessment practices particularly, two authors evaluating the implementation said of the teachers : "Indeed, they pay lip service to it, but consider that its practice is unrealistic in the present educational context." We can't do it, the teachers are saying in spite of a state-wide initiative to try and improve it.

A survey of the new national curriculum in the U.K. looked in detail at how the science curriculum was being implemented. When its quite substantial report came out, I looked eagerly for the piece about formative assessment. And there was just one paragraph about it, and hardly anything more, and the verdict was " seriously in need of development".

Johnson and others in a survey of practices in one U.S. state said, "Those who worked in highly controlling situations are inclined to use blaming language intended to provide global negative descriptions of assessments and impersonal language." That's pretty gloomy : what lies behind it is the article's report of the oppressive nature of external testing and the way in which that had lowered the tone and the commitment to engage at all in assessment practices.

And finally, a more recent survey in British science classrooms by a couple of authors particularly concerned and interested in promoting formative assessment ended up saying "Why is the extent and nature of formative assessment in science so impoverished ?"

So the practice seems weak. It's not weak just in the U.S.A. It's not weak just in Britain. It's in most countries where it has been studied. In Britain, particularly, we have frequent inspections of schools by our Office of Standards in Education. You can read almost any one of their survey reports in the last several years, based on their inspection going into schools and sitting in classrooms. Every British school is now inspected in some depth once every four years. And their reports regularly say that the weakest part of classroom work is the assessment practices.

The common qualities or features that are picked up in such rather pessimistic surveys are the following:

  • Assessments tend to encourage rote and superficial learning.

  • Questions used by teachers are not reviewed with their peers and teachers are not critical about what is being assessed. Of course, it's quite hard, as you know, to compose assessment questions that really test important aspects of understanding and teachers, even if they have the will and the training, don't have the time to develop and trial their own questions.

  • Many teachers can predict pupils' results on external tests, but they know little about their real learning needs. That's a commentary on what the external tests are doing, perhaps.

  • At a primary level particularly, there's a tendency to emphasise quantity rather than quality of the work.

  • The grading function is over-emphasised, and the learning function under-emphasised. This is a crucial point I want to get back to, for it implies the use of a normative rather than a criterion approach to assessment. Such an approach emphasises competition, not personal improvement, so that feedback teaches weak pupils that they lack "ability", so they're de-motivated and believe they're not able to learn.

A high point of my career in Britain was being attacked in an editorial in the Times newspaper. We had publicised some work about competitive marking and the bad effects of grading and how it ought to be abandoned. And of course, this editorial said "This guy's crazy. It's a hard, competitive world out there. Kids have got to get used to it. They've got to compete in school," and so on. I wasn't too worried about being attacked for saying what we were saying: what the newspaper's editorial did not consider was the effect on pupils or on people in the world outside if they didn't win in the competition. Competitions have winners and losers, and if you start losing at school, you begin to believe that you are a loser and you might then be on a downward path.

Now, okay, that completes the second bit. So we come to the third bit, question (c). Does the research give us guidelines, evidence about what to do?

Now, the very important point here is not only that what you read in these research results doesn't give you the detail you need to work in classrooms, but also that it couldn't. That is to say it is not the case that education researchers can produce recipes for teachers out of their work. I don't believe that they can. The reasons for that are very similar to the reasons why pure scientists can't produce stuff directly applicable to technology. It's well known by those who analyse the history of technology that when the scientific finding is to be turned into some practical application, it needs not simply to be applied, it needs to be transformed. It needs to be honed and reconstructed in order to match the needs of the practicalities - of production, of every day use, and so on. Then it gets changed and becomes a different sort of knowledge. Similarly, research knowledge in education, if it's going to be useful in the classroom, has got to be transformed. Not just translated, but transformed into a shape that teachers can grasp, make use of and make their own. All you can do with research results of this sort is say there's a prima facie case that it is worthwhile to have teachers spending their time on trying to turn the ideas into practicality.

However, when teachers start engaging in this process, what they produce will have a variety of forms and shapes according to the particular contexts in which they teach. Because in teaching, as in most other aspects of real life, context is all and determines very markedly and sharply what's the best thing to do. But certainly as they do so, teachers will be formulating a new theory, working out new ways of doing things, indeed establishing new knowledge about instruction.

Having said all that, and given those reservations, I nevertheless am not going to sit down at this point and say nothing more for all sorts of reasons, including my pride. What I want to do now is talk about what indicators there can be. For formative assessment, there are some indicators, from the various research studies which we have reviewed, about things teachers should look into if they are to try to do change their practice. So my question now is - what framework of ideas can one provide which might give an impetus to trying things out ?

I'm going to tackle this question in two sections. One will be about the teacher's role, and then the second one will be about the student's role.

Let's talk about the teacher's role in formative assessment. I would do that under several headings. The first heading is choice of task. The point there is that you cannot add formative assessment to something stuck onto a pre-existing lesson plan or scheme of work. It's got to be built in. One has, for example, in the planning of a piece of work, to ask how it will provide opportunities for pupils to demonstrate where they are, what they know, what they are understanding. Opportunities for them to write, to talk, provocations or stimuli to them to open up about where they are, are essential, because until they do so, until they provide the evidence, there's nothing to work on, nothing for formative assessment to work with. So building it into the planning is essential. And to do it seriously means reconstructing the lessons and schemes of work are put together.

Second is the quality of classroom dialogue. I'm talking there about the way in which the to and fro in the classroom of discussion and dialogue actually goes, what drives it, how it's put together. To illustrate that point, I can show you a piece of classroom dialogue. It's a real piece. taken from a transcript of a lesson by one of our researchers at King's.

It is the dialogue of a teacher with a small group of primary school children, talking about magnets. It has commentary added by the researcher. So here's the piece.

The teacher starts. "Why do you think the paper clips stick to the magnets ?"

Mark : "Because they are magnets," expressing surprise the teacher couldn't work that one out. (Laughter)

Teacher: "What do you think makes the magnets attract the paper clips ?"

Anthony: "I think there's glue in the magnet," being helpful, for he knows that another question means there must be more his teacher was trying to find out.

Notice what underlies this: there are unwritten conventions in classroom dialogue. Pupils and teachers know how you should behave, the hidden messages that questions. This makes it hard to change things.

Teacher : "What makes you believe that, Anthony ?"

Anthony : "Because they stick."

Teacher: "Do you think the magnet is made of glue or another substance, Anthony ?" "Hmm," said Anthony with a ponderous look. "It's another substance, but I don't know what it is."

Matthew: "I do, I do," excitedly. "Let me say it. It's metal."

Teacher: "Yes, Matthew, that's a good name for the substance in the magnet. What do you think, Anthony?"

Anthony : "I knew that," defensively, " but I couldn't think of the word."

Teacher : "So why don't these things," pointing to some other articles, "stick to the magnet, then?"

Matthew : "Because they're plastic, of course."

The teacher then went on to something else.

Now, of course, it's very easy in retrospect when you're not under pressure of producing an answer to look at someone's live performance, have a laugh at it and criticise it. Nevertheless, having allowed for that, it is a deeply flawed piece of dialogue for several different reasons.

It is deeply flawed, first of all, because the whole conceptual underpinning of the piece, the line of questioning, is as bad as could be. To think that you can produce an explanation of why things are magnetic to primary children is a bit daunting. I think it's sophomore level or higher in undergraduate physics before you start talking about the unfilled levels the 3d bands of the transition metals of the first long period. (Laughter) You can't quite get there in primary work ( although you ought to avoid the idea that metals are magnetic- most of them are not ).

So I'm not quite sure what the teacher had in mind in setting this questioning in the first place or what understanding underpinned that. Nevertheless, it's not a hopeless question and Anthony comes up with an idea. "I think there's glue." What happens to that idea? It probably scared the teacher, because it's not what he or she wanted, or expected. And the questioning then slowly chokes Anthony off. Instead of following up the glue, the teacher "Do you think the magnet is made of glue or another substance, Anthony ?" That's heavily cued- its message is "Anthony, you're on the wrong track. Think of something else." Anthony reads the message that he's on the wrong track and covers up saying, "It's another substance, but I don't know what it is."

And Matthew then, who is probably smart enough to keep quiet until he knows which way the wind's blowing, then comes right back, producing a word, the right word - Metal. Okay. We're there. And now the teacher gives him praise. "What do you think, Anthony?" That's a bit cruel to not leave Anthony alone. And he says, "I knew that, but I couldn't think of the word," and then we go on.

So the way in which the teacher is handling that dialogue is flawed. Now, why? Perhaps because the teacher is insecure in subject knowledge, wants the answer metal, is thrown by something else coming up, and has to suppress it because he or she didn't know how to cope with it. You could have made something of it. You could have started saying, "Well, how would you make things stick with glue? Is what we see here with the magnet any different?" This could bring out the fact that there's action at distance for magnets attracting these paper clips, whereas you don't get action at distance with glue. At least I never found a glue like that, although I'm sure somebody's going to go and invent a magnetic glue. Perhaps that could have come into the discussion, too.

So there's that expectation, and then the nervousness which means you can't follow up on what the pupils said. You can't make any lesson about scientific thinking. And so it becomes a ritualistic dialogue. With the notion that thinking will be rewarded by something interesting is extinguished. The ritual is established and episodes like this can put it more firmly into concrete.

Here's a quotation from a report of the UK Office for Standards in Education about their inspections: "The quality of teachers' questioning is very variable in the degree to which it extends pupils' thinking, draws out their ideas, and encourages them to volunteer points and explore further, thus providing evidence of achievement. Too often teachers engaged in closed questioning limiting pupils' responses, or even neglecting to take up the issues pupils raise, and ultimately failing to register how far they've understood the objectives of the work."

This sort of fractured discussion is quite common, certainly in our country, and there is a need to find ways of helping teachers be more critical of the way they're asking questions, the way they're dealing with dialogue.

Now, if we come back to surveying the teacher's role, that leads on to the next of my points, which is about questions, which brings in issues about recall and about wait time. If you have the sort of dialogue where a teacher asks a question, and then waits for an answer, and if none is forthcoming, breaks the silence with another question or statement, then one of the critical issues is how long you wait, how much silence you tolerate before you interrupt again. You know probably most of you about Mary Budd Rowe's original research, which others have replicated. If a researcher sits in classrooms and measure this time of waiting in silence, and the average time is less than one second.

Now, a pupil cannot think of an answer which requires any thought, for which you have to work out some piece of understanding and then work how to express, it in less than one second. So if teachers are not prepared to tolerate, as this evidence showed they weren't, extended periods of silence, e.g. at least a few seconds, they will never get answers to questions which require some thought. If they come in with another question, they may not get an answer to that, either, unless they simplify the question. How do you simplify the question? You narrow it down to a simple rote learning question, the only type of questions which can be answered quickly.

So the quality of the dialogue is downgraded, and it becomes a 'ping pong' of a few students exchanging with the teacher rote answers to superficial questions - the research that normally seven or eight in a class of 20 or 30 who will answer the factual questions in this way. The teacher would asks such questions, and the rapid answers keep the dialogue going. But there's very little thinking going on in such a classroom ; the dialogue give an appearance of involvement and progress, but it's not actually helping learning.

If you want to break into that, you've got to break into it quite radically. You've got to establish a different classroom culture. My own suggestion would be fewer questions, better quality, and more time spent on each. The teachers and the pupils have to learn to tolerate extended silence. Or a teacher may, when posing a question, require pupils to chat to one another about the answer before they offer the responses, and then each of several pupils may answer on behalf of a group.

When these ideas were put to the teachers that we're working with at the moment, they didn't at first believe that questioning had anything to do with formative assessment. But then they got into auditing their own questions and discussing the quality of questions and began to appreciate the problem. In consequence, they're now spending much more in their lesson preparation trying to frame rich, valuable questions which they're going to give prominence to as part of the learning.

Another part of the teacher's role is testing. All I want to say about that is that frequent summative is not formative. Piling up scores or marks in a record may be useful for some purposes, but it's of no value in promoting learning in the short term. And tests should be short and rather close to the learning, because what you want to do with them is to feed back to the pupils some extra activity to remediate weaknesses exposed by the tests. You can only do that if it's close to the learning.

Now, feedback alone isn't enough. It's the quality of the feedback which actually determines whether or not it'll help learning. And in particular, not just marks only, but a focus on the individual improvement guided by assumptions about learning. I think it's necessary to expand on that one, and to do this, I want to bring into play another of the pieces of evidence.

This is one of several pieces of work done by a researcher named Ruth Butler in Israel. She had an experiment in which she had 132 pupils age 11, in 12 classes in 4 schools. For the research analysis, she selected the results in the top quartile and the bottom quartile of each class. For the experiment, classes were divided into three groups, A, B, and C, each composed of four classes. They were all given the same teaching by two project students - so this is an artificial experimental situation. In that teaching, they gave the same aims and criteria to all the classes. All pupils were all asked to produce the same work en route for marking, and this work was marked and given back during the learning. The differences between the three lay in the way the feedback was given on that work. One group, four classes, was only given comments on their work- no marks, just comments. And they showed a pre-post gain of 30 percent and they evaluated their interest in the subject as positive.

The second group was given marks only, no comments, just marks. They showed no gain in the learning. The top lot went up in interest, the bottom lot went down. Now, that's not surprising. If you simply give people marks, it doesn't give them any indication what to do, whereas if you've given them comments, it does give an indication of what to do. So it's not surprising that comments lead to better learning than marks. That in itself is a lesson.

The third group was given marks and comments together. They showed no gain in learning. The top lot went up in interest, the bottom lot went down. That is, whilst the first two groups showed that comments do help and marks do not, this third group shows that if you give both together, the marks wipe out the positive effects of the comments.

Now, I don't find that surprising, because one of my colleagues doing research in classrooms at one school was looking at this sort of thing and had a contract with the teachers that the teachers would mark the work, and she would then take the marked books and talk with the pupils about what they made of their marked books.

What she learned from the dialogue with the pupils was, first of all, the school had effort grades as well as achievement grades, and some of the pupils didn't know which was which.

Secondly, the pupil view of marked homework was if they got seven or eight out of ten, that was fine, but if they got two or three out of ten, they had to try harder next time. They didn't read the comments. The focus of interest was the marks only, not the learning lessons involved.

When the researcher reported back to the teacher and said, "This is what the pupils say," the teacher was very distressed. She said, "I spend hours marking this stuff. And you're telling me I'm wasting my time, it's not helping their learning, and they're not reading my comments." The researcher had to be honest in replying "Yes, I'm afraid that's true. Let's think about doing it differently. Not about you working harder. It's about working differently to help pupils to learn."

Research into such issues is being looked at in different ways by others, the evidence does not just rest on one set of results. But I wanted to quote another of the same person's research. This is another experiment done by Ruth Butler and colleagues. In this experiment, she had four groups, one given comments, one given grades, one given praise, and one given no feedback.

On a post-test, the group given comments showed greater learning gains than any of the other three groups, these other three all scoring about the same. So for grades, praise or no feedback, there's no difference in the learning gains. The comments paid off. But she also had a test of what she called ego- involvement, which is a test of the extent to which pupils are introverted, looking at themselves asking "Am I doing okay?", comparing themselves with others, as opposed to being task centered, looking at what needs to be done in their learning work, and thinking about what they're going to do about it.

And in terms of the test of ego involvement, the groups given grades and praise were high. That is, they were introverted, making comparisons with others and so on. The groups who were given comments or no feedback, on the other hand, were low in ego-involvement, more objectively concerned about their work. Getting away from that high ego involvement, becoming more objective about the task is what pays off in terms of learning gains.

What lies behind all of that is that the whole culture we have of giving marks, grades, gold stars, merits and so on can actually be damaging to the learning of many pupils. It distracts attention from the real job, which for any one pupil should be " What do I need to do to improve my learning?" rather than looking at whether you're high or low compared with others. Comparisons are rewarding if you're high, but depressing if you're low. The classroom culture should play this aspect down and help pupils concentrate on what each needs to do to improve, with an accompanying a message from the teacher that " I'm going to show you how you to improve."

It's not, "I've labelled you as no good." It's just, "Where you are now? I'm going to help you to go the next step." That change of culture is important and requires a lot of explanation. Teachers we're working with are having trouble getting it across in their schools because the management of the school requires marks in the belief that these help raise standards. We're having to help them in arguing with the management that that isn't the best way to go.

This important emphasis on the quality of feedback opens up some broader and more difficult questions, which can be put something like this. Why is it that concentration on formative assessment and on giving feedback leads to learning gains? Why does it work? What is there about it that is fundamental to pupils' learning?

This is a theoretical question and I don't have good answers. But there are some clues about it which I think are worth exploring because they open up even more avenues. One of the ways of looking at it was produced by an Australian, Sadler, several years ago. He was working in the State of Queensland . In that state in 1982 external examinations to determine school leaving certificates were abolished and since then certification of pupils has relied solely on school-based assessment, with standards checked by collaboration between teachers in groups of schools forming local clusters. So teachers' assessments were important in that context.

Sadler drew attention to the fact that in a learning exercise, you have three main components. One is the aim of learning, another is the pupil's understanding at the outset. Between those two, there's a gap: the object of the learning process is to close that gap, so the third component is the action taken to close the gap.

Now, such action must be based on evidence. You want to identify what the gap is, so you need to know where you are and where you want to go. Then you want to take action to close the gap. On a constructivist view of learning, those engaged in these learning processes have to be active in the learning. The teacher can help, but the learning has to be done by the pupil - it can't be done for the pupil. You can't make people understand. You can help them to work out their understanding and the constructivist view tells us you've got to start from where they are and work from that position

The conclusion of this argument is that for formative assessment to be effective, the learning must involve self-assessment by the pupil. That has a lot of implications about how learning might best be achieved In particular, it shows that it's not a happy accident that formative self-assessment enriches learning, but rather that self-assessment is a central feature.

That process opens up a couple of different avenues. One of the avenues is to say, "Well, okay, now we're talking about theory and theoretical ways and ways of improving learning. But wait a minute: aren't there a lot of different theories about how you improve learning ?" And of course there are, and you've probably all heard of at least some of them .

Some talk about meta-cognition. Some talk of zones of proximal development following Vygotsky, and some have developed his thinking to explore the concept of scaffolding. There are people who talk about multiple intelligences, about the contexts and content specific nature of learning. There are people doing experiments on thinking skills. There's Schunk and Zimmerman emphasising self-regulated learning as the answer. There are arguments about the value of externalising productions and producing a product that you can discuss and learn from by having it out there. There's also emphasis that learning is a social activity and that any classroom will be a "community of practice". There's all the literature, related to the work that I just showed you from Ruth Butler, about self esteem, self concept, self attribution.

That's a nice long list, and it could easily be made much longer. It might inspire and impress you, or it might scare you to death. I have to say it scares me, because although I can name the names, I don't understand the concepts in sufficient depth that I can do perform a super synthesis to produce a unified model drawing on all of them.

But what's common to them is that they're all about helping people to learn. They're about learning to learn. They're putting the emphasis on how you make pupils better learners, not just teaching them a particular subject.

And secondly what's common to them, if you look in detail, is that they all involve learners taking responsibility, and they all need to have in it some mechanism whereby pupils do understand their learning goals and can take control of their own actions in trying to reach those goals. So what can we distil as the basis of formative assessment? The three notions, of the target, of knowing where you are, and of knowing how to cross from one to the other, are in fact implicit in most of those.

And since no teacher can actually work with ten theories at once and most teachers can't be expected to read them all to decide which is best, what you actually need, what will be useful to teachers, is some rather simple and basic theoretical model. I think formative assessment puts us on the road to developing that. I claim no more, except that it is an essential part of different ways of approaching improvement in learning.

But such argument opens up the different, more practical aspect, which concerns the student's role in learning. If it's fundamental that one must try to help a student to become a learner, to learn to learn, and that that is a basic message common to work on meta-cognition, on self-regulated learning, and so on, how do you go about it? What have you got to do?

Let me give you one example of what you've got to do. This is work in one school whose teachers were part of a group working with faculty at King's College to improve science teaching through better formative assessment, in the lower secondary school, that is with children ages 11 and 12. They chose to focus on self-assessment. In fact, they came up and said it had to be self-assessment at a time when I didn't understand very much about such ideas and was therefor concerned that they were being far too ambitious.

But their way of doing it was to introduce a means of self-assessment across a whole year group in their science lessons. For each topic they would give each pupil a sheet with three vertical columns. The first column set out the learning goals for that topic, the second was space for the pupils to put in comments opposite each goal, and the third was for comments by the teacher. For the second column, each pupil was asked to write a short statement about how he or she felt about each particular goal as the learning went on - their degree of confidence, what problems they were having with it, and so on.

This enterprise at first failed. Most pupils wrote nothing. The only pupils who could write much here were the best pupils anyway. The best pupils don't need the benefit of such initiatives, as has come clear from many of the studies that Dylan Wiliam and myself reviewed. One can only speculate: my speculation is that they have learnt the main 'tricks' about how to learn - that is what has made them the best pupils. It's those who have not yet got that trick who come out labelled as low attainers - they don't have any clear idea of what learning is about.

What the teachers did to provoke pupils into writing more was to add little rows of five boxes in the pupil's comment column opposite each goal statement. The pupils were told that each of them had got to colour in these boxes, colouring them all in if they were perfectly confident about the meaning of the goal and about their own understanding of it. They were to colour none if they had no idea what the goal statement meant, and to color some intermediate number in if they felt they were part way there, but had some problems to be cleared up.

This device helped to accelerate the development of the new approach, because once pupils had done this, they were asked to explain. Why did you only color half of them in? Why did you put none in? What were the problems ? It took most of the school year before most pupils were fluent in expressing their self-assessments on such sheets. But once they'd got there, then, their comments provided feedback which was extremely valuable to their teachers.

Writing about this work, the teacher who started it said :

"There are big advantages in pupils assessing themselves, but they find this hard to do at first. The first assessment sheet at the beginning of the year 7 (for eleven year olds) is a very simple, practical exercise involving only four statements from AT1( AT1 is a component of our national curriculum for science about making an open investigation) designed so that pupils can have an easy induction into the practice of self-assessment. Most pupils are honest in their own assessment for most of the time. Some, especially the less able, don't like to admit that they're not coping, and sometimes say they understand when they do not. Teachers try to help by emphasising to each pupil that the record is a private document between the two of them and what matters is that the teacher can see where the pupil has problems so that help can be given where needed."

"There are likely to be variations in standards between pupils in filling in the boxes, but this doesn't matter very much - the main value comes from the combination of the box data and the pupil's comments. These together provide starting points for useful dialogue with teachers. Pupils have gotten noticeably better at their own assessment during the year. Many at the start wrote very few and very vague comments, but during the year, these change and become more explicit and perceptive, and so more useful."

Now, there are two interesting aspects to that development. The one is that the pupils are learning to be honest and fluent in assessing themselves, which helps the teacher. But the other one is in the structure of such sheets. The laying out of goal statement was making pupils think of their learning in terms of a set of aims, and structuring the way they thought about it in relationship to those aims. Now that is where the hard work lies - in finding ways to communicate the goals, which involves getting the language and the level of sophistication and complexity just right. As pupils get used to such a way of working, they are realising the whole idea that learning isn't just one damn thing after another. It actually has a structure, has a meaning which you can get hold of and take control of. That seems to be the radical change in thinking which they find difficult.

It took a year before these feedback sheets were really effective. Pupils were not dishonest or unwilling to assess themselves, the problem lay in their view of what the school learning procedure is about. There are quite a few pieces of research in which, for example, researchers have sat at the back of a lesson, listened to everything that happened, and at the end of that lesson, then interviewed some of the pupils as to what they thought the lesson was about. The researcher is at one with the teacher in seeing that the lesson had one or two main points, then several subsidiary points that followed from them, or helped to illustrate them, and then some illustrations and anecdotes. These elements had different levels of importance in the structure of the lesson.

To most pupils, things were not seen like that at all - the flow of the lesson was just one damn thing after another. The jokes, the anecdotes, the examples of the main principles were not distinguished in importance from one another - the structure is not perceived. There is a far more serious lack of communication than many teachers realise, and it is a surprise to them that the structure of ideas that they thought was obvious and clear in what they said had only come across to a fraction of their pupils. Until you get feedback, you don't see such things. And when you do, you find you've got a much harder job than you realise in getting pupils to try and make sense of it all.

This difficulty is expressed in many ways by different authors. Here's a Swiss author, Perrenoud, who unfortunately writes almost entirely in French so that it's difficult for me to understand what he's talking about. But he's a very subtle thinker - which is why the French is hard to translate.

"A number of pupils are content to get by. Every teacher who wants to practice formative assessment must reconstruct the teaching contract so as to counteract the habits acquired by pupils."

I think that encapsulates a very important message. You should not imagine that if you implement a new way of working that's going to be much better for learning, the pupils are going to clap their hands and join in with delight. No - every change is as threatening to them as it is to you. It's all pretty scary, taking part in innovations. And one has to have patience to try and change the culture of the classroom in the way that types of initiative that change the pupils' role will require.

So if we put all these various ideas together, what practical suggestions can emerge ? The basis for helpful suggestions is contained in what I've already told you, but I shall now go further and set out a few of the ideas which have come up in work that we're doing at the moment in a small-scale project with science and math teachers in just six schools, 24 teachers in all. They have worked out, partly following suggestions form me and my faculty colleagues, but very much by inventions of their own, different activities which they are finding useful. And these are all components out of which they're forging and reformulating their whole theory about how learning should go.

  • One is to prepare and audit classroom questions. I've referred to that already.

  • Another is to try and arrange that colleagues observe their lessons, but observation of lessons is very difficult. You've got to focus on what you're looking for. What the teacher-observer looking for as they sit in a colleague's lesson is the quality of the feedback so they can report to that colleague afterwards in terms like : "What I spotted in the way you were asking for, getting and acting on feedback, is this, this, and this." Which goes back to looking at the quality of the questioning and the classroom atmosphere and assumptions in which it promotes useful dialogue.

  • On marked work (‘marked' should be in inverted commas) they give no marks or grades, but comments only. There should only be a few comments, trying to make sure those comments are such that they can lead to action by or with the pupil, for there is no point in putting in a comment if a pupil can't do anything about it, it's no good saying "Try harder." You've got to say something about what trying harder involves. My colleague, Dylan Wiliam, tells a story about a comedian who wasn't doing very well. That is, his audience wasn't laughing at his jokes. So he went to a counsellor who listened to him very carefully and asked him all sorts of questions. And finally said, "I know what you need to do." He said, "Yes? What do I need to do?" He said, "You need to be funnier." (Laughter). So he gave up on going to counsellors.

  • Colleagues can exchange and discuss marked work. If you're trying change from grades to comments then, as for nay other change, you should not try and do it as a lone initiative. Teachers should try to learn from one another. And the way you do that is by looking at concrete things, not by abstract discussion. Swap some 'marked' books, books with comments, and talk about how you did it, and see how someone else does it.

  • Similarly, teachers share test questions: good class test questions are very valuable and hard to find. It's the cognitive quality of the questions, whether they get at the essential features of the learning in the subject, that actually determines whether they're going to be useful in giving feedback that helps learning.

  • There are also suggestions aimed directly at pupils. They can discuss questions asked of them in class in groups, so that the responses comes from and on behalf of each group. This way, they can have time to think, share their thoughts, externalise them, and learn in the community before responding.

  • When assessing their own work, our pupils are being asked to label them with "traffic lights". "Traffic lights" is a term invented by the teachers. It's a simple device, a bit like those colored-in boxes on the science sheets that I discussed earlier, only simpler. Pupils are asked to get into the habit of labelling their work with a red blob, a green blob, or a yellow blob. A green blob if it's okay, a red blob if they are really worried and stuck with it, a yellow blob if they're somewhere in between. The fact that their work is so labelled becomes a surprisingly powerful aid to classroom discussion and assessment.

  • Some teachers say, "Okay, you guys in yellow on this particular question get together and talk together about how you do it and what your difficulties are, so you can tell me together." Another teacher says, "If you've got a red on your particular thing, before I mark it, you go and talk with somebody who has got a green on it, somebody who thinks they understand it, and have some mutual cross-class discussions so you're helping one another."

  • Again, another teacher says, "When I get their homework, the bits I look for is where there's lots of red blobs all over the place. That's the stuff where I've got to really pay attention and I only look at a sample of a few of the ones that are green." So they're rapid indicators. There the pupils are giving you the information you need. Because if you don't get information from them, you've got to generate it all yourself, and your teacher colleagues are going to say, "How can we do all this with a class of 20 or 30 in the detail required?" The answer is, "You can't. You can do it if you recruit the pupils into doing it for themselves. And that will help their learning as well."

  • Pupils working in groups will mark one another's work and then the dialogue between them brings out the points which aren't clear to anybody and brings out a lot of peer help. Of course, having to explain why you did something is a very good way of making more concrete your understanding of it, or where your thinking has got stuck..

  • Pupils can make up test question on the work completed. Questioning need not be just responding to other people's, but making up some of your own. There are several pieces of experiments in the literature where people have compared the results between matched experimental and control groups, where the control has been a group who have been answering other people's question on the work, whilst the experiment has been a group in the same lessons who have been sat together trying to make up their own questions. The learning gains of those who made up the questions were greater the gains of those who were answering other people's questions.

  • If you're attuned to notions of learning, to the idea that learning involves meta-cognition, you can say, "Yes, that's just what we expected." Because if you've got to make up a question, you've got to be meta-cognitive, that is to say you've got to get an overview of what the work's about, to get a clear vision of what its purposes are. If you're answering someone else's question, you don't necessarily have to do that sort of thinking. Pupils often seem to wander about trying to find a trick that will get the answer.

  • Pupils can prepare oral reports, and the class evaluates them. That's the issue that you might get with project work, with pupil investigations and experiments. You get much more value from such activities if pupils have to stand up and present an account of their work. A pupil's work becomes an object out there, once they have communicated it. Teachers ask pupils to sit in a group and evaluate their peers' presentations, say what was good about them, what was not so good , and suggest how it could be improved. When the thing's out there, the pupil can join in a discussion of his or her own products and see the work a bit more externally and objectively. And that's part of learning.

  • And finally, selected pupils can give a little talk at the end of a lesson. They try and answer, "What did we learn in this lesson ?" And the class questions them. One of our teachers invented this, we wouldn't have dared to suggest it. I thought it was much too ambitious, but it did work with that particular class, and others are now trying it out. You select pupils and say, "At the end of this class, you're going to have to stand up for a few minutes and give a summary of the main things that were learned in this lesson. And then the others in the class can ask you questions about it. And if you can't answer their questions, you throw it out to the rest of the class, because there's probably somebody in there who can answer it." But the one rule that the teacher has is that she is not going to talk during that part at all. She is going to listen. Of very great importance at times is that teachers should listen, look, observe, and not try to 'deal with' everything. Because only then can a teacher begin to take in the evidence about where the class is. This teacher was doing this, whilst at the same time tackling the meta-cognitive problem, which is in part that pupils don't see the wood for the trees, they don't see the main points. She's developing her pupils by requiring them to do this, to take active parts in doing it, developing the skill of looking for the main points, the highlights, so that they can begin to grasp what it's all about.

All of the ideas which I have just listed have been tried out by some of the teachers and pupils we're working with. It's not the end of any possible list, but it is a useful starting point. The issue that still remains is the issue of how teachers can take this on board and forge it into new ways of working in classrooms in their own way. And the only answer to that is to encourage teachers to do it and support them while they take part in what must be a risky enterprise.

Thank you for your kind and close attention. (Applause)

__: Thank you for a very enlightening and thought-provoking talk. You've given us a great deal to think about. One of the things that it seems to me you're pointing out is that the quality of the classroom dialogue, the conversation that goes on between the teacher and the students, is extremely important. But you gave us a negative example.

Have you, in your work, developed some positive examples? It seems to me that it's not enough to ask teachers even to get together to try to generate good questions. We need some examples of powerful questions that lead to good classroom dialogues. Do they exist?

PB: Yes. I think they do. One of the ways to approach this is to first of all try and frame the whole lesson in terms of a question. That is, try and see what you're trying to do is addressing important questions, which is a good way of determining what is the significance or importance of doing this lesson topic at all. What sort of issues does it address, and why should these issues matter to anybody? And if you can put the whole thing in terms of a question and then its parts in terms of subsidiary questions, you're beginning to think of it in a different framework. That is quite a good way of beginning to get into questions which are about things that really matter.

The other clue, possibly, is that you should think in terms of questions to which any answer is correct, rather than questions to which you're either right or wrong. So if I ask you whether, say, electrical potential is an intensive or extensive quantity, unless you're a good physicist, you might be a bit worried about the answer. Because you know that what you say is going to be right or wrong. I've given you an option, and you might be caught out.

If I ask you, "What do you understand by the phrase ‘electric potential' and tell me what you understand by the words extensive and intensive," then I'm inquiring about where you are. I'm interested in your understanding, and any answer will be correct, because you'll give me information that you have and I don't. And I'm not presuming to challenge you in any way. Nevertheless, I can work with that answer to develop your understanding of the topic. So you try and get into questions which can reassure the pupils that you are interested in their thinking and which would nevertheless explore that thinking. Get away from questions which are right/wrong questions.

Some troubles do arise because of the bad routine of 'ping pong ', i.e. the rapid alternation between superficial questions and quick rote answers in the classroom - this does need right/wrong questions, and that's what's corrosive about that way of working. There is a book on questioning by an American author, J.T. Dillon which brings out some of these points. The difficulty is, I think, that the book's treatment is very general and about generic questions. It doesn't give many examples of good things within particular subjects.

I would recommend, however, as an exercise, that teachers themselves do this. If first of they can observe one another they can gather and share actual questions that have been used, and then sit in a group treating these as anonymous examples to discuss and reflect on them. In such discussion, they should address such issues as: What's the point of asking that question? What would you learn from the answers? This way they might all develop their own criteria of quality. Nobody can produce off the shelf questions which you can go and use in your classroom. What you need is to sharpen up your idea of the quality so you can learn about your own questions.

__: A hallmark of inquiry science that we are all promoting is allowing or better stimulating children to-- On the basis of their curiosity-driven exercises, experiments, explorations, begin to understand some things on their own. The old idea was, prior to inquiry, and we all have fought it, I'm sure, is the teacher putting the "objectives" on the blackboard at the beginning of the lesson, thereby squelching the possibility that the child discovers anything.

There's an element of that, however, in the kinds of things you're talking about. The rubric with the little boxes, where the kids have to fill in all or some or none of the boxes, had along the side a summary of the learning that that child is supposed to have succeeded in. What I gather is that it's very important to sequence that kind of rubric, or that kind of exploration, in the whole process of learning so as to get a formative assessment, and yet at the same time, not abort a child's curiosity-driven discoveries of facts. And I wish you'd talk a little bit about that.

PB: I think I'll respond by saying that I would have a model of learning which has a variety of activities. The activity I was showing you was activity fairly directed to understanding something about the atmosphere in a fairly closed way. And there's some virtue in that, but certainly not all the virtue in that.

If you go into much more open-ended inquiry, which is an essential part of learning, then what's required there for formative assessment is not for the pupils to get an understanding of the answers. Because as you're pointing out, there aren't answers in that sense. And if there were, you'd kill the activity.

But what they do need to get to is criteria of quality about what they do. Because there's no doubt that some children will do investigations which are of great quality and very rewarding, and others will not get very far. So what's required for self-assessment there is the criteria, the concepts of quality, so that you can assess yourself as to whether you're going in the right way or not.

Now that means breaking it down, I think a little. I would be a bit analytic about it, but the analytic frame would be quite different. For example, for people in investigations, it would be about how well defined is the original question. Or how have you taken a very vague idea, how have you managed, perhaps by iteration, not once and for all, but by trying things, to hone it down into something you could actually answer by your explorations.