ABC on Scoring Rubrics Development for Large Scale Performance Assessment in Mathematics and Science

author:	Westat
description:	As part of its technical assistance effort, Westat is developing an Occasional Papers series addressing issues of concern in doing outcome evaluation. The first of these papers, the development of scoring rubrics, has now been completed and is available for use and comment. Suggestions for additional papers are welcome. Remember Westat staff and their consultants are available to provide assistance to you in developing or reviewing your outcome evaluation plans. NSF is providing the resources for this technical assistance. Please don't wait until the last minute to ask for help. To suggest themes for occasional papers or request technical assistance, please contact Joy Frechtling. She can be reached at frechtj1@westat.com or (301) 517-4006.
published in:	WESTAT
published:	05/01/2002
posted to site:	05/31/2001

3. Options for Scoring Rubric Development

Scoring rubric development requires teamwork among developers and users, and it is time consuming. Depending on the feasibility and resources, there are three options for selecting a scoring rubric: adopt, adapt, and do it yourself.

3.1 Adopt

The easiest way to develop a rubric is to adopt a currently existing scoring rubric that matches exactly what you need in an item or a product. This approach is most likely to work with general scoring rubrics, and can be used for both analytic and holistic procedures (e.g., speech presentation rubrics). The advantage is that adoption is quick and easy and does not require much expertise in rubric development. The disadvantage is the difficulty of finding the exact match.

3.2 Adapt

When existing rubrics are not available, an alternative is to adapt an existing rubric by any of the following methods:

Modify or combine existing rubrics.

Reword parts of the rubric.

Drop or change one or more dimensions of an analytical rubric.

Omit criteria that are not relevant to the outcome you are measuring.

"Mix and match" dimensions from different rubrics.

Change the rubric for use at a different grade.

Add a "no-response" category at the bottom of the scale.

Divide a holistic rubric into several dimensions.

3.3 Do It Yourself

Districtwide or statewide standardized assessments often require specific scoring to their performance assessment items or tasks, but there is simply no existing scoring rubric available. Every item is new and measures specific skills based on the local standards. To align with the local assessment standards, scoring rubrics need to reflect their contents. Consequently, test developers have to develop scoring rubrics from scratch. In the next section, we will describe step-by-step procedures about how to develop a scoring rubric. Again, the focus is on performance assessment items or tasks in mathematics and science in standardized assessments.

4. Development of a Scoring Rubric for Performance Assessment Items in Mathematics and Science

This section will describe briefly the common types of performance assessment items in standardized mathematics and science assessments, as well as general procedures to develop scoring rubrics for such items. Tips for developing a scoring rubric will be suggested also.

4.1 Common Types of Performance Assessment Items in Standardized Mathematics and Science Assessments

Performance assessment tasks can be used to assess a variety of learning outcomes. Higher order thinking, connecting, and integrating knowledge abilities are some of the typical examples for mathematics and science assessments. Depending on the goals they serve, performance tasks may be short or lengthy, and may employ different types of item formats. In mathematics and science particularly, the most common types of performance tasks in standardized assessments include, but are not limited to, the following:

Short-answer

Extended-response or constructed-response

Product

Investigation

4.2 General Procedures for Scoring Rubric Development

1. Based on the test blueprint, members of item development committee need to make a preliminary decision on the dimensions of the performance or product to be assessed.

Scoring rubric development is a team activity, especially for standardized assessments. Theoretically and practically, item writing and scoring rubric designing is an integrated process. Based on the test blueprint and assessment standards, item writers should work together with the content advisory committee to decide on the dimensions of each performance or product to be assessed. For mathematics and science, a holistic scoring procedure is usually used. Each item has specific scoring rubrics that are usually developed together with item writing, and this development is considered as part of the item writers’ responsibilities.

2. Look at some actual examples of student work to see if you have omitted any important dimensions.

Before designing scoring rubrics it would be helpful to look at some examples of student work to get a range of performance of what students are actually able to do. When the first draft of a rubric is done, looking at examples of student work again will help item writers verify the dimensions and degree of detail that each rubric measures against.

3. Refine and consolidate your list of dimensions as needed.

This action is a natural followup to steps 1 and 2. It will serve as the base for formal scoring rubrics writing. For holistic scoring in mathematics and science, the number of dimensions for an item does not usually exceed two to three. Mostly it is one.

4. Write a definition of each of the dimensions.

It is important to define the boundary and meaning of each dimension or trait to be assessed because the definition will provide the frame of skills and knowledge to be measured.

5. Develop a scale or a checklist on which you will record the presence or absence of the attributes of a quality product/performance.

In this critical step, scoring rubric developers need to decide such characteristics as whether the scale is numerical, qualitative, or a combination of the two; maximum possible scores; and a description for each level of scored performance.

When the total number of performance levels or maximum score points is decided, a description of performance at each score level will substantiate the meaning of the assigned score. The description can be as short as a sentence and as long as a small paragraph. The following strategies can be considered when developing a performance scale:

Start with the highest level of performance. Describe an outstanding product/performance. What characterizes the best possible performance of the task? This description will serve as the anchor by defining the highest score point on your rating scale.

Then work on the lowest level of performance. Describe the worst possible product/performance. What characterizes the worst possible performance of the task? This will serve as a bottom line of performance on your rating scale.

When the ceiling and bottom levels of performance are defined, it will take less effort to describe characteristics of products/performances that fall at the intermediate points of the rating scale. Often these points will include some major or minor flaws that prevent the product/performance from receiving a higher rating.

Avoid ambiguity in wording of description. Keep the description succinct.

Avoid overlapping in characteristics of different levels of performance. This point is very important in that it will allow you to categorize students of different abilities to the appropriate level. It will also help scorers in the process of hand scoring.

6. Evaluate your rubric using the following questions.

Does the rubric relate to the outcome(s) being measured? Does it address anything extraneous?

Does the rubric cover important dimensions of student performance by standards?

Do the criteria reflect current conceptions of "excellence" in the standards?

Are the levels of performance well defined?

Is there a clear basis for assigning a score at each scale point?

Can the rubric be applied consistently by different scorers?

Can teachers, students and parents understand the rubric?

Is the rubric developmentally appropriate?

If it is a general rubric, can it be applied to a variety of tasks?

Is the rubric fair and free from bias?

Is the rubric useful, feasible, manageable, and practical?

7. Peer review and pilot test the rubrics on actual samples of student work.

Always have your peers review the rubrics together with the item to make sure that they cover what was intended. Piloting rubrics will help establish that the rubric is practical to use and that scorers can generally agree on what scores they would assign to a given piece of student work. When pilot testing a rubric, it is best to use examples of work that span the entire continuum from very poor to very good on a scale.

8. Revise the rubric and try it out again.

It is unusual to get everything right the first time, so be prepared to revise a rubric. The analysis of the pilot test result and the criteria stated above can serve as the guides.

9. Share the sample rubric with teachers, students, and parents.

This may entail rewording the rubrics in less technical language. Give teachers and students a clear target to aim for by letting them know what constitutes quality work that meets the standards. Training teachers to use the rubric to score student work can be a powerful instructional tool. Sharing the rubric with students and parents will help them understand what is expected from students.

However, when test items are secure materials, sharing item-specific rubrics will disclose the items. A solution to this dilemma is to provide teachers some exemplars of items with rubrics so that they can have an idea of assessment expectations.

4.3 Tips on Scoring Rubric Development

Perhaps the most common challenge is avoiding unclear language. Take the phrase "creative solution" in a rubric for a science investigation, for example. The word creative is difficult to define. If a rubric is to teach as well as evaluate, such terms must be defined for students. One possible approach is to discuss what the term "creative solution* means. Another is to actually list ways in which students could meet the criterion, thereby providing valuable information to students on how to begin to develop a solution and avoiding the need to define elusive terms.

A second challenge in rubric development is avoiding unnecessarily negative language. Again, use the "creative solution" as example. The rubric should avoid words like boring by describing what was done during a so-so solution and implicitly comparing it with the highest level of quality. Thus, students know exactly what their problem is and how they can do better next time, not just that their solution lacked fresh ideas.

Articulating gradations of quality is often a challenge. It helps if you spend adequate time thinking about criteria and how best to combine them before going on to define the levels of quality. One helpful approach is trying them out as described in step 7 above before the actual operation application.

4.4 Checklist of Scoring Rubric Development

This checklist, which summarizes the procedures of rubric construction, will help you verify each step in the process.

Procedures	Check Point
How many dimensions should we assess in an item?
Have we looked at actual examples of student work yet?
Have we refined and consolidated the list of dimensions?
Have we written a definition for each dimension?
Have we developed a scale of score points for the item?
Compare a scoring rubric with the evaluation questions. Have we achieved all or most of the criteria?
Have we had our peers reviewing these rubrics?
Have we piloted these rubrics on actual samples of student work?
Did we revise and try the rubrics again?
Have we shared the sample rubric with teachers, students, and parents?

5. Sample Sources of Additional Information for Scoring Rubrics Development

With the increasing use of scoring rubrics in various assessments and evaluations, more and more scoring rubrics are developed to serve different purposes. Much information on this topic has been published on line and off line. Below are sample sources available for scoring rubric developers as references. They contain many examples of scoring rubrics for mathematics and science items.

Paper