- In Part I, we looked at how standards get selected for item development.
- Then, in Part II, we watched with bated breath while items were born.
- In the enthralling final chapter, also known as Part III, we examined the ways items get poked and prodded before they are placed into a test bank for use.
- But wait! There's more. A bonus post on how a test gets put together, including all the statistical goodies.
Imagine, if you will, a continuum of performance:
This could be any size scale that you like. I seem to remember Rick Stiggins saying that research had shown that we lose the ability to discriminate amongst anything more than 7 different categories. Most rubrics tend to have 3 or 4 levels, so we'll choose a 4-point scale for our discussion here:
Now, imagine a set of student work---perhaps they are explanations of how to solve a math problem or detail an experimental design or maybe it's even an essay about Shakespeare. Whatever. Unless there's been some looky-looing, no two student performances will be identical. Sure, all of the kids could write answers that score well, but that doesn't mean that they use the exact same words, organization, or other features you are looking for. They will be spread out across the "range" of the scale. Your job as a rangefinder is to determine two things.
First, you want to look at the work and decide what is acceptable within each score point. I know, I know. From 1 to 2 looks like an easy jump. But it's not.
There's a lot of space between points. What is the least best "1" a student could earn...so barely a "1" that it's killing you that a piece of work even gets that much attention---what would it look like? What about a "middle" and a "high" 1? It is astounding how a one word change in your scoring tool (or even a one word swap in an answer) makes all the difference in how an artifact of student learning performs against the standards.
And then, there are these questions to answer:
What is the difference between a "high 1" and a "low 2"? How do you officially draw the line between score points?
Sometimes, sets of papers illustrating each point are put together prior to rangefinding. Other times---especially if it is a brand-new type of item or assessment---a rangefinding committee might look at groups of papers, order them from best to not-so-best, then determine the score points and refine the scoring tool. My group? I pulled a few samples I thought were interesting. We used those to calibrate amongst one another and tweak the scoring guide---then we applied it to every single sample that was turned in, sifting out exemplars and finalizing the scoring guide.
There can be a variety of outcomes of these conversations. Sometimes, you develop some additional scoring rules to clarify how the rubric is applied. You will want to find samples to use as exemplars. These are sets of papers which clearly illustrate how to apply the rubric (or in some cases, represent "tricky" papers that get people into the nuances of scoring).
I can hear some of you. You're saying, "Who the heck cares about this crap?" Most of us educators don't---unless the test is big enough. We just don't have the resources to deal with this for every item that will cross a student's desk. At a state level, where you might be scoring tens of thousands of student responses (or a national level, like the SAT or AP), rangefinding becomes important for validity reasons. And, in fact, the legislation which required the development of the assessments I'm working on also mandates that they be able to be "consistently scored" by educators. We must be able to have a water-tight scoring tool that a fourth-grade teacher (for example) in any classroom in the state can pick up and use---and get the same scores as any other teacher looking at the same work. But to do that, we first have to see what kids do with a prompt. We have to set them loose upon an assessment, then reel in the products and see how they look along the scoring tool we developed. It is inspiring to see all the creative things kids do...all the unanticipated ways they interpreted directions...or how many papers you end up shaking a fist at (or cheering for).
Get thee to a rangefinding event. You'll never look at classroom work the same way again---and for all sorts of wonderful reasons.