Responses to Questions from LSC Conference Participants

author:	Joy Frechtling
submitter:	Joy Frechtling
published:	02/09/2001
posted to site:	02/09/2001

Responses to Questions from LSC Conference Participants

Submitted by Joy Frechtling of WESTAT

How do you justify treating "students" as the unit of analysis rather than "teachers" as the unit of analysis in a project designed to treat teachers?
This question identifies the classic mixed-level analysis problem inherent in a reform program like LSC. That is, projects are providing professional development to teachers, while ultimately needing to isolate the effects of their instruction on student performance. Indeed, the LSC projects are expected to have an impact on students -ostensibly improving their achievement in mathematics and science. Although the training is targeted for teachers, the expectation is that teachers will evidence improvement in their instruction, thereby benefiting the students. The teachers are the vehicles for the instruction, which is an intervening variable. It is therefore important to develop measures of implementation as well as measures of student achievement.
How can we get an accurate (valid) fix on the level/quality of curriculum implementation in a school district, school building, or individual classroom, so that effects can be accurately addressed?
This requires some good, old-fashioned hard work and collaboration. Begin by having a group of expert teachers who know the curriculum well brainstorm about what it means to implement this curriculum. Write down or record everything they say. Then have them go back through their output and identify observable or measurable indicators of implementation. Use this as the basis for developing a checklist of implementation indicators. Take that checklist to a separate, but equally skilled group of teachers and have them critique it. Use their critique as the basis for a pilot draft of a checklist that could be used by an observer in a classroom. Try it out. Does it provide you with the information you need?
The HRI Classroom Observation Protocol also provides an excellent extant model for determining quality of curriculum implementation. You could then conduct comparisons by selecting teachers who score on the extremes on the rating scale.
How can we get an accurate (valid) fix on the quality of the professional development experience when there are multiple workshops being conducted by multiple staffs?
You might consider a survey of participating teachers as an additional source of information. In addition, you may wish to develop a workshop quality checklist, based on the principles of adult learning theory. You may need to do a little background research to come up with a workable draft. Vet it to some respectable PD folks that you know. After getting their feedback, pilot test it by using it to grade the various PD activities. You could also use a Likert-type scale for some of the items (e.g., "To what extent does this workshop ...?") The principles of good adult instruction are well documented. Try the Association for Supervision and Curriculum Development as a source(ASCD) (http://www.ascd.org/).
To really answer the validity issue, you will need an independent external measure of the same variables to corroborate your findings. The HRI Classroom Observation Protocol might be a good source for that. The validity question is: To what extent do the two independent measures of PD quality yield similar results? Correlational evidence in the .75 range or higher should be good enough to establish validity if you have evidence that your measure is addressing the correct (most important) dimensions of adult instruction.
What if your "baseline" data occurs after students have already been in the program?
This is a difficult problem to overcome without a comparison or control group. Short of having a time machine at one's disposal, there is not much that can be done about this situation. However, we offer below several somewhat different but complementary suggestions that can help make the best of the situation.
- One could compare the students of teachers who had fully implemented the reform approach to the students of teachers who were minimal implementers.
- One could compare students who had teachers with extensive training to students with teachers who had minimal training.
- One could conduct a small study taking advantage of naturally occurring circumstances, such as teachers who were unable to participate in the training for some reason. The students of these untrained teachers could be compared to the students of trained teachers.
- One could identify schools similar to those in which your program exists that are using the same achievement measures and attempt to obtain data from them to see if there is a difference between achievement in their schools and your project schools. Be sure, however, that your measures are aligned with the curriculum, which may or may not be the case in your comparison schools.
- One could look at achievement relative to the norm group as a function of the length of time a student has been exposed to teachers involved in the project. You can either block on years of exposure or use regression analysis with years of exposure as a continuous covariable.
- If your district has high student mobility, another possibility is to do a sub-study using students that are new to the district, and collect baseline data on them.
- Finally, one might search for another test that can act as a `control' for prior achievement. You might even consider using a test in another subject, which, although not perfect, is better than nothing.
While committed to trying to find outcome data to inform the LSC effects, the notion of isolating an "LSC effect" in an effort that involves systemic reform seems daunting.
The statement is somewhat unclear, since LSC itself is a systemic program. But the questioner is correct that there are other factors beyond the direct professional development opportunities that projects are providing to teachers that can influence their instruction and the achievement of their students. What is needed are mechanisms for ruling out the possible effects of those other factors. If you are in a single district with little demographic variation among schools, most of those factors are not likely to vary a great deal from school to school or classroom to classroom. That is, most of the non-LSC systemic factors are likely to influence all participants pretty much equally. Therefore, although it would be difficult (if not impossible) to capture this information quantitatively, one could describe and discuss these factors qualitatively to provide a context within which to interpret one's student impact analyses. There are ways to do this using prior years' performance of similar groups as a comparison, if the district's data system can support them. Suggestions for doing so can be found in the section on control/comparison groups in the "Guidelines" document provided at the North Carolina workshop.

We have a fourth-grade assessment that we can use that aligns well with the reform mathematics curriculum, but how do we link the results to the fourth-grade teachers, or to all the teachers the students have had up to fourth grade?

There are really two questions here. First, linking fourth grade achievement results to fourth grade teachers can be done by simply comparing the performance of the fourth grade students in the LSC teachers' classes with those in non-LSC teachers' classes - that is, if one has non-LSC teacher classes available for this purpose.

The second question asks how to look at the "cumulative" effect of students' sequences of teachers in previous grades, as well as their fourth grade experience. This requires a more complex analysis, which would require tracking down the sequence of teachers that each student in the sample had. (Do not underestimate the time involved in doing this.) For example, teachers could be characterized as "LSC trained" or "not LSC trained" at the time that the student had that teacher. Thus, a few entries in a data set might look like this:

Student ID

2^nd Grade Teacher

3^rd Grade Teacher

4^th Grade Teacher

1001

Not LSC trained

LSC trained

LSC trained

1002

LSC trained

Not LSC trained

Not LSC trained

1003

LSC trained

LSC trained

LSC trained

1004

Not LSC trained

Not LSC trained

Not LSC trained

The students in the example data set have a variety of exposures to LSC trained teachers over time. Student 1003 has had a three-year diet of LSC trained teachers, while student 1004 has not encountered any LSC trained teachers. Using these dummy variables in a regression model, one can produce effect sizes for the teacher in each year, and also produce a cumulative effect on students with different sequences of teachers (e.g., a trained-trained-trained sequence compared to a not-not-not sequence, etc.).

The above response assumes that one has scores for individual students, and can identify which students obtained which scores. If you are not able to match test scores to students, then you may need to work at the classroom level. If your data are only available for a school as a whole, this could be problematic, unless whole schools are brought into the project at a time.

If you are not able to determine which mathematics teacher students had, you will not be able to do this type of analysis, unless whole schools are brought into the project at various points of time. That would allow you to compare schools with trained teachers to schools with teachers who have not been trained, unless whole grade levels are trained at one time.

How can we determine if the state test is aligned with the purpose of the reform curriculum?
Districts will need to determine for themselves whether or not the state test is aligned with their reform efforts. They will need to obtain the information about the state test from those responsible for developing the test. District staff will need to review this information, including sample items, if available, and determine if there is alignment with reform efforts.
If the state test is not aligned with the reform curriculum, then it is unlikely that it will be sensitive to the instruction being offered. In such an instance, one should try to identify and get permission to administer a measure that is well aligned. Alignment is the basic underpinning of validity. Alternatively, if some of the items in the test align well and others don't you can ask your scoring service if it is possible to have those items scored separately. Expect to pay extra for this service.
In a multi-district project, how can a study design use different national standardized test results as indicative of project progress on student achievement?
There are four types of options to consider in such instances. First, although it is not a perfect solution, the normal curve equivalent (NCE) score, available as an option with most nationally standardized tests, will allow for making comparisons where you don't have the same measure. Be careful, though, to make sure that the test(s) align well with the curriculum as it is being taught in each district. Second, it is possible that some of the districts use other similar metrics (percent passing, percent at different proficiency levels, grade equivalency) that will enable one to conduct some cross-district analyses. Third, in some instances the large test publishers have already done the technical work to equate different tests into a single measure, and this is well worth checking out. Fourth, if the tests have not been equated, one should probably consider doing separate studies for each district, and then examining the pattern of effects across the districts to draw conclusions about the overall impact of the project.
Is it enough to just assess grades four and eight to generalize about the whole project?
Evaluations often use a subset of grades to represent the effects of an intervention on an entire project. In fact, many states test in only a few key grades and use the results to represent the progress of the entire system. In an ideal world, one would want to collect data for all grades involved in an LSC project. However, time and money issues can prevent one from doing the ideal. Using only grades 4 and 8 may be the best that can be done. However, one would want to be able to explain why factors other than LSC were not the possible reasons for changes in scores. Determining if it is reasonable to use the scores from just these two grade levels would also depend on what grade levels are a part of the project and how teachers are brought into it (e.g., one grade level at a time or randomly across all grade levels).
What happens when a standardized test (e.g., a statewide, grade-specific mathematics test) is revamped? Doesn't this cause problems with baseline data and ability to show trends over the years?
When a test of this nature is changed, information is typically provided with the new test to show one how to "cross-walk" from the old to the new to make valid comparisons. However, if this has not been done, it is common to use one test as the outcome and a different test as a control for prior achievement if you are using regression analysis.
What are some good measures of "teacher quality"?
Teacher quality is a complex phenomenon, and there is little consensus on what it is or how to measure it. For example, definitions range from those that focus on what should be taught and how knowledge should be imparted to the kinds of knowledge and training teachers should possess. There are, however, two broad elements that most observers agree characterize teacher quality: (1) teacher preparation and qualifications, and (2) teaching practices. The first refers to preservice learning (e.g., degrees held, teaching certification), teaching assignment (certification in subject(s) taught, teaching in-field), and continued learning (e.g., availability/extent of induction activities, mentoring, professional development activities, and collaboration with other teachers). The second refers to the actual behaviors and practices that teachers exhibit in their classrooms. Of course, these two elements of teacher quality are not independent; excellent teacher preparation and qualifications should lead to exemplary teaching behaviors and practices.
Jere Brophy and Tom Good have been conducting research in this area for about 35 years, and have published numerous articles on what constitutes good teaching. Brophy recently published an article synthesizing the characteristics of quality instruction. You may wish to conduct an on-line Web or ERIC search on these authors and this topic for ideas.
Another good source is Teacher Quality: A Report on the Preparation and Qualifications of Public School Teachers (Publication Code NCES 1999-080)₁. This report, based on a national sample of teachers, found that only 28 percent of them felt very well prepared to use student performance assessment techniques; 41 percent reported feeling very well prepared to implement new teaching methods; and 36 percent reported feeling very well prepared to implement state or district curriculum and performance standards. If you use a survey as a part of your evaluation, these are some items that you may wish to include, since national comparative data exists.
What timeline are we using for statistical results?
Timelines for evaluations should typically correspond to project timelines - i.e., pretest data should be collected just prior to or at the beginning of the project; posttest data at the end. However, not all of the LSC projects will have an opportunity to collect "true" pretest data, since some of the earlier projects have been operational for a period of time. Basically, we are now in a planning period for data that will be collected in the 2001-2002 school year, and in one or several subsequent years through the duration of the project, depending upon the evaluation design. The value of collecting data in the intervening years, rather than just at the conclusion of the project, is to obtain indications regarding whether changes are being made in the right direction. However, studies have shown that it can take two to three years for interventions to have substantial impact, so do not expect large changes early in the project.
¹ This report was prepared by Westat staff, and can be purchased the U.S. Government Printing Office Superintendent of Documents, Mail Stop: SSOP, Washington, DC 20402-9328. Alternatively, it can be viewed and/or downloaded from the following Web site: http://nces.ed.gov/pubs99/1999080.htm

Conference Material

Responses to Questions from LSC Conference Participants

Submitted by Joy Frechtling of WESTAT