Communication Center  Conference  Projects Share  Reports from the Field Resources  Library  LSC Project Websites  NSF Program Notes
 How to Use this site    Contact us  LSC-Net: Local Systemic Change Network
Educational Reform & Policy

Professional Development

Teaching and Learning

LSC Papers and Reports

Cross Site Reports

LSC Case Study Reports

Papers / Presentations authored by LSC Members

About LSC Initiatives and their impact

Bibliographies

Paper

  New!     

Guidelines for Evaluating Student Outcomes

author: WESTAT
submitter: Joy Frechtling, WESTAT
published: 03/23/2001
posted to site: 03/23/2001

VII. Strengthening Your Study's Internal Validity

Studies that are assessing the effect of an educational program must give strong consideration to internal validity. Internal validity means that you have evidence that your program, and not other factors, was the cause of the outcomes. Such alternative explanations are known as threats to internal validity. A good study design helps to minimize these factors, but even the best studies have potential threats to internal validity. Thus it is the responsibility of the researcher to examine the study for any threats and to determine the likelihood that the threat, and not the treatment, was responsible for any differences in the outcomes.

In studies using comparison groups, the largest potential threat to internal validity is related to sample selection-that the treatment and control groups are selected in different ways, resulting in bias. For example, in districts with high teacher turnover, untreated teachers might tend to be new to the profession while treated teachers would tend to be more experienced. Thus, there would be an inherent bias in a comparison of these two groups of teachers. For this reason, it is critical that you build into your research design some method to examine the initial equivalence of your treatment and control groups.

The following example, while exaggerated, illustrates the threat of selection bias. Imagine we wanted to investigate the effect of taking calculus on mathematics achievement, with the hypothesis that students who take calculus will be better prepared in mathematics than students not taking calculus. To do this, we examine students' scores on the mathematics portion of the SAT relative to their score on the PSAT, comparing those who took calculus to those who did not. The results of the analysis show that those students taking calculus have much greater gains than the students not taking calculus. While taking calculus may lead to higher gains between the PSAT and the SAT, this study does not justify enrolling everyone in calculus in an attempt to raise mathematics achievement. Rather, it is very likely that the calculus students would have higher gains than the non-calculus students even if they hadn't taken the course, since the students who elect to take calculus tend to have a particularly high capacity to learn mathematics.

In their seminal work on research design, Campbell and Stanley identify eight threats to internal validity that could interact with the selection of your treatment and comparison groups. Of those, the four threats you are most likely to encounter in research on the effects of the LSC are:

  • History - students in one of your groups have an experience other than what your teacher enhancement program provides. For example, an exciting new, museum-based science education program for elementary children advertised in LSC professional development sessions might be attended primarily by students of treated teachers. That program rather than the LSC treatment could be the primary reason for the difference in test scores between the groups.

  • Maturation - a change occurred simply as a result of the passage of time. For example, if the students in your treatment group are more advanced before the LSC treatment is implemented, they might develop at a faster rate than students who start off at a lower level.

  • Statistical regression - where those who score very high and very low initially have a tendency to score closer to the mean. Therefore, you want to make sure that students at the extreme ends of your measurement scale are not concentrated in one of your groups.

  • Experimental mortality - there is considerable attrition in the study, particularly if participants in the treatment and control group drop out at different rates.

    Without random assignment of teachers to treatment groups and students to teachers, neither of which is typically feasible, there is no research design that can totally rule out these threats. However, there are three common methods researchers use to help reduce the possibility that these threats to validity are responsible for the study's outcome:

  • By argument - This is the easiest action but is also the weakest. If you have knowledge about the students and teachers in the groups, or about how they were selected, you can make the case that there was or was not a selection bias.

  • By measurement or observation - Sometimes you can measure the threat in order to subtract it out. For example, if you find that most of the students in your treatment group also are participating in an after school program, you could compare their test scores to other students in the after school program-not in LSC classrooms-to measure the effect of the after school program.

  • By analysis - Some threats can be addressed by advanced statistical analysis. Examples include computation to adjust for regression effect and analysis of variance (ANOVA) to adjust for mortality. Regardless of how you choose to examine the possibility of selection bias in your study, you need to address this issue fully in your report.

 to previous page   next page