research and design methods

Experimental Design 1

Running Head: EXPERIMENTAL DESIGN

Experimental Design and Some Threats to

Experimental Validity: A Primer

Susan Skidmore

Texas A&M University

Paper presented at the annual meeting of the Southwest Educational

Research Association, New Orleans, Louisiana, February 6, 2008.

Experimental Design 2

Abstract

Experimental designs are distinguished as the best method to respond to

questions involving causality. The purpose of the present paper is to explicate

the logic of experimental design and why it is so vital to questions that demand

causal conclusions. In addition, types of internal and external validity threats are

discussed. To emphasize the current interest in experimental designs, Evidence-

Based Practices (EBP) in medicine, psychology and education are highlighted.

Finally, cautionary statements regarding experimental designs are elucidated

with examples from the literature.

Experimental Design 3

The No Child Left Behind Act (NCLB) demands “scientifically based

research” as the basis for awarding many grants in education (2001).

Specifically, the 107th Congress (2001) delineated scientifically-based research

as that which “is evaluated using experimental or quasi-experimental designs”.

Recognizing the increased interest and demand for scientifically-based research

in education policy and practice, the National Research Council released the

publication, Scientific Research in Education (Shavelson & Towne, 2002) a year

after the implementation of NCLB. Almost $5 billion have been channeled to

programs that provide scientifically-based evidence of effective instruction, such

as the Reading First Program (U. S. Department of Education, 2007). With

multiple methods available to education researchers, why does the U. S.

government show partiality to one particular method? The purpose of the

present paper is to explicate the logic of experimental design and why it is so

vital to questions that demand causal conclusions. In addition, types of internal

and external validity threats are discussed. To emphasize the current interest in

experimental designs, Evidence-Based Practices (EBP) in medicine, psychology

and education are highlighted. Finally, cautionary statements regarding

experimental designs are elucidated with examples from the literature.

Experimental Design

An experiment is “that portion of research in which variables are

manipulated and their effects upon other variables observed” (Campbell &

Stanley, 1963, p. 171). Or stated another way, experiments are concerned with

an independent variable (IV) that causes or predicts the outcome of the

Experimental Design 4

dependent variable (DV). Ideally, all other variables are eliminated, controlled or

distributed in such a way that a conclusion that the IV caused the DV is validly

justified.

Figure 1. Diagram of an experiment.

In Figure 1 above you can see that there are two groups. One group

receives some sort of manipulation that is thought (theoretically or from previous

research) to have an impact on the DV. This is known as the experimental group

because participants in this group receive some type of treatment that is

presumed to impact the DV. The other group, which does not receive a treatment

or instead receives some type of alternative treatment, provides the result of

what would have happened without experimental intervention (manipulation of

the IV).

So how do you determine whether participants will be in the control group

or the experimental group? The answer to this question is one of the

characteristics that underlie the strength of true experimental designs. True

experiments must have three essential characteristics: random assignment to

Outcome measured as DV

No manipulation or alternate manipulation of IV (treatment

or intervention)

Control Group

Manipulation of IV (treatment or intervention)

Experimental Group

Experimental Design 5

groups, an intervention given to at least one group and an alternate or no

intervention for at least one other group, and a comparison of group

performances on some post-intervention measurement (Gall, Gall, & Borg,

2005).

Participants in a true experimental design are randomly allocated to either

the control group or the experimental group. A caution is necessary here.

Random assignment is not equivalent to random sampling. Random sampling

determines who will be in the study, while random assignment determines in

which groups participants will be. Random assignment makes “samples

randomly similar to each other, whereas random sampling makes a sample

similar to a population” (Shadish, Cook, & Campbell, 2002, p. 248, emphasis in

original). Nonetheless, random assignment is extremely important. By randomly

assigning participants (or groups of participants) to either the experimental or

control group, each participant (or groups of participants) is as likely to be

assigned to one group as to the other (Gall et al., 2005). In other words, by giving

each participant an equal probability of being a member of each group, random

assignment equates the groups on all other factors, except for the intervention

that is being implemented, thereby ensuring that the experiment will produce

“unbiased estimates of the average treatment effect” (Rosenbaum, 1995, p. 37).

To be clear, the term “unbiased estimates” describes the fact that any observed

effect differences between the study results and the “true” population are due to

chance (Shadish et al., 2002).

Experimental Design 6

This equality of groups assertion is based on the construction of infinite

number of random assignments of participants (or groups of participants) to

treatment groups in the study and not to the single random assignment in the

particular study (Shadish et al., 2002). Thankfully, researchers do not have to

conduct an infinite number of random assignments in an infinite number of

studies for this assumption to hold. The equality of groups‟ assumption is

supported in studies with large sample sizes, but not in studies with very small

sample sizes. This is true due to the law of large numbers. As Boger (2005)

explained, “If larger and larger samples are successively drawn from a population

and a running average calculated after each sample has been drawn, the

sequence of averages will converge to the mean, µ, of the population” (p. 175). If

the reader is interested in exploring this concept further, the reader is directed to

George Boger‟s article that details how to create a spreadsheet simulation of the

law of large numbers. In addition, a medical example of this is found in

Observational Studies (Rosenbaum, 1995, pp. 13-15).

To consider the case of small sample size, let us suppose that I have a

sample of 10 graduate students that I am going to randomly assign to one of two

treatment groups. The experimental group will have regularly scheduled graduate

advisor meetings to monitor students‟ educational progress. The control group

will not have regularly scheduled graduate advisor meetings. Just to see what

happens, I choose to do several iterations of this random assignment process. Of

course, I discover that the identity of the members in the groups across iterations

is wildly different.

Experimental Design 7

Recognizing that most people are outliers on at least some variables

(Thompson, 2006), there may be some observed differences that are due simply

to the variable characteristics of the members of the treatment groups. For

example, let‟s say that six of the ten graduate students are chronic

procrastinators, and might benefit greatly from regular scheduled visits with a

graduate advisor, while four of the ten graduate students are intrinsically

motivated and tend to experience increased anxiety with frequent graduate

advisor inquiries. If the random assignment process distributes these six

procrastinator graduate students equally among the two groups, a bias due to

this characteristic will not evidence itself in the results. If instead, due to chance

all four intrinsically motivated students end up in the experimental group, the

results of the study may not be the same had the groups been more evenly

distributed. Ridiculously small sample sizes, therefore would result in more

pronounced differences between the groups that are not due to treatment effects,

but instead are due to the variable characteristics of the members in the groups.

If instead I have a sample of 10,000 graduate students that that I am going

to randomly assign to one of two treatment groups, the law of large numbers

works for me. As explained by Thompson et al. (2005), “The beauty of true

experiments is that the law of large numbers creates preintervention group

equivalencies on all variables, even variables that we do not realize are essential

to control” (p. 183). While there is still not identical membership across treatment

groups, and I still expect that the observed differences between the control group

and the experimental group are going to be due to any possible treatment effects

Experimental Design 8

and to the error associated with the random assignment process, the expectation

of equality of groups is nevertheless reasonably approximated. In other words, I

expect the ratio of procrastinators to intrinsically motivated students to be

approximately the same across the two treatment groups. In fact, I expect

proportions of variables I am not even aware of to be the same, on average,

across treatment groups!

The larger sample size has greatly decreased the error due to chance

associated with the random assignment process. As you can see in Figure 2,

even if both of the sample studies produce identical treatment effects, the results

are not equally valid. The majority of the effect observed in the small sample

size study is actually due to error associated with the random assignment

process and not a result of the treatment. This effect due to error is greatly

reduced in the large sample size study.

Figure…