Threats to validity of Research Design
The books by Campbell and Stanley (1963), Cook and Campbell (1979), and Shadish, Cook, and Campbell, (2002) are considered seminal works in the field of experimental design. The following write-up is based upon their books with insertion of my examples and updated information.
Problem and Background
Experimental method and essay-writingCampbell and Stanley point out that adherence to experimentation dominated the field of education through the 1920s (Thorndike era) but that this gave way to great pessimism and rejection by the late 1930s. However, it should be noted that a departure from experimentation to essay writing (Thorndike to Gestalt Psychology) occurred most often by people already adept at the experimental tradition. Therefore, we must be aware of the past so that we avoid total rejection of any method, and instead take a serious look at the effectiveness and applicability of current and past methods without making false assumptions.
ReplicationLack of replicability is one of the major challenges in social science research. After replicating one hundred psychological studies, Open Science Collaboration (OSC) (2015) found that a large portion of the replicated results were not as strong as the original reports in terms of significance (p values) and magnitude (effect sizes). Specifically, 97% of the original studies reported significant results (p < .05), but only 36% of the replicated studies yielded significant findings. Further, the average effect size of the replicated studies was only half of the initial studies (Mr = 0.197 vs. Mr = 0.403).
Nonetheless, the preceding problem is not surprising because usually the initial analysis tends to overfit the model to the data. Needless to say, a theory remains inconclusive when replicated results are unstable and inconsistent. Multiple experimentation is more typical of science than a one-shot experiment! Experiments really need replication and cross-validation at various times and conditions before the theory can be confirmed with confidence. In the past the only option is to replicate the same experiments over and over. Nevertheless, today the researcher is allowed to virtually repeat the study using one single sample by resampling. Specifically, many data mining software applications have the features of cross-validation and bootstrap forest. In cross-validation the data set is partitioned into many subsets and then multiple analyses are run. In each run the model is refined by previous "training" and thus the end result is considered a product of replicated experiments. In a similar vein, bootstrap forest randomly selects observations from the data and replicate the analysis many times. The conclusion is based on the convergence of these diverse results.
Cumulative wisdomAn interesting point made is that experiments which produce or support opposing theories against each other probably will not have clear cut outcomes. In fact, different researchers might observe something valid that represents a part of the truth. Adopting experimentation in education should not imply advocating a position incompatible with traditional wisdom. Rather, experimentation may be seen as a process of refining or enhancing this wisdom. Therefore, cumulative wisdom and scientific findings need not be opposing forces.
Factors Jeopardizing Internal and External Validity
Please note that validity discussed here is in the context of experimental design, not in the context of measurement.
- Internal validity refers specifically to whether an experimental treatment/condition makes a difference to the outcome or not, and whether there is sufficient evidence to substantiate the claim.
- External validity refers to the generalizibility of the treatment/condition outcomes across various settings.
Efficacy and effectiveness
In medical studies, usually efficacy studies in experimental settings are conducted to address the issue of internal validity whereas effectiveness studies in naturalistic settings (the "real" world) are employed to examine the external validity of the claim. Usually patients in experimentation are highly selected whereas patients in the real world are not. For example, subjects in clinical trials usually have just the illness under study. Patients who have multiple health conditions are excluded from the study because those uncontrolled variables could muddle the research results. However, in the real world it is not unusual that patients have multiple illnesses. As a result, a drug that could work well in a lab setting may fail in the real world. Thus, medical researchers must take both internal validity and external validity into account while testing the goodness of a treatment. On one hand, efficacy studies aim to answer this question: Does the treatment work in a close experimental environment? On the other hand, effectiveness studies attempt to address a different issue: Does the treatment work in the real-life situation? (Pittler & White, 1999).
Interestingly enough, the US drug approval and monitoring processes seem to compartmentalize efficacy and effectiveness. The US Food and Drug administration (FDA) is responsible for approving drugs before they are released to the market. Rigorous experiments and hard data are required to gain the FDA's approval. But after the drugs are on the market, it takes other agencies to monitor the effectiveness of the drugs. Contrary to the popular belief, FDA has no authority to recall unsafe drugs. Rather, FDA could suggest a voluntarily recall only. Several drugs that had been approved by FDA before were re-called from the market later (e.g. the anti-diabetic drug Avandia and pain-reliever Vioxx). This discrepancy between the results yielded from lab tests and the real world led to an investigation by the Institute of Medicine (IOM). To close the gap between internal and external validity, the IOM committee recommended that the FDA should take proactive steps to monitor the safety of the approved drugs throughout their time on the market (Ramsey, 2012).
In recent years, the concepts of efficacy and effectiveness is also utilized by educational researchers (Schneider, Carnoy, Kilpatrick, Schmidt, & Shavelson, 2007). Indeed, there is a similar concept to "effectiveness" in educational research: ecological validity. Educational researchers realize that it is impossible for teacher to blocking all interferences by closing the door. Contrary to the experimental ideal that a good study is a "noiseless" one, a study is regarded as ecologically valid if it captures teachers' everyday experience as they are bombarded with numerous things (Black & Wiliam, 1998; Valli & Buese, 2007)
Which one is more important?
Whether internal validity or external validity is more important has been a controversial topic in the research community. Campbell and Stanley (1963) stated that although ideally speaking a good study should be strong in both types of validity, internal validity is indispensable and essential while the question of external validity is never completely answerable. External validity is concerned with whether the same result of a given study can be observed in other situations. Like inductive inference, this question will never be conclusive. No matter how many new cases concur with the previous finding, it takes just one counter-example to weaken the external validity of the study. In other words, Campbell and Stanley's statement implies that internal validity is more important than external validity. Cronbach (1982) is opposed to this notion. He argued that if a treatment is expected to be relevant to a broader context, the causal inference must go beyond the specific conditions. If the study lacks generalizability, then the so-called internally valid causal effect is useless to decision makers. In a similar vein, Briggs (2008) asserted that although statistical conclusion validity and internal validity together affirms a causal effect, construct validity and external validity are still necessary for generalizing a causal conclusion to other settings.
Factors which jeopardize internal validity
- History: the specific events which occur between the first and second measurement. The 2008 economic recession is a good example. Due to the budget crisis many schools cut back resources. A treatment implemented around that period of time may be affected by a lack of supporting infrastructure.
- Maturation: the processes within subjects which act as a function of the passage of time. i.e. if the project lasts a long period of time, most participants may improve their performance regardless of treatment.
- Testing: the effects of taking a test on the outcomes of taking a second test. In other words, the pretest becomes a form of "treatment."
- Instrumentation: the changes in the instrument, observers, or scorers which may produce changes in outcomes.
- Statistical regression: It is also known as regression towards the mean. This phenomenon was first discovered by British statistician Francis Galton in the 19th century. Contrary to popular belief, Galton found that tall parents do not necessary have tall children. If the parent is extremely tall, the offspring tend to closer to the average. This pattern was re-discovered by Jewish-American psychologist Daniel Kahneman (2011) in his study about why rebuking pilots cannot explain flight performance. In the context of research design, the threat of regression towards the mean is caused by the selection of subjects on the basis of extreme scores or characteristics. If there are forty poor students in the treatment program, it is likely that they will show some improvement after the treatment. However, if the students are extremely poor and thus are unresponsive to any treatment, then it is called the floor effect.
- Selection of subjects: the biases which may result in selection of comparison groups. Randomization (Random assignment) of group membership is a counter-attack against this threat. However, when the sample size is small, randomization may lead to Simpson Paradox, which has been discussed in an earlier lesson.
- Experimental mortality: the loss of subjects. For example, in a Web-based instruction project entitled Eruditio, it started with 161 subjects and only 95 of them completed the entire module. Those who stayed in the project all the way to end may be more motivated to learn and thus achieved higher performance. The hidden variable, intention to treat, might skew the result.
- Selection-maturation interaction: the selection of comparison groups and maturation interacting which may lead to confounding outcomes, and erroneous interpretation that the treatment caused the effect.
- John Henry effect and Hawthorne effect: John Henry was a worker who outperformed a machine under an experimental setting because he was aware that his performance was compared with that of a machine. The Hawthrone effect is similar to John Henery effect in the sense that the participants change their behaviors when they are aware of their role as research subjects. Between 1924 and 32 the Hawthorne Works sponsored a study to examine how lighting would influence productivity. Researchers concluded that workers improved their productivity because they were observed rather than better illumination. Hence, the Hawthorne effect is also known as the observer effect. However, recent research suggested that the evidence of the Hawthorne effect is scant (Paradis & Sutlin, 2017).
Factors which jeopardize external validity
- Reactive or interaction effect of testing: a pretest might increase or decrease a subject's sensitivity or responsiveness to the experimental variable. Indeed, the effect of pretest to subsequent tests has been empirically substantiated (Wilson & Putnam, 1982, Lana, 1959).
- Interaction effects of selection biases and the experimental variable
- Reactive effects of experimental arrangements: it is difficult to generalize to non-experimental settings if the effect was attributable to the experimental arrangement of the research.
- Multiple treatment interference: as multiple treatments are given to the same subjects, it is difficult to control for the effects of prior treatments.
Three Experimental Designs
To make things easier, the following will act as representations within particular designs:
- X: Treatment
- O: Observation or measurement
- R: Random assignment
The three experimental designs discussed in this section are:
The One Shot Case StudyThere is a single group and it is studied only once. A group is introduced to a treatment or condition and then observed for changes which are attributed to the treatment
The problems with this design are:
- A total lack of manipulation. Also, the scientific evidence is very weak in terms of making a comparison and recording contrasts.
- There is also a tendency to have the fallacy of misplaced precision, where the researcher engages in tedious collection of specific detail, careful observation, testing and etc., and misinterprets this as obtaining solid research. However, a detailed data collection procedure should not be equated with a good design. In the chapter on design, measurement, and analysis, these three components are clearly distinguished from each other.
- History, maturation, selection, mortality, and interaction of selection and the experimental variable are potential threats against the internal validity of this design.
One Group Pre-Posttest DesignThis is a presentation of a pretest, followed by a treatment, and then a posttest where the difference between O1 and O2 is explained by X:
O1 X O2
However, there exists threats to the validity of the above assertion:
- History: between O1 and O2 many events may have occurred apart from X to produce the differences in outcomes. The longer the time lapse between O1 and O2, the more likely history becomes a threat.
- Maturation: between O1 and O2 students may have grown older or internal states may have changed and therefore the differences obtained would be attributable to these changes as opposed to X. For example, if the US government does nothing to the economic depression starting from 2008 and let the crisis runs its course (this is what Mitt Romney said), ten years later the economy may still be improved. In this case, it is problematic to compare the economy in 2021 and that in 2011 to determine whether a particular policy is effective; rather, the right way is to compare the economy in 2021 with the overall (e.g. 2011 to 2021). In SPSS the default pairwise comparison is to contrast each measure with the final measure, but it may be misleading. In SAS the default contrast scheme is Deviation, in which each measure is compared to the grand mean of all measures (overall).
- Testing: the effect of giving the pretest itself may effect the outcomes of the second test (i.e., IQ tests taken a second time result in 3-5 point increase than those taking it the first time). In the social sciences, it has been known that the process of measuring may change that which is being measured: the reactive effect occurs when the testing process itself leads to the change in behavior rather than it being a passive record of behavior (reactivity: we want to use non-reactive measures when possible).
- Instrumentation: examples are in threats to validity above
- Statistical regression: or regression toward the mean. Time-reversed control analysis and direct examination for changes in population variability are proactive counter-measures against such misinterpretations of the result. If the researcher selects a very polarized sample consisting of extremely skillful and extremely poor students, the former group might either show no improvement (ceiling effect) or decrease their scores, and the latter might appear to show some improvement. Needless to say, this result is midleading, and to correct this type of misinterpretation, researchers may want to do a time-reversed (posttest-pretest) analysis to analyze the true treatment effects. Researchers may also exclude outliers from the analysis or to adjust the scores by winsorizing the means (pushing the outliers towards the center of the distribution).
- Others: History, maturation, testing, instrumentation interaction of testing and maturation, interaction of testing and the experimental variable and the interaction of selection and the experimental variable are also threats to validity for this design.
The Static Group ComparisonThis is a two group design, where one group is exposed to a treatment and the results are tested while a control group is not exposed to the treatment and similarly tested in order to compare the effects of treatment. Threats to validity include:
- Selection: groups selected may actually be disparate prior to any treatment.
- Mortality: the differences between O1 and O2 may be because of the drop-out rate of subjects from a specific experimental group, which would cause the groups to be unequal.
- Others: Interaction of selection and maturation and interaction of selection and the experimental variable.
Three True Experimental Designs
The next three designs discussed are the most strongly recommended designs:
The Pretest-Posttest Control Group DesignThis designs takes on this form:
This design controls for all of the seven threats to validity described in detail so far. An explanation of how this design controls for these threats is below.
- History: this is controlled in that the general history events which may have contributed to the O1 and O2 effects would also produce the O3 and O4 effects. However, this is true if and only if the experiment is run in a specific manner: the researcher may not test the treatment and control groups at different times and in vastly different settings as these differences may influence the results. Rather, the researcher must test the control and experimental groups concurrently. Intrasession history must also be taken into account. For example if the groups are tested at the same time, then different experimenters might be involved, and the differences between the experimenters may contribute to the effects.
In this case, a possible counter-measure is the randomization of experimental conditions, such as counter-balancing in terms of experimenter, time of day, week and etc.
- Maturation and testing: these are controlled in the sense that they are manifested equally in both treatment and control groups.
- Instrumentation: this is controlled where conditions control for intrasession history, especially where the same tests are used. However, when different raters, observers or interviewers are involved, this becomes a potential problem. If there are not enough raters or observers to be randomly assigned to different experimental conditions, the raters or observers must be blind to the purpose of the experiment.
- Regression: this is controlled by the mean differences regardless of the extremely of scores or characteristics, if the treatment and control groups are randomly assigned from the same extreme pool. If this occurs, both groups will regress similarly, regardless of treatment.
- Selection: this is controlled by randomization.
- Mortality: this was said to be controlled in this design. However, unless the mortality rate is equal in treatment and control groups, it is not possible to indicate with certainty that mortality did not contribute to the experiment results. Even when even mortality actually occurs, there remains a possibility of complex interactions which may make the effects drop-out rates differ between the two groups. Conditions between the two groups must remain similar: for example, if the treatment group must attend the treatment session, then the control group must also attend sessions where either no treatment occurs, or a "placebo" treatment occurs. However, even in this there remains possibilities of threats to validity. For example, even the presence of a "placebo" may contribute to an effect similar to the treatment, the placebo treatment must be somewhat believable and therefore may end up having similar results!
The factors described so far affect internal validity. These factors could produce changes, which may be interpreted as the result of the treatment. These are called main effects, which have been controlled in this design giving it internal validity.
However, in this design, there are threats to external validity (also called interaction effects because they involve the treatment and some other variable the interaction of which cause the threat to validity). It is important to note here that external validity or generalizability always turns out to involve extrapolation into a realm not represented in one's sample.
In contrast, internal validity are solvable by the logic of probability statistics, meaning that we can control for internal validity based on probability statistics within the experiment conducted. On the other hand, external validity or generalizability can not logically occur because we can't logically extrapolate to different settings. (Hume's truism that induction or generalization is never fully justified logically).
External threats include:
- Interaction of testing and X: because the interaction between taking a pretest and the treatment itself may effect the results of the experimental group, it is desirable to use a design which does not use a pretest.
- Interaction of selection and X: although selection is controlled for by randomly assigning subjects into experimental and control groups, there remains a possibility that the effects demonstrated hold true only for that population from which the experimental and control groups were selected. An example is a researcher trying to select schools to observe, however has been turned down by 9, and accepted by the 10th. The characteristics of the 10th school may be vastly different than the other 9, and therefore not representative of an average school. Therefore in any report, the researcher should describe the population studied as well as any populations which rejected the invitation.
- Reactive arrangements: this refers to the artificiality of the experimental setting and the subject's knowledge that he is participating in an experiment. This situation is unrepresentative of the school setting or any natural setting, and can seriously impact the experiment results. To remediate this problem, experiments should be incorporated as variants of the regular curricula, tests should be integrated into the normal testing routine, and treatment should be delivered by regular staff with individual students.
Research should be conducted in schools in this manner: ideas for research should originate with teachers or other school personnel. The designs for this research should be worked out with someone expert at research methodology, and the research itself carried out by those who came up with the research idea. Results should be analyzed by the expert, and then the final interpretation delivered by an intermediary.
Tests of significance for this design: although this design may be developed and conducted appropriately, statistical tests of significance are not always used appropriately.
- Wrong statistic in common use: many use a t-test by computing two ts, one for the pre-post difference in the experimental group and one for the pre-post difference of the control group. If the experimental t-test is statistically significant as opposed to the control group, the treatment is said to have an effect. However this does not take into consideration how "close" the t-test may really have been. A better procedure is to run a 2X2 ANOVA repeated measures, testing the pre-post difference as the within-subject factor, the group difference as the between-subject factor, and the interaction effect of both factors.
- Use of gain scores and covariance: the most used test is to compute pre-posttest gain scores for each group, and then to compute a t-test between the experimental and control groups on the gain scores. In addition, it is helpful to use randomized "blocking" or "leveling" on pretest scores because blocking can localize the within-subject variance, also known as the error variance. It is important to point out that gain scores are subject to the ceiling and floor effects. In the former the subjects start with a very high pretest score and in the latter the subjects have very poor pretest performance. In this case, analysis of covariance (ANCOVA) is usually preferable to a simple gain-score comparison.
- Statistics for random assignment of intact classrooms to treatments: when intact classrooms have been assigned at random to treatments (as opposed to individuals being assigned to treatments), class means are used as the basic observations, and treatment effects are tested against variations in these means. A covariance analysis would use pretest means as the covariate.
The Soloman Four-Group DesignThe design is as:
R O1 X O2 R O3 O4 R X O5 R O6
In this research design, subjects are randomly assigned into four different groups: experimental with both pre-posttests, experimental with no pretest, control with pre-posttests, and control without pretests. In this configuration, both the main effects of testing and the interaction of testing and the treatment are controlled. As a result, generalizability is improved and the effect of X is replicated in four different ways.
Statistical tests for this design: a good way to test the results is to rule out the pretest as a "treatment" and treat the posttest scores with a 2X2 analysis of variance design-pretested against unpretested. Alternatively, the pretest, which is a form of pre-existing difference, can be used as a covariate in ANCOVA.
The Posttest-Only Control Group DesignThis design is as:
This design can be viewed as the last two groups in the Solomon 4-group design. And can be seen as controlling for testing as main effect and interaction, but unlike this design, it doesn't measure them. But the measurement of these effects isn't necessary to the central question of whether of not X did have an effect. This design is appropriate for times when pretests are not acceptable.
Statistical tests for this design: the most simple form would be the t-test. However, covariance analysis and blocking on subject variables (prior grades, test scores, etc.) can be used which increase the power of the significance test similarly to what is provided by a pretest.
Discussion on causal inference and generalization
As illustrated above, Cook and Campbell devoted much efforts to avoid/reduce the threats against internal validity (cause and effect) and external validity (generalization). However, some widespread concepts may also contribute other types of threats against internal and external validity.
Some researchers downplay the importance of causal inference and assert the worth of understanding. This understanding includes "what," "how," and "why." However, is "why" considered a "cause and effect" relationship? If a question "why X happens" is asked and the answer is "Y happens," does it imply that "Y causes X"? If X and Y are correlated only, it does not address the question "why." Replacing "cause and effect" with "understanding" makes the conclusion confusing and misdirect researchers away from the issue of "internal validity."
Some researchers apply a narrow approach to "explanation." In this view, an explanation is contextualized to only a particular case in a particular time and place, and thus generalization is considered inappropriate. In fact, an over-specific explanation might not explain anything at all. For example, if one asks, "Why Alex Yu behaves in that way," the answer could be "because he is Alex Yu. He is a unique human being. He has a particular family background and a specific social circle." These "particular" statements are always right, thereby misguide researchers away from the issue of external validity.
- Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy and Practice, 5, 7-74.
- Briggs, D. C. (2008). Comments on Slavin: Synthesizing causal inferences. Educational Researcher, 37, 15-22.
- Campbell, D. & Stanley, J. (1963). Experimental and quasi-experimental designs for research. Chicago, IL: Rand-McNally.
- Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston, MA: Houghton Mifflin Company.
- Cronbach, L. (1982). Designing evaluations of educational and social programs. San Francisco: Jossey-Bass.
- Kahneman, D. (2011). Thinking fast and slow. New York, NY: Farrar, Straus, and Giroux.
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). doi: 10.1126/science.aac4716. Retrieved from http://science.sciencemag.org/content/349/6251/aac4716
- Paradis, E., & Sutkin, G. (2017). Beyond a good story: from Hawthorne Effect to reactivity in health professions education research. Medical Education, 51(1): 31-39.
- Pittler, M. H., & White, A. R. (1999). Efficacy and effectiveness. Focus on Alternative and Complementary Therapy, 4,109–10. Retrieved from http://www.medicinescomplete.com/journals/fact/current/fact0403a02t01.htm
- Ramsey, L. (2012, May 2). U.S. needs to expand monitoring after drug approval. PharmPro. Retrieved from http://www.pharmpro.com/news/2012/05/us-needs-to-expand-monitoring-after-drug-approval/
- Schneider, B., Carnoy, M., Kilpatrick, J. Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs: A think tank white paper. Washington, D.C.: American Educational Research Association.
- Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
- Valli, L., & Buese, D. (2007). The changing roles of teachers in an era of high0stakes accountability. American Education Research Journal, 44, 519-558.
date: 13 March 2018
Experimental Design in the Study of Crime Media and Popular Culture
Summary and Keywords
In the study of crime media and popular culture, researchers have a wide range of research methodologies at their disposal. Each methodology or standardized practice for producing knowledge involves an epistemological foundation and rules of evidence for making a claim, as well as a set of practices for generating evidence of the claim. The research methodology chosen is contingent upon the question being studied, as each methodology has strengths and weaknesses.
As the most stringent research design, experiments are unique because they are the only methodology able to establish causality. This is because experimental design’s major advantage is that researchers can control the environment, conditions, and variables that are being studied. However, experiments suffer from a major disadvantage as well: the precision and control utilized in experiments make it difficult to apply the findings to the real world, referred to as generalizability. This is especially poignant in crime media and popular culture studies where researchers are often interested in exploring how the criminal justice system, participants, and processes are socially constructed and how the mediated images impact our conceptualization of criminality and appropriate criminal justice system responses.
Keywords: experiment, experimental design, quantitative methodology, randomization, control and experimental groups, causation
The world of research design is divided into qualitative and quantitative approaches, as both offer techniques through which researchers can explore popular culture and media crime. For instance, the qualitative researcher may study metaphors in a dialogue or narrative structure with newspaper accounts of mass shootings whereas the quantitative researcher may study the number of times the stories are covered in the news following a mass shooting event. Experimental design usually utilizes quantification, which is the process by which we attach the properties of the object or individual under study to the properties of numbers, making observations more explicit, easier to aggregate, and able to run a variety of statistical analyses (Anderson, 2012; Maxfield & Babbie, 2016). In doing so, researchers test hypotheses or unproven ideas. For example, one of the more lasting scholarly questions is the influence of the media on viewers’ perceptions of the world. The cultivation hypothesis suggests that television is the primary source of information for viewers and that it has the power to dictate how we view the world. At a basic level, this means that people who watch a lot of media with violence will believe that violent crimes are common and maybe even increasing. This hypothesis has been tested in a variety of ways since being popularized by Gerbner and Cross (1976), and current research continues to debate the relationship between fictional depictions of violence and beliefs regarding crime today (Ferguson, 2013).
The Experiment: Features
True (also referred to as classical) experiments have three features: (a) two comparison groups, commonly known as an experimental group and control group; (b) random assignment to the two groups; and (c) the initial measurement of a dependent variable among subjects, followed by the re-measurement of a dependent variable among subjects, after they have been introduced to the independent variable. With each of these features, researchers are faced with important decisions regarding the design of their experiment.
Comparison Groups: Experimental and Control Groups
At the start of an experiment, several decisions regarding who will participate are necessary. Participants (or subjects) are just the people who take part in the research. First, who will the target population be? The target population refers to the group to which the results of the experiment will apply. Because we often cannot measure the entire target population, say all people who watch television, we need to decide how members of our target population, such as television viewers, will be selected for the experiment. We need to keep in mind the issue of generalizability. It should be possible to generalize from the sample of subjects chosen to the target population those subjects represent. Second, how will we ensure that our groups, the experimental and control groups, are equivalent? The group eventually exposed to the treatment is called the experimental group and the group not exposed is called the control group (first component). Equivalence refers to the attempt on the part of the researcher to select and assign subjects to groups that are comparable. Equivalence of groups is accomplished by randomization.
The second component of an experiment is random assignment. Randomization produces experimental and control groups that are statistically equivalent because each individual has an equal probability of being chosen from the population and being assigned to any of the groups to be compared. There are several ways to achieve randomization. For example, if there are only two groups, randomization can be engaged in by flipping a coin. If the coin lands heads up, the subject is assigned to group 1, and if the coin lands tails up, the subject is assigned to group 2. For experimental designs that utilize more than two groups, a random number table may be used. A random number table is a listing of random numbers that can be used to generate groups of a random nature. Stratified random sampling may also be used when the population divides into mutually exclusive groups (or subgroups) that are at least more homogenous than the general population. Utilizing stratified random sampling, the researcher takes a random sample within each subgroup that is proportionate to the segment of the population. Subgroups are often based on gender, age, or another meaningful category.
One classical study that utilized stratified random assignment measured if exposing a child to violent television scenes increased their aggressive behavior. At Fels Research Institute in Yellow Springs, Ohio, Liebert and Baron (1972) selected 136 children (68 boys; 68 girls; 65 5- and 6- year olds, 71 8- and 9- year-olds). Within gender and age groups, children were randomly assigned to either an experimental or control group. Each child was individually taken to a waiting room, where the child was asked to wait. On a television monitor in the room, the children in the experimental group saw a violent sequence while children in the control group saw a non-violent sequence. Next, each child was presented with the opportunity to press a “help” or “hurt ” button, to make a game a child was playing in the adjacent room harder or easier. Compared with the non-violent viewing children, the children who saw the violent sequence were significantly more likely to press the “hurt” button making the game harder for the child playing in the next room (who never existed).
The third component of an experiment is variables. Variables are used in an experiment to measure concepts from the real world. The process of moving from a concept or idea used in the real world to communicate and share mental images to a measurement in the experiment can be arduous. Examples of concepts in crime media and popular culture include aggression, fear of crime, and media violence. To be useful in an experiment, a construct must be operationalized. The process of operationalization defines concepts by describing how they will be measured. Once a concept has been converted from an abstract idea to a measurable quantity, it has become a variable. Variables are simply constructs that have been operationalized that can vary or take on different quantitative and qualitative values.
For instance, let us consider the operationalization of the construct of fear of crime. Given the importance of fear of crime in the public and political awareness, several researchers have explored the use of the construct in crime studies. Results suggested that a range of social and demographic variables, including perceptions of risk and vulnerability, age, social class, geographic location, and experiences with criminal victimization all influenced an individual’s construction of fear of crime, making it a difficult construct to measure. More importantly measures of the fear of crime vary in their affective, cognitive, and behavioral construction. For example, Schafer, Huebner, and Bynum (2006) measured fear of crime by examining participant’s constructs of perceived safety. Perceived safety was operationalized into two variables: fear of personal victimization and fear of property victimization.
Independent and Dependent Variables
At the most basic level, an experiment examines the effect of an independent variable (X) on a dependent variable (Y). More complex designs may utilize a variety of variables. There are four types of variables that occur in experimental design: independent, dependent, covariate, and control variables. It can often be difficult to determine the role a variable plays in a complex experimental design. Here are a few rules to help identify a variable’s role in the design.
The independent variable is the one that is manipulated by the researcher. For this reason, the independent variable is also known as the treatment, the predictor variable, the experimental variable, the explanatory variable, or X. The independent variable takes the form of the presence or absence of an experimental stimulus. The manipulation of the independent variable is a necessary factor of an experiment. Moreover, the independent variable must occur before the dependent variable in order to be identified as the true effect, otherwise known as causality. By comparison, the dependent variable is the one tested for differences across the treatment conditions in order to reveal the outcome. The dependent variable is also known under a number of different names such as criterion variable, response variable, or Y.
Covariates and Control Variables
The manipulation of covariates is popular in experimental research. A covariate variable is complimentary to the dependent variable. It is defined as any variable that is measurable and considered to have a statistical relationship with the dependent variable. A covariate is thus a possible predictive or explanatory variable of the dependent variable. For this reason, researchers may have theoretical interest in the consequences of a covariate variable.
However, researchers may also be interested in covariates because they may interact—one item has an effect on the other item—with the independent variables to obscure the true relationship between the dependent and the independent variables. For instance, Hayes-Smith and Levett (2011) reported a covariate relationship between the level of forensic evidence and viewing a television crime show when explaining jurors’ trial decisions. In this case, the independent variable was a trial vignette in which there were three levels of forensic evidence defined as no, low, or high. The dependent variable was trial decisions, in which participants were asked to render a verdict of guilty or not guilty, after reading their assigned trial vignette. The covariate was crime television show viewing or how often participants watched crime shows on television. The results suggested that those who often watched crime shows were more likely to favor the defense in the vignette, and in rendering verdicts, than those who did not. In this case, verdicts varied based on the combined influence of watching crime shows and level of forensic evidence.
In other cases, the researcher just needs to be aware of and make efforts to control the effect of covariates to eliminate or minimize their effect, better known as control variables. This was such the case with Salisch, Vogelgesang, Kristen, and Oppl (2011), as they believed ecological, family, or individual/child characteristics would be extraneous factors that influenced the predilection for violent electronic games or aggression. To demonstrate, Salisch and her colleagues (2011) conducted a longitudinal study in order to isolate the causal relationship between the preference for violent computer or video games and aggression identified in German schoolchildren (dependent variable). Third and fourth graders who preferred violent electronic games were more likely to be classified as openly aggressive by their friends and teachers. As this was a longitudinal study Salisch et al. (2011) continued to examine the schoolchildren for a period of a year and uncovered that the same schoolchildren who were identified as aggressive had a preference for violent computer and video games over time. Additional variables relating to ecological, family, and individual/child characteristics were then introduced. One such covariate was low-level performance of school achievement. It was determined that this additional variable did not influence the selection of violent games by children who were identified as openly aggressive, and it was actually a control variable. There are statistical means for measuring both control and covariate relationships that researchers can employ.
Media, Crime, and Popular Culture Common Variables
In media, crime, and popular culture studies there are six common classes of variables, including: message (text), mode (medium or technology), audience (demographics), reception (issues of interpretation), interaction (the ability of the audience to create the media text and vice versa), and outcomes (e.g., behavior, attitudes) (Anderson, 2012). Past research suggests some common patterns as to which variables serve as independent, dependent, covariate, or control variables. For example, usually, the behaviors and attitudes of media and popular culture consumption are defined as the dependent variable(s) in experiments, while the practices of the media and popular culture industry and the economics of media often serve as independent variables or causes for the attitudes and behaviors.
If the three requirements to an experiment are met, (a) two comparison groups, commonly known as an experimental group and control group, (b) random assignment to the two groups, and (c) variation in the independent variable before the assessment of a change in the dependent variable, then causation is possible in the experiment. Remember that the major advantage of experimental research is the ability to assess the relationship between two variables such that one (the independent variable) can be claimed to have caused the other (the dependent variable), otherwise known as causality.
According to Shadish, Cook, and Campbell (2002), there are three specific criteria necessary for causality. The first requirement in a causal relationship is that the cause must precede the effect in time. Generally, knowing which variable comes first gives one the direction of the causation. As a reminder, the independent variable (X) is the cause, and the dependent variable is the effect (Y). This can be difficult to determine in crime and media research. Which comes first: violence on television or aggressive behavior? One of the more lasting debates in crime and media is the influence of violent television (or video games) on aggressive behavior. It is assumed that viewing television violence increases aggression, but what if aggressive people seek out violent programming? There is considerable research to suggest this is true. For instance, Slater, Henry, Swaim, and Anderson (2003) reported that violent media use in the form of television, video games, and the Internet among sixth- and seventh-grade adolescents predicted current levels of aggressive behavior, as well as levels of aggressive behaviors two years later. Similarly, Vidal, Clemente, and Espinosa (2003) found that the greater amount of violence watched by youths, the more they tended to enjoy the violence. These studies suggests a process by which violent media increases aggressive behavior, aggressive youth are increasingly attracted to violent media, violent media viewing becomes more enjoyable, and leads to greater violent media consumption. This reinforcing pattern has implications for establishing causality in youth media and aggression studies.
The second requirement in a causal relationship is that the two variables must be empirically correlated. In other words, one variable increases or decreases in value in a predictable manner along with an increase or decrease in the value of another variable. When researchers want to know if two variables are related to one another, they often apply a statistical formula to the data and compute a correlation coefficient, which illustrates the strength and direction of the relationship between the two variables. The outcome of this correlation coefficient will indicate the strength and direction between +1 and –1. The sign before the number shows the direction of the relationship as either positive or negative. In a negative relationship, one variable increases as the other variable decreases or vice versa. In a positive relationship, as one variable increases, the other increases. When variables are strongly correlated, they are closer to 1 or –1. A 0 indicates that there is no linear relationship. Salisch et al.’s (2011) results indicated a moderate (strength), positive (direction) correlation of 0.28, between teachers identifying the schoolchildren as openly aggressive and the child’s preference for brutal or blood plot lines in electronic games, indicating that the more aggressive the child was rated by the teacher the more likely the child preferred violent electronic games.
The third requirement for a causal relationship is that the observed empirical correlation between two variables cannot be explained away as being due to the influence of some other variable that causes both of them. In other words, the relationship between X and Y is really caused by Z, or an extraneous variable—a variable in the experiment that is not being studied. Z represents any variable besides X that could be really causing a change in Y, called rival casual factors. Rival causal factors are variables other than X or the treatment that may be responsible for the relationship. Williams (1986) classical experiment on the introduction of television and aggressive behavior, demonstrates the power of unexplained rival causal factors to complicate a potential relationship between the introduction of the television and aggressive behavior. Williams utilized a natural experiment methodology by exploring a naturally occurring event, in this case the level of television exposure, in three Canadian communities: Notel (listed by pseudonyms), a town without access to television; Unitel, a town that received one television channel; and Multitel, a town that received four television channels. Utilizing a pre-test, Williams (1986) confirmed that prior to the introduction of the television in Notel, the children in the three towns did not differ in their levels of verbal or physical aggression. However, following the introduction of the television in Notel, levels of physical and verbal aggression increased among the children of that town. So the question is did the introduction of television in Notel, cause the youth to become aggressive? It is not clear. It is plausible that the observed increases in aggression can be explained by undocumented social changes occurring in Notel at the same time television was introduced in the town (Kirsh, 2006). The inability of the research to exclude these rival causal factors makes it difficult to establish a causal link.
By comparison, excluding rival causal factors demonstrates that the relationship between the variables is non-spurious. A spurious relationship is a relationship not caused by the independent and dependent variables, but by other unobserved variables. In criminology, these are measured by three types of validity: internal validity, factors within the study that may invalidate its findings; external validity, the factors outside one’s study that makes it difficult to generalize findings from the study to a larger population—therefore, the examined group is unique to the examined setting of the experiment; and construct validity, which refers to how closely the findings of the research present the actual situation in the real world. Often we refer to these factors as threats to internal and external validity. In the attempt to gain greater internal validity, external validity is negatively affected. In both, it is important to remember that they present possible ways that researchers might be wrong in inferring causation.
Threats to Internal Validity
When we are concerned with whether we are correct in inferring that a cause produced an effect, we are concerned with the validity of the causal inference. There are several threats or reasons that we might be incorrect in stating that some cause produced some effect, including history, maturation, testing, instrumentation, statistical regression, selection bias, and experimental mortality.
Historical events that take place during the course of the experiment may confound the experimental results. These can be identified as major disasters or newsworthy events related to the chief focus of the experiment. The best-known example is the September 11 terrorist attacks in the United States, which influenced the way most Americans conceptualized threat, security, and terrorism.
Maturation refers to the biological and psychological changes in the respondents that may not be due to experimental conditions. People are constantly growing and learning, such that the process may impact the results of the experiment. This represents a correlation between cause and effect that is not due to the independent variable in the experimental, but rather due to a change in the subject or respondent. This maturation process was captured by Salisch et al. (2011) in their longitudinal study that examined the relationship between aggression and violent electronic games among schoolchildren. As a consequence of their longitudinal study, these researchers were able to identify that, over time, a child identified as openly aggressive will likely increase their preference for violent electronic games.
Closely related with testing, instrumentation is about the process of measurement in pre-testing and post-testing. Instrumentation issues involve changes in the measuring instrument from the beginning to the end of the testing. This alteration negatively affects reliability and is common in criminal justice research that utilizes secondary data sources.
Statistical regression is the tendency for groups that have been selected for the study on the basis of extreme scores. In due time, these subjects with extreme scores will return to their former selves, otherwise known as the regression toward the mean. For example, in an experiment involving reading instruction, children who were chosen for intervention due to their extreme low scores will show considerably greater gain than those who scored average on the pre-test. Thus, it will be difficult to determine whether the subjects were extreme in their scores or were, in fact, influenced by the study, because the sample was asymmetrically chosen in the first place.
Participant dropout is common, especially in studies that occur over a long period of time. This loss is referred to as experimental mortality or attrition. However, when participants drop out before the experiment is complete, it can affect the statistical comparisons and conclusions that may be drawn considering the groups may no longer be comparable. This is especially true if those in the treatment group (the group that determines change) are more likely to drop out than the control group (the group that establishes a baseline). It may be that those receiving the treatment may be unwilling to remain in the study. It may also happen in studies that occur over long periods of time (i.e., longitudinal studies) as participants move away, lose interest, or even die.
Threats to External Validity
External factors refer to rival causal factors that negatively affect external validity or the ability to generalize experimental findings to other times, places, and people in the real world. This tackles the questions of whether results from experiments in one setting will be obtained in another setting or whether the treatment that worked for one population will work for another. In media research, one way to look at this is to ask if the experimental situation is like the real world. In other words, are the research stimuli typical to the media content watchers would view outside of the laboratory, or are they taken out of context and not representative of typical television viewing? In the same vein, are the behaviors measured in the experiment realistic, or do they act as artificial behavioral measures? While artificial measures may be used for a variety of practical and ethical issues in the laboratory, it may also impact the generalizability of the findings in the lab to the real world. For example, when studying the effects of media violence on audience aggression, researchers have frequently created artificial aggression measures because of the ethical and practical consideration of encouraging aggressive outbreaks in a laboratory environment. For example, Jones (2002) has argued that aggressive displays in the laboratory are so different from aggression in real life that laboratory studies of aggression are meaningless due to their lack of external validity.
Jones’s (2002) and Kirsh’s (2006) critiques illustrate the give-and-take relationship between internal and external validity. For example, while experiments conducted in real life settings generally have greater external validity, they are difficult to monitor, resulting in less internal validity. Williams’ (1986) classical field experiment regarding the introduction of television in the Canadian community of Notel lacked controls for confounding variables, which called into question the causal link between television viewing and aggression. On the other hand, most of the classical experiments on television viewing and aggression have been critiqued on their inability to be generalized to the real world.
How well an attribute or skill taps into the quality we want to measure is often defined as construct validity. In terms of an experiment, construct validity refers to generalizing from the observation in the study to causal processes in the real world. The intended measures of a research study must be a reliable and valid assessment of the proposed construct. For instance, do twitter message exchanges (attribute) with police forces a valid assessment of perceived police legitimacy (construct) (see Grimmelikhuijsen & Meijer, 2015)? There is no such thing as a perfect variable. Most measures of a construct are incomplete or limited compared to the construct in the real world. How completely a measure represents a construct and how well it generalizes to the real world are important to the interpretation of the findings of the experiment and the contribution that the experiment makes to the field of study under exploration.
Types of Experimental Designs
There are several experimental designs in the study of crime media and popular culture that researchers can choose. Research designs are a way of controlling for the threats to validity in research discussed above. In doing so, researchers must make careful choices about the advantages and disadvantages inherent in each design, and more realistically, what resources they have available to them in the form of participants, time, and space. It is helpful to review some of the main types of experimental designs, and it should be noted that there are many different types of variations within each of these primary experimental designs. The three general types of experimental designed discussed in this sections are: (a) Experimental designs: characterized by random assignment to treatment and control groups with an assessment to determine change between the groups, and include the classical, post-test only control group, and Solomon four group designs; (b) Quasi-experimental designs: are absent of random assignment, and instead use matching or other means of obtaining equivalence within the groups. Such methods to gain equivalence can include time-series or counterbalanced designs; (c) Pre-experimental designs: do not use equivalence groups and include one- and two-group ex-post facto and one-group before-after designs.
Classical Experimental Design
The classical experimental design is the gold standard in scientific research and is often used to study the effects of media. For example, for decades research on media effects has sought to demonstrate a causal relationship between media violence and violent thoughts and behaviors in the real world through the use of the classical experiment. Typically, two groups of individuals are randomly assigned to either the experimental group or the control group. Subjects in the experimental group then watch a movie with violence, whereas the control group watches a movie with no violence. Both groups complete a questionnaire measuring their attitudes towards violence both before and after (i.e., pre- and post-test) watching the film. In the lab, all other rival causes are able to be controlled. The aim is to see if the attitudes changed to a greater extent among the (experimental) group that watched the violent over the non-violent film (i.e., control group). Below is a diagram of a typical design of a classical experiment.
E/C = Random experimental and control group
O1&2 = Pre-test observations
X = Treatment
O3&4 = Post-test observations
Other Classical Experimental Designs
Post-Test Only Control Group Design
In the post-test only control group design, there are two groups: one that gets the treatment and another that serves as the control. Neither group gets pre-tested. Instead, the post-test performance of the experimental group is compared to the performance of the controls in a one-time only test to determine whether a statistical significance is present.
For instance, Bushman and Gibson (2011) examined whether violent video games influenced aggression over time. Their college student participants were bisected into an experimental and a control group by random assignment. Participants of each group (i.e., experimental and control) were evenly bisected again by random assignment with half of the college students assign to ruminate (or to think deeply) about their gameplay. The experimental group played a violent video game (e.g., Mortal Kombat), whereas the control group played a non-violent video game (e.g., Guitar Hero). Immediately following their 20 minutes of gameplay the college students were questioned about their emotional arousal of the video game. The following day, the college students assigned to ruminate completed a survey that questioned their thoughts within the past day about their gameplay. A post-test was engaged with all participants split by gender (note the lack of a pre-test to determine aggression level) in the form of a competitive reaction time task. Winners of this time task were told to deliver a painful noise via headphones to the loser. Only male participants exhibited a relationship between violent video game play and aggression. Male participants who played violent video games and were told to ruminate about their gameplay were more aggressive when compared to male participants who were not told to ruminate, whereas, male participants who ruminated about their non-violent game play did not exhibit different aggression levels. In whole, Bushman and Gibson (2011) showed that violent video gameplay can affect a male’s aggression levels 24 hours later, at least of the males that thought about their gameplay afterward. Shown below is a post-test only control group design.
E/C = Randomized experimental and control group
X = Treatment
O = Post-test observations
Solomon Four Group Design
Created by Richard L. Solomon in 1949, the Solomon four group design is often viewed as the purest research design. It combines the classical experimental design with the post-test only design. The Solomon design has four groups; the first two similar to the classical experimental design and the second two similar to the post-test only design (see the diagram below). The strength of the design is that it minimizes measurement issues that can arise that are found to threaten the internal and external validity and it controls for the possible effect brought on by the pre-test or any extraneous variables.
Respondents are randomly assigned to four total groups. Two groups receive the pre-test while two groups do not. Next, two groups (one with the pre-test and one without the pre-test) get one version of the treatment, while the other two undergo a different version of the treatment or receive no treatment (known as the control groups). All groups are then post-tested. In this way, the pre-testing effects can be examined. If the groups who were pre-tested differ in their post-test responses from those who were not pre-tested, the researcher can conclude that the pre-test was influential. The greatest drawback to this design is that it requires four distinct groups, which means more subjects, and possibly adds to the expense and complexity of the experiment. The results can also be difficult to interpret, due to the plethora of comparisons that are possible.
E1&2 = Randomized pre-tested and not pre-tested experimental groups
C1&2 = Randomized pre-tested and not pre-tested control groups
O1&2 = Pre-test observations
X = Treatment
O3–6 = Post-test observations
When laboratory experiments are not possible, quasi-experimental designs are utilized. Quasi-experimental designs are variations on the classical experimental design. Usually, they lack one of the key features of a classical experimental design: randomization, pre- or post-tests, or comparison groups. Most often, quasi-experiments depend on self-selection or administrative decisions to determine who is exposed to a treatment (Cook, 1990). The two major types of quasi-experimental design are (a) nonequivalent control group designs that have groups that are not created randomly, but are designated before the intervention, and (b) before and after designs that have both a pre-and post-test, but no comparison group, including repeated measures.
The main strength of random assignment is that it allows for the assumption of equivalence. Non-equivalent group designs are those in which it is not possible to create groups through randomization, thereby negating the assumption of equivalence. Randomization eliminates the potential for systematic bias, thereby making the groups equal, which is the desired outcome, but often subjects are chosen in other ways, which can potentially result in nonequivalent comparison groups, or which can create groups that are dissimilar. Utilizing a convenience sample of available respondents for an experiment is a common way to selection bias. For example, if a university professor were interested in exploring if students were more accepting of rape myths following the viewing of pornography, choosing volunteers from their classes may lead to selection bias as their volunteers may not be typical of the general population of students enrolled at universities. Differential attribution is a process of concern as it can create selection bias. A deselection process occurs where subjects drop out of their assigned groups, thus creating dissimilar groups, even though the researcher engaged in randomization.
Lacking random assignment and a control group, researchers often chose a comparison group to serve the same function through the process of matching (a substitute for randomization). Matching involves selecting subjects on the basis of matching certain key characteristics such as age, race, and gender so that the groups are as similar as possible on these factors. There are two primary forms of matching. The first, matching by constancy, makes a variable uniform for all the groups. For instance, if a researcher is interested in studying the impact of violent and non-violent pornography on aggression in college students, past research strongly suggests that gender is related to the levels and types of aggressive acts performed. To match the sample by constancy and control for gender effects, the researcher may perform the experiment with only one gender in the sample. In the second type of matching, called matching by pairing, subjects are paired based on a similar score on a relevant variable. Going back to our pornography example, if the researcher believed that a prior level of aggression may impact how that subject is influenced by watching violent and non-violent pornography, the researcher may administer a test of aggressiveness to all subjects and record their scores. Subjects would then be matched based on their similar aggression scores. This provides a means for the researcher to limit the influence of confounding variables, such as gender or previous levels of aggression, which may influence the results of the experiment. Below is a non-equivalent group design.
E/C = Non-random experimental and control group
O1&2 = Pre-test observations
X = Treatment
O3&4 = Post-test observations
Many of the designs discussed previously require resource allocation of subjects to distinctive treatments, but what if the population under exploration is small, or the researcher only has access to a small sample? One solution to this problem is a repeated measures design. Instead of recruiting a control group, each subject instead acts as their own control group.
For example, Ward, Greenfield, and Colby (1996) were inspired by the trials of the police officers who beat Rodney King to explore how the formal features of slow motion and repetition, in a videotape of the beating that was at the center of the trial, influenced viewers. They designed a repeated measures study where each undergraduate student watched a clip of a real violent confrontation twice in one of the four viewing speeds: normal speed then slow speed, slow speed then normal speed, normal speed both times, or slow speed both times. In this format, researchers were able to determine whether changes in speed changed viewer’s perceptions of the attacks. As video footage of police and citizen interactions are increasingly used in court cases, the manipulation of the film has implications for viewers’ perceptions of the interaction. Below is a diagram of a counterbalanced repeated measures design. Counterbalancing is a method for controlling order effects in a repeated measures design, where each group receives the treatment in a different order, making for a stronger research design.
E/C = Non-random group
X1&2 = Treatments
O1&2 = Post-test observations
Field experiments are randomized trials conducted in real world settings. Occasionally, conditions that are naturally occurring lend themselves to a research study called field experiments. Unique to media studies, field experiments gauge the effects of an actual media program on subjects who would ordinarily listen to or view it, measuring outcomes in an unobtrusive fashion. For example, Bond et al. (2012) studied the effects of Facebook on voter turnout. What they found was that while encouragement (e.g., reminders) had little impact on voting behavior, when users were shown which of their friends had voted, their voting behavior increased. By utilizing the effects of Facebook, which subjects already visit, the researchers were able to unobtrusively (i.e., through the use of public voting records) measure the influence Facebook had on users voting behavior.
One advantage to criminal justice field experiments is that they take place in real-world conditions, making them more likely to be valid to the real world. The obvious disadvantage to field experiments is that it is impossible to control for other sources of influence on the dependent variable, due to the lack of random assignment. For this reason, most field studies are correlational, rather than causal in explanation. The lack of control inherent in field experiments is one of their drawbacks. However, field experiments are less artificial, and for this reason often have greater external validity or generalizability to the real world. A recent review of the use of field experiments suggests this is not as common of a methodology as true or quasi- experiments (Green, Calfano, & Aronow, 2014).
Sometimes random assignment to different conditions is not possible. In those cases, researchers may opt for a pre-experimental design. All of the pre-experimental designs are unable to provide equivalence or assurance that the group being studied is a representative sample of the target population. Because of this inability, most present the relationship between the independent and dependent variable as a correlation rather than causation. In other words, researchers are able to establish a relationship between the two variables but are unable to be certain that the one caused the other. For example, the one group post-test-only design (or one-shot case study) has only one group that gets the treatment, followed by a post-test (see diagram below). In this case, a group may be shown a violent movie and then given the attitude test after the treatment has been administered. However, this particular design lacks a control group. This design has considerable internal and external validity flaws.
X = Treatment
O = Post-test observation
In another option, the researcher may utilize a one-group pre-test and post-test design (see diagram below). In this model, one group gets the pre-test, treatment, and post-test, in that order. So, researchers may assess a group’s attitude towards violence, introduce a violent film, and then re-assess their attitudes. In this way, changes in individual’s attitudes over time may be explored. The lack of a control group makes it impossible to rule out rival explanations for the observed changes. In particular, the threats of history, maturation, and regression toward the mean are likely in this design, and for those reasons, this design is not usually casually interpreted.
X = Treatment
O1 = Pre-test observation
O2 = Post-test observation
Finally the two group post-test only design eliminates the possibility of pre-test reactivity by studying both the experimental and control group after the experimental groups has been exposed to the treatment (see diagram below). In some cases, the process of testing and re-testing confounds the experimental results, because subjects are no longer naïve about the subject matter on the post-test. Subjects may even begin to alter their responses into more socially accepted answers based on their sensitivity to the information, or the subjects may just get better at taking those types of tests. Consider the experiment where a pre-test measure may be a questionnaire about attitudes towards a particular video game. After playing the game, teenagers were asked to complete the same attitude measure they filled out before playing the game. Instead of reporting their true attitudes, teens may try to be consistent with their earlier responses. In this sense, the pre-tests have sensitized them to the post-test. For this reason, sometimes researchers are willing to sacrifice the pre-test information in order to eliminate the risk of running the ability to say what causes the teens responses. On the other hand, the major problem with this design is the inability to determine if the two groups were initially equivalent.
X = Treatment
O = Post-test observations
Advantages and Disadvantages to Experiments
The primary advantage of an experimental research design is the ability to establish the existence of a cause and effect relationship. Although some theorists argue we can never know with certainty whether one variable causes a change in another, experiments give us the tools to explore the world around us and make casual inferences. Researchers are then able to determine if crime media messages bring about some change in people’s thoughts, attitudes, emotions, and behaviors. The degree of control over conditions afforded by certain experimental designs allows the researcher to establish causation rather than correlation between variables.
The laboratory also offers maximum control over the experimental conditions so that any significant differences found can be attributed to the introduction of the independent variables. Specific to media, crime, and popular culture research, experimenters can even dictate the exact content of the dialogue, along with the presentation of the characters. This greater control and precision is well suited to address specific research questions, such as how exposure to depictions of race and crime in television news impacts participant’s perceptions of the guilt and punishment of African American suspects (Mastro, Lapinski, Kopacz, & Behm-Morawitz, 2009).
In experiments, the conditions under which they are carried out, the manipulation of the variables, and how the variables are quantified are all clearly spelled out to make future replication possible. The idea that others can repeat the exercise suggests that the information produced is empirical or indicative of an independent, objective, material reality. In fact, experiments are often replicated under slightly different conditions to ensure that the results are not unique or to determine under what conditions the results change.
However, there are limitations to experiments. In the real world, many experiments would be unethical or immoral. For example, many of the people involved in the famous Milgram experiment suffered from psychological problems caused by their participation in the project and required extensive counseling and therapy as a result of their participation. The Milgram experiment on obedience to an authority figure was conducted in July of 1961. Stanley Milgram, Yale University psychologist, created the experiment to measure the willingness of the study’s participants to obey an authority figure when instructed to perform acts that would conflict with their personal conscience. Milgram recruited participants from all walks of life, but most of the participants were male students from Yale. The participants were all assigned the “teacher” role during the study. The “teacher” would be asked to administer incremental electric shocks beginning from mild to severe to the “learner” when they did not answer questions correctly. The “learner” was played by an actor who was employed by the experimenter. The “teacher” participants were given small shocks to give them a feeling of what they would be delivering to the learner. They were told that shock levels ranged from 15 to 450 volts; the actor imitated severe distress as the “voltage” was increased by the “teacher.” The “teacher” participants were encouraged to continue dispensing higher voltage even though the “learner” appeared to be in pain and asked to end the experiment. The experiment’s results yielded that 65% of the study’s participants were willing to progress to the maximum level of voltage available if instructed to do so. Milgram concluded that people obey an authority figure either out of fear or out of a desire to appear cooperative, even when acting against their belief and judgment.
It is unethical for researchers to inflict harm on participants, but how can researchers study important crime and media questions about violence and aggression without deceiving participants to believe they are engaging in a behavior that they really are not? This is a lasting ethical dilemma in experiments. Interestingly, many researchers have critiqued media experiments, not because of ethical considerations, but because they provide data that has limited application in the real world.
Experiments try to explain real-world phenomena by manipulating them under laboratory conditions. The manipulations imposed during the experiment to control rival casual factors may create artificial conditions that make it difficult to generalize results to other times, places, and people. For example, in a study evaluating the influence of televised violence on viewers’ behavior, a few minutes of violence, taken from a popular television program and shown in the laboratory to viewers, may eliminate the natural context and fail to reproduce a normal media experience. In many experiments, the stimuli are created by the experimenters, are shown in isolation from other types of contents, and followed by a behavior measure that is removed from the real social cues associated with that behavior. Coupled with the artificiality of having little choice in the programming or the ability to discuss the programming with co-viewers, this results in low external validity or generalizability to the real world.
Finally, an experimenter may unwittingly reveal the purpose of the study, leading the individuals participating in the study to produce the responses they think the experimenter wants them to produce, referred to as experimenter bias. This problem is sometimes countered through the use of a double-blind procedure. In this design, the person who assigns the subjects to experimental and control groups is not the same as the person who runs the experiment. The researcher running the experiment is “blind” to who is in the experimental and control groups, and therefore is less likely to unwittingly give clues to the expected outcomes.
In conclusion, although experiments have some limitations in external validity, they can, when designed correctly, offer a high level of control and address specific questions better than most other methods used to explore media crime and popular culture. Looking towards the future of research methodologies in media crime and popular culture suggests that the utilization of experimental and nonexperimental formats as complements may increase in popularity as multiple methods designs can complement each other by fulfilling the others’ weaknesses. It is only by utilizing multi-method approaches that convergence (or triangulation) is possible. However, such an approach requires more expertise, time, and resources, which makes it a more demanding approach. In closing, the choice to pursue a particular methodology in crime media and popular culture should be based on validity, and more importantly, the most ideal method is one that is best able to answer the research questions of interest.
Review of Literature and Primary Sources
Crime sells. It attracts viewers and produces advertising revenues. Its ability to produce revenues contributes to its pervasiveness. Research exploring the intersections of crime media and popular culture have taken place in a variety of mediums, including: reality television (Cavender & Fishman, 1998; Rabe-Hemp, 2011), news coverage (Dixon & Azocar, 2007; Graziano, Schuck, & Martin, 2009; Mastro et al., 2009; McGinty, Webster, & Barry, 2013; Pelizzon, West, & Martha, 2010), top-rated television programs (Cavender, Bond-Maupin, & Jurik, 1999; Hayes-Smith & Levett, 2011; Robson & Silbey, 2012), movies (Bailey, Pollock, & Schroeder, 1998; Rafter, 2006), and comic books and magazines (Nyberg, 1998; Rabe-Hemp & Beichner, 2011). Crime content is very common in each of these mediums. For example, in tracing crime stories in the evening news of 100 television stations, it was found that 72 of the stations led their news programming with a crime story and that 33% of all stories covered on the evening news were crime related (Klite, Bardwell, & Salzman, 1997). Television research found that crime shows comprise 25% of prime-time television (Mandel, 1984). Finally, about 20% of all movies released annually are in the crime genre, but over half of all movies have significant crime content (Rafter, 2006).
With the saturation of crime media and popular culture, the influence that media has on the attitudes and behaviors of the viewers, ranging from violence and aggression to voting behavior are the most common type of experiments conducted in crime media and popular culture (Jewkes, 2015). Research has consistently found that media influences aggressive and violent behaviors, which has led to a legislated rating system for evaluating the appropriateness of the media for audiences (Kirsh, 2006). Current research continues to evaluate the mediums, content, and audiences for which these findings are most poignant.
While it is the most hotly debated, it is not the only question explored by experimental research design. Another common area of research focuses on the misrepresentation of crime and criminals, the perpetuation of crime myths, and the comparison between media portrayals and reality. We know that the mediated representations of crime bear little resemblance to the reality of crime, but because of its popularity, it has tremendous impact on our concept of criminality and the appropriate responses to crime. These mediated representations also promote race, class, and gender stereotypes (Potter & Kappeler, 2012). Due to the popularity of crime media in popular culture, and its subsequent influence on social thought and policy, it continues to be an important area of research.
Links to Digital Materials
The ethical considerations of the famous Milgram experiment gave rise to many of the safeguards in place in social sciences research today. For more details on the ethical concerns raised in Milgram’s famous experiment, can be found at Stanley Milgram: The Man Who Shocked the World.
The FBI creates or obtains records as it carries out its responsibilities. Many of these records are publically available for reading and research in the