Skip to main content
Search form
  • 00:02

    NARRATOR: Have you ever done terriblyon an exam you had prepared for diligently,or aced an exam without studying muchbecause the questions focused on the small part of the materialyou did know?Whether it worked out to your benefit or your detriment,you probably realize that the exam did not accurately reflectyour mastery of the subject.

  • 00:23

    DR. EVELYN BEHAR: We've all had the experience of notbeing particularly good at somethingand lucking out and getting really good at it that one timethat we're tested, or, alternatively,of being really, really good at somethingand doing a terrible job that one time that we're tested.[Evelyn Behar, PhD, Associate Professorof Psychology, University of Illinois at Chicago]

  • 00:38

    NARRATOR: When it comes to scientific research,we want to avoid situations like these, in which resultsare misleading.We want to make sure that our measurements are accurate,and that our methods take into accountany unusual circumstances, such as a student freezing whenhe sits for an exam.

  • 00:56

    DR. MARIANN WEIERICH: The real-world applicabilityof your results can be affected by how you design your study.If we don't do a good job of measuring our variables,we risk reaching inaccurate conclusionsabout those variables. [Mariann Weierich, PhD, AssistantProfessor of Psychology, Hunter College CUNY]

  • 01:11

    NARRATOR: How results can be misleading-- problemswith reliability and validity.[MUSIC PLAYING]This DVD will focus on the concepts of reliabilityand validity in scientific research,and how they affect the accuracy of the conclusionswe are able to draw.

  • 01:33

    NARRATOR [continued]: If we understand these issues, wecan reduce problems within our own studies,and understand where to be cautious when consideringthe results of studies we read.First, we will look at three types of reliability,and the problems with each of themthat can lead us to mistaken conclusions.Inter-rater reliability, test-retest reliability,

  • 01:57

    NARRATOR [continued]: and internal consistency.Then, we will examine three types of validity,and the complications with, or threats to,each of them, which can be misleading.Construct validity, external validity,and internal validity.

  • 02:15

    DR. EVELYN BEHAR: Broadly speaking,reliability refers to how consistently and stably we'remeasuring our variables, whereas validityrefers to how accurately we're measuring our variables.The more reliable and valid our measurements are,the more likely we are to draw valid conclusionsabout the topic that we're interested in.

  • 02:37


  • 02:40


  • 02:44

    DR. EVELYN BEHAR: There are many measurementsin our everyday lives for which we want clearlygood, consistent measurement.So for example, if you weigh yourself on one scaleand find that you weigh 140 pounds,and then you weigh yourself on another scaleand find that you weigh 155 pounds, then all of a suddenit calls into question the validity of that measurement.

  • 03:07

    DR. EVELYN BEHAR [continued]: The same goes for scientific research.You hope that, when you measure one variable, if youwere to measure it again you would get the same measurement.And when this does not happen, then you reallystart to doubt the accuracy of your measurement.There are three types of reliability-- inter-raterreliability, test-retest reliability,and internal consistency.

  • 03:29

    DR. EVELYN BEHAR [continued]: Let's say that you have a singing contest that you'reinterested in running, and the grand prize winnergets $1 million.The decision of how many judges to employ maps onto what'scalled inter-rater reliability.The decision about how many timeseach participant should sing mapsonto test-retest reliability.And the decision regarding how many different types of music

  • 03:51

    DR. EVELYN BEHAR [continued]: to have each participant sing mapsonto what's called internal consistency.[MUSIC PLAYING]

  • 03:59

    NARRATOR: Inter-rater reliability.[MUSIC PLAYING]

  • 04:06

    DR. EVELYN BEHAR: Inter-rater reliabilityis the degree to which multiple judges, or raters, agreein their measurement of a particular variable.Let's say that you have four judges.If those four judges agree in their assessmentof the singer who is performing-- so for example,on a scale of 1 to 10, the judgesrate them an 8.0, an 8.5, an 8.5, and an 8.0,

  • 04:30

    DR. EVELYN BEHAR [continued]: then you have achieved high inter-rater reliability.On the other hand, if your judgesdon't agree with each other, if theygive a particular contestant an 8.0, an 8.5, an 8.0, and a 1.5,then you've got a problem, because you have that onejudge who clearly did not have high inter-rater reliabilitywith the other judges.

  • 04:51

    DR. EVELYN BEHAR [continued]: And you might consider throwing out that judge's rating.

  • 04:54

    DR. MARIANN WEIERICH: In social science research,we often rely on raters, or coders,to rate the behaviors that are of interest.So for example, if we wanted to measure aggression in childrenafter viewing a violent film, we might show the childrenthe violent film and then put them in a situationwhere they might act in aggressive waystoward their peers.We would videotape those interactions,

  • 05:15

    DR. MARIANN WEIERICH [continued]: and then the raters or coders-- maybe four or five people--would then code the behavior on the videotapeaccording to the level of aggression.To the degree that the raters come upwith similar ratings for levels of aggressionin the observed behavior, we havegood inter-rater reliability.If the numbers are vastly different, thanwe don't have good inter-rater reliability.[MUSIC PLAYING]

  • 05:39

    NARRATOR: Problems with inter-rater reliability.[MUSIC PLAYING]Inter-rater reliability is damagedif there are not enough raters, if the raters are notwell-qualified or trained to makethe necessary observations, or if the raters influenceeach other by discussing their observations.

  • 06:01

    DR. EVELYN BEHAR: If you only have one judge,you run the risk of selecting someone whose musical taste issimply poor.You also run the risk of the decision resting too heavilyon that one judge's preferences.So for example, if that judge, for whatever reason,inherently prefers female voices to male voices,then the men in your competition willbe at an unfair disadvantage.

  • 06:23

    DR. MARIANN WEIERICH: An example of a studyin which inter-rater reliability might be quite lowcould be a study of interactions between a married couple.One rater might observe the husband's behaviortoward the wife as being very controlling and domineering,but the other raters might not necessarily see the same thing.

  • 06:39

    DR. EVELYN BEHAR: You want to make surethat your raters are making their ratings independently.You don't want them to communicate with each otherwhile they're making their ratings.Otherwise, you will essentially artificially inflateinter-rater reliability.[MUSIC PLAYING]

  • 06:58

    NARRATOR: Test-retest reliability.[MUSIC PLAYING]Test-retest reliability is the extentto which measuring something multiple timesyields a consistent value.

  • 07:13

    DR. EVELYN BEHAR: For the singing contest,it would be best if you had each contestant sing multiple times,and not just once.They might run the risk of just having an off numberif you only test them once, or theymay luck out and do a really, really good job that one time,even though they really aren't so talented.By having each contestant perform three or four songs,

  • 07:33

    DR. EVELYN BEHAR [continued]: you're going to essentially get closer to the truth.If you have a contestant who performs four songs and scoresa 7.0, 7.5, 7.0, and 6.5, then youhave evidence of good test-retest reliability.This gives you some confidence that you have accuratelymeasured this particular contestant's level of talent.

  • 07:56

    DR. EVELYN BEHAR [continued]: If, however, you have a contestant whoscores a 7.5, 9.0, 3.5, and 1.0, then all of a suddenyou have poor test-retest reliability,and it's difficult to really gaugewhat that person's level of talent really is.

  • 08:11

    DR. MARIANN WEIERICH: In social science research,we often rely on individuals to provideself-report of things like symptoms, personality,and functioning.It's important that these self-reportsare consistent over time.So for example, you might show one group of childrenviolent movies and you might show another group of childrennon-violent movies, and ask them after each viewing

  • 08:31

    DR. MARIANN WEIERICH [continued]: how much they might be likely to hitsomeone who stole their candy.You could use this as a test of aggression.

  • 08:38

    NARRATOR: If we do this on five separate daysand we ask the same question, we canexpect the children to answer similarlyafter viewing a movie on the third dayas they did on the second and on the first.If they answer very differently-- for example,if a child who watched a violent moviereports less aggressive feelings on the second and third days

  • 08:60

    NARRATOR [continued]: than on the first, fourth, and fifth-- thenwe have poor test-retest reliability.[MUSIC PLAYING]Problems with test-retest reliability.[MUSIC PLAYING]Sometimes it's hard to know whether we

  • 09:20

    NARRATOR [continued]: have a case of poor test-retest reliabilityor are simply studying a topic that hasa great deal of fluctuation.For example, a participant may indicate high levelsof depressed mood in an initial measurement and low levelsa few hours later.We would have to wonder whether this change isdue to a flaw in our research design--perhaps we're asking the wrong questions,

  • 09:42

    NARRATOR [continued]: or observing the wrong behaviors--or whether it is simply that human emotions are highlyvariable.

  • 09:48

    DR. MARIANN WEIERICH: If we're interested in attitudeson abortion, we might give the group of peoplea questionnaire on their attitudestoward abortion today, and then give themthe same questionnaire two weeks from today.Because attitudes don't change very quickly,we wouldn't expect to see much change in scoresover the two-week period.

  • 10:04

    DR. EVELYN BEHAR: If I were to measure suicidalityin a group of depressed patients,it's likely that many people wouldstate that they have high suicidality todaybut, if I measure them again in two weeks,it's likely that, for many of those individuals,their level of suicidality would have decreased.[MUSIC PLAYING]

  • 10:26

    NARRATOR: Internal consistency.[MUSIC PLAYING]

  • 10:33

    DR. EVELYN BEHAR: Internal consistencyis the degree to which scores on different measuresof the same variable agree with each other.In our singing competition, you wouldneed to make a decision about how many different typesof music each contestant should perform.If they were only to perform one type of music,then you run the risk of them lucking out.

  • 10:53

    DR. EVELYN BEHAR [continued]: If you have a contestant scoring similarlyacross different genres of music-- so for example,the contestant gets a 4.0 for country, a 4.5 for rock,a 4.0 for opera, and a 3.5 for pop--then you have some confidence in the ideathat you have accurately assessedthat particular contestant's level of talent.

  • 11:16

    DR. EVELYN BEHAR [continued]: However, if you have a contestant who scores allover the place, really very different numbersacross different genres of music,then you have low internal consistency,and you call into question your abilityto truly measure that particular contestant's level of talent.

  • 11:32

    DR. MARIANN WEIERICH: In social science research,we usually try to measure the same behaviorin several different ways.For example, if we think that somebodymight be socially anxious, we mightask a series of different questionsthat are all meant to measure social anxiety.We might ask, for example, how nervousdo you feel before you give a speech, how scared are youwhile you're giving a speech, how nervous

  • 11:52

    DR. MARIANN WEIERICH [continued]: do you feel before you have to stand up in frontof a lot of people in a room.To the degree that the person's scores on all threeof those items are in agreement with each other,that particular set of items has strong internal consistency.If the scores are vastly different,than the internal consistency is low.We can assess a variable using a variety of measures.

  • 12:13

    DR. MARIANN WEIERICH [continued]: And we call that a multi-method design.So for example, we might measure anxietyusing the person's self-report of their anxiety, the measureof their heart rate during an anxiety-provoking situation,and then also a measurement of their behaviorin the situation, such as failing to make eye contactor shaking visibly.All three of these measures independently

  • 12:34

    DR. MARIANN WEIERICH [continued]: are indices of anxiety.And if we have high internal consistency,we should see consistency in the scoreson these three different measures for someonewho is anxious.[MUSIC PLAYING]

  • 12:48

    NARRATOR: Problems with internal consistency.[MUSIC PLAYING]Using only one type of measurementmay not be enough to ensure internal consistency.For example, if we're studying aggressionand we ask people how aggressive are you,we are unlikely to reach reliable conclusions.

  • 13:10

    NARRATOR [continued]: We probably will want to ask people several questions thatindicate their level of aggression,as well as observe their behaviorand perhaps even measure their heart rateduring frustrating experiences.We should find consistent levels when analyzing answersto questions such as how often do you lose your temper as wedo when we observe participants' behavior,

  • 13:31

    NARRATOR [continued]: and when we measure heart rate during a frustrating task.

  • 13:34

    DR. EVELYN BEHAR: If you have more raters,then your inter-rater reliability will increase.If you have more observations, thenyour test-retest reliability will increase.And if you have more types of measurements,then your internal consistency will increase.[MUSIC PLAYING]

  • 13:51

    NARRATOR: Validity.[MUSIC PLAYING]Validity refers to the accuracy of measurement.Many aspects of our everyday livesdemand high validity, from medical testingto construction.In scientific research, we can onlyreach accurate study results if we have valid measurements.

  • 14:14

    DR. MARIANN WEIERICH: There are three primary typesof validity-- construct validity, external validity,and internal validity.[MUSIC PLAYING]

  • 14:26

    NARRATOR: Construct validity.[MUSIC PLAYING]

  • 14:32

    DR. EVELYN BEHAR: Construct validityis the degree to which you're actuallymeasuring the thing you think you're measuring.If you're interested in measuringhow addicted someone is to gambling,you might decide that gambling addiction canbe measured by counting the number of timesthat someone goes to a casino, or by countingthe amount of money that someone spends per month on gambling.

  • 14:54

    DR. EVELYN BEHAR [continued]: These definitions have what we call high construct validity.They probably are doing a pretty good job of measuringsomeone's gambling addiction.[MUSIC PLAYING]

  • 15:07

    NARRATOR: Problems with construct validity.[MUSIC PLAYING]Some things are easy to measure.But often, we have to figure out what aspect of our studycan be quantified.In such cases, we may choose the wrong variable, one that doesnot sufficiently relate to the topic of our study.

  • 15:27

    DR. MARIANN WEIERICH: In general, your resultsare only as good as your construct validity.This is easier in some fields than others.For example, entomologists who study insectsmight be interested in fertility in these insectsin different environments.They can use the number of eggs laid as an index of fertility.And that has pretty good construct validity.On the other hand, behavioral researchers

  • 15:48

    DR. MARIANN WEIERICH [continued]: often study variables such as intelligence.We can't open up people's heads and directly measureintelligence, so we have to try to use measuresthat we think are as close to the actual variableas possible.

  • 15:59

    DR. EVELYN BEHAR: If you decided that, in orderto measure gambling addiction, youwere going to count how many alcoholic beveragessomeone consumes in a week, that probably is goingto give you very poor construct validity,because the number of alcoholic beveragesthat someone consumes in a week really doesn't have anythingto do directly with someone's gambling addiction.[MUSIC PLAYING]

  • 16:23

    NARRATOR: External validity.[MUSIC PLAYING]

  • 16:28

    DR. EVELYN BEHAR: External validityis the degree to which the results that you foundin your experiment would actually generalize to whathappens in the real world.Let's say that you were interested in studyingthe impact of alcohol, drinking alcohol, on gambling behavior.And you randomly assigned your participantsto drink either six ounces, 12 ounces, 24 ounces,

  • 16:50

    DR. EVELYN BEHAR [continued]: or no alcohol.And you measured how that affectedhow long they gambled for in a particular gambling task.And after your results were in and you determinedwhat your findings were, you mightwonder how good the external validity of your study was.You might ask yourself whether your participants behavedin your laboratory the same way that they

  • 17:12

    DR. EVELYN BEHAR [continued]: would behave in the real world.Maybe there was something about beingin a laboratory with a researcher handing youalcohol that simply makes you behavevery differently than you would in a real casino.

  • 17:23

    NARRATOR: If we design our experiment carefully,and if we have the resources to implementa sophisticated set-up, we can achievegreater external validity.

  • 17:32

    DR. MARIANN WEIERICH: Experimental realismis how realistic a study is.If you wanted to study drinking behavior in college students,you have a choice.You can bring the college studentsinto a sterile-looking lab-- white walls, chairs,plain-looking paper cups with alcohol dosages-- or youcan design a lab environment that very closely approximates

  • 17:54

    DR. MARIANN WEIERICH [continued]: an actual bar scene.In this case, the actual bar sceneis more close to what might happen in the real world, asopposed to the sterile lab with the four white walls.

  • 18:06

    NARRATOR: In order to achieve high external validity,we also have to make sure that the participants in our studyare a representative sample.That is, that they adequately represent the populationwe want to learn about.If our sample population consists onlyof college students, do the resultsreflect what we would find among younger children,

  • 18:27

    NARRATOR [continued]: senior citizens, and people who don't pursue higher education?

  • 18:30

    DR. EVELYN BEHAR: It may be that, if you'rerunning your experiment on college students,they may behave very differently than peoplein the real, outside, everyday world might behave.And that would threaten your external validity.

  • 18:42

    NARRATOR: If we're testing the effectivenessof a new medication on patients at a private hospitalin a wealthy area and we assume that our sample isrepresentative of the entire population,we may reach misleading results because we haven't includedany lower income patients.[MUSIC PLAYING]

  • 19:03

    NARRATOR [continued]: Internal validity.[MUSIC PLAYING]

  • 19:09

    DR. EVELYN BEHAR: Internal validityis the degree to which your independent variable isactually what caused a change in your dependent variable.As a reminder, an independent variableis the variable in an experiment that you manipulate.It's the variable that you expectto exert some change, some effect, on your outcome.The dependent variable is that outcome variable.

  • 19:31

    DR. EVELYN BEHAR [continued]: It's the variable that you end up measuring.And it is impacted by the independent variable.In our example about the effects of alcohol on gambling,the internal validity of our experimentwould be very high if everything was exactlythe same for all of those participantsexcept the only thing that differed across those fourconditions was the amount of alcohol.

  • 19:53

    DR. EVELYN BEHAR [continued]: If that's the case, then we can confidentlyconclude, for example, that drinking more alcoholleads to more gambling behavior, and our experimentis high in internal validity.[MUSIC PLAYING]

  • 20:08

    NARRATOR: Problems with internal validity.[MUSIC PLAYING]

  • 20:15

    DR. MARIANN WEIERICH: It's impossible to controlfor every other potential variable.In general, there are two ways to reduce the riskof preexisting group variables.The first is random assignment to condition,and the second is anticipating potential third variablesand trying to control for them.

  • 20:31

    NARRATOR: Random assignment to conditionreduces the risk of differences between groupsbecause, according to the laws of probability,individuals with various characteristics,such as height or hay fever or high metabolism,are likely to be represented equally in each group.If we can anticipate any third variables,factors other than our independent variable that

  • 20:52

    NARRATOR [continued]: may cause a change in our dependent variable,then we should try to control for those factors.If they're not distributed equally,they can threaten the study's internal validity.For example, if we suspect, or know from previous experience,that people who wear glasses are more likely to have gamblingaddictions, then we would randomly assign participants

  • 21:14

    NARRATOR [continued]: to condition with a stipulation that every groupwould have the same number of participants who wear glasses.[MUSIC PLAYING]Campbell and Stanley-- threats to internal validity.[MUSIC PLAYING]In the 1960s, a pair of researchers

  • 21:36

    NARRATOR [continued]: named Donald T. Campbell and Julian C. Stanleyidentified threats to internal validity,and cautioned researchers to avoid them.Each of these threats to internal validityconstitutes an alternative explanationfor why our results emerged, why it may notbe the independent variable that caused the change

  • 21:57

    NARRATOR [continued]: in the dependent variable.A good scientist is careful to think about these threats,avoid them if possible, and acknowledgethe potential alternative explanations for the study'soutcome.We will focus on six of these threats-- historyand maturation, repeated testing,regression to the mean, reaction bias, attrition,

  • 22:24

    NARRATOR [continued]: and experimenter bias.[MUSIC PLAYING]History and maturation.[MUSIC PLAYING]During the course of a study, especiallya longitudinal study, changes mayoccur on a large scale in the outside world,

  • 22:44

    NARRATOR [continued]: or in an unpreventable way to individuals.

  • 22:48

    DR. MARIANN WEIERICH: History refersto changes that happen in the largerworld to just about everyone during the course of the study.For example, on 9/11 a lot of peoplehad a shared history experience.And that experience-- their reactions to that experiencemight have impacted the way they respondedin the course of studies that didn't have anythingto do with 9/11.

  • 23:10

    NARRATOR: In a study of treatmentfor post-traumatic stress disorder, the events of 9/11might have led participants to relapse, or in some other wayinterfered with determining the efficacy of the treatment.Not much can be done to avoid the effects of historyon our study.The best we can do is to try to measure an event's impactglobally so that we can compare that impact to changes

  • 23:31

    NARRATOR [continued]: within our study.

  • 23:34

    DR. MARIANN WEIERICH: Maturation refersto changes in the study sample thatare unrelated to the independent variablebut that might affect the dependent variable.For example, if you're studying irritability in childrenand the study session lasts six hours,the children are likely to become hungry during that time.Some of these effects can be handled.For example, making sure to have food available to the children.

  • 23:56

    DR. MARIANN WEIERICH [continued]: What we can't control for is children going,say, from late childhood into puberty.And so there will be differences in these children's scoresthat are attributable to puberty rather than the study.[MUSIC PLAYING]Repeated testing.[MUSIC PLAYING]

  • 24:17

    DR. MARIANN WEIERICH [continued]: The problem of repeated testing isthat people tend to perform better the second time they'retested on a task.This could be due to learning, practice, or hypothesisguessing.

  • 24:29

    DR. EVELYN BEHAR: In learning, participants might actuallylearn something from taking a test the first time,and then end up doing better the second time.Practice applies oftentimes to tests of motor ability.So for example, if you had someone shooting basketsat a basketball court, they might do betterthe second time that they were testedrelative to the first time, after they'vewarmed up and practiced a bit.

  • 24:51

    DR. EVELYN BEHAR [continued]: In hypothesis guessing, the participantactually figures out what it is that you're trying to measure.So for example, if you were measuring personalityvariables, the participant might figure this outand, the second time that you test him,he might actually report some more desirable personalitytraits.There's one very good way to decreasethe risk of repeated testing.

  • 25:12

    DR. EVELYN BEHAR [continued]: And that's by having a control group thatactually takes the pretest.You can have everybody in your sample take a pretest,and then randomly assign your participantsto either receive some manipulationor not receive the manipulation.And then have them all take the post-test.And what this does is it controlsfor repeated testing, because all of your participants

  • 25:33

    DR. EVELYN BEHAR [continued]: are being measured twice.[MUSIC PLAYING]

  • 25:38

    NARRATOR: Regression to the mean.[MUSIC PLAYING]If you happen to do extremely well at something one time,it is unlikely that you'll surpass or evenrepeat that level of performance the next time.It's also unlikely that you'll do worsethan an initial performance that is extremely poor.

  • 25:60

    NARRATOR [continued]: So the chances are that if you'vegotten an extremely high score on a test or an extremelylow score on a test, you'll score somewhere closerto the average of other people's scoresif you take the test again.This is what is called regressionto the mean, or statistical regression.

  • 26:17

    DR. MARIANN WEIERICH: The sophomore slump in baseballrefers to when a second-year player performs worsethan he did his first year.It is not the case that amazing rookies get so much worsein their second year, it's that their performanceregresses to the mean.In science, you might want to seethe effect of a medication on motor tics,so you find a group of people whohave a really extreme frequency of tic behavior

  • 26:39

    DR. MARIANN WEIERICH [continued]: and you give them the medication.The number of tics, the frequency of ticsmight be reduced, but you can't be sureif the reduction in tic frequencyis due to the medication, or if it'sdue to just a regression to the mean suchthat there are less extreme frequency of tic behavior.One way to avoid the problem of regression to the meanis to use an appropriate control group-- for example, a placebo

  • 27:01

    DR. MARIANN WEIERICH [continued]: group-- because scores in that groupare also likely to regress to the mean.[MUSIC PLAYING]

  • 27:09

    NARRATOR: Reaction bias.[MUSIC PLAYING]When people know that they're being observed,they tend to behave differently than they normally would.This is called reaction bias.For example, if you knew that someone was keeping trackof your television-watching habits,

  • 27:30

    NARRATOR [continued]: you might decide to watch less television,or you might change the type of programmingyou watched in order to seem more sophisticated.

  • 27:38

    DR. EVELYN BEHAR: When participantstry to confirm the researcher's hypothesis,it might be that they want to feel normal,or it might be that they want to actually helpthe experimenter in some way.When participants try to disconfirm the experimenter'shypothesis, the participant wants to somehowfeel different or unique, and doesn'twant to look like he or she is part of some herd mentality.

  • 28:01

    DR. EVELYN BEHAR [continued]: When participants change their behavior because they'retrying to be liked or admired by the experimenter,they change their behavior in orderto try to appear more politically correct or sociallydesirable.If a researcher were running a studyon corporal punishment of children,some parents might actually deny everhitting their children simply because theywant to appear to be good parents

  • 28:23

    DR. EVELYN BEHAR [continued]: in the eyes of the researcher.

  • 28:25

    NARRATOR: Reaction bias can be minimized by usingone of several techniques.Use a control group that is active.In other words, let participants in both the experimental groupand the control group believe there is a reason to change.For example, we can give the control group a placeborather than letting them know that they're notreceiving any treatment.

  • 28:47

    NARRATOR [continued]: Guaranteed total anonymity.If possible, don't ask for participants' names, and letthem place their questionnaire somewhere in a large pileof other questionnaires.Measure variables in a non-obvious way.If we're interested in the extent to which people avoidviolent images, we may track their eye movements

  • 29:09

    NARRATOR [continued]: as we show them a series of violent and non-violent images,rather than asking them a direct question.If nothing else seems to work, wemay choose to mislead participantsabout the goals of the experimentso that they will not guess our hypothesis.For example, we could administer an irrelevant questionnaire.

  • 29:30

    NARRATOR [continued]: Note, however, that if we lie at the outset,we must come clean about the true purpose of the studylater.[MUSIC PLAYING]Attrition.[MUSIC PLAYING]

  • 29:46

    DR. MARIANN WEIERICH: Attrition refersto when participants who have been enrolledin a study, particularly longitudinal studies,drop out sometime after enrollment.The most common reasons for attritionare that people might move away, people pass away,or people lose interest in participating in the study.The problem is that it might not be the independent variablethat caused a change in performance,

  • 30:07

    DR. MARIANN WEIERICH [continued]: it's that dropped out of the study,and the omission of their data makes it look like the groupas a whole changed from pre-test to post-test.There are a few ways to reduce the risk of attrition.You can offer compensation, such as money, gift certificates.You might meet with the participantsto tell them about the importance of the study,and make the participants feel like they're part of something

  • 30:28

    DR. MARIANN WEIERICH [continued]: bigger and important in the world.You might send reminders and notesto participants even when it's not time for the measurement,such as sending a birthday card, just to keep them sortof in touch with the study.[MUSIC PLAYING]

  • 30:43

    NARRATOR: Experimenter bias.[MUSIC PLAYING]

  • 30:49

    DR. EVELYN BEHAR: Experimenter biasoccurs when the experimenter inadvertentlybehaves differently toward participantsin different conditions of the experiment.By doing this, the experimenter causes the participantsin each of those conditions to behave differentlyon the dependent variable.So what ends up happening is that the between-group'sdifferences that are found on the dependent variable

  • 31:10

    DR. EVELYN BEHAR [continued]: are due not to the independent variable,but to the actual experimenter's behavior.For example, let's say that an experimenteris interested in testing whether anxious individualsand non-anxious individuals differ in termsof their interpersonal skills, and every timean anxious individual arrives to the experiment,that experimenter treats that person very gently and very

  • 31:32

    DR. EVELYN BEHAR [continued]: nicely, knowing that the person is anxious,whereas every time a non-anxious participantshows up to the study, the experimenterjust treats the person as they would treat anybody.The experimenter might conclude that individualswho are anxious actually have better interpersonalskills than individuals who are not anxious.[MUSIC PLAYING]

  • 31:54


  • 31:59

    DR. EVELYN BEHAR: Another important issueis that of confounds.A confound is anything that varies systematicallyalong with the independent variable thatwill cause some change in the dependent variable.Basically, you can think of it as a third variable.It's something that you didn't intendto vary across conditions, but it simply does.For example, let's say that I am interested in learning

  • 32:22

    DR. EVELYN BEHAR [continued]: whether smoking causes cancer.I recruit 5,000 smokers and 5,000 non-smokers,and start measuring everybody when they're 25 years old.And I follow them over a 50-year period.And at the end of that 50-year period,when everybody is 75 years old, Irecord the number of participantswho developed cancer versus the number of participantswho did not develop cancer in each of those conditions.

  • 32:46

    DR. EVELYN BEHAR [continued]: If I find that indeed smokers were more likely to have cancerthan non-smokers were, it could be that smoking causes cancer.However, there are some potential confounds.It might be that smokers are morelikely to spend a lot of time outdoors,and it's exposure to ultraviolet raysthat's causing an increase in cancer risk.

  • 33:06

    DR. EVELYN BEHAR [continued]: Similarly, it may be that smokersare more likely to be coffee drinkers,and maybe coffee increases risk of cancer, and not the smokingitself.A responsible researcher always acknowledgesthat there's some risk that his or her results were impactedby a potential confound.

  • 33:21

    DR. MARIANN WEIERICH: It is not possible to avoid all confoundsin a study.They always exist, and we can't always know what all of themare.However, there are a few steps wecan take to minimize the presence of confoundsin our data.We can conduct true experiments, wherethe only thing that varies across conditionsand across groups is the independent variable.We can give lots of thought to potential third variables

  • 33:44

    DR. MARIANN WEIERICH [continued]: that might influence our data in some way.And we can do the best we can to measurethem to make sure they're equivalent across conditions.[MUSIC PLAYING]

  • 33:54

    NARRATOR: Conclusion.[MUSIC PLAYING]Whether you're doing scientific researchor taking an exam for school, youwant the results to reflect accurate measurements.You also want to be sure that the tests are measuring whatyou want them to be measuring.That is, that they have high validity,

  • 34:16

    NARRATOR [continued]: and that they're consistent across time.That is, that they have high reliability.If you get an A on an exam, it should reflect your knowledgeof the subject matter, and it shouldbe consistent with other grades you receive in the course.

  • 34:31

    DR. MARIANN WEIERICH: As a general rule, the moreobservations, raters, and types of measurementyou have, the better.

  • 34:38

    NARRATOR: Careful and well-designed researchminimizes the chances of misleading resultsand allows more accurate conclusions to be drawn,enhancing the real-world value of the research.As researchers, and as consumers of research,we need to be aware of the design flaws thatcan damage the value of research,

  • 34:58

    NARRATOR [continued]: and work to create studies that minimize these problems.[MUSIC PLAYING]

Video Info

Publisher: SAGE Publications, Inc

Publication Year: 2012

Video Type:Tutorial

Methods: Reliability, Validity

Keywords: accuracy; fluctuations; maturation; practices, strategies, and tools; Social anxiety

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:



Dr. Evelyn Behar and Dr. Mariann Weierich discuss reliability and validity in research. Reliability refers to the consistency of the variables, and with more reliability, more valid conclusions can be drawn. Behar and Weierich discuss types of reliability and validity, and how problems can affect research.

Looks like you do not have access to this content.

How Results Can Be Misleading: Problems With Reliability and Validity

Dr. Evelyn Behar and Dr. Mariann Weierich discuss reliability and validity in research. Reliability refers to the consistency of the variables, and with more reliability, more valid conclusions can be drawn. Behar and Weierich discuss types of reliability and validity, and how problems can affect research.

Copy and paste the following HTML into your website