ThePitfallsofHypothesis Testing
Christopher B. Germann
1
Null Hypothesis Significance Tes ting
ALongSt anding Issue
Bakan,D.,1966.Thetestofsignificanceinpsychologicalresearch.PsychologicalBulletin66,423–437.
Cohen,J.(1994).Theearthisround(p<.05).AmericanPsychologist,49,9971003.
Chow,S.L.,1998.Précisof“Statisticalsignificance:rationale,validity,andutility”.BehavioralandBrainSciences,21,169–239.
Falk,R.,Greenbaum,C.W.,1995.Significancetestsdiehard.TheoryandPsycholog
y5,75–98.
Lecoutre,M.P.,Poitevineau,J.,Lecoutre,B.,2003.EvenstatisticiansarenotimmunetomisinterpretationsofNullHypothesis
SignificanceTests.InternationalJournalofPsychology38,37–45.
Loftus,G.R.,1991.Onthetyrannyofhypothesistestinginthesocialsciences.ContemporaryPsychology36,102–105.
Nickerson,R.S.,2000.Nullhypothesissignificancetesting:areviewofanoldandcontinuingcontroversy.PsychologicalMethods
5,241–301.
Meehl,P.E.(1967).Theory
testinginpsychologyandphysics:Amethodologicalparadox.PhilosophyofScience,34,103115.
Morrison.D.E.,&Henkel,R.E.(Eds.).(1970).ThesignificancetestcontroversyChicago:Aldine.
Rozeboom,W.W.(1960).Thefallacyofthenullhypothesissignificancetest.Psycholog
icalBulletin,57,416428.
Stevens,S.S.,1960.Thepredicamentindesignandsignificance.ContemporaryPsychology9,273–276.
Waller,N.G.,2004.Thefallacyofthenullhypothesisinsoftpsychology.AppliedandPreventivePsychology11,83–86.
2
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
3
Outline
Onlineexperiment
Pleasevisitthefollo wingURL:
http://irrationaldecisions.com/?page_id=159
UsernameandPasswordarebothcognovo
4
Supposeyouhaveatreatmentthatyoususpectmayalterperformanceonacertain
task.
Youcomparethemeansofyourcontrolandexperimentalgroups(say20subjectsin
eachsample).Further,supposeyouuseasimpleindependentmeansttestand
yourresultissignificant(t=2.7,d.f. =18,p=0.01).Pleasemarkeachofthe
following
statementsas“true” or“false.“False”meansthatthestatementdoes
notfollowlogicallyfromtheabovepremises.Alsonotethatseveralornoneofthe
statementsmaybecorrect.
Asimplettests
5
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
6
Outline
Referend ThomasBayes
17011761
BayestheoremwaspublishedpostmortembyhisfriendRichardPricein1763.
8
( ) ( )()
P
A B PABPB
ConditionalProbability
Pr (Ace&Blackcard)=Pr(Ace|Black)xPr(Black)
2
26
x
1
2
=
2
52
9
( ) ( | )()
P
BA PBAPA
ConditionalProbability
Pr (Blackcard&Ace)=Pr(Black|Ace)xPr(Ace)
2
4
x
1
13
=
2
52
NOTE:
(|) (|)PAB PB A
2
4
2
26
10
( ) ( )()
P
A B PABPB
()()()
P
BA PBAPA
( )() ( )() PABPB PBAPA
BAYESRULE
=
()()()/()
P
AB PBAP A PB
priorposterior
likelihood
marginalevidence
11
()()()/()
AB PBAPA PB
()()()/()
P
He PeHPH Pe
A→H=hypothesis
B→e=experimentalevidence
THEINVERSEPROBLEMOFPROBABILITY
211
x
4132



2
26
Prior:Fulldeck!
2
20
211
x
4102



Prior:Nopicturecards
12
()()()/()PHe PeHPH Pe
Yourestimationofthestateoftheworld
dependsonyourpriors(orpriorbeliefs)
0
0
0
()()
()
()
PeH PH
PH e
Pe
1
1
1
()()
()
()
PeH PH
PH e
Pe
0
0
01
01
()()
()()()()
PeH PH
PeH PH PeH PH
1
1
01
01
()()
()()()()
PeH PH
PeH PH PeH PH
2–AlternativeHypotheses
Example
1%ofwomenover40getbreastcancer
80%ofwomenwithbreastcancerhaveapositivemammography
10%ofwomenwithoutbreastcanceralsohaveapositivemammography
Awomantestspositive,whatistheprobabilityshehasbreastcancer?
11
1
()()()/()PH e PeH PH Pe
1
(e | ) 0.8PH ( ) 0.01PH
01
01
( ) ( ) ( ) ( ) ( ) (0.1)(0.99) (0.8)(0.01) 0.11Pe PeH PH PeH PH
111
000
()()()/()(0.8)(0.01)/0.110.08
()()()/()0.92
PH e PeH PH Pe
PH e PeH PH Pe


Bay e sianLiteracy
100womanov er40100womanov er40
1
breast
cancer
1
breast
cancer
0.8positive0.8positive 0.2negative0.2negative
99
nobreast
cancer
99
nobreast
cancer
9.9
positive
9.9
positive
89.1negative89.1negative
15
80%
20%
10%
90%
1% 99%
Manyphysiciancommitthesocalled“baseratefallacy(neglectingtheaprori probability).
See,forexample,Bramwell R,WestH,SalmonP. (2006). Healthprofessionals’andusers’interpretationof
screeningtestresults:anexperimentalstudy.BritishMedicalJournal.333,284286.
BAYESTHEOREM
Thetheorythatwouldnotdie
Bayes'theoremcrackedtheenigmacode
(themodernBaysian revival)
HunteddownRussiansubmarines
Emergedtriumphantfromtwocenturies
ofcontroversy!Itisnowheavilyusedin
manyfields...
16
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
17
Outline
Letmein troduce...
SirRonaldFisher
(1890‐ 1962)
HewasaneminentEnglishgeneticist,
statistician,andevolutionarybiologist.
Professorof Eugenics atthe University
CollegeLondon
laterChairofGeneticsattheUniversityof
Cambridge
18
First published 1935First published 1925
19
Fishersseminalbook s
Fishervs.Bay es
FisherexplicitlyrejectedtheBayesianapproachbuttacitlyuses
Bayesianlogicinhisownreasoning
Fisher'squasiBay e sianvie w
Althoughheisoftendescribedasafrequentist,whateverphraseology
heused,healwaysheldthatasignificantresultaff ectsourconfidence
ordegreeofbelief thatthenullhypothesisisfalse.
20
Fisherian nullhypothesis
t esting
21
Setupastatisticalnullhypothesis.
Thenullneednotbeanil
hypothesis(i.e.,zerodifference)
andtrytorejectit.
Theladytasting tea
Why5%signific ancelev e l?
InTheDesignFisherwrites:
“Itisusualandconvenientforexperimenterstotak e5%asastandard
levelofsignificance,inthesensethattheyarepreparedtoignoreall
resultswhichfailtoreachthisstandard...(Fisher,1935,p.13)
...noscientificwork erhasa fixedlevelofsignificanceatwhichfrom
yea
rtoyear,andinallcircumstances,herejectshypotheses;herather
giveshismindtoeachparticularcaseinthelightofhisevidenceandhis
ideas.(Fisher,1956,p.41)
However,FisherwasnottheonlyonecontributingtothedevelopmentofNHST...
23
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
24
Outline
Ne yman&Pearson
JerzyNeyman(1894‐ 1981)
EgonPearson(1895‐ 1980)
VirtuallyallstaticstextbooksdonotmentionNeyman andPearsonin
thecontextofNHST!
25
Neyman‐Pearsondecision
theory
26
State of the World
H
0
true H
0
false
Research
Decision
Reject H
0
Type I error (α)
Correct
RRejection (1-
β)
(Power)
Accept H
0
Correct
Acceptance
Type II error (β)
27
The4Outc omes
Type1Err or
/2
/2
1
28
Pr(H1|W0):theprobabilitythatweconcludeH1istruegiventhat
theworldisinstateW0=
TYPE1error=
Probabilityofrejectingthenullhypothesisincorrectly
H0
Power&Type2Error
29
H0 H1
/2
/2
1
=POWER
Effectsize
30
10
Effect Size
XX
sN
31
H0 H1
/2
/2
1
=POWER
PowerDependsontheEffectSize
Effectsize=1s.d.
32
H0 H1
/2
/2
1
=POWER
PowerDependsontheEffectSize
Effectsize=3s.d.
33
H0 H1
/2
/2
PowerDependsonAlpha
Effectsize=0.5s.d.
1
=POWER
34
H0 H1
/2
/2
PowerDependsonAlpha
Effectsize=0.5s.d.
1
=POWER
35
Neyman–PearsonDecisionRule
0
(Rej| )PH
AswithFisher,NPintroducedtheideaofadecisionrule:torejectthenull
hypothesisinfavourofthealternativehypothesis,Rej,ortoNOTrejectthe
nullhypothesisin favourofthenullhypothesis.
Theydidthisinaparticular way(nottheonlypossibility)
1
(Rej )PH
set
Thenmaximise
Neyman–PearsonDecision
Rule
1.Setuptwostatisticalhypotheses,H
0
andH
1
,anddecideaboutα,β,and
samplesizebeforetheexperiment,basedonsubjectivecostbenefit
considerations.Thesedefinearejectionregionforeachhypothesis.
2.IfthedatafallsintotherejectionregionofH
0
,acceptH
1
;otherwise
acceptH
0
.Notethatacceptingahypothesisdoesnotmeanthatyou
“believe”init,butonlythatyouact“asifitweretrue”.
3.Theusefulnessoftheprocedureislimitedamongotherstosituations
whereyouhaveadisjunctionofhypotheses(e.g.,eitherμ
0
=8orμ
1
=10is
true)andwhereyoucanmakemeaningfulcostbenefittradeoffsfor
choosingalphaandbeta.
4.NotBayesian(nopriors),(butcanberelatedtoposteriorprobability).
36
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
37
Outline
Fishervs.Neyman&Pearson
38
Bothcampsinthecontroversyaccusedtheother
partyofmechanical,thoughtlessstatistical
inference.
Asymmetricvs.symmetric
hypothesestesting
Fisherspecifiedonlyonehypothesis,thenull,andleftunspecifiedthe
alternativehypothesis,typicallythehypothesistheresearcherisinterested
in.Thismadenonsignificance appearanegative,worthless,and
disappointingresult.
InNeymanPearsontheory,bycontrast,thereissymmetry,andaconclusion
isdrawnfromnonsignificance.
39
Hiddenc onflicts
Theseconflictingviewsarealmostunknowntopsychologists.Textbooks
areuniformlysilent.
Theydon’tspelloutdifferencesbetweenFisher,Neyman andPearson
butinsteadpresentahybridversionwhichbothpartieswouldnothave
agreedupon.
40
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
41
Outline
TheoffspringofFisherandNP:
Thecurr en tlyusedhybridtheory
“AmishmashofFisherandNeymanPearson,withinvalid
Bayesianinterpretation”(Cohen,1994,p.998)
Cohen,J.(1994).Theearthisround(p<.05).AmericanPsychologist,49,9971003.
42
43
1. Setuponly1hypothesis(H
0
)(inaccordwithFisher).
2. Use5%asaconventionforrejectingthenull.Ifsignificant,accept
yourresearchhypothesis.Reporttheresultasp<0.05,p<0.01,or
p<0.001(whichevercomesnexttotheobtainedpvalue).
3. Makeayes/nodecision(congruentwithNPbutintheirtheorythe
signific
ancelev
elisnotfixedbyconventionbutbythinkingaboutα,
β,andthesamplesize).
4. Alwaysperformthisprocedure
TheHybrid
Criticism
FormerAPA presidentPaulMeehl (1978)putit
strongly:
“IsuggesttoyouthatSirRonaldhasbefuddledus,
mesmerizedus,andledusdowntheprimrosepath.I
believethatthealmostuniversalrelianceonmerely
refutingthenullhypothesisisoneoftheworstthings
thateverhappenedinthehistoryof p
sychology.(p.
817)
JacobCohenarguesthatnullhypothesissignificance
testing„notonlyfailstosupporttheadvanceof
psychologyasasciencebutalsohasseriouslyimpeded
it.(Cohen,1997,p.997)
45
GerdGigerenzer
“Fewresearchersareawarethattheirownheroesrejectedwhatthey
practiceroutinely.Awarenessoftheoriginsoftheritualandofits
rejectioncouldcauseavirulentcognitivedissonance,inadditionto
dissonancewitheditors,reviewers,anddearcolleagues.Suppressionof
conflictsandcontradictinginformationisintheverynatureofthissocial
ritual.(Giger
enzer,2004,p.592)
47
In t erimsummary:
Wha tiswrongwithNHST?
Itdoesnottelluswhatwewanttoknow!Whatwewanttoknowis
"Giventhesedata,whatistheprobabilitythatH
0
istrue?"
Whatittellsusis"GiventhatH
0
istrue,whatistheprobabilityofthese
(ormoreextreme)data?“
Thesetwostatementsarenotthesameashasbeenpointedoutmany
timesovertheyearsbyMeehl (1978,1986,1990a,1990b),Gigerenzer
(1993),andCohen(1990).
48
Analy sis anddiscussionof
experimen talr esults
49
t=2.7,d.f. =18,p=0.01
1.Youhaveabsolutelydisprovedthenullhypothesis(thatis,thereisnodifference
betweenthepopulationmeans).
2.Youhavefoundtheprobabilityofthenullhypothesisbeingtrue.
3.Youhaveabsolutelyprovedyourexperimentalhypothesis(thatthereisadifference
betweenthepopulationmeans).
4.Youcandeducetheprobabilityoftheexpe
rimentalhypothesisbeingtrue.
5.Youknow,ifyoudecidetorejectthenullhypothesis,theprobabilitythatyouare
makingthewrongdecision.
6.Youhaveareliableexperimentalfindinginthesensethatif,hypothetically,the
experimentwererepeatedagreatnumberoftimes,youwouldobtainasignificantr
esult
on99%ofoccasions.
Oakes,M.(1986).Statisticalinference:Acommentaryforthesocialandthebehavioral
sciences.Chichester,England:Wiley.
Logic alfallacies
Recallthatapvalueistheprobabilityoftheobserveddata(orofmore
extremedatapoints),giventhatthenullhypothesisH
0
istrue,defined
insymbolsasp(D|H
0
).
Statements1and3areeasilydetectedasformalfallacies,becausea
significancetestcanneverdisprovethenullhypothesisorthe
(undefined)experimentalhypothesis.1and3areinstancesofthe
illusionofcertainty(Gigerenzer,2002).
51
Logic alfallacies
Statements2and4arealsofalse.Theprobabilityp(D|H
0
)isnotthe
sameasp(H
0
|D),andmoregenerally,asignificancetestdoesnot
provideaprobabilityforahypothesis.
Statement5alsoreferstoaprobabilityofahypothesis.Thisisbecause
ifonerejectsthenullhypothesis,theonlypossibilityofmakingawrong
decisionisifthenullhypothesisistrue.Thus,itmakesessentia
llythe
sameclaimasStatement2does,andbothareincorrect.
Statement6amountstothereplicationfallacy(Gigerenzer,1993,2000).
Here,p=1%istak entoimplythatsuchsignificantdatawouldreappear
in99%oftherepetitions.
52
Replic a tionfallacy
thebeliefthatthelevelofsignificancegivesinformationaboutthe
replicabilityofanexperiment.
P(D|H
0
)doesnotimplyanything aboutp(Replication)
TheeditoroftheJournalofExperimentalPsychology,ArthurMelton,
statedthatheusedthelevelofsignificancereportedinsubmitted
papersasthemeasureoftheconfidencethattheresultsofthe
experimentwouldberepeatableundertheconditionsdescribed”
(Melton,1962,p.553).
Or,“Ifthe
statisticalsignificanceisatthe0.05level...theinvestigator
canbeconfidentwithoddsof95outof100thattheobserved
differencewillholdupinfutureinvestigations”(Nunnally,1975,p.195).
53
Perpe tua tingstatistical
illusions
Guilford,J.P.,1942.Fundamental
StatisticsinPsychologyandEducation,
3rded.,1956;6thed.,1978(with
Fruchter,B).McGrawHill,NewYork.
InGuildfords handspvaluesturn
miraculouslyintoBayesianposterior
probabilities.
Iftheresultcomesoutoneway,the
hypothesisisprobablycorrect,ifit
comesoutanotherway,thehypothesis
isprobablywrong(p.156).
54
80%ofthestatisticsteachers
sharedillusionswiththeir
s tuden ts!
55
Thepermanen tillusion
Inv erseprobability
p(D|H
0
)≠p(H
0
|D)
Nullhypothesissignificancetestingcanonlytellusp(D|H
0
),thatis,the
probabilityofthedatagiventhenullhypothesis.
Wecannotderivetheinverse(posteriorBayesian)probabilityp(H
0
|D),
theprobabilityofahypothesisgiventhedata!
Wecannotevencalculatep(D|H
A
),aswecaninNeyman andPearson’ s
theorybecauseH
A
hasnotbeenspecified.
56
WishfulBay esianthinking
... isthebeliefthatthelevelofsignificance,say.05,istheprobability
thatthenullhypothesisiscorrect.
Orconsequentlythat1‐.05istheprobabilitythatthealternative
hypothesisiscorrect.
57
Necessaryillusions
Wearguethatsuchillusionsarenecessarytomaintainthedreamof
mechanizedinductiveinference.
Withoutillusions,wewouldseeclearlythatthehybridsimplygivesus
p(D|H
0
)andnothingmore!
Withoutillusions,theritualwouldbeeasilyrecognizedforwhatitis.
58
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
59
Outline
Aparable
concerning
editorial policies
Notknowing what is f alse and what is not the
r esear cher sees 125h ypotheses as true,45of which
ar e not.Theneg ativ eresults ar e much mor e r eliable
butless likely to be published.
68
Type1err or infla tion:Imagine a
tests on1000hypotheses 100 of
which are true
66
Thetests have af alse positive ra teof 5%.Tha t means
they pr oduce 45f alse positiv es(5%of 900).They
ha v eapow erof.8,sotheycanconfirmonly80ofthe
trueh ypotheses,pr oducing20neg a tiv es.
67
Notknowing what is f alse and what is not the
r esear cher sees 125h ypotheses as true,45of which
ar e not.Theneg ativ eresults ar e much mor e r eliable
butless likely to be published.
68
Experiment
Bayes
Fisher
Neyman &
Pearson
Fishervs.
Neyman &
Pearson
Hybrid
Theory
Publication
bias
Solutions
69
Outline
Thelogicofscien tificInfer ence
Thelessontobelearnedisthatnosinglesolutiontotheprocessof
inductiveinferenceexists.
AmagicalternativetoNHST,someothermechanicalritualtoreplaceit
doesn’texist.(Cohen,1997)
70
OtherApproaches
Confidenceintervals
MaximumLikelihood
Bayesianstatistics
Avoidmakingpiecewisedecisions
NHST
(inconsistenthybrid)
71
72
Visualisingdat a
Blumenthal90
Study
2 1 012
Yukna98
Masters96
Borghetti93
Meadows93
Altiere 79
Meandifferencebetweentest&controlgroups
ForestPlots
73
BayesFactors
1
11
00
0
()
()
()
.
() ()()
PeH
PH e
PH
PH e PeH PH







Posterior
oddsratio
BAYES
FACTOR
Priors
oddsratio
Conclusion
Statistical reasoningisanart andsodemandsbothmathematical
knowledgeandinformedjudgment.Whenitismechanized,aswiththe
institutionalizedhybridlogic,itbecomesritual,notreasoning.
74
Thankyouforyourattention!
An yquest ionsar ewelc ome...