Ubiquitous statistical fallacies: Protected versus unprotected pairwise comparisons | ॐ Homepage of Dr. Christopher B. Germann (Ph.D., M.Sc., B.Sc. / Marie Curie Alumnus)

Ubiquitous statistical fallacies

Protected versus unprotected pairwise comparisons

Christopher B. Germann (Ph.D., M.Sc., B.Sc. / Marie Curie Alumnus)

2019

URL: https://christopher-germann.de

It is generally regarded as “best practice” to compute post hoc pairwise multiple compar-

isons only after a significant omnibus F-test. Many widely sold textbooks either explicitly

or implicitly advocate the utilisation of protected tests before post hoc comparisons are

conducted (i.a., Kennedy & Bush, 1985; Maxwell & Delaney, 2004). That is, a 2-stage

strategy is advocated and widely adopted by most researchers as evidenced in the liter-

ature. The 2-stage strategy makes post hoc pairwise comparisons conditional on a sta-

tistically significant omnibus F-test (hence the name protected test). However, this rec-

ommendation is not evidence based and there is no analytic or empirical evidence in sup-

port of this practice. To the contrary, it has been empirically demonstrated that this

strategy results in a significant inflation of α-error rates (Keselman, Games, & Rogan,

1979). Further empirical evidence against the 2-stage (protected) testing strategy is

based on a Monte Carlo analysis which explicitly compared protected versus unprotected

testing procedures. Independent of the error control method used (i.e., Dunn-Šidák,

Dunn-Bonferroni, Holm, Tukey’s HSD) unprotected tests performed significantly better

compared to protected tests (Barnette & Mclean, 2005). This simulation study clearly

demonstrated that using the F-test as a “protected gateway” for post hoc pairwise com-

parison is overly conservative. The simulation results clearly show that protected tests

should not be used. Independent of weather experimentwise or per-experiment α-control

is used, and no matter which α-error control technique is used (i.e., Dunn-Šidák, Dunn-

Bonferroni, Holm, Tukey’s HSD, etc.) unprotected tests generally outperformed their

protected counterparts.

Based on this evidence, it can be safely concluded that unpro-

tected testing procedures should be preferred over 2-stage protected procedures. The

conventional wisdom of conducting omnibus tests before post hoc comparisons are per-

formed does not stand the empirical/mathematical test. The authors of the previously cited

Monte Carlo simulation study conclude their paper with the following statement: “

Only when

one is willing to question our current practice can one be able to improve on it” (Barnette

& Mclean, 2005, p. 452).

A neglectable exception was only the Holm procedure in the case of per-experiment error control (but

not in the case of experimentwise error control). In this specific constellation, α of .10 was more accu-

rate as a protected test as compared to an unprotected test. This accuracy difference was lower when α

was .05 or .01.

References

Barnette, J. J., & Mclean, J. E. (2005). Type I Error Of Four Pairwise Mean

Comparison Procedures Conducted As Protected And Unprotected Tests. Journal

of Modern Applied Statistical Methods, 4(2), 446–459.

https://doi.org/10.22237/jmasm/1130803740

Kennedy, J. J., & Bush, A. J. (1985). An introduction to the design and analysis of

experiments in behavioral research. Lanham, MD: University Press of America,

Inc.

Keselman, H. J., Games, P. A., & Rogan, J. C. (1979). Protecting the overall rate of

Type I errors for pairwise comparisons with an omnibus test statistic.

Psychological Bulletin, 86(4), 884–888. https://doi.org/10.1037/0033-

2909.86.4.884

Maxwell, S., & Delaney, H. (2004). Designing experiments and analyzing data: A

model comparison perspective. Briefings in functional genomics proteomics (Vol.

4). https://doi.org/10.1002/sim.4780100917