Ubiquitous statistical fallacies
Protected versus unprotected pairwise comparisons
Christopher B. Germann (Ph.D., M.Sc., B.Sc. / Marie Curie Alumnus)
2019
URL: https://christopher-germann.de
It is generally regarded as “best practice” to compute post hoc pairwise multiple compar-
isons only after a significant omnibus F-test. Many widely sold textbooks either explicitly
or implicitly advocate the utilisation of protected tests before post hoc comparisons are
conducted (i.a., Kennedy & Bush, 1985; Maxwell & Delaney, 2004). That is, a 2-stage
strategy is advocated and widely adopted by most researchers as evidenced in the liter-
ature. The 2-stage strategy makes post hoc pairwise comparisons conditional on a sta-
tistically significant omnibus F-test (hence the name protected test). However, this rec-
ommendation is not evidence based and there is no analytic or empirical evidence in sup-
port of this practice. To the contrary, it has been empirically demonstrated that this
strategy results in a significant inflation of α-error rates (Keselman, Games, & Rogan,
1979). Further empirical evidence against the 2-stage (protected) testing strategy is
based on a Monte Carlo analysis which explicitly compared protected versus unprotected
testing procedures. Independent of the error control method used (i.e., Dunn-Šidák,
Dunn-Bonferroni, Holm, Tukey’s HSD) unprotected tests performed significantly better
compared to protected tests (Barnette & Mclean, 2005). This simulation study clearly
demonstrated that using the F-test as a “protected gateway” for post hoc pairwise com-
parison is overly conservative. The simulation results clearly show that protected tests
should not be used. Independent of weather experimentwise or per-experiment α-control
is used, and no matter which α-error control technique is used (i.e., Dunn-Šidák, Dunn-
Bonferroni, Holm, Tukey’s HSD, etc.) unprotected tests generally outperformed their
protected counterparts.
1
Based on this evidence, it can be safely concluded that unpro-
tected testing procedures should be preferred over 2-stage protected procedures. The
conventional wisdom of conducting omnibus tests before post hoc comparisons are per-
formed does not stand the empirical/mathematical test. The authors of the previously cited
Monte Carlo simulation study conclude their paper with the following statement: “
Only when
one is willing to question our current practice can one be able to improve on it” (Barnette
& Mclean, 2005, p. 452).
1
A neglectable exception was only the Holm procedure in the case of per-experiment error control (but
not in the case of experimentwise error control). In this specific constellation, α of .10 was more accu-
rate as a protected test as compared to an unprotected test. This accuracy difference was lower when α
was .05 or .01.
References
Barnette, J. J., & Mclean, J. E. (2005). Type I Error Of Four Pairwise Mean
Comparison Procedures Conducted As Protected And Unprotected Tests. Journal
of Modern Applied Statistical Methods, 4(2), 446–459.
https://doi.org/10.22237/jmasm/1130803740
Kennedy, J. J., & Bush, A. J. (1985). An introduction to the design and analysis of
experiments in behavioral research. Lanham, MD: University Press of America,
Inc.
Keselman, H. J., Games, P. A., & Rogan, J. C. (1979). Protecting the overall rate of
Type I errors for pairwise comparisons with an omnibus test statistic.
Psychological Bulletin, 86(4), 884–888. https://doi.org/10.1037/0033-
2909.86.4.884
Maxwell, S., & Delaney, H. (2004). Designing experiments and analyzing data: A
model comparison perspective. Briefings in functional genomics proteomics (Vol.
4). https://doi.org/10.1002/sim.4780100917