α-correction for simultaneous statistical inference: Familywise error rate vs. per-family error rate | ॐ Homepage of Dr. Christopher B. Germann (Ph.D., M.Sc., B.Sc. / Marie Curie Alumnus)

α-correction for simultaneous statistical inference:

Familywise error rate vs. per-family error rate

A meta-analysis of more than 30000 published articles indicated that less than 1% applied α-

corrections for multiple comparisons even though the median number of hypothesis tests per

article was ≈ 9 (Conover, 1973; Derrick & White, 2017; Pratt, 1959). A crucial, yet underap-

preciated difference, is the distinction between 1) the familywise (or experimentwise) error rate

(FWER), and 2) the per-family error rate (PFER). FWER is the probability of making at least

one Type I error in a family of hypotheses. The PFER, on the other hand, which is the number

α-errors expected to occur in a family of hypotheses (in other words, the sum of the probabili-

ties of α-errors for all the hypotheses in the family).The per-comparison error rate (PCER) is

the probability of a α-error in the absence of any correction for multiple comparisons

(Benjamini & Hochberg, 1995). Moreover, the false discovery rate (FDR) quantifies the ex-

pected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejec-

tions).

The majority of investigations focus on the former while the latter is largely ignored even

though it evidently is at least equally important if not more so (Barnette & Mclean, 2005;

Kemp, 1975). The experimentwise (EW) error rate does not take the possibility of multiple α-

errors in the same experiment into account. Per-experiment (PE) α-control techniques control

α for all comparisons (a priori and post hoc) in a given experiment. In other terms, they con-

sider all possible α-errors that in a given experiment. It has been persuasively argued that per-

experiment α control is most relevant for pairwise hypothesis decision-making (Barnette &

Mclean, 2005) even though most textbooks (and researchers) focus on the experimentwise er-

ror rate. Both approaches differ significantly in the way they adjust α for multiple hypothesis

tests. It has been pointed out that the almost exclusive focus on experimentwise error rates is

not justifiable (Barnette & Mclean, 2005). From a pragmatic point of view, per-experiment

error correction is much closer aligned with prevailing research practices. In other words, in

most experiments it is not just the largest difference between conditions which is of empirical

interest and most of the time all pairwise comparisons are computed. The EW error rate treats

each experiment as one test even though multiple comparisons might have been conducted. A

systematic Monte Carlo based comparison between four different adjustment methods showed

that, for experimentwise control, Tukey’s HSD is the most accurate procedure (as an unpro-

tected test). If experimentwise α-control is desired, Tukey’s HSD (unprotected) test is the most

accurate procedure. If the focus is on per-experiment α-control, the Dunn-Bonferroni (again

unprotected) is the most accurate α-adjustment procedure (Barnette & Mclean, 2005).

References

Barnette, J. J., & Mclean, J. E. (2005). Type I Error Of Four Pairwise Mean Comparison

Procedures Conducted As Protected And Unprotected Tests. Journal of Modern Applied

Statistical Methods, 4(2), 446–459. https://doi.org/10.22237/jmasm/1130803740

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society B.

https://doi.org/10.2307/2346101

Conover, W. J. (1973). On methods of handling ties in the wilcoxon signed-rank test. Journal

of the American Statistical Association, 68(344), 985–988.

https://doi.org/10.1080/01621459.1973.10481460

Derrick, B., & White, P. (2017). Comparing two samples from an individual Likert question.

International Journal of Mathematics and Statistics, 974–7117. Retrieved from

http://eprints.uwe.ac.uk/30814%0Ahttp://www.ceser.in/ceserp/index.php/ijms

Kemp, K. E. (1975). Multiple comparisons: comparisonwise versus experimentwise Type I

error rates and their relationship to power. Journal of Dairy Science, 58(9), 1374–1378.

https://doi.org/10.3168/jds.S0022-0302(75)84722-9

Pratt, J. W. (1959). Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures.

Journal of the American Statistical Association, 54(287), 655–667.

https://doi.org/10.1080/01621459.1959.10501526