Evaluation of statistical null-hypotheses
in a Bayesian framework
A ROPE & HDI-based decision algorithm
Christopher B. Germann (Ph.D., M.Sc., B.Sc. / Marie Curie Alumnus)
2018
URL: https://christopher-germann.de
HDI and ROPE based decision algorithm for hypothesis testing.
(HDI
0.95
ROPE = data) {0,1}.
In the majority of scientific research, it is conventional to try to reject H
0
.
Bayesian parameter estimation can likewise be utilised to assess the cred-
ibility of a given null hypotheses (e.g., μ
1
μ
2
= 0). This can be achieved by
examining the posterior distribution of the plausible parameter values
(i.e., one simply checks if the null value lies within the credible interval of
θ). If the null value departs from the most credible parameter value esti-
mates it can be rejected in the classical Popperian sense (Meehl, 1967;
Rozeboom, 2005; Steiger, 2004). By contrast, if the credible values are al-
most identical to the null value than H
0
can also be accepted, in contrast to
the asymmetry inherent to NHST. To be more explicit, Bayesian param-
eter estimation methods allow the researcher to accept and reject a null
value. Hence it can be regarded as a symmetrical hypothesis testing pro-
cedure.
Another significant logical problem associated with NHST is that alterna-
tive theories can be expressed very imprecisely (if at all) and still be “cor-
roborated” by rejection of H
0
. A problem known in philosophy of science
as “Meehls’ paradox” (Carlin, Louis, & Carlin, 2009), named after the in-
genious psychologist and former APA president Paul Meehl (see
Rozeboom, 2005; Steiger, 2004). Differences of means that are infinitesi-
mally larger than zero can become statistically significant if n is large
enough. That is, given a large enough sample, any magnitude of difference
can be considered statistically significantly greater than zero. Bayesian pa-
rameter estimation provides methods to circumvent this particular issue
by constructing a region of practical equivalence (ROPE) around the null
value (or any other parameter of interest). The ROPE is a bipolar interval
that specifies a predefined range of parameter values that are regarded as
compatible with H
0
. In other words, the definition of the ROPE depends
on the experiment at hand and it involves a subjective judgment on the
part of the investigator. As n , the probability that the difference of
means is exactly zero is zero. Of theoretical interest is the probability that
the difference may be too small to be of any practical significance. In Bayes-
ian estimation and decision theory, a region of practical equivalence
around zero is predefined. This allowed to compute the exact probability
that the true value of the difference lies inside this predefined interval
(Gelman, Carlin, Stern, & Rubin, 2004). In the psychophysics experiment
at hand, a difference of ± 0.01 in the visual analogue scale ratings was con-
sidered too trivial to be of any theoretical importance (ergo, the a priori
specified ROPE ranged from [-0.01;0,01]).
In addition to parameter estimation, the posterior distribution can be uti-
lised to make discrete decisions about specific hypotheses. High Density
Intervals contain rich distributional information about parameters of in-
terest. Moreover, a HDI can be utilised to facilitate reasonable decisions
about null values (i.e., the null hypothesis that there is no difference be-
tween condition V
00
and V
01
). HDIs indicate which values of θ are most
credible/believable. Furthermore, the HDI width conveys information re-
garding the certainty of beliefs in the parameter estimate, i.e., it quantifies
certainty vs. uncertainty. A wide HDI is signifies a large degree of uncer-
tainty pertaining to the possible range of values of θ, whereas a narrow
HDI indicates a high degree of certainty with regards to the credibility of
the parameters in the distribution. It follows, that the analyst can define a
specific degree of certainty by varying the width of the HDI. In other
words, the HDI entails the assembly of most likely values of the estimated
parameters. For instance, for a 95% HDI, all parameter values inside the
interval (i.e., 95% of the total probability mass) have a higher probability
density (i.e., credibility/trustworthiness) relative to those outside the inter-
val (5% of the total mass). Moreover, the HDI contains valuable distribu-
tional information, I n contrast to classic frequentists confidence intervals
(CI). For a classical 95% CI, all values within its range are equally likely, i.e.,
values in the centre of the confidence interval are equally like as those lo-
cated at the outer extremes. Furthermore, the range of 95% CI does not
entail 95% of the most credible parameter values. The choses terminology
is in actuality very misleading as it gives the impression that the 95% CI
carries information about the confidentiality of the values it entails (which
it does not) The related widely shared logical fallacies are discussed in
chapter xxx. The Bayesian HDI does what the CI pretends to do. For ex-
ample, a 95% HDI is based on a density distribution, meaning that values
in its centre are more likely than those at the margin, viz., the total proba-
bility of parameter values within the HDI is 95%. The HDI encompasses a
large number of parameter values that are jointly credible, given the em-
pirical data. In other terms, the HDI provides distributions of credible val-
ues of θ, not merely point estimates as is the case with CIs. Thus, the HDI
can be considered as a measure of precision of the Bayesian parameter
estimation it provides a summary of the distribution of the credible values
of θ. Another major advantage of HDIs over Cis is their insensitivity with
regards to sampling strategies and other data-collection idiosyncrasies that
distort (and oftentimes logically invalidate) the interpretation of p-values,
and therefore Cis (which are based on p values). The statistical inadequa-
cies of CIs (which are nowadays advertised as an integral part of “the new
statistics”) are discussed in greater detail in chapter xxx.
The specified HDI can also be utilised in order to decide which values for
θ are credible (given the empirical data). For this purpose, a “Region of Prac-
tical Interest” (ROPE)
1
is constructed around the value of θ. Consider a
ROPE for θ = 0 (i.e., μ
1
μ
2
= 0) is defined. The 95% ROPE defines a narrow
interval which specifies values that are deemed equivalent to θ = 0. That
is, for all practical purpose, values that lie within the Region of Practical In-
terest are regarded as equivalent to θ = 0. The ROPE procedure allows flex-
ibility in decision-making which is not available in other conventional
procedures (e.g., NHST). Another significant advantage is that no correc-
tion for multiple comparisons are needed because no p values are in-
volved. In other words, the analysis does not have to take α- inflation into
account (Kruschke & Vanpaemel, 2015). However, it should be emphasized
that the Bayesian procedure is not immune to α-errors (false alarms). The
Bayesian analysis (and any other class of analyses) can lead to fallacious
conclusions if the data is not representative of the population of interest
(due to sampling bias, response bias, or any number of other potentially
confounding factors).
1
The literature contains a multifarious nomenclature to refer to “regions of practical equivalence”. Synonymous
terms are, inter alia: “smallest effect size of interest”, “range of equivalence,” “interval of clinical equivalence,”
and “indifference zone,” etcetera (but see Kruschke & Liddell, 2017).
The crucial analytic question is: Are any of the values within the ROPE
sufficiently credible given the empirical data at hand? This question can be
solved by consulting the HDI. We asserted in the previous paragraphs that
any value that falls within the High Density Interval can be declared as
reasonably credible/believable. It follows logically that a given ROPE value
is regarded as incredible if it does not lie within the HDI and, vice versa,
ROPE values that fall within the HDI are considered credible. The heuris-
tic “accept versus reject” decision rule based on the HDI and the ROPE can
thus be summarized with the following two statements:
“A parameter value is declared to be not credible, or rejected, if its entire ROPE
lies outside the 95% highest density interval (HDI) of the posterior distribution of
that parameter.”
“A parameter value is declared to be accepted for practical purposes if that value’s
ROPE completely contains the 95% HDI of the posterior of that parameter.”
(Dieudonne, 1970)
Expressed as a logical representation, the decision rule can be stated as fol-
lows.
Equation 1. HDI and ROPE based decision algorithm for hypothesis test-
ing.
(HDI
0.95
ROPE = data) {0,1}.
where denotes the set membership, the intersection, and is the Bour-
baki notation (Festa, 1993, p. 22, content in braket added) denoting an
empty set containing no elements.
A related question is: What is the probability that θ is enclosed by the
ROPE (has set membership). This question can be posed as follows:
( ROPE data).
The ROPE is specified by taking theoretical considerations and a prior
knowledge into account. The researcher must determine what “practically
equivalent” means in the specific experimental context at hand, that is,
which values around the landmark of zero are to be regarded as equal to
zero. This decision should ideally be made a priori and independent from
the empirical data observed in the current experimental situation. Hence,
the ROPE is predetermined fixed interval (i.e., a constant with no vari-
ance). The 95% HDI on the other hand, is entirely defined by the postulated
model and the empirical data.
As opposed to NHST, the ROPE based decision procedure can both reject
and accept the null (can only reject). The question becomes: Should be ac-
cept the null value as indicated by the HDI/ROPE procedure? Given that
the limits of the ROPE are subjectively determined one would like to know
what the conclusion would be if we had specified a ROPE with different
bounds. The posterior distribution in combination with the parameters of
the 95% HDI is de facto all that is needed to evaluate if a different (e.g., nar-
rower) ROPE would still lead to the conclusion to accept the null value.
In sum, it can be concluded that the discrete (binary) decision about the
credibility of parameter values based on the combination of HDI and
ROPE indicates that there is no difference for the means between experi-
mental condition v00 versus v01. More specifically, because the 95% HDI
was contained within the ROPE we concluded that the difference between
means is practically equivalent to zero. It should be underscored that this
is a pragmatic decision based on Bayesian (propositional) logic and not a
frequentists interpretation. Moreover, it should be emphasized that the
reduction of an information rich posterior probability distribution into a
binary “yes versus no” decision is based on several additional assumptions
that are independent of the informational value of the HDI. The HDI con-
veys valuable distribution information about the parameter in question,
independent from its auxiliary role in deciding about a point-hypothesis
(i.e., whether μ
1
– μ
2
= 0).
Thus, reporting the exact 95% HDI allows the sceptical reader to construct
their own subjectively/empirically motivated ROPE for comparison.
References
Carlin, B. P., Louis, T. A., & Carlin, B. P. (2009). Bayesian methods for data
analysis. Chapman & Hall/CRC texts in statistical science series.
https://doi.org/10.1002/1521-3773(20010316)40:6<9823::AID-
ANIE9823>3.3.CO;2-C
Dieudonne, J. A. (1970). The Work of Nicholas Bourbaki. The American
Mathematical Monthly, 77, 134145.
Festa, R. (1993). Bayesian Point Estimation, Verisimilitude, and
Immodesty. In Optimum Inductive Methods (pp. 3847). Dordrecht:
Springer Netherlands. https://doi.org/10.1007/978-94-015-8131-8_4
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian Data
Analysis. Chapman Texts in Statistical Science Series.
https://doi.org/10.1007/s13398-014-0173-7.2
Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian New Statistics:
Hypothesis testing, estimation, meta-analysis, and power analysis
from a Bayesian perspective. Psychonomic Bulletin & Review.
https://doi.org/10.3758/s13423-016-1221-4
Kruschke, J. K., & Vanpaemel, W. (2015). Bayesian estimation in
hierarchical models. The Oxford Handbook of Computational and
Mathematical Psychology, 279299.
https://doi.org/10.1093/oxfordhb/9780199957996.013.13
Meehl, P. E. (1967). Theory-Testing in Psychology and Physics: A
Methodological Paradox. Philosophy of Science, 34(2), 103115.
https://doi.org/10.1086/288135
Rozeboom, W. W. (2005). Meehl on metatheory. Journal of Clinical
Psychology. https://doi.org/10.1002/jclp.20184
Steiger, J. H. (2004). Paul Meehl and the evolution of statistical methods
in psychology. Applied and Preventive Psychology, 11(1), 6972.
https://doi.org/10.1016/j.appsy.2004.02.012