Evaluation of statistical null-hypotheses

in a Bayesian framework

A ROPE & HDI-based decision algorithm

Christopher B. Germann (Ph.D., M.Sc., B.Sc. / Marie Curie Alumnus)

2018

URL: https://christopher-germann.de

HDI and ROPE based decision algorithm for hypothesis testing.

(HDI

0.95

∩ ROPE = ∅ ∣ data) ∈ {0,1}.

In the majority of scientific research, it is conventional to try to reject H

0

.

Bayesian parameter estimation can likewise be utilised to assess the cred-

ibility of a given null hypotheses (e.g., μ

1

– μ

2

= 0). This can be achieved by

examining the posterior distribution of the plausible parameter values

(i.e., one simply checks if the null value lies within the credible interval of

θ). If the null value departs from the most credible parameter value esti-

mates it can be rejected in the classical Popperian sense (Meehl, 1967;

Rozeboom, 2005; Steiger, 2004). By contrast, if the credible values are al-

most identical to the null value than H

0

can also be accepted, in contrast to

the asymmetry inherent to NHST. To be more explicit, Bayesian param-

eter estimation methods allow the researcher to accept and reject a null

value. Hence it can be regarded as a symmetrical hypothesis testing pro-

cedure.

Another significant logical problem associated with NHST is that alterna-

tive theories can be expressed very imprecisely (if at all) and still be “cor-

roborated” by rejection of H

0

. A problem known in philosophy of science

as “Meehls’ paradox” (Carlin, Louis, & Carlin, 2009), named after the in-

genious psychologist and former APA president Paul Meehl (see

Rozeboom, 2005; Steiger, 2004). Differences of means that are infinitesi-

mally larger than zero can become statistically significant if n is large

enough. That is, given a large enough sample, any magnitude of difference

can be considered statistically significantly greater than zero. Bayesian pa-

rameter estimation provides methods to circumvent this particular issue

by constructing a region of practical equivalence (ROPE) around the null

value (or any other parameter of interest). The ROPE is a bipolar interval

that specifies a predefined range of parameter values that are regarded as

compatible with H

0

. In other words, the definition of the ROPE depends

on the experiment at hand and it involves a subjective judgment on the

part of the investigator. As n → ∞, the probability that the difference of

means is exactly zero is zero. Of theoretical interest is the probability that

the difference may be too small to be of any practical significance. In Bayes-

ian estimation and decision theory, a region of practical equivalence

around zero is predefined. This allowed to compute the exact probability

that the true value of the difference lies inside this predefined interval

(Gelman, Carlin, Stern, & Rubin, 2004). In the psychophysics experiment

at hand, a difference of ± 0.01 in the visual analogue scale ratings was con-

sidered too trivial to be of any theoretical importance (ergo, the a priori

specified ROPE ranged from [-0.01;0,01]).

In addition to parameter estimation, the posterior distribution can be uti-

lised to make discrete decisions about specific hypotheses. High Density

Intervals contain rich distributional information about parameters of in-

terest. Moreover, a HDI can be utilised to facilitate reasonable decisions

about null values (i.e., the null hypothesis that there is no difference be-

tween condition V

00

and V

01

). HDIs indicate which values of θ are most

credible/believable. Furthermore, the HDI width conveys information re-

garding the certainty of beliefs in the parameter estimate, i.e., it quantifies

certainty vs. uncertainty. A wide HDI is signifies a large degree of uncer-

tainty pertaining to the possible range of values of θ, whereas a narrow

HDI indicates a high degree of certainty with regards to the credibility of

the parameters in the distribution. It follows, that the analyst can define a

specific degree of certainty by varying the width of the HDI. In other

words, the HDI entails the assembly of most likely values of the estimated

parameters. For instance, for a 95% HDI, all parameter values inside the

interval (i.e., 95% of the total probability mass) have a higher probability

density (i.e., credibility/trustworthiness) relative to those outside the inter-

val (5% of the total mass). Moreover, the HDI contains valuable distribu-

tional information, I n contrast to classic frequentists confidence intervals

(CI). For a classical 95% CI, all values within its range are equally likely, i.e.,

values in the centre of the confidence interval are equally like as those lo-

cated at the outer extremes. Furthermore, the range of 95% CI does not

entail 95% of the most credible parameter values. The choses terminology

is in actuality very misleading as it gives the impression that the 95% CI

carries information about the confidentiality of the values it entails (which

it does not) The related widely shared logical fallacies are discussed in

chapter xxx. The Bayesian HDI does what the CI pretends to do. For ex-

ample, a 95% HDI is based on a density distribution, meaning that values

in its centre are more likely than those at the margin, viz., the total proba-

bility of parameter values within the HDI is 95%. The HDI encompasses a

large number of parameter values that are jointly credible, given the em-

pirical data. In other terms, the HDI provides distributions of credible val-

ues of θ, not merely point estimates as is the case with CIs. Thus, the HDI

can be considered as a measure of precision of the Bayesian parameter

estimation it provides a summary of the distribution of the credible values

of θ. Another major advantage of HDIs over Cis is their insensitivity with

regards to sampling strategies and other data-collection idiosyncrasies that

distort (and oftentimes logically invalidate) the interpretation of p-values,

and therefore Cis (which are based on p values). The statistical inadequa-

cies of CIs (which are nowadays advertised as an integral part of “the new

statistics”) are discussed in greater detail in chapter xxx.

The specified HDI can also be utilised in order to decide which values for

θ are credible (given the empirical data). For this purpose, a “Region of Prac-

tical Interest” (ROPE)

1

is constructed around the value of θ. Consider a

ROPE for θ = 0 (i.e., μ

1

– μ

2

= 0) is defined. The 95% ROPE defines a narrow

interval which specifies values that are deemed equivalent to θ = 0. That

is, for all practical purpose, values that lie within the Region of Practical In-

terest are regarded as equivalent to θ = 0. The ROPE procedure allows flex-

ibility in decision-making which is not available in other conventional

procedures (e.g., NHST). Another significant advantage is that no correc-

tion for multiple comparisons are needed because no p values are in-

volved. In other words, the analysis does not have to take α- inflation into

account (Kruschke & Vanpaemel, 2015). However, it should be emphasized

that the Bayesian procedure is not immune to α-errors (false alarms). The

Bayesian analysis (and any other class of analyses) can lead to fallacious

conclusions if the data is not representative of the population of interest

(due to sampling bias, response bias, or any number of other potentially

confounding factors).

1

The literature contains a multifarious nomenclature to refer to “regions of practical equivalence”. Synonymous

terms are, inter alia: “smallest effect size of interest”, “range of equivalence,” “interval of clinical equivalence,”

and “indifference zone,” etcetera (but see Kruschke & Liddell, 2017).

The crucial analytic question is: Are any of the values within the ROPE

sufficiently credible given the empirical data at hand? This question can be

solved by consulting the HDI. We asserted in the previous paragraphs that

any value that falls within the High Density Interval can be declared as

reasonably credible/believable. It follows logically that a given ROPE value

is regarded as incredible if it does not lie within the HDI and, vice versa,

ROPE values that fall within the HDI are considered credible. The heuris-

tic “accept versus reject” decision rule based on the HDI and the ROPE can

thus be summarized with the following two statements:

“A parameter value is declared to be not credible, or rejected, if its entire ROPE

lies outside the 95% highest density interval (HDI) of the posterior distribution of

that parameter.”

“A parameter value is declared to be accepted for practical purposes if that value’s

ROPE completely contains the 95% HDI of the posterior of that parameter.”

(Dieudonne, 1970)

Expressed as a logical representation, the decision rule can be stated as fol-

lows.

Equation 1. HDI and ROPE based decision algorithm for hypothesis test-

ing.

(HDI

0.95

∩ ROPE = ∅ ∣ data) ∈ {0,1}.

where ∈ denotes the set membership, ∩ the intersection, and ∅ is the Bour-

baki notation (Festa, 1993, p. 22, content in braket added) denoting an

empty set containing no elements.

A related question is: What is the probability that θ is enclosed by the

ROPE (has set membership). This question can be posed as follows:

( ∈ ROPE ∣ data).

The ROPE is specified by taking theoretical considerations and a prior

knowledge into account. The researcher must determine what “practically

equivalent” means in the specific experimental context at hand, that is,

which values around the landmark of zero are to be regarded as equal to

zero. This decision should ideally be made a priori and independent from

the empirical data observed in the current experimental situation. Hence,

the ROPE is predetermined fixed interval (i.e., a constant with no vari-

ance). The 95% HDI on the other hand, is entirely defined by the postulated

model and the empirical data.

As opposed to NHST, the ROPE based decision procedure can both reject

and accept the null (can only reject). The question becomes: Should be ac-

cept the null value as indicated by the HDI/ROPE procedure? Given that

the limits of the ROPE are subjectively determined one would like to know

what the conclusion would be if we had specified a ROPE with different

bounds. The posterior distribution in combination with the parameters of

the 95% HDI is de facto all that is needed to evaluate if a different (e.g., nar-

rower) ROPE would still lead to the conclusion to accept the null value.

In sum, it can be concluded that the discrete (binary) decision about the

credibility of parameter values based on the combination of HDI and

ROPE indicates that there is no difference for the means between experi-

mental condition v00 versus v01. More specifically, because the 95% HDI

was contained within the ROPE we concluded that the difference between

means is practically equivalent to zero. It should be underscored that this

is a pragmatic decision based on Bayesian (propositional) logic and not a

frequentists interpretation. Moreover, it should be emphasized that the

reduction of an information rich posterior probability distribution into a

binary “yes versus no” decision is based on several additional assumptions

that are independent of the informational value of the HDI. The HDI con-

veys valuable distribution information about the parameter in question,

independent from its auxiliary role in deciding about a point-hypothesis

(i.e., whether μ

1

– μ

2

= 0).

Thus, reporting the exact 95% HDI allows the sceptical reader to construct

their own subjectively/empirically motivated ROPE for comparison.

References

Carlin, B. P., Louis, T. A., & Carlin, B. P. (2009). Bayesian methods for data

analysis. Chapman & Hall/CRC texts in statistical science series.

https://doi.org/10.1002/1521-3773(20010316)40:6<9823::AID-

ANIE9823>3.3.CO;2-C

Dieudonne, J. A. (1970). The Work of Nicholas Bourbaki. The American

Mathematical Monthly, 77, 134–145.

Festa, R. (1993). Bayesian Point Estimation, Verisimilitude, and

Immodesty. In Optimum Inductive Methods (pp. 38–47). Dordrecht:

Springer Netherlands. https://doi.org/10.1007/978-94-015-8131-8_4

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian Data

Analysis. Chapman Texts in Statistical Science Series.

https://doi.org/10.1007/s13398-014-0173-7.2

Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian New Statistics:

Hypothesis testing, estimation, meta-analysis, and power analysis

from a Bayesian perspective. Psychonomic Bulletin & Review.

https://doi.org/10.3758/s13423-016-1221-4

Kruschke, J. K., & Vanpaemel, W. (2015). Bayesian estimation in

hierarchical models. The Oxford Handbook of Computational and

Mathematical Psychology, 279–299.

https://doi.org/10.1093/oxfordhb/9780199957996.013.13

Meehl, P. E. (1967). Theory-Testing in Psychology and Physics: A

Methodological Paradox. Philosophy of Science, 34(2), 103–115.

https://doi.org/10.1086/288135

Rozeboom, W. W. (2005). Meehl on metatheory. Journal of Clinical

Psychology. https://doi.org/10.1002/jclp.20184

Steiger, J. H. (2004). Paul Meehl and the evolution of statistical methods

in psychology. Applied and Preventive Psychology, 11(1), 69–72.

https://doi.org/10.1016/j.appsy.2004.02.012