Statistical Power & Sample Size – An interactive intuitive spatial metaphor

power = Pr ( reject  H0 ∣ H1  is true ).

Display an interactive visual metaphor which intuitively demonstrates the logic behind statistical power.
The app was programmed in Actionscript2 / Shockwave Flash and therefore requires an external plugin which you can download/install here.

The larger the zoom-factor of the telescope (i.e., the larger the sample size) the larger the difference between the stars (i.e., the experimental groups) appears… When the sample is large enough even a minute differences become statistically significant.

Alternative format - no external plugin required

The larger the zoom-factor of the telescope (i.e., the larger the sample size) the larger the difference between the stars (i.e., the experimental groups) appears… When the sample is large enough even a minute differences become statistically significant.

The relationship between sample size and sensitivity has important implications for normality testing. Statistical normality tests are generally speaking not sensitive with small sample sizes and too sensitive with large sample sizes.

Simulations in R

In general, normality tests should always reject the null of normal distribution for large (though not insanely large) samples. And so, perversely, normality tests should only be used for small samples, when they presumably have lower power and less control over Type I error rate.

  • the statistical significance criterion used in the test
  • the magnitude of the effect of interest in the population
  • the sample size used to detect the effect
#Shapiro-Wilk Normality Test
#Sam S. Shapiro, Martin Bradbury Wilk: An analysis of variance test for normality #(for complete samples), Biometrika, 1965

x <- replicate(100,{ #generates 100 different tests on each distribution
  c( #rnorm gives a random draw from the normal distribution
    shapiro.test(rnorm(10)+c(1,0,2,0,1))$p.value,
    shapiro.test(rnorm(100)+c(1,0,2,0,1))$p.value,
    shapiro.test(rnorm(1000)+c(1,0,2,0,1))$p.value,
    shapiro.test(rnorm(5000)+c(1,0,2,0,1))$p.value
# the Shapiro.test() function cannot run if the sample size exceeds 5000
    )
  } 
)
rownames(x)<-c("n10","n100","n1000","n5000")


#########################################################################
rowMeans(x<0.05) #the proportion of significant deviations from normality
#########################################################################

#"There is no excuse for failing to plot and look." J.W. Tukey, 1977

par(mfrow=c(2,2))

qqnorm(rnorm(10), main = "Normal Q-Q Plot n=10",
       xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
       plot.it = TRUE, datax = FALSE)

qqnorm(rnorm(100), main = "Normal Q-Q Plot n=100",
       xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
       plot.it = TRUE, datax = FALSE)

qqnorm(rnorm(1000), main = "Normal Q-Q Plot n=1000",
       xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
       plot.it = TRUE, datax = FALSE)

qqnorm(rnorm(5000), main = "Normal Q-Q Plot n=5000",
       xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
       plot.it = TRUE, datax = FALSE)

 

qq-plot


The Galton board

francis galton

The Galton board is a model of multiple binary decisions. By studying the Galton board,one can see how the techniques of probability can be used to predict the long-range behavior of events that are completely random. The Law of Large Numbers predicts roughly how many balls will end up in each bin after many trials. The Central Limit Theorem predicts with great accuracy what the overall distribution of balls will be. The concepts of the mean and standard deviation provide a vocabulary for comparing the results of a single trial to the expected results of many trials. These are classic, powerful, ideas from the mathematical study of probability.

References