Numerous methods exist for multiple comparisons following a significant Analysis of Variance (ANOVA). Most involve pairwise comparisons to identify which specific means differ significantly. Among these, the best-known is Tukey's Honestly Significant Difference (HSD) test, which relies on the studentized range distribution (q).
Tukey's test is often considered the
gold standard, especially with unequal sample sizes or when confidence
intervals are required. However, for equal sample sizes where confidence
intervals are not the focus, the Student-Newman-Keuls
(SNK) test offers greater statistical power. This post
explores the SNK procedure.
While Tukey's test uses a single,
conservative critical value for all comparisons—making it robust against Type I
errors (false positives)—it can sometimes be overly cautious. A key practical
advantage, though, is its universal availability in statistical software, which
is not always the case for the SNK test.
2. Explanation of the SNK Procedure
Like Tukey's test, the SNK is based on the studentized range statistic
(q). Its key feature is that it is a sequential (stepwise) procedure. Imagine
we have four group means, ranked from smallest to largest:
x̄₁ < x̄₂ < x̄₃ < x̄₄
The SNK procedure works as follows:
1.
Compare the largest and smallest means
(a span of *m* = 4 means).
2. Next, compare the largest with the second smallest, and the second largest with the smallest (both spanning *m* = 3 means).
3. Finally, compare the remaining adjacent pairs (spanning *m* = 2 means).
For each comparison, calculate a critical difference (dₘ):
where:
·
q(
α
, m, df)
is
the critical value from the studentized range distribution for significance
level α, with m
means in the range, and df
degrees
of freedom (from the ANOVA residual).
·
MSres
is the residual mean square from
the ANOVA.
·
r
is the number of observations per
group (assuming balanced data).
3. Example Section
Consider
the (fictitious) data on blood pressure reduction presented in Table 1.
Table 1.Blood pressure reduction (mmHg)
These data were analyzed by ANOVA
(Table 2), where the F value was significant at the 5% level. Thus, at
least one mean differs from the others. The sample means are shown in Table 3.
Table 2: Analysis of variance (ANOVA)
Table 3: Means of blood pressure reduction (mmHg)
The largest mean is 29 (Group D), and the smallest is 2 (Control). We can calculate the critical difference dₘ for comparing these extremes, where *m* = 6. From the ANOVA, we have residual degrees of freedom (df) = 24 and MSres = 36.00. Each group has *r* = 5 observations. The critical value q(0.05, 6, 24) is 4.3727. Thus:
The observed difference between D and
Control is 29 - 2 = 27. Since 27 > 11.733, the difference is statistically
significant at the 5% level.
We then proceed to compare pairs
spanning *m* = 5 means (e.g., D vs. B and A vs. Control). The new
critical value is q(0.05, 5, 24) = 4.1663. Thus:
The differences between groups D and B (29 – 8 = 21) and between A and the control (21 – 2 = 19) are greater than 11.179. So, both are significant at 5% level.
Attention: The analysis does not stop here; it continues stepwise through the ordered means.
4. "Comparison with Tukey’s Test" Section
We performed pairwise comparisons using
both Tukey’s HSD and the SNK tests. The results are summarized in Table 5,
which groups means that are not statistically different from each other.
Table 5: Comparison: Tukey's test and SNK test
The SNK test identified more significant
differences than Tukey's, demonstrating its greater power. For instance, SNK
detected that treatment D is significantly greater than A, a difference that
Tukey's test did not find.
This increased power, however, comes
with a trade-off: a slightly higher risk of Type I error. Therefore, the choice
between Tukey and SNK must be carefully justified based on the study's
objectives and the desired balance between rigor (conservatism) and sensitivity
(power).
Feature |
Tukey's HSD |
Student-Newman-Keuls (SNK) |
Type |
Single-step |
Sequential (Stepwise) |
Error Control |
Controls experiment-wise error |
Does not control experiment-wise error |
Power |
Less powerful (conservative) |
More powerful (liberal) |
Software Availability |
Excellent |
Limited |
Best For |
Confirmatory analysis, unequal n |
Exploratory analysis, equal n |
Further reading
- Seaman, M.A., Levin, J.R., Serlin, R.C. (1991). Psychological
Bulletin 110: 577–586.
- Day, R.W., Quinn, G.P. (1989). Ecological
Monographs 59: 433–463.
- Zar, J.H. Biostatistical
Analysis.
- Dean, A., Voss, D. Design
and Analysis of Experiments.
- Montgomery, D.C. Design and Analysis of Experiments.
Translation
& Editing Note: This post
was translated from Portuguese and edited for clarity with the assistance of an
AI language model. The statistical methodology, calculations, and conclusions
were rigorously verified by the author.
No comments:
Post a Comment