Introduction
When comparing k populations using ANOVA, there are m = k(k-1)/2 possible
pairwise comparisons between means. If these comparisons were not pre-planned
(also known as unplanned or post-hoc comparisons) and
were chosen after the researcher examined the sample means, it is more
appropriate to use a test that controls the significance level for the entire
experiment, not just for an individual comparison.
Key Definitions
·
Comparisonwise
Error Rate (CER): The probability of committing a
Type I error when comparing a single pair of means from a set of k means.
·
Experimentwise
Error Rate (EER) or Familywise Error Rate (FWER): The probability of committing at least one Type I error when
performing all m pairwise comparisons from a set of k means.
Two
specific types are distinguished:
o
Complete Null
EERC: The experimentwise error rate when
all population means are truly equal.
o
Partial Null EERC: The experimentwise error rate when some means are equal and others
are not.
The Trade-Off: Power vs. Protection
Tests that control the experimentwise
error rate are conservative—they
reject the null hypothesis of equal means less easily, resulting in lower
statistical power. Conversely, tests that control only the comparisonwise error
rate are liberal,
as they find significance more easily and therefore have higher power.
A Spectrum of Tests: From Liberal to
Conservative
According to the classic classification by Winner (1962), multiple comparison
tests can be ordered from most liberal to most conservative as follows:
1. Duncan's Multiple Range Test (MRT)
2. Student-Newman-Keuls Test (SNK)
3. Fisher's Least Significant Difference (LSD)
4. Tukey's Honestly Significant Difference (HSD)
5.
Scheffé's Test
This means that applying Duncan's test
will likely yield more statistically significant differences between means than
using Scheffé's test.
Illustrative Example with Fictional Data
Means and standard deviations on blood pressure are
depicted in Table 1 and the analysis of variance (ANOVA) in table 2.
Table 1:
Blood Pressure Reduction (mmHg) by Treatment Group
|
Treatment |
N |
Mean(mmHg) |
Standard
deviation |
|
A |
5 |
21 |
5.10 |
|
B |
5 |
8 |
7.07 |
|
C |
5 |
10 |
5.83 |
|
D |
5 |
29 |
5.10 |
|
E |
5 |
13 |
7.07 |
|
Control |
5 |
2 |
5.48 |
Table 2:
Analysis of Variance (ANOVA)
·
Duncan's MRT and
SNK: Both tests provide different
critical values for the difference between means, depending on the rank of the
means. Comparing the critical ranges shows that Duncan's test is more liberal,
declaring significance more easily (its minimum significant differences are
smaller than SNK's).
Table 3: Critical range for Duncan’s and Student Newman Keuls (SNK) tests
|
Test |
Critical range for Number
of Means in the Range (p) |
||||
|
2 |
3 |
4 |
5 |
6 |
|
|
Duncan´s |
7.83 |
8.23 |
8.48 |
8.66 |
8.79 |
|
SNK |
7.83 |
9.75 |
10.47 |
11.18 |
11.73 |
·
Fisher's LSD,
Tukey's HSD, and Scheffé's Test: A
comparison of the critical differences clearly shows the spectrum: Fisher's LSD
is the most liberal (smallest critical difference), followed by Tukey's HSD,
with Scheffé's test being the most conservative (largest critical difference).
Table 4: Critical
Differences for Pairwise Comparison Tests
|
Test |
Critical difference |
|
Fisher's
LSD |
7.83 |
|
Tukey's
HSD |
11.73 |
|
Scheffé's |
13.74 |
Practical Recommendations (Based on SAS/STAT 9.2 Manual)
1. Use the unprotected LSD test if you are interested in several individual
comparisons and are not concerned with multiple inferences.
2. For all pairwise comparisons, use Tukey's test.
3. For comparisons with a control group, use Dunnett's test.
Choosing the Right Test: A Decision
Framework
Imagine an experiment with more than two
groups analyzed by a one-way ANOVA at a 5% significance level. For unplanned
comparisons, the researcher has several options:
·
To control the experimentwise error rate at 5%,
use Tukey's HSD (for
all pairs) or Dunnett's
test (vs. a control). The trade-off is a lower comparisonwise
error rate.
·
For higher power, use Fisher's LSD (unprotected), Duncan's MRT,
or SNK.
These maintain a ~5% comparisonwise error rate, but the experimentwise error
rate will be much higher (depending on the number of treatments).
Context Matters
· Choose a Conservative Test (Tukey, Dunnett, planned LSD) when you need high confidence to reject H₀. This is crucial in fields like pharmacology, where recommending a new drug with unknown side effects requires strong evidence of its superiority over the standard treatment.
·
Choose a Liberal
Test (unprotected LSD, Duncan) when
you need high discriminatory power. This is common in product testing or
agronomy, where the primary goal is to detect any potential difference, and a
false positive is less consequential than missing a real difference.
Alternatively, using a conservative test like Tukey at a 10% significance level
also increases power.
Final Considerations
·
Scheffé's Test has excellent mathematical properties but is often considered
excessively conservative for simple pairwise comparisons.
·
Bonferroni
Correction is best suited for a small,
pre-defined number of comparisons, as it becomes overly conservative with a
large number of tests.
·
No Single Best
Test: All procedures have advantages and
disadvantages. While not exact, using a formal method for comparing means
prevents conclusions from being entirely subjective. The researcher always has
a margin of choice in both the selection of the test and the establishment of
the significance level.
A Note on Software
The calculations for this guide were performed using SAS software. Results from
other software packages or hand calculations may show slight differences due to
rounding. Differences are typically more pronounced for the SNK test, as its
critical values are less standardized across different sources.
No comments:
Post a Comment