Sonia Vieira: Comparing Classical ANOVA with Welch's ANOVA

INTRODUCTION

Before applying a statistical test to a dataset, it is important to verify whether the assumptions required by the test are met. Since many researchers aim to compare means, Analysis of Variance (ANOVA) is a natural solution. However, they do not always state whether the assumptions for applying the F-test were satisfied.

Classical one-way ANOVA assumes that the groups being compared have the same variance. If the sample is small and group sizes differ, unequal variances can lead to incorrect conclusions.

It is common to suggest transforming the variable (usually by logarithm or square root), which helps stabilize and equalize variances. Even so, ANOVA is robust even when its assumptions are not perfectly met — as long as the design is balanced and the sample size is large. Alternatively, a non-parametric test can be used, since such tests are not sensitive to unequal variances. However, these tests do not compare means and are not suitable when researchers wish to interpret average values.

A less commonly used but effective solution is the Welch’s ANOVA, or W test, available in most statistical software. It is a modification of classical ANOVA, offering more robustness when the assumption of equal variances is violated, especially when the groups have different sizes.

1. The F-test in Classical ANOVA

Let us recall the formulas used to compute the F statistic in a one-way ANOVA with groups of different sizes. Let y_ij be the observed value for the j-th unit in group i.

Sum of squares between groups (SSB):

Sum of squares within groups (SSW):

Note that larger groups (r_j) contribute more to the sums of squares.

Total sum of squares:

The F statistic is calculated as:

EXAMPLE

For illustration, suppose we have three groups, A, B, and C, with the following values:

    A: 10, 12, 14
    B: 8, 10
    C: 5, 6, 7, 8

Mean of all data:

ȳ = (10 + 12 + 14 + 8 + 10 + 5 + 6 + 7 + 8) / 9 = 80 / 9 ≈ 8.89

Group means:

  ȳ_A = (10 + 12 + 14) / 3 = 12
    ȳ_B = (8 + 10) / 2 = 9
    ȳ_C = (5 + 6 + 7 + 8) / 4 = 6.5

Between-group SS:

SSB = 3 × (12 − 8.89)² + 2 × (9 − 8.89)² + 4 × (6.5 − 8.89)² = 51.90

Within-group SS:

  A: (10 − 12) ² + (12 − 12)² + (14 − 12)² = 8
    B: (8 − 9)² + (10 − 9)² = 2
    C: (5 − 6.5)² + (6 − 6.5)² + (7 − 6.5)² + (8 − 6.5)² = 5

SSW = 8 + 2 + 5 = 15

Total SS:

SST = 51.90 + 15 = 66.90

F = (51.90 / 2) / (15 / 6) = 25.95 / 2.5 = 10.38

2. Welch’s ANOVA in a Completely Randomized Design (W Test)

When the researchers’ aim is only to deal with unequal group sizes (without explicit weighting), classical ANOVA already accounts for this naturally. Welch’s ANOVA, or W test, is an adaptation of classical ANOVA that aims to handle both heteroscedasticity (unequal variances across groups) and unequal sample sizes. To do this, it uses weighting factors in the calculation of the sums of squares. Let’s walk through the calculation of the W test, or Welch’s F test. See the formula below.

Where:

The weighting factor w_j= n_j /s_j² gives more weight to larger groups (n_j ≫) and less weight to groups with high variance (sj² ≫).

EXAMPLE

The following example is taken from Charles Zaiontz’s website:https://real-statistics.com/one-way-analysis-of-variance-anova/welchs-procedure/

Let’s apply Welch’s ANOVA, breaking the calculation into steps. First, we’ll compute F_WELCH. Then, we will compute the degrees of freedom associated with this F value.

1. Weighting Factors

2. Global Weighted Mean

Calculations

3. Numerator of the W Test

Calculations

4. Denominator of the W Test (in parts)

5. Final Value of the W Test

6. Degrees of Freedom

7. Example conclusion

The data presented above resulted in F = 4.32, with 2 and 11.7 degrees of freedom, which is statistically significant at the 5% level. Classical ANOVA, in contrast, yielded F = 2.11, with 2 and 24 degrees of freedom — not significant at the 5% level.

Review the dataset. Note the large variance differences among the groups. When variances are so heterogeneous, the result provided by Welch’s ANOVA is more reliable.

Important

Most statistical software packages provide both results — the classical ANOVA and Welch’s version.

Zaiontz, C. Welch´s Anova test Real Statistics using Excel.
 https://real-statistics.com

https://support.minitab.com/pt-br/minitab/help-and-how-to/statistical-modeling/
anova/how-to/one-way-anova/methods-and-formulas/
multiple-comparisons

Delacre,M.; Leys,C.;Mora, Y. L.;Lakens,D. Taking Parametric Assumptions Seriously: Arguments
 for the Use of Welch’s F-test instead of the Classical F-test in One-Way ANOVA. International 
Review of Social Psychology. International Review of Social Psychology

Sonia Vieira

Friday, July 11, 2025

Comparing Classical ANOVA with Welch's ANOVA

INTRODUCTION

1. The F-test in Classical ANOVA

EXAMPLE

EXAMPLE

1. Weighting Factors

2. Global Weighted Mean

3. Numerator of the W Test

4. Denominator of the W Test (in parts)

5. Final Value of the W Test

6. Degrees of Freedom

Important

No comments: