Friday, July 11, 2025

Comparing Classical ANOVA with Welch's ANOVA

                    

INTRODUCTION


Before applying a statistical test to a dataset, it is important to verify whether the assumptions required by the test are met. Since many researchers aim to compare means, Analysis of Variance (ANOVA) is a natural solution. However, they do not always state whether the assumptions for applying the F-test were satisfied.

Classical one-way ANOVA assumes that the groups being compared have the same variance. If the sample is small and group sizes differ, unequal variances can lead to incorrect conclusions.

It is common to suggest transforming the variable (usually by logarithm or square root), which helps stabilize and equalize variances. Even so, ANOVA is robust even when its assumptions are not perfectly met — as long as the design is balanced and the sample size is large. Alternatively, a non-parametric test can be used, since such tests are not sensitive to unequal variances. However, these tests do not compare means and are not suitable when researchers wish to interpret average values.

A less commonly used but effective solution is the Welch’s ANOVA, or W test, available in most statistical software. It is a modification of classical ANOVA, offering more robustness when the assumption of equal variances is violated, especially when the groups have different sizes.

1. The F-test in Classical ANOVA


Let us recall the formulas used to compute the F statistic in a one-way ANOVA with groups of different sizes. Let yij be the observed value for the j-th unit in group i.

Sum of squares between groups (SSB):

Sum of squares within groups (SSW):

Note that larger groups (r_j) contribute more to the sums of squares.

Total sum of squares:

The F statistic is calculated as:

                                                                EXAMPLE


For illustration, suppose we have three groups, A, B, and C, with the following values:

    A: 10, 12, 14
    B: 8, 10
    C: 5, 6, 7, 8

Mean of all data:

    ȳ = (10 + 12 + 14 + 8 + 10 + 5 + 6 + 7 + 8) / 9 = 80 / 9 8.89

Group means:

    ȳA = (10 + 12 + 14) / 3 = 12
   
ȳB = (8 + 10) / 2 = 9
   
ȳC = (5 + 6 + 7 + 8) / 4 = 6.5

Between-group SS:

    SSB = 3 × (12 − 8.89)2 + 2 × (9 − 8.89)2 + 4 × (6.5 − 8.89)2 = 51.90

Within-group SS:

    A: (10 − 12) 2 + (12 − 12)2 + (14 − 12)2 = 8
    B: (8 − 9)2 + (10 − 9)2 = 2
    C: (5 − 6.5)2 + (6 − 6.5)2 + (7 − 6.5)2 + (8 − 6.5)2 = 5

  SSW = 8 + 2 + 5 = 15

Total SS:

 SST = 51.90 + 15 = 66.90

                    F = (51.90 / 2) / (15 / 6) = 25.95 / 2.5 = 10.38

2. Welch’s ANOVA in a Completely Randomized Design (W Test)

When the researchers’ aim is only to deal with unequal group sizes (without explicit weighting), classical ANOVA already accounts for this naturally. Welch’s ANOVA, or W test, is an adaptation of classical ANOVA that aims to handle both heteroscedasticity (unequal variances across groups) and unequal sample sizes. To do this, it uses weighting factors in the calculation of the sums of squares. Let’s walk through the calculation of the W test, or Welch’s F test. See the formula below.

Where:            

The weighting factor wj = nj /sj² gives more weight to larger groups (nj ) and less weight to groups with high variance (sj² ).

EXAMPLE


The following example is taken from Charles Zaiontz’s website:https://real-statistics.com/one-way-analysis-of-variance-anova/welchs-procedure/


         Let’s apply Welch’s ANOVA, breaking the calculation into steps. First, we’ll compute FWELCH.                       Then,  we will compute the degrees of freedom associated with this F value.

    1. Weighting Factors

                             

   2. Global Weighted Mean



            Calculations

   3. Numerator of the W Test

               Calculations

                

4. Denominator of the W Test (in parts)


        

5. Final Value of the W Test


   

6. Degrees of Freedom



7. Example conclusion

The data presented above resulted in F = 4.32, with 2 and 11.7 degrees of freedom, which is statistically significant at the 5% level. Classical ANOVA, in contrast, yielded F = 2.11, with 2 and 24 degrees of freedom — not significant at the 5% level.

Review the dataset. Note the large variance differences among the groups. When variances are so heterogeneous, the result provided by Welch’s ANOVA is more reliable.

Important


Most statistical software packages provide both results — the classical ANOVA and Welch’s version.

                                  

Zaiontz, C. Welch´s Anova test Real Statistics using Excel.

https://real-statistics.com

https://support.minitab.com/pt-br/minitab/help-and-how-to/statistical-modeling/

anova/how-to/one-way-anova/methods-and-formulas/

multiple-comparisons


Delacre,M.; Leys,C.;Mora, Y. L.;Lakens,
D. Taking Parametric Assumptions Seriously: Arguments

for the Use of Welch’s F-test instead of the Classical F-test in One-Way ANOVA. International

Review of Social Psychology. International Review of Social Psychology




No comments: