Monday, July 28, 2025

Standard Deviations Don’t Add Up. Why Not?

 

I received a kind email from a PhD student at UNICAMP — someone I don't know personally — pointing out what she considered a 'small calculation mistake' in my book Analysis of Variance (p. 47). According to her, the value of the coefficient of variation (CV) calculated in the example was incorrect.

Naturally, I went straight to check.

The example in the book presents an experiment with two treatments (A and B) and five replicates per treatment. The data are simple and were chosen solely to illustrate the ANOVA calculations. Both the dataset and the ANOVA table were designed for this didactic purpose.

Dataset


ANOVA table


However, the reader, who works in quality control, applied the procedures she was used to: she calculated the means and standard deviations of each group, as is common in process analysis. She obtained the following results:

Means and standard deviations

So far, so good. But as she continued reading, she found this sentence in the book: “One may be interested in relating the standard deviation to the mean, to assess the magnitude of dispersion relative to the magnitude of the mean. By definition, the coefficient of variation (CV) is the ratio of the standard deviation to the mean.”

Later in the same chapter, I also wrote: “In analysis of variance, the standard deviation is the square root of the residual mean square.”

Since the student didn’t perform the analysis of variance (which is not common in some fields), she didn’t have the error mean square (EMS) value. Instead, she took the mean of the standard deviations and divided it by the mean of the means to compute the CV. That calculation is incorrect.

The arithmetic mean differs from the quadratic mean. For two positive numbers, a and b, we have:

Equality holds only when a = b. Therefore, the average of two standard deviations is smaller than the square root of the average of their variances — unless those variances are equal.

In experiments with more than one group, as in the example, each group has its own variance. The correct way to calculate the overall standard deviation — and hence the CV — is by taking the square root of the weighted average of the variances.

In the context of ANOVA, the EMS represents the average of group variances. The formula for the coefficient of variation in this case is:

where ȳ is the overall mean of all data, and EMS is the residual mean square, calculated as:

where ESS is the error sum of squares, k is the number of groups, and r is the number of replicates per group.

This definition provides a consistent and meaningful estimate of the coefficient of variation.

When I wrote the book, I didn’t realize that the traditional definition of CV — “standard deviation divided by the mean” — could be misleading if the source of the standard deviation isn’t clearly explained.

The formula is only correct when dealing with a single sample or group. In experiments with multiple treatments, each with its own mean and variance, the overall standard deviation must come from the ANOVA — not from combining descriptive statistics across groups.

And this episode also taught me to write definitions more carefully.

This was the core of the student’s mistake: standard deviations don’t add — variances do.


No comments: