I received a kind email from a PhD student at UNICAMP — someone I don't know personally — pointing out what she considered a 'small calculation mistake' in my book Analysis of Variance (p. 47). According to her, the value of the coefficient of variation (CV) calculated in the example was incorrect.
Naturally,
I went straight to check.
The example in the book presents an experiment with two treatments (A and B) and five replicates per treatment. The data are simple and were chosen solely to illustrate the ANOVA calculations. Both the dataset and the ANOVA table were designed for this didactic purpose.
Dataset
However,
the reader, who works in quality control, applied the procedures she was used
to: she calculated the means and standard deviations of each group, as is
common in process analysis. She obtained the following results:
Means and standard deviations
So
far, so good. But as she continued reading, she found this sentence in the
book: “One may be interested in relating the standard deviation to the mean, to
assess the magnitude of dispersion relative to the magnitude of the mean. By
definition, the coefficient of variation (CV) is the ratio of the standard
deviation to the mean.”
Later
in the same chapter, I also wrote: “In analysis of variance, the standard
deviation is the square root of the residual mean square.”
Since
the student didn’t perform the analysis of variance (which is not common in
some fields), she didn’t have the error mean square (EMS) value. Instead,
she took the mean of the standard deviations and divided it by the mean of the
means to compute the CV. That calculation is incorrect.
The
arithmetic mean differs from the quadratic mean. For two positive numbers, a and
b, we have:
Equality
holds only when a = b. Therefore, the average of two standard deviations is
smaller than the square root of the average of their variances — unless those
variances are equal.
In
experiments with more than one group, as in the example, each group has its own
variance. The correct way to calculate the overall standard deviation — and
hence the CV — is by taking the square root of the weighted average of the
variances.
In
the context of ANOVA, the EMS represents the average of group variances. The
formula for the coefficient of variation in this case is:
where
ȳ is the overall mean
of all data, and EMS is the residual mean square, calculated as:
where ESS is the error sum of squares, k is the number of groups, and r is the
number of replicates per group.
This
definition provides a consistent and meaningful estimate of the coefficient of
variation.
When
I wrote the book, I didn’t realize that the traditional definition of CV —
“standard deviation divided by the mean” — could be misleading if the source of
the standard deviation isn’t clearly explained.
The
formula is only correct when dealing with a single sample or group. In
experiments with multiple treatments, each with its own mean and variance, the
overall standard deviation must come from the ANOVA — not from combining
descriptive statistics across groups.
And
this episode also taught me to write definitions more carefully.
This
was the core of the student’s mistake: standard deviations don’t add —
variances do.
No comments:
Post a Comment