Monday, July 28, 2025

Standard Deviations Don’t Add Up. Why Not?

I received a kind email from a PhD student at UNICAMP — someone I don't know personally — pointing out what she considered a 'small calculation mistake' in my book Analysis of Variance (p. 47). According to her, the value of the coefficient of variation (CV) calculated in the example was incorrect.

Naturally, I went straight to check.

The example in the book presents an experiment with two treatments (A and B) and five replicates per treatment. The data are simple and were chosen solely to illustrate the ANOVA calculations. Both the dataset and the ANOVA table were designed for this didactic purpose.

Dataset

ANOVA table

However, the reader, who works in quality control, applied the procedures she was used to: she calculated the means and standard deviations of each group, as is common in process analysis. She obtained the following results:

Means and standard deviations

So far, so good. But as she continued reading, she found this sentence in the book: “One may be interested in relating the standard deviation to the mean, to assess the magnitude of dispersion relative to the magnitude of the mean. By definition, the coefficient of variation (CV) is the ratio of the standard deviation to the mean.”

Later in the same chapter, I also wrote: “In analysis of variance, the standard deviation is the square root of the residual mean square.”

Since the student didn’t perform the analysis of variance (which is not common in some fields), she didn’t have the error mean square (EMS) value. Instead, she took the mean of the standard deviations and divided it by the mean of the means to compute the CV. That calculation is incorrect.

The arithmetic mean differs from the quadratic mean. For two positive numbers, a and b, we have:

Equality holds only when a = b. Therefore, the average of two standard deviations is smaller than the square root of the average of their variances — unless those variances are equal.

In experiments with more than one group, as in the example, each group has its own variance. The correct way to calculate the overall standard deviation — and hence the CV — is by taking the square root of the weighted average of the variances.

In the context of ANOVA, the EMS represents the average of group variances. The formula for the coefficient of variation in this case is:

where ȳ is the overall mean of all data, and EMS is the residual mean square, calculated as:

where ESS is the error sum of squares, k is the number of groups, and r is the number of replicates per group.

This definition provides a consistent and meaningful estimate of the coefficient of variation.

When I wrote the book, I didn’t realize that the traditional definition of CV — “standard deviation divided by the mean” — could be misleading if the source of the standard deviation isn’t clearly explained.

The formula is only correct when dealing with a single sample or group. In experiments with multiple treatments, each with its own mean and variance, the overall standard deviation must come from the ANOVA — not from combining descriptive statistics across groups.

And this episode also taught me to write definitions more carefully.

This was the core of the student’s mistake: standard deviations don’t add — variances do.

Desvios-padrão não se somam. Por quê?

Recebi um e-mail de uma doutoranda da Unicamp — que não conheço pessoalmente — apontando, gentilmente, o que considerava um “pequeno erro de cálculo” no meu livro Análise de Variância (p. 47). Segundo ela, o valor do coeficiente de variação (CV) calculado no exemplo estaria incorreto.

Claro, fui direto conferir.

O exemplo do livro apresenta um ensaio com dois tratamentos (A e B) e cinco repetições por tratamento. Os dados são simples e servem apenas para ilustrar os cálculos da análise de variância (ANOVA). A tabela de dados e a tabela da ANOVA foram construídas com esse propósito didático.

Tabela de dados

Tabela da ANOVA

No entanto, a leitora, que é da área de controle de qualidade, resolveu aplicar os

procedimentos que costuma usar: calculou as médias e os desvios-padrão de cada grupo, como se faz em análise de processos. Obteve os seguintes resultados:

Médias e os desvios-padrão

Tudo certo até aqui. Mas, ao prosseguir com a leitura do livro, ela encontrou o trecho em que afirmo: “Pode existir interesse em relacionar o desvio-padrão com a média, para se ter ideia da grandeza da dispersão em relação à média. Por definição, o coeficiente de variação (CV) é a razão entre o desvio-padrão e a média dos dados.”

Mais adiante, no mesmo capítulo, informo que: “Na análise de variância, o desvio-padrão é dado pela raiz quadrada do quadrado médio do resíduo.”

Como a doutoranda não fez a análise de variância (que não é usual em algumas áreas), ela não tinha o valor do quadrado médio do resíduo (QMR). Em vez disso, tomou a média dos desvios-padrão e dividiu pela média das médias para obter o CV. Esse cálculo está incorreto.

A média aritmética é diferente da média quadrática. Para dois números positivos a e b, segue-se que:

A igualdade só ocorre quando a = b. Portanto, a média de dois desvios-padrão é menor que a média das duas variâncias, a menos que essas variâncias sejam iguais.

Quando se trata de ensaios com mais de um grupo, como é o caso do exemplo, cada grupo tem sua própria variância. A maneira correta de calcular o desvio-padrão global — e, portanto, o CV — é usando a raiz quadrada da média das variâncias ponderadas.

No contexto da ANOVA, o QMR representa a média das variâncias dos grupos. A fórmula do coeficiente de variação, nesse caso, é:

de todos os dados, e o QMR é o quadrado médio do resíduo, calculado por:

sendo SQR a soma de quadrados do resíduo, k o número de grupos e r o número de repetições por grupo.

Essa definição permite obter um valor consistente e comparável do coeficiente de variação.

Quando escrevi o livro, não percebi que a definição tradicional do CV — “desvio-padrão dividido pela média” — pode ser mal interpretada quando não se deixa claro de onde vem esse desvio-padrão.

A fórmula está correta somente quando se lida com uma única amostra ou grupo. Em ensaios com múltiplos tratamentos, cada um com suas médias e variâncias, o desvio-padrão do ensaio como um todo deve ser obtido da ANOVA, e não por simples combinação das estatísticas descritivas dos grupos.

Este episódio me ensinou que é preciso escrever definições com mais cuidado.

Saturday, July 26, 2025

Skewness in Data Distributions: An Introductory Discussion

1. What Is Skewness in a Data Distribution?

· If the left tail is longer or more pronounced than the right, the distribution is said to have negative skewness.

· If the right tail is longer, the distribution has positive skewness.

· Otherwise, the distribution is symmetric.

Textbooks often include histograms to illustrate long tails and how extreme values can can shift the mean upward or downward.

Figure 1

Histograms showing skewness types (negative, symmetric, positive)

Source: David P. Doane & Lori E. Seward (2011) Measuring Skewness:

A Forgotten Statistic? Journal of Statistics Education, 19 (2)

DOI: 10.1080/10691898.2011.11889611

2. Visual Tools for Assessing Skewness

Beyond histograms, several exploratory tools can help evaluate skewness:

· Boxplots highlight dispersion and outliers.

· Dotplots reveal distribution shape and sample size.

· Stem-and-leaf plots preserve individual data points while showing distribution shape.

Figure 2

Boxplots and dotplots illustrating skewness and sample size

3. Mean, Median, and Mode: Where Is the Center?

The traditional textbook rule states:

· If the mean > median, the distribution is right-skewed.

· If the mean < median, it is left-skewed.

However, this rule may fail — specially for discrete or multimodal distributions.

Figure 3

Right skewness: mean greater than median

Source: von Hippel, P.T. (2005). Mean, Median, and Skew: Correcting a Textbook Rule. Journal of Statistics Education, 13(2). Link

Figure 4

Right skewness even when mean is less than median

Source: von Hippel (2005), ibid.

4. Measuring Skewness: Pearson’s Coefficients

Since Karl Pearson (1895), statisticians have proposed various ways to quantify skewness.

· Pearson’s First Skewness Coefficient (uses mode)

· Pearson’s Second Skewness Coefficient (uses median)

Example

Given a dataset with

mean = 70.5

median = 80

mode = 85

standard deviation = 19.33

calculate both coefficients.

5. Caution with the Mode

Pearson's first coefficient relies on the mode, but the mode is not always reliable.

Example

Set A: 1, 2, 3, 4, 5, 5

The mode (5) does not reflect the center of the distribution.

Set B: 1, 2, 3, 3, 3, 3, 3, 3, 4.

The mode (3) clearly represents central tendency.

Avoid using the mode for skewness if it’s based on few values.

6. Interpreting Skewness Coefficients

The coefficient indicates both direction and degree of asymmetry:

· Positive value → right-skewed.

· Negative value → left-skewed.

· Near zero → symmetric.

The farther from zero, the stronger the skewness.

7. Statistical Moments and the Fisher–Pearson Coefficient

Skewness can also be described using moments:

· Second moment (m2) = variance:

· Third moment (m3) = related to skewness :

· Fisher–Pearson coefficient formula:

Note: These formulas describe population parameters, not sample statistics.

So, they are rarely used in modern software.

8. The Adjusted Fisher–Pearson Coefficient (Used in Excel)

Modern software (e.g., Excel) uses a bias-adjusted version for samples:

Example

Data: 3, 4, 5, 2, 3, 4, 5, 6, 4, 7.

Skewness = 0.359543 (decreases as sample size increases).

9. A Final Note on Symmetry Testing

Calculating skewness does not test for general symmetry. Instead, it assumes the data come from a specific symmetric population (usually the normal distribution).

Friday, July 25, 2025

Probability in Reverse: Bayes' Theorem

Before presenting Bayes' theorem, it’s helpful to recall the definition of conditional probability to highlight the difference between this concept and the theorem itself.

❗ Definition

The conditional probability of an event B given that event A has occurred is the chance of B happening under the condition that A has already occurred. It is denoted by P(B∣A), read as “the probability of B given A”.

Key points:

🔸 A and B are dependent events.
🔸 Event A occurs before event B.

🛑 Example

An urn contains five balls that differ only by color: two red and three blue. Two balls are drawn without replacement, one after the other.

Question: What is the probability that the second ball is red, given that the first was blue?
A tree diagram helps visualize the possible outcomes in this situation. All conditional probabilities are shown, and the answer to the question is highlighted in yellow.

The answer is obtained using the multiplication rule for dependent events.

Answer: The probability that the second ball is red, given the first was blue is 3/10, ou 30%.

Now that we understand conditional probability, let’s see how Bayes’ Theorem allows us to ' reverse' the condition.

❗ BAYES' THEOREM

⚠️ P(B∣A) and P(A∣B) may look similar, but they represent different ideas. Consider the following examples:

1. Let A = “has technical training”; B = “performs good service”.
🔸 P(B∣A): probability of performing good service given technical training.
🔸 P(A∣B): probability of having technical training given that good service was performed.

2. Let A = “was a good student in high school”; B = “passed the college entrance exam”.
🔸 P(B∣A): probability of passing the exam given that the person was a good student.
🔸 P(A∣B): probability of having been a good student given that the person passed the exam.

These pairs of probabilities often appear in real-life problems. Now let’s find a formula to calculate P(A∣B). From:

we can write:

and we have

Bayes’ Theorem:

🔔 Interpretation

Bayes’ Theorem reverses the order of information:

• Conditional probability deals with P(B∣A): probability of B occurring given A occurred.

• Bayes’ Theorem addresses P(A∣B): probability of A occurring given B occurred — that is, the reverse of conditional probability.

🛑 Example – Applying Bayes’ Theorem

Let’s revisit the urn example, but now with a different question:

Question: What is the probability that the first ball drawn was blue, given that the second was red?

From the tree diagram, we see that the second ball being red can happen in two ways:

• Blue then Red (B–R)
• Red then Red (R–R)

The event of interest is: first blue given second red.

We apply Bayes’ Theorem:

Answer: Using Bayes’ Theorem, the probability the first ball was blue given the second is red is 3/4, ou 75%.

🛑 Example – Breathalyzer Test

In a city, the breathalyzer test is mandatory.
• 25% of drivers drink before driving.
• Of those who drink, 99% test positive.
• Of those who do not drink, 17% also test positive.

Question: If a driver tests positive, what is the chance they actually consumed alcohol?

Let the events be:
• B: drinks
• NB: does not drink
• + : positive test

Compute P(B∣+) using the given data.

Answer: If a driver tests positive, the chance they actually consumed alcohol is 66%.

🛑 Example – Horse Race

Two horses race: White and Black.

• In 12 previous races, White won 5 times and Black 7.
• In 3 of White’s 5 victories, it was raining.
• In 1 of Black’s 7 victories, it was also raining.

Question: It is raining now. What is the probability that White will win?

Compute P(White wins∣rain)

Answer: The probability that White wins is 3/4 ou 75%.

CONCLUSION:

"Bayes’ Theorem is powerful because it lets us update probabilities with new evidence. Remember: it ‘reverses’ the condition!"

Doutora em Estatística pela USP

Livre Docente em Estatística pela Unicamp

Pós doctor na Universidade da California, Berkeley e Universidade Yale.

Além de diversos artigos em revistas nacionais e estrangeiras, publicou os livros:

1. Pela Editora Elsevier: Introdução à Bioestatística (5ed), Bioestatística: tópicos avançados (3ed), Estatística para a Qualidade 3ed), Metodologia Científica para a Área de Saúde (3ed), este último em co-autoria com William Saad Hossne.

2. Pela Editora Atlas: Elementos de Estatística (6ed), Como elaborar um questionário.

3. Pela Editora Cencage Learning: Estatística Básica.

4. Pela Editora Brasiliense: O que é Estatística (3ed).

Estão esgotados: Experimentação com seres humanos (Moderna), Como escrever uma tese (Atlas), Análise de variância (Atlas), Primeiro a gente chora (Cultura).

Possui um domínio online, no qual estão disponibilizadas algumas aulas de estatística elementar: https://profasoniavieira.wixsite.com/estatistica