Monday, July 28, 2025

Standard Deviations Don’t Add Up. Why Not?

 

I received a kind email from a PhD student at UNICAMP — someone I don't know personally — pointing out what she considered a 'small calculation mistake' in my book Analysis of Variance (p. 47). According to her, the value of the coefficient of variation (CV) calculated in the example was incorrect.

Naturally, I went straight to check.

The example in the book presents an experiment with two treatments (A and B) and five replicates per treatment. The data are simple and were chosen solely to illustrate the ANOVA calculations. Both the dataset and the ANOVA table were designed for this didactic purpose.

Dataset


ANOVA table


However, the reader, who works in quality control, applied the procedures she was used to: she calculated the means and standard deviations of each group, as is common in process analysis. She obtained the following results:

Means and standard deviations

So far, so good. But as she continued reading, she found this sentence in the book: “One may be interested in relating the standard deviation to the mean, to assess the magnitude of dispersion relative to the magnitude of the mean. By definition, the coefficient of variation (CV) is the ratio of the standard deviation to the mean.”

Later in the same chapter, I also wrote: “In analysis of variance, the standard deviation is the square root of the residual mean square.”

Since the student didn’t perform the analysis of variance (which is not common in some fields), she didn’t have the error mean square (EMS) value. Instead, she took the mean of the standard deviations and divided it by the mean of the means to compute the CV. That calculation is incorrect.

The arithmetic mean differs from the quadratic mean. For two positive numbers, a and b, we have:

Equality holds only when a = b. Therefore, the average of two standard deviations is smaller than the square root of the average of their variances — unless those variances are equal.

In experiments with more than one group, as in the example, each group has its own variance. The correct way to calculate the overall standard deviation — and hence the CV — is by taking the square root of the weighted average of the variances.

In the context of ANOVA, the EMS represents the average of group variances. The formula for the coefficient of variation in this case is:

where ȳ is the overall mean of all data, and EMS is the residual mean square, calculated as:

where ESS is the error sum of squares, k is the number of groups, and r is the number of replicates per group.

This definition provides a consistent and meaningful estimate of the coefficient of variation.

When I wrote the book, I didn’t realize that the traditional definition of CV — “standard deviation divided by the mean” — could be misleading if the source of the standard deviation isn’t clearly explained.

The formula is only correct when dealing with a single sample or group. In experiments with multiple treatments, each with its own mean and variance, the overall standard deviation must come from the ANOVA — not from combining descriptive statistics across groups.

And this episode also taught me to write definitions more carefully.

This was the core of the student’s mistake: standard deviations don’t add — variances do.


Desvios-padrão não se somam. Por quê?

Recebi um e-mail de uma doutoranda da Unicamp — que não conheço pessoalmente — apontando, gentilmente, o que considerava um “pequeno erro de cálculo” no meu livro Análise de Variância (p. 47). Segundo ela, o valor do coeficiente de variação (CV) calculado no exemplo estaria incorreto.

Claro, fui direto conferir.

O exemplo do livro apresenta um ensaio com dois tratamentos (A e B) e cinco repetições por tratamento. Os dados são simples e servem apenas para ilustrar os cálculos da análise de variância (ANOVA). A tabela de dados e a tabela da ANOVA foram construídas com esse propósito didático.

Tabela de dados


Tabela da ANOVA


       No entanto, a leitora, que é da área de controle de qualidade, resolveu aplicar os      
       procedimentos que costuma usar: calculou as médias e os desvios-padrão de cada                    grupo, como se faz em análise de processos. Obteve os seguintes resultados:

Médias e os desvios-padrão

Tudo certo até aqui. Mas, ao prosseguir com a leitura do livro, ela encontrou o trecho em que afirmo: “Pode existir interesse em relacionar o desvio-padrão com a média, para se ter ideia da grandeza da dispersão em relação à média. Por definição, o coeficiente de variação (CV) é a razão entre o desvio-padrão e a média dos dados.”

Mais adiante, no mesmo capítulo, informo que: “Na análise de variância, o desvio-padrão é dado pela raiz quadrada do quadrado médio do resíduo.”

Como a doutoranda não fez a análise de variância (que não é usual em algumas áreas), ela não tinha o valor do quadrado médio do resíduo (QMR). Em vez disso, tomou a média dos desvios-padrão e dividiu pela média das médias para obter o CV. Esse cálculo está incorreto.

A média aritmética é diferente da média quadrática. Para dois números positivos a e b, segue-se que:

A igualdade só ocorre quando a = b. Portanto, a média de dois desvios-padrão é menor que a média das duas variâncias, a menos que essas variâncias sejam iguais.

Quando se trata de ensaios com mais de um grupo, como é o caso do exemplo, cada grupo tem sua própria variância. A maneira correta de calcular o desvio-padrão global — e, portanto, o CV — é usando a raiz quadrada da média das variâncias ponderadas.

No contexto da ANOVA, o QMR representa a média das variâncias dos grupos. A fórmula do coeficiente de variação, nesse caso, é:

 de todos os dados, e o QMR é o quadrado médio do resíduo, calculado por:

sendo SQR a soma de quadrados do resíduo, k o número de grupos e r o número de repetições por grupo.

Essa definição permite obter um valor consistente e comparável do coeficiente de variação.

Quando escrevi o livro, não percebi que a definição tradicional do CV — “desvio-padrão dividido pela média” — pode ser mal interpretada quando não se deixa claro de onde vem esse desvio-padrão.

A fórmula está correta somente quando se lida com uma única amostra ou grupo. Em ensaios com múltiplos tratamentos, cada um com suas médias e variâncias, o desvio-padrão do ensaio como um todo deve ser obtido da ANOVA, e não por simples combinação das estatísticas descritivas dos grupos.

       Este episódio me ensinou que é preciso escrever definições com mais cuidado. 



Saturday, July 26, 2025

Skewness in Data Distributions: An Introductory Discussion

 


1. What Is Skewness in a Data Distribution?


        ·      If the left tail is longer or more pronounced than the right, the distribution is said to have negative skewness.

        ·      If the right tail is longer, the distribution has positive skewness.

        ·      Otherwise, the distribution is symmetric.


Textbooks often include histograms to illustrate long tails and how extreme values can can shift the mean upward or downward.


Figure 1

Histograms showing skewness types (negative, symmetric, positive)



Source: David P. Doane & Lori E. Seward (2011) Measuring Skewness:

 A Forgotten Statistic? Journal of Statistics Education, 19 (2)


DOI: 10.1080/10691898.2011.11889611

2. Visual Tools for Assessing Skewness


Beyond histograms, several exploratory tools can help evaluate skewness:


      ·      Boxplots highlight dispersion and outliers.

      ·     Dotplots reveal distribution shape and sample size.

      ·     Stem-and-leaf plots preserve individual data points while showing distribution shape.


Figure 2

Boxplots and dotplots illustrating skewness and sample size


3. Mean, Median, and Mode: Where Is the Center?


The traditional textbook rule states:


       ·      If the mean > median, the distribution is right-skewed.

       ·      If the mean < median, it is left-skewed.


However, this rule may fail — specially for discrete or multimodal distributions.


Figure 3

Right skewness: mean greater than median

                      Source: von Hippel, P.T. (2005). Mean, Median, and Skew: Correcting a Textbook Rule. Journal of Statistics Education, 13(2). Link


Figure 4 

Right skewness even when mean is less than median


Source: von Hippel (2005), ibid.

4. Measuring Skewness: Pearson’s Coefficients


Since Karl Pearson (1895), statisticians have proposed various ways to quantify skewness.


            ·      Pearson’s First Skewness Coefficient (uses mode)

            ·      Pearson’s Second Skewness Coefficient (uses median)

Example

 

              Given a dataset with 

                         mean = 70.5

                         median = 80

                         mode = 85

                         standard deviation = 19.33 

            calculate both coefficients.


5. Caution with the Mode


Pearson's first coefficient relies on the mode, but the mode is not always reliable


                                                          Example


                                                    Set A: 1, 2, 3, 4, 5, 5 

The mode (5) does not reflect the center of the distribution.


Set B: 1, 2, 3, 3, 3, 3, 3, 3, 4.

 The mode (3)  clearly represents central tendency.


                  Avoid using the mode for skewness if it’s based on few values.

6. Interpreting Skewness Coefficients


The coefficient indicates both direction and degree of asymmetry:


          ·      Positive value → right-skewed.

          ·      Negative value → left-skewed.

          ·      Near zero →  symmetric.


The farther from zero, the stronger the skewness.

7. Statistical Moments and the Fisher–Pearson Coefficient


 Skewness can also be described using moments:


        ·     Second moment (m2) = variance:

       ·      Third moment (m3) = related to skewness :

       ·       Fisher–Pearson coefficient formula: 


Note: These formulas describe population parameters, not sample statistics.

So, they are rarely used in modern software.

8. The Adjusted Fisher–Pearson Coefficient (Used in Excel)


Modern software (e.g., Excel)  uses a bias-adjusted version for samples:

             

Example


Data: 3, 4, 5, 2, 3, 4, 5, 6, 4, 7.

Skewness = 0.359543 (decreases as sample size increases).

9. A Final Note on Symmetry Testing


Calculating skewness does not test for general symmetry Instead, it assumes the data come from a specific symmetric population (usually the normal distribution).



Friday, July 25, 2025

Probability in Reverse: Bayes' Theorem

 

Before presenting Bayes' theorem, it’s helpful to recall the definition of conditional probability to highlight the difference between this concept and the theorem itself.

Definition

The conditional probability of an event B given that event A has occurred is the chance of B happening under the condition that A has already occurred. It is denoted by P(B∣A), read as “the probability of B given A”.

                                                
       Key points:

   🔸 A and B are dependent events.
   🔸 Event A occurs before event B.

🛑 Example


An urn contains five balls that differ only by color: two red and three blue. Two balls are drawn without replacement, one after the other.


Question: What is the probability that the second ball is red, given that the first was blue?
A tree diagram helps visualize the possible outcomes in this situation. All conditional probabilities are shown, and the answer to the question is highlighted in yellow.



       The answer is obtained using the multiplication rule for dependent events.

                                                           
     Answer:  The probability that the second ball is red, given the first was blue is 3/10, ou 30%.

        Now that we understand conditional probability, let’s see how Bayes’ Theorem allows us to '                    reverse' the condition.

BAYES' THEOREM


⚠️ P(B∣A) and P(A∣B) may look similar, but they represent different ideas. Consider the following examples:

1. Let A = “has technical training”; B = “performs good service”.
     🔸 P(B∣A): probability of performing good service given technical training.
     🔸 P(A∣B): probability of having technical training given that good service was performed.

2. Let A = “was a good student in high school”; B = “passed the college entrance exam”.
    🔸 P(B∣A): probability of passing the exam given that the person was a good student.
    🔸 P(A∣B): probability of having been a good student given that the person passed the exam.

These pairs of probabilities often appear in real-life problems. Now let’s find a formula to calculate P(A∣B). From:

      we can write:
     and we have

Bayes’ Theorem:


🔔 Interpretation


Bayes’ Theorem reverses the order of information:

• Conditional probability deals with P(B∣A): probability of B occurring given A occurred.

• Bayes’ Theorem addresses P(A∣B): probability of A occurring given B occurred — that is, the reverse of conditional probability.


🛑 Example – Applying Bayes’ Theorem


Let’s revisit the urn example, but now with a different question:

Question: What is the probability that the first ball drawn was blue, given that the second was red?

  From the tree diagram, we see that the second ball being red can happen in two ways:

                        • Blue then Red (B–R)
                        • Red then Red (R–R)

  The event of interest is: first blue given second red.

    We apply Bayes’ Theorem:

                              


Answer: Using Bayes’ Theorem, the probability the first ball was blue given the second is red is 3/4, ou 75%.

🛑 Example – Breathalyzer Test


In a city, the breathalyzer test is mandatory.
         • 25% of drivers drink before driving.
         • Of those who drink, 99% test positive.
         • Of those who do not drink, 17% also test positive.


Question: If a driver tests positive, what is the chance they actually consumed alcohol?

Let the events be:
          • B: drinks
          • NB: does not drink
          • + : positive test

Compute P(B+) using the given data.

             

       Answer: If a driver tests positive,  the chance they actually consumed alcohol is 66%.

    🛑 Example – Horse Race


       Two horses race: White and Black.

                  • In 12 previous races, White won 5 times and Black 7.
                 • In 3 of White’s 5 victories, it was raining.
                 • In 1 of Black’s 7 victories, it was also raining.

       Question: It is raining now. What is the probability that White will win?
      
        Compute P(White winsrain)


     Answer: The probability that White wins is 3/4 ou 75%.


        CONCLUSION: 
    • "Bayes’ Theorem is powerful because it lets us update probabilities with new evidence. Remember: it ‘reverses’ the condition!"