Sunday, August 31, 2025

Why the Mean Doesn’t Tell the Whole Story: Standard Deviation Made Simple

 

You probably know the mean, which is a measure of central tendency in a dataset. But the mean doesn’t tell the whole story.

For example: the average yearly spending of a person does not explain possible excesses on certain days or lack of money at the end of some months.

In science and social sciences, it’s important to know how much the data vary around the mean. The less they vary, the better the mean represents the whole dataset.

When the Mean Misleads

Imagine two situations:

·     Ages of children in a preschool class:
3; 4; 3; 5; 5
The mean is 4, which represents the group well.

·     Ages of students in an adult literacy class:
45; 19; 83; 55; 43
The mean is also 49, but here it does not represent the group, because the ages are very spread out.

So, data can be tightly clustered or widely spread around the mean, and we need a measure to capture that.

First Attempt: Mean of the Deviations

One idea is to compute the mean of the deviations from the mean.
But this doesn’t work: positive and negative deviations cancel out, and the result is always zero.

Example:
Data = 14; 14; 6; 6
Mean = 10
Deviations = +4; +4; –4; –4
Sum = 0

Second Attempt: Absolute Deviations

What if we use absolute values of the deviations?

          Example 1:
14; 14; 6; 6 → mean = 10
Sum of absolute deviations = |4|+|4|+|–4|+|–4| = 16
Mean absolute deviation = 4

          Example 2:
17; 11; 4; 8 → mean = 10
Sum of absolute deviations = |7|+|1|+|–6|+|–2| = 16
Mean absolute deviation = 4

But notice: the second set of numbers is more spread out, yet the value is the same. So this method fails.

The Solution: Standard Deviation

The next step is to square each deviation from the mean. That way:

1.       Negative values disappear (all become positive).

2.       Larger deviations carry more weight.

3.       The math becomes easier, allowing smooth algebraic manipulation later.

Then, we compute the mean of these squared deviations and take the square root.

          Example 1

14; 14; 6; 6 → mean = 10
Squared deviations = 16; 16; 16; 16
Average = 64/4 = 16
Square root = 4

          Example 2

17; 11; 4; 8 → mean = 10
Squared deviations = 49; 1; 36; 4
Average = 90/4 = 22.5
Square root = 4.74

Now it works! The second set shows greater spread, and the number reflects that.

What We Learned

·             The standard deviation is never negative.

·             It is larger when data are more spread out.

·             It gives more importance to values far from the mean.

·             It’s a robust measure, widely used in statistics, social sciences, economics, health, and many other fields.

📌 In short: the standard deviation is the ruler that measures how far data stray from the center. It shows when the mean is reliable — and when it can be misleading.

Saturday, August 30, 2025

Por que a média não conta toda a história: desvio-padrão explicado sem mistério

 

Você conhece bem a média, que é uma medida de tendência central de um conjunto de dados. Mas a média não conta toda a história.

Por exemplo: a média de gastos em dinheiro de uma pessoa durante o ano não explica possíveis excessos em determinados dias nem a falta de dinheiro nos finais de alguns meses.

Em ciência e nas ciências sociais, é importante saber o quanto os dados variam em torno da média. Quanto menos os dados variam, mais a média representa bem o conjunto.

A média pode enganar

Imagine duas situações:

·        Idades dos alunos de uma turma infantil:
             3; 4; 3; 5; 5
A média é 4 — e nesse caso ela representa bem o grupo.

·        Idades dos alunos em um curso de alfabetização de adultos:
           45; 19; 83; 55; 43
A média também é 49, mas aqui ela não é um bom resumo, porque a dispersão é grande.

Ou seja, dados podem estar concentrados ou espalhados em torno da média, e precisamos de uma medida que capture isso.

Primeira tentativa: média dos desvios

Uma ideia inicial é calcular a média dos desvios em relação à média.
Mas isso não funciona: os desvios positivos e negativos se cancelam, e o resultado é sempre zero.

                  Exemplo
Dados = 14; 14; 6; 6
Média = 10
Desvios = +4; +4; –4; –4
Soma = 0

Segunda tentativa: valores absolutos dos desvios

Que tal usar os valores absolutos?

               Exemplo 1
14; 14; 6; 6 → média = 10
Soma dos desvios absolutos = |4|+|4|+|–4|+|–4| = 16
Média dos desvios absolutos = 4

             Exemplo 2:
17; 11; 4; 8 → média = 10
Soma dos desvios absolutos = |7|+|1|+|–6|+|–2| = 16
Média dos desvios absolutos = 4

Mas repare: os dados do segundo conjunto são mais espalhados e, mesmo assim, o valor saiu igual. Esse método não resolve.

A solução: o desvio-padrão

O próximo passo é elevar cada desvio ao quadrado. Assim:

1.   Os valores negativos desaparecem (todos ficam positivos).

2.   Desvios maiores ganham mais peso.

3.   A matemática fica simples, permitindo cálculos posteriores com facilidade.

Depois, calculamos a média desses quadrados e tiramos a raiz quadrada do resultado.

          Exemplo 1

14; 14; 6; 6 → média = 10
Quadrados dos desvios = 16; 16; 16; 16
Média = 64/4 = 16
Raiz quadrada = 4

          Exemplo 2

17; 11; 4; 8 → média = 10
Quadrados dos desvios = 49; 1; 36; 4
Média = 90/4 = 22,5
Raiz quadrada = 4,74

Agora sim! O segundo conjunto mostra maior dispersão, e o número traduz isso.

O que aprendemos

·  O desvio-padrão nunca é negativo.

·  Ele é maior quando os dados estão mais espalhados.

·  Dá mais importância aos pontos longe da média.

·  É uma medida robusta, usada em estatística, ciências sociais, economia, engenharia saúde e muitas outras áreas.

📌 Em resumo: o desvio-padrão é a régua que mede o quanto os dados se afastam do centro. Ele mostra quando a média é confiável — e quando pode ser uma ilusão.

 

 

Friday, August 29, 2025

Tukey (HSD), Student-Newman-Keuls (SNK), and Duncan (MRT) Tests

 

1. Introduction
An analysis of variance (ANOVA) can indicate whether significant differences exist among group means, but it does not identify which specific groups differ from each other. To determine this, we use post hoc tests—also known as multiple comparison procedures.

Among the most well-known are Tukey’s test, the Student-Newman-Keuls (SNK) test, and the Duncan Multiple Range Test (MRT). While all three compare means after a significant ANOVA, they operate on different statistical principles. This article explores what each test offers and the situations where each may be most appropriate.

2. Approach to Statistical Significance

·        Tukey’s HSD (Honest Significant Difference)
A conservative test that strictly controls the family-wise error rate (FWER), which is the probability of making at least one Type I error (falsely rejecting a true null hypothesis). In practice, this means Tukey’s test identifies fewer statistically significant differences, but it offers strong protection against false positives.

·        Student-Newman-Keuls (SNK)
A less conservative test than Tukey’s HSD. It employs a sequential stepwise procedure that offers greater statistical power (the ability to detect a true effect) than Tukey, but at the cost of a higher risk of Type I errors across the set of comparisons. It controls the per-comparison error rate but not the overall family-wise rate.

·        Duncan’s MRT (Multiple Range Test)
The most liberal of the three tests. It is designed to maximize power and therefore tends to identify the largest number of significant differences. However, this comes with a significantly increased risk of false positives (Type I errors). For this reason, its use is often discouraged in formal, confirmatory research contexts.

👉 In terms of statistical power:

·        Tukey may fail to detect subtle but true differences (higher risk of Type II errors).

·        SNK offers a middle ground, often revealing patterns that Tukey misses.

·        Duncan is the most powerful but deliberately increases the Type I error rate to achieve this.

3. Key Conceptual Differences

·        Tukey: Uses a single critical value from the studentized range distribution for all pairwise comparisons, rigorously controlling the experiment-wise error rate.

·        SNK: Uses a sequential (stepwise) procedure. It ranks the means and applies different critical values based on the number of steps between means in the ordered list. It does not control the overall experiment-wise error rate.

·        Duncan: Also uses a stepwise procedure but varies the significance level (α) at each step, using a more liberal (larger) α for comparisons between means that are closer together. This unique approach requires specialized tables, such as those found in Harter (1960).

👉 In summary:

·        Tukey: Conservative; controls the overall family-wise error.

·        SNK: Moderate; fixed α per comparison, no overall error control.

·        Duncan: Liberal; α varies, maximizes the chance of detecting differences.

4. General Procedure
The detailed computational procedures for each test are beyond the scope of this text (and are typically handled by software).
In brief:

·        Both Tukey and SNK are based on the studentized range statistic.

·        Tukey applies a single critical value to all pairwise differences.

·        SNK and Duncan are stepwise tests. They begin by comparing the largest and smallest means in the ordered set and then proceed to compare subsets of means. Duncan differs by adjusting its significance level based on the range of means being compared.

5. When to Use Each Test?
The choice among these tests is not purely statistical; it also involves consideration of the consequences of error.

·        Tukey’s HSD (along with other conservative tests like Scheffé’s) should be used when the cost of a false positive is high. Examples include pharmaceutical trials, clinical research, or any setting where acting on a false discovery could be dangerous or very costly.

·        SNK is a good intermediate option. It offers more power than Tukey for detecting true effects while being less prone to false positives than Duncan. It is useful in exploratory research where a balance between risk and discovery is desired.

·        Duncan’s MRT is best suited for highly exploratory analyses where the priority is to avoid missing any potential effect (high power is paramount), and a higher number of false positives is acceptable. Examples include preliminary screening of agricultural varieties or product formulations where follow-up confirmation is planned.

⚠️ Warning: Many statisticians discourage the use of Duncan’s test in formal confirmatory research due to its liberal nature and lack of control over the family-wise error rate.

6. Practical Example (Using Provided Data)

Raw Data:

·        Group A: 25, 26, 20, 23, 21 | Mean = 23.0

·        Group B: 31, 25, 28, 27, 24 | Mean = 27.0

·        Group C: 22, 26, 28, 25, 29 | Mean = 26.0

·        Group D: 33, 29, 31, 34, 28 | Mean = 31.0

The ordered means are: D (31.0), B (27.0), C (26.0), A (23.0)

A one-way ANOVA yielded a significant result F (3,16) = 7.798, p = 0.002), justifying the use of post-hoc tests to compare groups.

The results from the three post-hoc tests are summarized below.

Factor

Mean

Tukey Grouping

SNK Grouping

Duncan Grouping

D

31.0

a

a

a

B

27.0

a b

b

b

C

26.0

b

b

b c

A

23.0

b

c

c

Interpretation of Results:

·        Tukey’s HSD: Suggests that Group D is significantly different from Groups A and C, but not from Group B. It finds no significant difference between Groups A, B, and C amongst themselves. This is the most conservative interpretation.

·        SNK Test: Provides a clearer separation, identifying distinct tiers: 1) Group D (highest), 2) Groups B, C and A are lowest. It detects that D is significantly higher than B a distinction Tukey did not make.

·        Duncan’s MRT: Offers the most granular separation. It agrees with SNK that Group D is the highest, and that Group A is the lowest, a finding not supported by the other tests. This highlights its liberal nature.

7. Conclusion
This example clearly demonstrates that the choice of post-hoc test can directly influence the conclusions drawn from an experiment. It is therefore crucial to select the multiple comparison procedure a priori—before data is collected—based on the research context and the balance between tolerance for Type I (false positive) and Type II (false negative) errors. The choice should be guided by statistical philosophy and the consequences of error, not by which test yields the most desirable outcome.