Monday, October 27, 2025

A Practical Rule for Residual Degrees of Freedom in ANOVA

 

Imagine you are conducting an experiment in an area with a fertility gradient. The land is on a slope and is therefore more fertile at the bottom than at the top. You want to compare four treatments, which we will call A, B, C, and D, and you decide to arrange them in five blocks. Each block can accommodate four plots. The experimental design could be the one shown in Figure 1.

            Figure 1: Layout of a randomized complete block design

Table 1 presents the Analysis of Variance (ANOVA) for this experiment.

             Table 1: Analysis of Variance (ANOVA)

This design is appropriate because the variation within each block has been minimized (by grouping similar fertility levels together), and the variation between blocks has been maximized. But what can be said about the number of residual degrees of freedom?

The most repeated criticism in experimental work is that the sample size is too small. Sometimes it is also argued that the number of residual degrees of freedom should be greater than 10 or 12. But why?

Remember that you want to compare four treatments. Therefore, the degrees of freedom for treatments are necessarily 3. If you increase the sample size, by how much does the residual degrees of freedom increase? Look at Table 2, which shows the increase in residual degrees of freedom as the sample size—more specifically, the number of blocks—increases.

   Table 2: Residual Degrees of Freedom for 4 Treatments and a Varying Number of Blocks

Now, observe Table 3 below. It provides some critical values of F for 3 degrees of freedom in the numerator (because you are comparing four treatments) and various degrees of freedom in the denominator (the residual). Notice that the critical F-values stabilize after the denominator has about 12 degrees of freedom. Therefore, increasing the number of blocks beyond this point does not help much in achieving statistical significance.

           Table 3: Critical F-values at the 5% significance level for 3 numerator df and various denominator df.

This becomes clearer by looking at Figure 2. The F-value is what determines significance. So, your ability to detect differences between the means of the four treatments improves if you organize five blocks instead of four (the critical F decreases from 3.86 to 3.49). However, it does not improve as much if you use six blocks instead of five (the critical F only decreases from 3.49 to 3.29).

        Figure 2: A graph plotting the data from Table 3, showing the critical F-value rapidly decreasing and then leveling off as the residual df increases.

This is the origin of the practical rule: aim for at least 12 residual degrees of freedom in the ANOVA. But note well: this is for 4 treatments. In agricultural sciences, it is common to compare 4 or even more treatments. Therefore, this rule is quite reasonable.

Summary

Here is a well-established and practical rule of thumb.

 

·        The power of an ANOVA F-test to detect differences between treatments depends on the critical F-value.

·        This critical F-value drops quickly as the residual degrees of freedom (df) increase from a low number but stabilizes after around 10-12 df.

·        Therefore, beyond a certain point (e.g., 12 residual df), adding more replicates (blocks) provides diminishing returns for the cost and effort involved.

Saturday, October 25, 2025

Beyond the Average: 5 Types of Means

       Introduction

When we talk about the "average," we're usually thinking of just one number. But in statistics, there isn't just one way to find the center of your data—there are several. Each type of mean provides a unique perspective, and choosing the right one can reveal a more accurate story hidden within the numbers.

Here are five essential means, from the everyday arithmetic mean to the robust trimmed mean.

 

1.     Arithmetic mean

 

The arithmetic mean of a set of data is the sum of all data divided by the number of data in the set. For example, a student obtained grades of 7.0, 3.0, 5.5, 6.5, and 8.0 in mathematics. He passed because the average grade is:


                                              ·        Mean = (7.0 + 3.0 + 5.5 + 6.5 + 8.0)/5 = 6.0 

 

The arithmetic mean of a sample is represented by x (read as x-bar or x-slash). The sample size is indicated by n. So, the formula for calculating the arithmetic mean of a sample is:

 

·        x̄ = (1/n) Σ xᵢ = (x₁ + x₂ + ... + xₙ)/n

       2- Weighted average 

The weighted average is the sum of the products of the data (x) by their respective weights (p), divided by the sum of the weights.

To understand how weighted averages are calculated, imagine that a student took three tests in a certain subject in which the material is cumulative, that is:

  • in the first test, questions were asked about the material taught up to the date of that first test; 
  • in the second test, questions were asked about the material taught from the beginning of the course up to the date of that second test.
  • in the third test, questions were asked about the material taught from the beginning of the course up to the end of the course.

It is reasonable that the grade for the first test should have less weight (in other words, count for less in the final grade) than the second; it is also reasonable that the grade for the second test should have less weight than the third. Consider the following weights were  proposed: 1, 2, 3.

Imagine that the student obtained grades of 4, 7, and 6, which had weights of 1, 2, and 3, respectively. The student's weighted average is


                                           ·        x̄ = (1×4 + 2×7 + 3×6)/(1 + 2 + 3) = 36/6 = 6.0

Notice that to obtain the weighted average of a student's grades, each grade by was multiplied by its respective weight; products were added; weights were added and was applied the formula:

·        Formula: x̄ = Σ(xᵢpᵢ)/Σpᵢ

 3 Geometric mean

 

The geometric mean is given by the nth root of the product of n data points.

The geometric mean is difficult to calculate and, perhaps because of this characteristic, is rarely used.

Here is an example. Let's calculate the geometric mean of the following data: 2, 3, 5, and 10.

 

                                       ·        G = ⁴√(2×3×5×10) = ⁴√300 = 4.16

To perform this calculation, use a calculator or apply logarithms. Since


      Therefore


        So, given n values of variable X,the geometric mean is
 

                                                           
 

The Greek letter ∏ (pronounced pi) is used as a mathematical symbol to indicate that all observed values of X must be multiplied. In mathematics, this letter is read as product.


4. Harmonic mean


The harmonic mean of n data points is the inverse of the arithmetic mean of the inverse of these values.

As an example, consider two numbers, 2 and 4. To calculate the harmonic mean, indicated here by H, invert the numbers, determine the arithmetic mean of these inverses, and invert the arithmetic mean to find the harmonic mean:


 

To calculate the harmonic mean, apply the formula:


5. Trimmed mean

 

Trimmed mean is a way to calculate an average by first removing a small percentage of the highest and lowest values. After taking out these extreme values, the average is calculated using the usual method.

Let’s say, as an example, a figure skating competition produces the following scores:

6.0, 8.1, 8.3, 9.1, 9.9.

 

The mean for the scores would equal:

 

    

To trim the mean by a total of 40%, we remove the lowest 20% and the highest 20% of values, eliminating the scores of 6.0 and 9.9.

Next, we calculate the mean based on the calculation:                                                      

Conclusion

Although the arithmetic mean is the most familiar, it is not always the most appropriate measure.
Choosing the right type of mean depends on the nature of the data and the purpose of the analysis.
Understanding the differences among arithmetic, weighted, geometric, harmonic, and trimmed means ensures that statistical summaries accurately reflect what the data reveal.

Há varios tipos de médias

 A Estatística oferece várias maneiras de resumir um conjunto de dados, mas nem todas as “médias” são iguais. Dependendo da situação, um tipo de média pode descrever melhor os dados do que outro. A seguir, apresentamos cinco tipos de médias comumente usadas — da simples média aritmética à média aparada, mais especializada.

1. Média aritmética

A média aritmética de um conjunto de dados é a soma de todos os valores dividida pelo número total de valores. Por exemplo, um estudante obteve as notas 7,0, 3,0, 5,5, 6,5 e 8,0 em Matemática. Ele foi aprovado porque a média das notas é:

·        Média = (7,0 + 3,0 + 5,5 + 6,5 + 8,0)/5 = 6,0

A média amostral é representada por x̄ (lê-se “x barra”), e o tamanho da amostra é indicado por n. A fórmula é:

·        x̄ = (1/n) Σ xᵢ = (x₁ + x₂ + ... + xₙ)/n

2. Média ponderada

A média ponderada é a soma dos produtos entre cada valor (x) e seu respectivo peso (p), dividida pela soma dos pesos.

Exemplo: três provas com pesos 1, 2 e 3, e notas 4, 7 e 6, respectivamente:

·        x̄ = (1×4 + 2×7 + 3×6)/(1 + 2 + 3) = 36/6 = 6,0

·        Fórmula: x̄ = Σ(xᵢpᵢ)/Σpᵢ

3. Média geométrica

A média geométrica é a raiz enésima do produto de n valores. Por exemplo, a média geométrica de 2, 3, 5 e 10 é:

·        G = ⁴√(2×3×5×10) = ⁴√300 = 4,16

·        Fórmula geral: G = (∏xᵢ)^(1/n)

4. Média harmônica

A média harmônica de n valores é o inverso da média aritmética dos inversos desses valores.

Exemplo para os números 2 e 4:

·        H = 1 / [(1/2)(1/2 + 1/4)] = 3/8

·        Fórmula geral: H = 1 / [(1/n) Σ(1/xᵢ)]

5. Média aparada

A média aparada é calculada eliminando uma porcentagem dos valores mais altos e mais baixos antes de calcular a média usual. Por exemplo, com as notas 6,0, 8,1, 8,3, 9,1 e 9,9:

·        Média = (6,0 + 8,1 + 8,3 + 9,1 + 9,9)/5 = 8,28

·        Aparando 40% (removendo 6,0 e 9,9): (8,1 + 8,3 + 9,1)/3 = 8,50

Conclusão

Embora a média aritmética seja a mais conhecida, nem sempre é a mais adequada. A escolha do tipo de média depende da natureza dos dados e do objetivo da análise. Compreender as diferenças entre as médias aritmética, ponderada, geométrica, harmônica e aparada ajuda a garantir que os resumos estatísticos representem corretamente o que os dados revelam.


👉 Siga-me também no Instagram: [@soniavieira.estatistica]

Thursday, October 23, 2025

The Eternal Question in Qualitative Research: How Many Interviews Are Enough?

 

If you've ever designed a qualitative study, you've faced the inevitable and tricky question: "What should my sample size be?" If you asked a methodologist, they'd likely say, "It depends." If you ask an ethics committee, they demand a specific number. And if you are working on a master's thesis, you likely have to navigate the various opinions of your advisor.

As a statistics professor, I tend to encourage large samples. But I know that in qualitative research, especially in studies involving personal interviews, the statistician's voice carries less weight. Anyway, I have faced many anxious students, trying to get my opinion. In an attempt to shed some light on the discussion, I will outline some knowledge that should help those who are searching at this crucial moment in their research.

This dilemma is as relevant today as it was a decade ago. While the core principle remains—depth over breadth—our understanding of how to justify the "right" number has become more sophisticated. Let's explore the current answers to this perennial question.

1. The Classic Compass: Pragmatic and Experiential Advice

Some timeless advice still holds immense value for early-career researchers, especially for dissertations.

        ·        Uwe Flick reminds us that design must balance what we want to know with the time needed, access to participants, and, crucially, the budget.

        ·        Rola Ajjawi points to the trade-off between depth and breadth. A deep phenomenological study might need only 6-12 participants, while a study seeking broader perspectives might require more.

       ·        Adler & Adler offered famously practical advice for students: interview about twelve people for a master's thesis. This number is manageable and provides a rich learning experience in interviewing and analysis. For a doctoral thesis, they suggested around 30.

This pragmatic guidance is a safe harbor for many. But the prevailing wind in qualitative research has long been blowing toward a different concept: saturation.

2. The Gold Standard? The Evolution of "Saturation"

For years, saturation has been the go-to justification for sample size. It’s the point where you stop collecting data because new interviews are no longer yielding new insights; you're hearing the same themes repeated.

However, this common definition has faced valid criticism:

        ·        Is it just a "feeling" the researcher has?

        ·        What if the next person would have provided a radically different story?

        ·        Are we stopping at "descriptive" repetition, or have we reached a deeper, "theoretical" understanding?

How the Concept Has Matured:

Research by scholars like O'Reilly & Parker pushed the field to refine its thinking. We now often distinguish between:

       ·        Code Saturation (or Descriptive Saturation): When no new codes or themes are emerging from the data. You've "heard it all."

       ·        Meaning Saturation (or Theoretical Saturation): A deeper level where you fully understand the nuances, relationships, and contours of your themes. You're not just hearing new codes, but you've developed a robust theoretical model that explains the data.

This distinction is crucial. A study might achieve code saturation after 15 interviews, but may need 5 more to truly flesh out the meaning and relationships between those codes, reaching meaning saturation.

3. The Modern Toolkit: New Ways to Justify Your Number

So, how do you navigate this in practice today, especially when you need to propose a sample size to an ethics committee before you start?

Here are two powerful contemporary approaches:

A) The "Information Power" Model:
This model suggests that the more information power your sample holds, the fewer participants you need. Your sample size is adequate when your data is rich enough to answer your research question.
This depends on:

       ·        Aim of the Study (broad vs. narrow)

       ·        Sample Specificity (highly specific, experienced participants vs. a general group)

       ·        Use of Established Theory (are you building new theory or testing an existing one?)

       ·        Quality of Dialogue (the depth of the interviews)

       ·        Analysis Strategy (a detailed, nuanced analysis requires less data)

B) The "Saturation Model":
This approach, gaining traction in health research, involves specifying a "stopping rule" for data collection.
You can pre-define:

     ·        base size (e.g., an initial 10 interviews).

     ·        run length (e.g., 3 additional interviews).

     ·        new information threshold (e.g., less than 10% new information).

You would analyze the base size, then conduct the "run" of new interviews. If the new data falls below your threshold for new information, you stop. If not, you do another run. This provides a transparent, empirical justification for your final sample size.

Conclusion: From a Single Number to a Strategic Plan

So, what is the sample size for qualitative research in 2024?

The answer is no longer a single magic number. It's a strategic, justified decision.

     1.     Start with a Pragmatic Estimate: Use rules of thumb (like Adler & Adler's) or the Information Power model to propose a plausible range (e.g., 15-25 participants) for your ethics proposal.

     2.     Plan for an Iterative Process: Clearly state that this is an initial estimate and that data collection will continue until saturation is achieved. Specify which type of saturation you are seeking (e.g., code saturation).

     3.     Be Transparent in Reporting: When you publish, don't just say "saturation was reached." Describe how you assessed it. Did you use a saturation grid? Track cumulative themes? This transparency is the new standard of rigor.

The goal remains to tell a compelling, valid, and deep story about your data. The methods for getting there have simply become more transparent, defensible, and nuanced. And I hope you develop a sensitivity for the method and start accumulating experience—even if it's hard-won—to confidently throw yourself into new research.

References


         1.     Flick, Uwe in Baker, S. E. e Edwards, R How many qualitative interviews is enough. (2012)  http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.  

  2. Ajjawi, R. Sample size in qualitative research Medical Education Research Network. blogs.cmdn.dundee.ac.uk/meded.../tag/sample-size/

        3.     Adler, P.A. e Adler, P. in Baker, S. E. e Edwards, R How many qualitative interviews is enough. (2012)  http://eprints.ncrm.ac.uk/2273/4/how_many_interviews. 

      4.     Mason, M Sample Size and Saturation in PhD Studies Using Qualitative Interviews. FQS.  Volume 11, Nº 3, Art. 8 – September 2010.

     5.     O’Reilly, M., & Parker, N. ‘Unsatisfactory Saturation’: a critical exploration of the notion of saturated sample sizes in qualitative research. Qualitative Research, 13(2), 190-197. (2013).

     6.     Keen, A. Saturation in qualitative research: distinguishing between descriptive and theoretical saturation www.rcn.org.uk/

     7.     Look for: How many interviews are needed in a qualitative research ...www.researchgate.net/.../How_many_interviews_are

Acknowledgements

This post was updated with the assistance of DeepSeek's AI research assistant in synthesizing recent developments in qualitative methodology.