Wednesday, September 10, 2025

The Tukey-Kramer test

                                                                   Summary         

 When ANOVA indicates significant differences between groups, the next step is to identify which groups actually differ from each other. If the groups have unequal sizes (unbalanced samples), the Tukey-Kramer test is a reliable and robust choice. In this post, we explain how to apply this test step-by-step, with a real example and interpretation of the results.

Introduction

When a researcher obtains a significant result in ANOVA (Analysis of Variance) for an experiment involving three or more groups, there is a need to perform post-hoc tests to compare the means and identify which ones are statistically different.

Several tests are available for this purpose. This text covers the Tukey-Kramer test, which is recommended specifically for situations where the groups have unequal sizes. In these cases, it is necessary to adjust the procedure by replacing the common group size (n) with the individual sizes (ni and nj) of each pair being compared.

The Tukey-Kramer test

The Tukey-Kramer test assumes homoscedasticity, or homogeneous variances. Therefore, the mean square error (MSE) obtained from the analysis of variance (ANOVA) table is an estimate of the common variance of the variable.

The minimum significant difference (MSD) between the means of two groups of sizes ni and nj, denoted by dij is calculated using the following formula:

Where:

               ·       q (k, df, α) is the critical value from the studentized distribution.

               ·       k is the number of groups;

               ·       df is the degrees of freedom for the residual (error) in the ANOVA.

               ·       MSE is the mean square error.

               ·       α is the significance level (e.g. 0.05 for 5%).

Example

Table 1 presents data from an experiment with four groups (four brands of green tea). The means for each group are shown at the bottom of the table. The aim is to compare these means using the Tukey-Kramer test. First, an ANOVA must be performed, as shown in Table 2.

Next, pairwise comparisons of the group means are conducted. The test was applied using the value of q for a significance level of 5%, with k = 4 groups and df = n - k = 24 - 4 = 20 residual degrees of freedom.

Table 1: Folic acid (vitamin B) content in green tea leaves randomly selected from four brands (1)


Table 2: Analysis of variance for the data in Table 1

 For example:

To compare the mean of Brand 1 to Brand 2 (α = 5%), calculate:

To compare the mean of Brand 1 to Brand 3, use the same procedure, but with n₁ = 7 and n₃ = 6.

The same procedure is repeated for the remaining pairs. Table 3 shows the observed differences between the means, as well as the respective di,j values. If the absolute difference between two means is greater than the corresponding di,j, the null hypothesis of equality between those means (H₀: μi = μj) is rejected.

Table 3: Mean comparison using the Tukey–Kramer test                                                                                       

Interpretation

For example, the results in Table 3 indicate that Brand 1 has a significantly higher average folic acid content than Brand 4.

Approximation using the harmonic mean

Calculating all minimum significant differences for the Tukey-Kramer test is laborious if done by hand. Statistical software automates this process. In the past, when group sizes were approximately equal, a common approximation to simplify the calculation was to use the standard Tukey HSD formula, replacing n with the harmonic mean (H) of the sample sizes. The formula becomes:

This approach is an approximation and may not provide exact control of the significance level, but it can be found in older literature.

Using the data in Table 1, where the group sizes are 7, 5, 6 and 6, the harmonic mean H is calculated as follows:

                                                                                  
         Thus

Substituting these values yields a single d value for all comparisons. In this example, interpreting the results using this approximation remains consistent with the complete analysis.

Bibliography

1.    Chen. TS; Lui. CK; Smith. CH. Journal of the American Dietetic Association [1983.82(6):627-632].    Apud Devore. JL. Probability and Statistics for engineering and the sciences. Brooks Cole. 2015.On line books.  

2.    table of the studentized range - David Lane .             http://davidmlane.com/hyperstat/sr_table.html

3.    Multiple Comparisons With Unequal Sample Sizes

https://www.uvm.edu/~dhowell/gradstat/.../labs/.../Multcomp.html

            4 .  ANOVA & Tukey-Kramer test. https://www.youtube.com           

Thursday, September 04, 2025

The Principle Behind the Data: Understand Maximum Likelihood Estimation with Simple Examples

    

Have you ever wondered how statisticians can make statements about an entire population by studying only a small sample? The secret lies in methods such as Maximum Likelihood Estimation—a powerful technique that helps us make the “best guess” about unknown parameters.

What is Statistical Inference?

Statistical inference means obtaining information from a sample and, based on that, drawing conclusions about characteristics of the entire population from which the sample was taken. Among several methods that produce good estimators, today we will focus on the maximum likelihood method.

An Intuitive Example

Imagine a box with many blue and orange balls. You don't know which color is more frequent, but you know there are only two possibilities:

1.   Three blue balls for every orange ball → probability of blue: p = ¾

 


2.   One blue ball for every three orange balls → probability of blue: p = ¼

 

Now, you draw 3 balls with replacement and observe how many are blue. How do you decide what the true value of p is? See Table 1. 


Table 1. The Probabilities at Play


Nº of blue balls

p = ¾

p = ¼

0

1/64

27/64

1

9/64

27/64

2

27/64

9/64

3

27/64

 1/64


Figure 1. The Probabilities at Play


 

The strategy is simple: we choose the value of p that makes our observation most likely.

    If 0 or 1 blue ball comes out → we estimate p = ¼

    If 2 or 3 blue balls come out → we estimate p = ¾

You just used the maximum likelihood estimator!


From the Specific Case to the General


In the real world, we rarely have only two options. In an experiment with “success” or “failure” outcomes, we might have a sample like: S, S, F, F, S, F (3 successes in 6 trials).

The intuitive approach would lead us to calculate:

                                                   p̂ = x / n = 3 / 6 = 1/2

This is not only a reasonable choice—it is the maximum likelihood estimate. For n = 6 trials, the value p = ½ makes the observation of x = 3 successes the most likely of all possibilities. See Table 2.

Table 2: Probabilities associated with the occurrence of x successes in samples of size n = 6

P value

 

Number of success

 

0

1

2

3

4

5

6

p = 1/2

0,01563

0,09375

0,23438

0,3125

0,23438

0,09375

0,01563


Figure 2: Probabilities associated with the occurrence of x successes in samples of size n = 6


Why Does This Matter?


The maximum likelihood estimator is:

 Intuitive: It chooses the parameter that maximizes the chance of observing what we actually observe.

    Powerful: It can be applied to many statistical models that are much more complex than the simple binomial example.

  Consistent: With large samples, it tends to converge to the true value of the population parameter.

    Versatile: It forms the basis for a large number of modern statistical techniques used in data science, machine learning, and scientific research.

Practical Examples of Maximum Likelihood Estimation

Example 1: Screw Factory Quality Control

In a screw factory, quality control is performed by selecting a sample from each batch and checking how many are non-conforming. Consider that in a batch of 500 screws, 38 non-conforming ones were found.

·  What is the maximum likelihood estimator for the proportion of non-conforming screws?

·  What is the estimate obtained for this specific batch?

Solution:
The maximum likelihood estimator (MLE) for a proportion 
p in a binomial distribution is the sample proportion itself, given by the formula p̂ = x / n, where:

·  x is the number of "successes" (in this context, finding a non-conforming screw).

·  n is the sample size.

For this batch:

·  x = 38 (non-conforming screws)

·  n = 500 (total screws in the sample)

The estimate is therefore:
p̂ = 38 / 500 = 0.076 or 7.6%

Conclusion: The maximum likelihood estimate for the proportion of non-conforming screws in the batch is 7.6%.


Example 2: Election Poll (Corrected Version)

Election polls are an attempt to capture voters' intentions at a specific moment by conducting a limited number of interviews. It is, therefore, an effort to measure the whole from a part. Imagine a polling institute conducted a preliminary election poll for mayor in a specific municipality. There were two candidates, which we will call A and B.

500 voters were interviewed, yielding the following results:

·  220 votes would be for candidate A

·  180 votes would be for candidate B

·  The remaining voters were undecided.

a) What is the maximum likelihood estimate for the proportion of undecided voters in the population?
b) What is the maximum likelihood estimate for the proportion of votes for candidate A?

Solution:
The same principle applies. The MLE for a population proportion 
p is the sample proportion p̂ = x / n.

a) Proportion of Undecided Voters:

·  Number of undecided voters in sample (x): 500 - 220 - 180 = 100

·  Sample size (n): 500

·  Estimate: p̂_undecided = 100 / 500 = 0.20 or 20%

b) Proportion of Votes for Candidate A:

·  Number of votes for A in sample (x): 220

·  Sample size (n): 500

·  Estimate: p̂_A = 220 / 500 = 0.44 or 44%

Conclusion: Based on this sample, the maximum likelihood estimates are a 20% proportion of undecided voters and a 44% vote proportion for candidate A in the broader population.