Tuesday, November 25, 2025

Analysis of Variance (ANOVA): Assumptions and Data Transformation

 

1.     Introduction

 

The assumptions required for an analysis of variance (ANOVA) are not always perfectly met by real-world data. However, researchers who choose this procedure need to be assured that, even if they do not fully meet the necessary assumptions (normality of residuals and homogeneity of variances), their data will still be suitable.

 

It is known that minor deviations from normality do not seriously compromise the validity of the ANOVA, especially when group sizes are equal or similar. Similarly, minor violations of homogeneity of variances have little practical relevance - except in two critical situations:

 

         1) when there is asymmetry in the residuals;

         (2) when there is positive kurtosis in the residuals.

 

In any case, the F-test remains the most powerful of the available tests provided that its assumptions are met. Otherwise, researchers should consider using non-parametric tests or resorting to data transformation. Transformations are particularly useful for stabilising the variance, but also generally help to approximate the distribution to normality.

 

2.     What does it mean to transform data? 

 

Transforming data involves applying a mathematical operation to each observation and conducting statistical analyses using the resulting values. The best-known transformations are listed below.

 

2.1.         Square Root 

 

In general, variables obtained by counting do not have a constant variance or a normal distribution. For count data (e.g. the number of insects or bacterial colonies, or the prevalence of lesions), it is recommended that the square root is applied to each observation before proceeding with ANOVA. This usually results in a more constant variance.

Practical note: If the observed values are small (less than 10) or there are many zeros, it is recommended, to avoid problems with the square root of zero, use the Anscombe transformation:

​​

or a simplified, older correction that is also effective:

before conducting the analysis.

2.2.      Logarithm

Many biological variables (such as tree height, body weight and survival time) follow a lognormal distribution. In these cases, taking the logarithm (decimal or natural) of the variable helps stabilize the variance and approximate the distribution to normality. One classic indication that this transformation is needed is when the variance of the groups increases proportionally with the mean.

2.3.      Arc sine of the square root of the proportion

If the variable is a proportion or percentage (e.g. the percentage of seeds that germinate), ANOVA can only be applied directly if the proportions strictly vary between 0.3 and 0.7. If many values fall outside this range, it is recommended that the transformation is applied.

                                              Y = arcsin(√p).

 3. Final considerations

For those unfamiliar with statistics, transforming data may seem like suspicious 'manipulation'. It is not. It is a legitimate and widely accepted technique that is often necessary when alternatives are unavailable.

Although modern software offers alternative methods, such as Welch's test for one-way analysis of variance, transforming the original variable may be the only feasible and robust approach to satisfy the model assumptions for more complex analysis of variance models, such as split-plot designs or hierarchical models.

Researchers must always be able to justify their chosen transformation and, ideally, use the most common transformation in their field of study.

Important: even if the statistical analysis was performed using transformed data, the descriptive results (means, standard errors, graphs, etc.) must be presented on the original scale of the variable. To achieve this, the transformation must be 'undone' (back-transformed) using the inverse function of the original transformation.


Monday, November 24, 2025

Linear Regression Through the Origin: When to Force the Intercept to Zero

 

In regression analysis, the most common model includes an intercept term (constant). However, in specific situations, we are forced to make the regression line pass through the origin of the Cartesian plane (point (0,0)). This decision can be motivated by solid theoretical reasons or prior empirical evidence.

Why Use a Model Without an Intercept?

Two classic examples illustrate this need:

1.     Uniform Rectilinear Motion: In Physics, if an object starts from rest on a straight path, at the initial moment (time zero) the distance traveled is necessarily zero. A model that does not pass through the origin would make no physical sense.

2.     Young's Modulus: In Materials Engineering, Young's modulus, which measures the stiffness of a material, is defined by the slope of the stress-strain curve in the elastic regime. If no stress is applied, there is no strain. Therefore, the line modeling this behavior must pass through the origin. Figure 1 illustrates this relationship in the context of Young's modulus.

                                                              Figure 1


Although useful, adopting a no-intercept model should be done cautiously. It is good practice to compare its performance with the model that includes an intercept. The final choice can be controversial and intrinsically depends on the problem's context.

The Mathematical Model

By forcing the line through the origin, our model simplifies to:

Where:

    ·  X is the independent variable.

    ·  Y is the dependent variable.

   ·    b is the parameter (slope) we want to estimate.

    ·   e is the random error term.


 The estimate for the coefficient b is given by the formula:


   The fitted regression line will therefore be:


Evaluating the Model Fit

A crucial difference from the model with an intercept is that the sum of the residuals (Σei) is not necessarily zero. By forcing the line through (0,0), we lose the degree of freedom that "adjusted" the line's height to minimize the residuals.

To assess the quality of the fit, we use analysis of variance (ANOVA). The degrees of freedom are adjusted as follows:

·      Total SS: n degrees of freedom.

·      Regression SS: k degrees of freedom (where k -1).

·      Residual SS: n-k degrees of freedom.

Based on these calculations, we build the ANOVA table (Table 1).

Table 1


Practical Example

Consider the data in Table 2, where we want to fit a model that passes through the origin.

Table 2


With the data from Table 2, we calculate the coefficient b:

Thus, the equation of the regression line is:

It is also important to calculate quality metrics:

·      Standard Deviation (s)

·      Coefficient of Determination (R²)

·      t-value

For our data:


Figure 2 shows the scatter plot with the fitted regression line.

Figure 2

Checking the Result in Software

To validate our manual calculations, we can use statistical software. The Minitab output for this analysis is presented below and should corroborate our results.







Monday, November 17, 2025

A Practical Guide to Post-Hoc Pairwise Comparisons: Choosing Between Liberal and Conservative Tests

 

Introduction

When comparing k populations using ANOVA, there are m = k(k-1)/2 possible pairwise comparisons between means. If these comparisons were not pre-planned (also known as unplanned or post-hoc comparisons) and were chosen after the researcher examined the sample means, it is more appropriate to use a test that controls the significance level for the entire experiment, not just for an individual comparison.

Key Definitions

       ·        Comparisonwise Error Rate (CER): The probability of committing a Type I error when comparing a single pair of means from a set of k means.

      ·        Experimentwise Error Rate (EER) or Familywise Error Rate (FWER): The probability of committing at least one Type I error when performing all m pairwise comparisons from a set of k means.
Two specific types are distinguished:

  o   Complete Null EERC: The experimentwise error rate when all population means are truly equal.

  o   Partial Null EERC: The experimentwise error rate when some means are equal and others are not.

The Trade-Off: Power vs. Protection

Tests that control the experimentwise error rate are conservative—they reject the null hypothesis of equal means less easily, resulting in lower statistical power. Conversely, tests that control only the comparisonwise error rate are liberal, as they find significance more easily and therefore have higher power.

A Spectrum of Tests: From Liberal to Conservative
According to the classic classification by Winner (1962), multiple comparison tests can be ordered from most liberal to most conservative as follows:

       1.     Duncan's Multiple Range Test (MRT)

      2.     Student-Newman-Keuls Test (SNK)

      3.     Fisher's Least Significant Difference (LSD)

      4.     Tukey's Honestly Significant Difference (HSD)

      5.     Scheffé's Test

This means that applying Duncan's test will likely yield more statistically significant differences between means than using Scheffé's test.

Illustrative Example with Fictional Data

Means and standard deviations on blood pressure are depicted in Table 1 and the analysis of variance (ANOVA) in table 2.

Table 1: Blood Pressure Reduction (mmHg) by Treatment Group

Treatment

N

Mean(mmHg)

Standard deviation

A

5

21

5.10

B

5

8

7.07

C

5

10

5.83

D

5

29

5.10

E

5

13

7.07

Control

5

2

5.48


Table 2: Analysis of Variance (ANOVA)


      ·        Duncan's MRT and SNK: Both tests provide different critical values for the difference between means, depending on the rank of the means. Comparing the critical ranges shows that Duncan's test is more liberal, declaring significance more easily (its minimum significant differences are smaller than SNK's).

 

Table 3: Critical range for Duncan’s and Student Newman Keuls (SNK) tests

 

Test

Critical range for Number of Means in the Range (p)

2

3

4

5

6

Duncan´s

7.83

8.23

8.48

8.66

8.79

SNK

7.83

9.75

10.47

11.18

11.73

 

          ·        Fisher's LSD, Tukey's HSD, and Scheffé's Test: A comparison of the critical differences clearly shows the spectrum: Fisher's LSD is the most liberal (smallest critical difference), followed by Tukey's HSD, with Scheffé's test being the most conservative (largest critical difference).

 

                     Table 4: Critical Differences for Pairwise Comparison Tests


  

Test

Critical difference

Fisher's LSD

7.83

Tukey's HSD

11.73

Scheffé's

13.74

Practical Recommendations (Based on SAS/STAT 9.2 Manual)

        1.     Use the unprotected LSD test if you are interested in several individual comparisons and are not concerned with multiple inferences.

        2.     For all pairwise comparisons, use Tukey's test.

        3.     For comparisons with a control group, use Dunnett's test.

Choosing the Right Test: A Decision Framework

Imagine an experiment with more than two groups analyzed by a one-way ANOVA at a 5% significance level. For unplanned comparisons, the researcher has several options:

         ·        To control the experimentwise error rate at 5%, use Tukey's HSD (for all pairs) or Dunnett's test (vs. a control). The trade-off is a lower comparisonwise error rate.

        ·        For higher power, use Fisher's LSD (unprotected), Duncan's MRT, or SNK. These maintain a ~5% comparisonwise error rate, but the experimentwise error rate will be much higher (depending on the number of treatments).

Context Matters

        ·        Choose a Conservative Test (Tukey, Dunnett, planned LSD) when you need high confidence to reject H₀. This is crucial in fields like pharmacology, where recommending a new drug with unknown side effects requires strong evidence of its superiority over the standard treatment.

      ·        Choose a Liberal Test (unprotected LSD, Duncan) when you need high discriminatory power. This is common in product testing or agronomy, where the primary goal is to detect any potential difference, and a false positive is less consequential than missing a real difference. Alternatively, using a conservative test like Tukey at a 10% significance level also increases power.

Final Considerations

       ·        Scheffé's Test has excellent mathematical properties but is often considered excessively conservative for simple pairwise comparisons.

      ·        Bonferroni Correction is best suited for a small, pre-defined number of comparisons, as it becomes overly conservative with a large number of tests.

      ·        No Single Best Test: All procedures have advantages and disadvantages. While not exact, using a formal method for comparing means prevents conclusions from being entirely subjective. The researcher always has a margin of choice in both the selection of the test and the establishment of the significance level.

A Note on Software

The calculations for this guide were performed using SAS software. Results from other software packages or hand calculations may show slight differences due to rounding. Differences are typically more pronounced for the SNK test, as its critical values are less standardized across different sources.