Monday, November 24, 2025

Linear Regression Through the Origin: When to Force the Intercept to Zero

 

In regression analysis, the most common model includes an intercept term (constant). However, in specific situations, we are forced to make the regression line pass through the origin of the Cartesian plane (point (0,0)). This decision can be motivated by solid theoretical reasons or prior empirical evidence.

Why Use a Model Without an Intercept?

Two classic examples illustrate this need:

1.     Uniform Rectilinear Motion: In Physics, if an object starts from rest on a straight path, at the initial moment (time zero) the distance traveled is necessarily zero. A model that does not pass through the origin would make no physical sense.

2.     Young's Modulus: In Materials Engineering, Young's modulus, which measures the stiffness of a material, is defined by the slope of the stress-strain curve in the elastic regime. If no stress is applied, there is no strain. Therefore, the line modeling this behavior must pass through the origin. Figure 1 illustrates this relationship in the context of Young's modulus.

                                                              Figure 1


Although useful, adopting a no-intercept model should be done cautiously. It is good practice to compare its performance with the model that includes an intercept. The final choice can be controversial and intrinsically depends on the problem's context.

The Mathematical Model

By forcing the line through the origin, our model simplifies to:

Where:

    ·  X is the independent variable.

    ·  Y is the dependent variable.

   ·    b is the parameter (slope) we want to estimate.

    ·   e is the random error term.


 The estimate for the coefficient b is given by the formula:


   The fitted regression line will therefore be:


Evaluating the Model Fit

A crucial difference from the model with an intercept is that the sum of the residuals (Σei) is not necessarily zero. By forcing the line through (0,0), we lose the degree of freedom that "adjusted" the line's height to minimize the residuals.

To assess the quality of the fit, we use analysis of variance (ANOVA). The degrees of freedom are adjusted as follows:

·      Total SS: n degrees of freedom.

·      Regression SS: k degrees of freedom (where k -1).

·      Residual SS: n-k degrees of freedom.

Based on these calculations, we build the ANOVA table (Table 1).

Table 1


Practical Example

Consider the data in Table 2, where we want to fit a model that passes through the origin.

Table 2


With the data from Table 2, we calculate the coefficient b:

Thus, the equation of the regression line is:

It is also important to calculate quality metrics:

·      Standard Deviation (s)

·      Coefficient of Determination (R²)

·      t-value

For our data:


Figure 2 shows the scatter plot with the fitted regression line.

Figure 2

Checking the Result in Software

To validate our manual calculations, we can use statistical software. The Minitab output for this analysis is presented below and should corroborate our results.







No comments: