In
regression analysis, the most common model includes an intercept term
(constant). However, in specific situations, we are forced to make the
regression line pass through the origin of the Cartesian plane (point (0,0)).
This decision can be motivated by solid theoretical reasons or prior empirical
evidence.
Why Use a Model Without an Intercept?
Two classic examples illustrate this
need:
1. Uniform Rectilinear Motion: In
Physics, if an object starts from rest on a straight path, at the initial
moment (time zero) the distance traveled is necessarily zero. A model that does
not pass through the origin would make no physical sense.
2. Young's Modulus: In Materials
Engineering, Young's modulus, which measures the stiffness of a material, is
defined by the slope of the stress-strain curve in the elastic regime. If no
stress is applied, there is no strain. Therefore, the line modeling this
behavior must pass through the origin. Figure 1 illustrates this relationship
in the context of Young's modulus.
Figure
1
Although
useful, adopting a no-intercept model should be done cautiously. It is good
practice to compare its performance with the model that includes an intercept.
The final choice can be controversial and intrinsically depends on the
problem's context.
The Mathematical Model
By forcing the line through the origin,
our model simplifies to:
Where:
·
X is the independent variable.
·
Y is the dependent variable.
· b is the parameter (slope) we want to estimate.
· e is the random error term.
The estimate for the coefficient b is given by the formula:
The
fitted regression line will therefore be:
Evaluating the Model Fit
A crucial difference from the model with
an intercept is that the sum
of the residuals (Σei) is not necessarily zero.
By forcing the line through (0,0), we lose the degree of freedom that
"adjusted" the line's height to minimize the residuals.
To assess the quality of the fit, we use
analysis of variance (ANOVA). The degrees of freedom are adjusted as follows:
·
Total SS: n degrees of freedom.
·
Regression SS: k degrees of freedom (where k -1).
·
Residual SS: n-k degrees of freedom.
Based on these calculations, we build
the ANOVA table (Table 1).
Table 1
Practical Example
Consider the data in Table 2, where we
want to fit a model that passes through the origin.
Table 2
With
the data from Table 2, we calculate the coefficient b:
Thus,
the equation of the regression line is:
It
is also important to calculate quality metrics:
·
Standard Deviation (s)
· Coefficient of Determination (R²)
·
t-value
For our data:
Figure
2 shows the scatter plot with the fitted regression
line.
Figure 2
Checking the Result in Software
To validate our manual calculations, we
can use statistical software. The Minitab output for this analysis is presented
below and should corroborate our results.
No comments:
Post a Comment