Have you
ever wondered how statisticians can make statements about an entire population
by studying only a small sample? The secret lies in methods such as Maximum
Likelihood Estimation—a powerful technique that helps us make the “best guess”
about unknown parameters.
What is Statistical Inference?
Statistical
inference means obtaining information from a sample and, based on that, drawing
conclusions about characteristics of the entire population from which the
sample was taken. Among several methods that produce good estimators, today we
will focus on the maximum likelihood method.
An Intuitive Example
Imagine
a box with many blue and orange balls. You don't know which color is more
frequent, but you know there are only two possibilities:
1.
Three blue balls for every
orange ball → probability of blue: p = ¾
2.
One blue ball for every
three orange balls → probability of blue: p = ¼
Now, you draw 3 balls with replacement and observe how many are blue. How do you decide what the true value of p is? See Table 1.
Table 1.
The Probabilities at Play
Nº of blue balls |
p = ¾ |
p = ¼ |
0 |
1/64 |
27/64 |
1 |
9/64 |
27/64 |
2 |
27/64 |
9/64 |
3 |
27/64 |
1/64 |
Figure 1. The Probabilities at Play
The strategy is simple: we choose the value of p that makes our
observation most likely.
• If 0 or 1
blue ball comes out → we estimate p = ¼
• If 2 or 3
blue balls come out → we estimate p = ¾
You just used the maximum likelihood estimator!
From the Specific Case to the
General
In the real world, we rarely have only two options. In an experiment
with “success” or “failure” outcomes, we might have a sample like: S, S, F, F,
S, F (3 successes in 6 trials).
The intuitive approach would lead us to calculate:
p̂ = x / n = 3 / 6 = 1/2
This is not only a reasonable choice—it is the maximum likelihood
estimate. For n = 6 trials, the value p = ½ makes the observation of x = 3
successes the most likely of all possibilities. See Table 2.
Table 2: Probabilities associated with the occurrence
of x successes in samples of size n = 6
P value |
|
Number
of success |
|
||||
0 |
1 |
2 |
3 |
4 |
5 |
6 |
|
p = 1/2 |
0,01563 |
0,09375 |
0,23438 |
0,3125 |
0,23438 |
0,09375 |
0,01563 |
Figure 2: Probabilities associated with the occurrence
of x successes in samples of size n = 6
Why Does This Matter?
The maximum likelihood estimator is:
• Intuitive: It
chooses the parameter that maximizes the chance of observing what we actually
observe.
• Powerful:
It can be applied to many statistical models that are much more complex than
the simple binomial example.
• Consistent: With large samples, it tends to
converge to the true value of the population parameter.
• Versatile: It forms the basis for a large number of modern statistical techniques used in data science, machine learning, and scientific research.
Practical
Examples of Maximum Likelihood Estimation
Example
1: Screw Factory Quality Control
In a screw factory, quality
control is performed by selecting a sample from each batch and checking how
many are non-conforming. Consider that in a batch of 500 screws, 38
non-conforming ones were found.
·
What is the maximum likelihood estimator for the
proportion of non-conforming screws?
·
What is the estimate obtained for this specific
batch?
Solution:
The maximum likelihood estimator (MLE) for a proportion p in
a binomial distribution is the sample proportion itself, given by the
formula p̂ = x / n, where:
·
x is the number of "successes" (in
this context, finding a non-conforming screw).
·
n is the sample size.
For this batch:
· x = 38 (non-conforming screws)
·
n = 500 (total screws in the sample)
The
estimate is therefore:
p̂ = 38 / 500 = 0.076 or 7.6%
Conclusion: The
maximum likelihood estimate for the proportion of non-conforming screws in the
batch is 7.6%.
Example
2: Election Poll (Corrected Version)
Election polls are an
attempt to capture voters' intentions at a specific moment by conducting a
limited number of interviews. It is, therefore, an effort to measure the whole
from a part. Imagine a polling institute conducted a preliminary election poll
for mayor in a specific municipality. There were two candidates, which we will
call A and B.
500 voters were
interviewed, yielding the following results:
·
220 votes would be for candidate A
·
180 votes would be for candidate B
·
The remaining voters were undecided.
a) What
is the maximum likelihood estimate for the proportion of
undecided voters in the population?
b) What is the maximum likelihood estimate for the proportion of
votes for candidate A?
Solution:
The same principle applies. The MLE for a population proportion p is the sample proportion p̂ =
x / n.
a)
Proportion of Undecided Voters:
·
Number of undecided voters in sample (x): 500
- 220 - 180 = 100
· Sample size (n): 500
· Estimate: p̂_undecided = 100 / 500 = 0.20 or 20%
b)
Proportion of Votes for Candidate A:
·
Number of votes for A in sample (x): 220
· Sample size (n): 500
· Estimate: p̂_A = 220 / 500 = 0.44 or 44%
Conclusion: Based
on this sample, the maximum likelihood estimates are a 20% proportion
of undecided voters and a 44% vote proportion for candidate A
in the broader population.