1. What Is Skewness in a Data Distribution?
·
If
the left tail is longer or more pronounced than the right, the distribution is
said to have negative skewness.
·
If
the right tail is longer, the distribution has positive skewness.
·
Otherwise,
the distribution is symmetric.
Textbooks
often include histograms to illustrate long tails and how extreme values can
Figure 1
Histograms showing skewness types (negative, symmetric, positive)
Source:
David P. Doane & Lori E. Seward (2011) Measuring Skewness:
A Forgotten Statistic? Journal of Statistics Education, 19 (2)
DOI: 10.1080/10691898.2011.11889611
2. Visual Tools for
Assessing Skewness
Beyond
histograms, several exploratory tools can help evaluate skewness:
· Boxplots highlight dispersion and outliers.
· Dotplots reveal distribution shape and sample size.
· Stem-and-leaf plots preserve individual data points while showing distribution
shape.
Figure 2
Boxplots and dotplots illustrating skewness and sample size
3. Mean, Median, and
Mode: Where Is the Center?
The
traditional textbook rule states:
· If the mean > median, the distribution is right-skewed.
· If the mean < median, it is left-skewed.
However,
this rule may fail — specially for discrete or multimodal distributions.
Figure 3
Right skewness: mean greater than median
Source: von
Hippel, P.T. (2005). Mean, Median, and Skew: Correcting a Textbook
Rule. Journal of Statistics Education, 13(2). Link
Figure 4
Right skewness even when mean is less than median
Source: von Hippel (2005), ibid.
4. Measuring
Skewness: Pearson’s Coefficients
Since
Karl Pearson (1895), statisticians have proposed various ways to quantify
skewness.
· Pearson’s First Skewness Coefficient (uses mode)
· Pearson’s Second Skewness Coefficient (uses median)
Example
Given a dataset with
mean = 70.5
median = 80
mode = 85
standard deviation = 19.33
calculate both coefficients.
5. Caution with the
Mode
Pearson's first coefficient relies on the mode, but the mode is not always reliable.
Example
Set A: 1, 2, 3, 4, 5, 5
The mode (5) does not reflect the center of the
distribution.
Set B: 1, 2, 3, 3, 3, 3, 3, 3, 4.
The mode (3) clearly represents central tendency.
Avoid using the mode for skewness if it’s based on few values.
6. Interpreting
Skewness Coefficients
The coefficient indicates both direction and degree of asymmetry:
· Positive value → right-skewed.
· Negative value → left-skewed.
· Near zero → symmetric.
The
farther from zero, the stronger the skewness.
7. Statistical
Moments and the Fisher–Pearson Coefficient
Skewness can also be described using moments:
· Second moment (m2) = variance:
· Third moment (m3) = related to skewness :
· Fisher–Pearson coefficient formula:
Note: These formulas describe population parameters, not sample statistics.
So, they are rarely used in modern software.
8. The Adjusted
Fisher–Pearson Coefficient (Used in Excel)
Modern software (e.g., Excel) uses a bias-adjusted version for samples:
Example
Data: 3, 4, 5, 2, 3, 4, 5, 6, 4, 7.
Skewness
= 0.359543 (decreases as sample size increases).
9. A Final Note on
Symmetry Testing
Calculating skewness does not test for general symmetry. Instead, it assumes the data come from a specific symmetric population (usually the normal distribution).
No comments:
Post a Comment