Saturday, July 26, 2025

Skewness in Data Distributions: An Introductory Discussion

 


1. What Is Skewness in a Data Distribution?


        ·      If the left tail is longer or more pronounced than the right, the distribution is said to have negative skewness.

        ·      If the right tail is longer, the distribution has positive skewness.

        ·      Otherwise, the distribution is symmetric.


Textbooks often include histograms to illustrate long tails and how extreme values can can shift the mean upward or downward.


Figure 1

Histograms showing skewness types (negative, symmetric, positive)



Source: David P. Doane & Lori E. Seward (2011) Measuring Skewness:

 A Forgotten Statistic? Journal of Statistics Education, 19 (2)


DOI: 10.1080/10691898.2011.11889611

2. Visual Tools for Assessing Skewness


Beyond histograms, several exploratory tools can help evaluate skewness:


      ·      Boxplots highlight dispersion and outliers.

      ·     Dotplots reveal distribution shape and sample size.

      ·     Stem-and-leaf plots preserve individual data points while showing distribution shape.


Figure 2

Boxplots and dotplots illustrating skewness and sample size


3. Mean, Median, and Mode: Where Is the Center?


The traditional textbook rule states:


       ·      If the mean > median, the distribution is right-skewed.

       ·      If the mean < median, it is left-skewed.


However, this rule may fail — specially for discrete or multimodal distributions.


Figure 3

Right skewness: mean greater than median

                      Source: von Hippel, P.T. (2005). Mean, Median, and Skew: Correcting a Textbook Rule. Journal of Statistics Education, 13(2). Link


Figure 4 

Right skewness even when mean is less than median


Source: von Hippel (2005), ibid.

4. Measuring Skewness: Pearson’s Coefficients


Since Karl Pearson (1895), statisticians have proposed various ways to quantify skewness.


            ·      Pearson’s First Skewness Coefficient (uses mode)

            ·      Pearson’s Second Skewness Coefficient (uses median)

Example

 

              Given a dataset with 

                         mean = 70.5

                         median = 80

                         mode = 85

                         standard deviation = 19.33 

            calculate both coefficients.


5. Caution with the Mode


Pearson's first coefficient relies on the mode, but the mode is not always reliable


                                                          Example


                                                    Set A: 1, 2, 3, 4, 5, 5 

The mode (5) does not reflect the center of the distribution.


Set B: 1, 2, 3, 3, 3, 3, 3, 3, 4.

 The mode (3)  clearly represents central tendency.


                  Avoid using the mode for skewness if it’s based on few values.

6. Interpreting Skewness Coefficients


The coefficient indicates both direction and degree of asymmetry:


          ·      Positive value → right-skewed.

          ·      Negative value → left-skewed.

          ·      Near zero →  symmetric.


The farther from zero, the stronger the skewness.

7. Statistical Moments and the Fisher–Pearson Coefficient


 Skewness can also be described using moments:


        ·     Second moment (m2) = variance:

       ·      Third moment (m3) = related to skewness :

       ·       Fisher–Pearson coefficient formula: 


Note: These formulas describe population parameters, not sample statistics.

So, they are rarely used in modern software.

8. The Adjusted Fisher–Pearson Coefficient (Used in Excel)


Modern software (e.g., Excel)  uses a bias-adjusted version for samples:

             

Example


Data: 3, 4, 5, 2, 3, 4, 5, 6, 4, 7.

Skewness = 0.359543 (decreases as sample size increases).

9. A Final Note on Symmetry Testing


Calculating skewness does not test for general symmetry Instead, it assumes the data come from a specific symmetric population (usually the normal distribution).



No comments: