May
27

# Central tendency and variability

Filed Under (Data Analysis) by Molly on 27-05-2010 and tagged ,

The question about interpretation of data arose today.  I was with colleagues and the discussion focused on measures of central tendency and dispersion or variability.  These are important terms and concepts that form the foundation for any data analysis.  It is important to know how they are used.

MEASURES OF CENTRAL TENDENCY

There are five measures of central tendency.  Those measures of numbers that reflect the tendency of data to cluster around the center of the group.  Two of these (geometric mean and harmonic mean) won’t be discussed here as they are not typically used in the work Extension does.  The three I’m talking about are:

• Median Symbolized by Md (read M subscript d)
• Mode Symbolized by Mo (read M subscript o)

The mean is the sum of all the numbers divided by the total number of numbers.  Like this:

The median is the middle number of a sorted list (some folks use the 50% point), like this:

The mode is the most popular number, the number “voted” most frequently, like this:

Sometimes,  all of these measures fall on the same number, like this:

Sometimes, all of these measures fall on different numbers, like this:

MEASURES OF VARIABILITY

There are four measure of variability, three of which I want to mention today.  The fourth, known as the Mean (average) deviation, is seldom used in Extension work.  They are:

• Range Symbolized by R
• Variance Symbolized by V
• Standard deviation Symbolized by s or SD (for sample) and σ, the lower case Greek letter sigma (for standard deviation of a population).

The range is the difference between the largest and the smallest number in the sample, like this

In this example, the blue distribution (distribution A) has a larger range than the red distribution (Distribution B).

Variance is more technical.  It is the sum of squares of the deviations (difference from the mean) about the mean minus 1. Subtracting one removes the bias from the calculation and that allows for a more conservative estimate and being more conservative reduces possible error.

There is a mathematical formula for computing the variance.  Fortunately, a computer software program like SPSS or SAS will do it for you.

The standard deviation results when the square root is taken of the variance.  It gives us an indication of  “…how much each score in a set of scores, on average, varies from the mean” (Salkind, 2004, p. 41).  Again, there is a mathematical formula that is computed by a software package.  Most people are familiar with the mean and standard deviation of IQ scores: mean=100 and sd = plus or minus 20.

Convention has it that the lower case Greek letters are used for parameters of populations and Roman letters to represent  corresponding  estimates of samples.  So you would see σ for standard deviation (lower case sigma) and μ for mean (lower case mu) for populations and s (or sd for standard deviation) and for samples.

These statistics relate to the measurement scale you have chosen to use.  Permissible statistics for a nominal scale are frequency and mode; for ordinal scale, median and percentiles; for an interval scale, mean, variance, standard deviation,  and Pearson correlation; and for a ratio scale, the geometric mean.  So think seriously about reporting a mean for your Likert-type scale.  What exactly does that tell you?

Be Sociable, Share!