May « 2010 « Evaluation is an Everyday Activity

The question about interpretation of data arose today. I was with colleagues and the discussion focused on measures of central tendency and dispersion or variability. These are important terms and concepts that form the foundation for any data analysis. It is important to know how they are used.

MEASURES OF CENTRAL TENDENCY

There are five measures of central tendency. Those measures of numbers that reflect the tendency of data to cluster around the center of the group. Two of these (geometric mean and harmonic mean) won’t be discussed here as they are not typically used in the work Extension does. The three I’m talking about are:

Mean Symbolized by (read bar X) NOTE: The Bar X refers to the arithmetic mean of a SAMPLE (see last week’s blog entry).

Median Symbolized by Md (read M subscript d)

Mode Symbolized by Mo (read M subscript o)

The mean is the sum of all the numbers divided by the total number of numbers. Like this:

The median is the middle number of a sorted list (some folks use the 50% point), like this:

The mode is the most popular number, the number “voted” most frequently, like this:

Sometimes, all of these measures fall on the same number, like this:

Sometimes, all of these measures fall on different numbers, like this:

MEASURES OF VARIABILITY

There are four measure of variability, three of which I want to mention today. The fourth, known as the Mean (average) deviation, is seldom used in Extension work. They are:

Range Symbolized by R

Variance Symbolized by V

Standard deviation Symbolized by s or SD (for sample) and σ, the lower case Greek letter sigma (for standard deviation of a population).

The range is the difference between the largest and the smallest number in the sample, like this

In this example, the blue distribution (distribution A) has a larger range than the red distribution (Distribution B).

Variance is more technical. It is the sum of squares of the deviations (difference from the mean) about the mean minus 1. Subtracting one removes the bias from the calculation and that allows for a more conservative estimate and being more conservative reduces possible error.

There is a mathematical formula for computing the variance. Fortunately, a computer software program like SPSS or SAS will do it for you.

The standard deviation results when the square root is taken of the variance. It gives us an indication of “…how much each score in a set of scores, on average, varies from the mean” (Salkind, 2004, p. 41). Again, there is a mathematical formula that is computed by a software package. Most people are familiar with the mean and standard deviation of IQ scores: mean=100 and sd = plus or minus 20.

Convention has it that the lower case Greek letters are used for parameters of populations and Roman letters to represent corresponding estimates of samples. So you would see σ for standard deviation (lower case sigma) and μ for mean (lower case mu) for populations and s (or sd for standard deviation) and for samples.

These statistics relate to the measurement scale you have chosen to use. Permissible statistics for a nominal scale are frequency and mode; for ordinal scale, median and percentiles; for an interval scale, mean, variance, standard deviation, and Pearson correlation; and for a ratio scale, the geometric mean. So think seriously about reporting a mean for your Likert-type scale. What exactly does that tell you?

Having addressed the question about which measurement scale was used (“Statistics, not the dragon you think”), I want to talk about how many groups are being included in the evaluation and how those groups are determined.

The first part of that question is easy–there will be either one, two, or more than two groups. Most of what Extension does results in one group, often an intact group. An intact group is called a population and consists of all the participants in the program. All program participants can be a very large number or a very small number.

The Tree School program is an example that has resulted in a very large number of participants (hundreds) . It is a program that has been in existence for about 20 years. Contacting all of these participants would be inefficient. On the other hand, the 4H science teacher training program involved a small number participants (about 75) and has been in existence for 5 years. Contacting all participants would be efficient.

With a large population, choosing a part of the bigger group is the best approach. The part chosen is called a sample and is only a part of a population. Identifying a part of the population starts with the contact list of participants. The contact list is called the sampling frame. It is the basis for determining the sample.

Identifying who will be included in the evaluation is called a sampling plan or a sampling approach. There are two types of sampling approaches–probability sampling and nonprobability sampling. Probability sampling methods are those which assure that the sample represents the population from which it is drawn. Nonprobability sampling methods are those which are based on characteristics of the population. Including all participants works well for a population with less than 100 participants. If there are over 100 participants, choosing a subset of the sampling frame will be more efficient and effective. There are several ways to select a sample and reduce the population to a manageable number of participants. Probability sampling approaches include:

simple random sampling
stratified random sampling
systematic sampling
cluster sampling

Nonprobability sampling approaches include:

convenience sampling
snowball sampling
quota sampling
focus groups

More on these sampling approaches later.

I had a conversation today about how to measure if I was making a difference in what I do. Although the conversation was referring to working with differences, I am conscious that the work work I do and the work of working with differences transcends most disciplines and positions. How does it relate to evaluation?

Perspective and voice.

These are two sides of the same coin. Individuals come to evaluation with a history or perspective. Individuals voice their view in the development of evaluation plans. If individuals are not invited and/or do not come to the table for the discussion, a voice is missing.

This conversation went on–the message was that voice and perspective are more important in evaluations which employ a qualitative approach rather than a quantitative approach. Yes—and no.

Certainly, words have perspective and provide a vehicle for voice. And words are the basis for qualitative methods. So this is the “Yes”. Is this still an issue when the target audience is homogeneous? Is it still an issue when the evaluator is “different” on some criteria than the target audience. Or as one mental health worker once stated, only an addict can provide effective therapy to another addict. Is that really the case? Or do voice and perspective always over lay an evaluation?

Let’s look at quantitative methods. Some would argue that numbers aren’t affected by perspective and voice. I will argue that the basis for these numbers is words. If words are turned into numbers are voice and perspective still an issue? This is the “Yes and no”.
I am reminded of the story of a brook and a Native American child. The standardized test asked which of the following is similar to a brook. The possible responses were (for the sake of this conversation) river, meadow, lake, inlet. The Native American child, growing up in the desert Southwest, had never heard of the word “brook”. Consequently got the item wrong. This was one of many questions where perspective affected the response. Wrong answers were totaled to a number subtracted from the possible total and a score (a number) resulted. That individual number was grouped with other individual numbers and compared to numbers from another group using a statistical test (for the sake of conversation), a t-test. Is the resulting statistic of significance valid? I would say not. So this is the “No”. Here the voice and perspective have been obfuscated.

The statistical significance between those groups is clear according to the computation; clear that is until one looks at the words behind the numbers. It is in the words behind the numbers that perspective and voice affect the outcomes.

Statistics are not the dragon you think it is.

For many people, the field of statistics is a dragon in disguise and like dragons, most people shy away from statistics.

I have found that Neil Salkind’s book “Statistics for People Who (Think They) Hate Statistics” a good reference for understanding the basics of statistics. The 4th edition is due out in September 2010. This book isn’t intimidating; it is easy to understand; it isn’t heavy on the math or formulas; it has a lot of tips. I’m using it for this column. I keep it on my desk along with Dillman.

Faculty who come to me with questions about analyzing their data typically want to know how to determine statistical significance. But before I can talk to faculty about statistical significance, there are a few questions that need to be answered.

What type of measurement scale have you used?
How many groups do you have on which you have data?
How many variables do you have for those groups?
Are you examining relationships or differences?
What question(s) you want to answer?

Most people immediately jump to what test to use. Don’t go there. Start with what measurement scale do you have. Then answer the other questions.

So let’s talk about scales of measurement. All data are not created equally. Some data are easier to analyze than other data. Scale of measurement makes that difference.

There are four scales of measurement and most data fall into one of these four. They are either categorical (even if they have been converted to numbers) or numerical (originally numbers). They are:

nominal
ordinal
interval
ratio

Scales of measurement are rules determining the particular levels at which outcomes are measured. When you decide on an answer to a question, you are deciding on the scale of measurement, you are agreeing to the particular set of characteristics for that measurement.

Nominal scales name something. For example–gender is either male, female, or unknown/not stated; ethnicity is one of several names of groups. When you gather demographic data, such as gender, ethnicity, or race, you are employing a nominal scale. The data that result from nominal scales are categorical data–that is data resulting from categories which are mutually exclusive from each other. The respondent is either male or female, not both.

Ordinal scale orders something; it puts the thing being measured in order–high to low; low to high. Salkind gives the example of ranking candidates for a job. Extension professionals (and many/most survey professionals) use ordinal scales in surveys (strongly agree to strongly disagree; don’t like to like a lot). We do not know how much difference is between don’t like and likes a lot. The data that result from ordinal scales are categorical data.

Interval scale is based on a continuum of equally spaced intervals along that continuum. Think of a thermometer; test score; weight. We know that the intervals along the scale are equal to one another. The data that result from interval scales are numerical data.

Ratio scale is a scale with absolute zero or a situation where the characteristic of interest is absent–like zero light or no molecular movement. This rarely happens social or behavioral science, the work that most Extension Professionals do. The data that result from ratio data are numerical data.

Why do we care?

Scales are ordered from the least precise (nominal) to the most precise (ratio).

The scale used determines the detail provided by the data collected; more precision, more information.
The more precise scale is a scale which contains all the qualities of less precise scales (interval has the qualities of ordinal and nominal).

Using an inappropriate scale will invalidate your data and provide you with spurious outcomes which yield spurious impacts.

Evaluation is an Everyday Activity

Program Evaluation Discussions

Monthly Archives: May 2010

Central tendency and variability

Population or sample?

Two sides of the same coin

Statistics, not the dragon you think.

Contact Info