June « 2010 « Evaluation is an Everyday Activity

Last week, I talked about formative and summative evaluations. Formative and summative evaluation roles can help you prioritize what evaluations you do when. I was then reminded of another approach to viewing evaluation that relates to prioritizing evaluations that might also be useful.

When I first started in this work, I realized that I could view evaluation in three parts–process, progress, product. Each part could be conducted or the total approach could be used. This approach provides insights to different aspects of a program. It can also provide a picture of the whole program. Deciding on which part to focus is another way to prioritize an evaluation.

Process evaluation captures the HOW of a program. Process evaluation has been defined as the evaluation that assesses the delivery of the program (Scheirer, 1994). Process evaluation identifies what the program is and if it is delivered as intended both to the “right audience” and in the “right amount”. The following questions (according to Scheirer) can guide a process evaluation:

Why is the program expected to produce its results?
For what types of people may it be effective?
In what circumstances may it be effective?
What are the day-to-day aspects of program delivery?

Progress evaluation captures the FIDELITY of a program–that is, did the program do what the planners said would be done in the time allotted? Progress evaluation has been very useful when I have grant activities and need to be accountable for the time-line.

Product evaluation captures a measure of the program’s products or OUTCOMES. Sometimes outputs are also captured and this is fine. Just keep in mind that outputs may be (and often are) necessary; they are not sufficient for demonstrating the impact of the program. A product evaluation is often summative. However, it can also be formative, especially if the program planners want to gather information to improve the program rather than to determine the ultimate effectiveness of the program.

This framework may be useful in helping Extension professionals decide what to evaluate and when. It may help determine what program needs a process, progress, or product evaluation. Trying to evaluate all your program all at once often defeats being purposeful in your evaluation efforts and often leads to results that are confusing, invalid, and/or useless. It makes sense to choose carefully what evaluation to do when–that is, prioritize.

A question was raised in a meeting this week about evaluation priorities and how to determine them. This reminded me that perhaps a discussion of formative and summative was needed as knowing about these roles of evaluation will help you answer your questions about priorities.

Michael Scriven coined the terms formative and summative evaluation in the late 1960s. Applying these terms to the role evaluation plays in a program has been and continues to be a useful distinction for investigators. Simply put, formative evaluation provides information for program improvement. Summative evaluation provides information to assist decision makers in making judgments about a program, typically for adoption, continuation, or expansion. Both are important.

When Extension professionals evaluate a program at the end of an training or other program, typically, they are gathering information for program improvement. The data gathered after a program are for use by the program designers to help improve it. Sometimes, Extension professionals gather outcome data at the end of a training or other program. Here, information is gathered to help determine the effectiveness of the program. These data are typically short term outcome data, and although they are impact data of a sort, they do not reflect the long term effectiveness of a program. These data gathered to determine outcomes are summative. In many cases, formative and summative are gathered at the same time.

Summative data are also gathered to reflect the intermediate and long term outcomes. As Ellen Taylor-Powell points out when she talks about logic models, impacts are the social, economic, civic, and/or environmental consequences of a program and tend to be longer term. I find calling these outcomes condition changes helps me keep in mind that they are the consequences or impacts of a program and are gathered using a summative form of evaluation.

So how do you know which to use when? Ask yourself the following questions:

What is the purpose of the evaluation? Do you want to know if the program works or if the participants were satisfied?
What are the circumstances surrounding the program?Is the program in its early development or late development? Are the politics surrounding the program challenging?
What resources are available for the evaluation? So you have a lot of time or only a few weeks? Do you have access to people to help you or are you on your own?
What accountability is required? Do you have to report about the effectiveness of a program or do you just have to offer it?
What knowledge generation is expected or desired? Do you need to generate scholarship or support for promotion and tenure?

Think of the answers to these questions as a decision tree as the answers to these questions will help you prioritize your evaluation. Those answers will help you decide if you are going to conduct a formative evaluation, a summative evaluation, or include components of both in your evaluation.

Last week, I talked briefly about what test to use to analyze your data. Most of the evaluation work conducted by Extension professionals results in one group, often an intact group or population. Today I want to talk about what you can do with those data.

One of the first things you can do is to run frequencies and percentages on these data. In fact, I recommend you compute them as the first analyzes you run. Most softwear (SPSS, SAS, Excel, etc.) programs will do this for you. When you run frequencies in SPSS, the computer returns an output that looks something like the first image:

When compute frequencies in SAS, the resulting output looks like the second image:

Both images report frequencies, percentages of those frequencies, and cumulative percentage (that is, it adds the percents of frequency A to the percent of frequency B, etc. until 100% is reached).

To compute frequencies in Excel, read here. Excel has a number of COUNT functions depending on what you want to know.

Once you have computed frequencies and percentages, most people want to know if change occurred. Although there are other analyses which can be performed (reliability, validity, correlation, prediction), all of these require that you know what type of data do you have

nominal–whether people’s answers named something (e.g., gender; marital status);
ordinal–whether people ordered their responses on how strongly they agreed (e.g., agree or disagree);
interval–the scores on a standardized scale (e.g., temperature or nutrition test).

If you have nominal data and you want to compute change, you need to know how many times participants are answering the questionnaire and how many categories you have in your questions (e.g.pre/post; yes/no). If you are giving the questionnaire twice and there are two categories for some of your questions, you can compute the McNemar change test. The McNemar change test is a non-parametric (meaning that the parameters are not known) that is applied to a 2×2 contingency table. It tests for changes in responses using the chi-square distribution and is useful for detecting changes in responses due to “before-and-after” designs. A 2×2 contingency table has two columns and two rows. The frequencies from the nominal data are in the cells where the rows and columns cross; the totals for rows and columns are in the margins (or the last row and the far right column). SPSS computes the following statistics when a cross tabs test is run–Pearson’s Chi Square, Continuity Correction, Likelihood Ratio, Fisher’s Exact Test, and Linear by Linear Association. A McNemar test can be specified.

This will be very brief.

The answer to the question, “What test do I use?” is, “It all depends.”

If you have one group you can do the following:

check the reliability.
check the validity.
look at relationships between variables.
predict something from other variables.
look at change across time.
look at scores on one variable measured under different conditions (within group difference)

If you have two groups you can do the following:

compare the two groups on one variable (between group difference).
look at change across time between the two groups
compare two groups on one variable under different conditions (within group difference).

If you have more than two groups, it gets more complicated and I’ll talk about that another day. Most Extension work doesn’t have more than two groups.

So you can see, it all depends. More later.

There are four ways distributions can be compared. Two were mentioned last week–central tendency and dispursion. Central tendency talks about the average value. Dispursion reflect the distribution’s variability.

The other two are skewness and kurtosis.

Skew (or skewness) is a measure of lack of symmetry. Skew occurs when one end (or tail) of the distribution sticks out farther than the other. Like this:

In this picture, the top image is that of positive skew and the bottom picture is negative skew.

Skew can happen when data are clustered at one end of a distribution like in a test that is too easy or too hard. When the mean is is a larger number (i.e., greater) than the median, the distribution is positively skewed. When the median is is a larger number (i.e., greater) than the mean, the distribution is negatively skewed.

The other characteristic of a distribution is kurtosis.

Kurtosis refers to the overall shape of the distribution relative to its peak. Distributions can be relatively flat, or platykurtic, or they can be relatively peaked, or leptokurtic. This drawing provides a mnemonic to remember those terms:

A normal distribution, the bell curve, is called mesokurtic.

These terms are used to describe a distribution in a report or presentation. When all four characteristics of a distribution are described, central tendency, dispursion, skew, and kurtosis, the reader has a clear picture of the data base. From that point, frequencies and percentages can be reported. Then statistical tests can be performed and reported as well.

Evaluation is an Everyday Activity

Program Evaluation Discussions

Monthly Archives: June 2010

Process, progress, product?

Formative–Summative

One group and one group only

What test do I use

Differing distributions

Contact Info