I’ve been writing for almost a year, 50 some columns.  This week, before the Thanksgiving holiday, I want to share evaluation resources I’ve found useful and for which I am thankful.  Although there are probably others with which I am not familiar, these are ones for which I am thankful.


My colleagues at UWEX, University of  Wisconsin Extension Service, Ellen Taylor-Powell, and at Penn State Extension Service,

Nancy Ellen Kiernan,

both have resources that are very useful, easily accessed, clearly written.  Ellen’s can be found at the Quick Tips site and Nancy Ellen’s can be found at her Tipsheets index.  Both Nancy Ellen and Ellen have other links that may be useful as well.  Access their sites through the links above.

Last week, I mentioned the American Evaluation Association.     One of the important structures in AEA is the Topical Interest Groups (or TIGs).  Extension has a TIG called the Extension Education Evaluation which helps organize Extension professionals who are interested or involved in evaluation.  There is a wealth of information on the AEA web site.  about the evaluation profession,  access to the AEA elibrary, links to AEA on Facebook, Twitter, and LinkedIn.  You do NOT have to be a member,  to subscribe to blog, AEA365, which as the name suggests, is posted daily by different evaluators.  Susan Kistler, AEA’s executive director, posts every Saturday.  The November 20 post talks about the elibrary–check it out.

Many states and regions have local AEA affiliates.  For example, OPEN, Oregon Program Evaluators Network, serves southern Washington and Oregon.  It has an all volunteer staff who live mostly in Portland and Vancouver WA.  The AEA site lists over 20 affiliates across the country, many with their own website.  Those websites have information about connecting with local evaluators.

In addition to these valuable resources, National eXtension (say e-eXtension) has developed a community of practice devoted to evaluation and Mike Lambur, eXtension Evaluation and Research Leader, who can be reached at mike.lambur@extension.org. According to the web site, National eXtension “…is an interactive learning environment delivering the best, most researched knowledge from the smartest land-grant university minds across America. eXtension connects knowledge consumers with knowledge providers—experts like you who know their subject matter inside out.”

Happy Thanksgiving.  Be safe.

Recently, I attended the American Evaluation Annual (AEA) conference is San Antonio, TX. And although this is a stock photo, the weather (until Sunday) was like it seems in this photo.  The Alamo was crowded–curious adults, tired children, friendly dogs, etc.  What I learned was that  San Antonio is the only site in the US where there are five Spanish missions within 10 miles of each other.  Starting with the Alamo (the formal name is San Antonio de Valero), as you go south out of San Antonio, the visitor will experience the Missions Concepcion, San Juan, San Jose, and Espada, all of which will, at some point in the future, be on the Mission River Walk (as opposed to the Museum River Walk).  The missions (except the Alamo) are National Historic Sites.  For those of you who have the National Park Service Passport, site stamps are available.

AEA is the professional home for evaluators.  The AEA has approximately 6000 members and about 2500 of them attended the conference, called Evaluation 2010.  This year’s president, Leslie Cooksy, identified “Evaluation Quality”

as the theme for the conference.  Leslie says in her welcome letter, “Evaluation quality is an umbrella theme, with room underneath for all kinds of ideas–quality from the perspective of different evaluation approaches, the role of certification in quality assurance, metaevaluation and the standards used to judge quality…”  Listening to the plenary sessions, attending the concurrent sessions, networking with long time colleagues, I got to hear so many different perspectives on quality.

In the closing plenary, Hallie Preskill, 2007 AEA president, was asked to comment on the themes she heard throughout the conference.  She used mind mapping (a systems tool) to quickly and (I think) effectively organize the value of AEA.  She listed seven main themes:

  1. Truth
  2. Perspectives
  3. Context
  4. Design and methods
  5. Representation
  6. Intersections
  7. Relationships

Although she lists, context as a separate theme, I wonder if evaluation quality is really contextual first and then these other things.

Hallie listed sub themes under each of these topics:

  1. What is (truth)?  Whose (truth)?  How much data is enough?
  2. Whose (perspectives)?  Cultural (perspectives).
  3. Cultural (context). Location (context).  Systems (context).
  4. Multiple and mixed (methods).  Multiple case studies.  Stories.  Credible.
  5. Diverse (representation).  Stakeholder (representation).
  6. Linking (intersections).  Interdisciplinary (intersections).
  7. (Relationships) help make meaning.  (Relationships) facilitate quality.   (Relationships) support use.  (Relationships) keep evaluation alive.

Being a member of AEA is all this an more.  Membership is affordable ($80.00, regular; $60.00 for joint membership with the Canadian Evaluation Society; and $30.00 for full time students).  Benefits are worth that and more.  The conference brings together evaluators from all over.  AEA is quality.

While I discussing evaluation in general earlier this week, the colleague with whom I was conversing asked me how data from a post/pre evaluation form are analyzed.  I pondered this for a nanosecond and said change scores…one would compute the difference between the post ranking and the pre ranking and subject that change to some statistical test.  “What test?” my colleague asked.

So, today’s post is on what test  and why?

First, you need to remember that the post/pre data are related response.  SPSS uses the label “paired samples” or “2-related samples” and those labels are used with a parametric test and a non-parametric test, respectively for responses from the same person (two related responses).

Parametric tests (like the t-test) are based on the assumption that the data are collected from a normal distribution (i.e., bell shaped distribution), a distribution based on known parameters (i.e., means and standard deviation).

Non-parametric tests (like the Wilcoxon  or the McNemar test) do not make assumptions about the population distribution.  Instead, these tests rank the data from low to high and then analyze the ranks.  Some times these tests are known as distribution-free tests because the parameters of the population are not known.   Extension professionals work with populations where parameters are not known most of the time.

If you KNOW (reasonably) that the population’s distribution approximates a normal bell curve, choose a parametric test–in the case of post/pre, that would be a t-test, because the responses are related.

You need to use a non-parametric test if the following conditions are met:

  • the response is a rank or a score and the distribution is not normal;
  • some values are “out of range”–if someone says 11 on a scale of 1 – 10;
  • the data are measurements (like a post/pre) and you are sure the distribution is NOT normal;
  • you don’t have data from a previous sample with which to compare the current sample; or
  • you have a small sample size (statistical tests to test for normality don’t work with small samples).

The last criteria is the one to remember.

If you have a large sample, it doesn’t matter if the distribution is normal because the parametric test is robust enough to ignore the distribution.  The only caveat is determining what a “large sample” is.  One source I read says, “Unless the population distribution is really weird, you are probably safe choosing a parametric test when there are at least two dozen data points in each group.”  That means at least 24 data point in each group.  If the post/pre evaluation has six questions and each question is answered by 12 people both post and pre, each question has only 12 data points–12 post; 12 pre.  You can’t lump the questions (6) and multiply by the number of people (12) by post and pre (2).  Each question is viewed as a separate set of data points. My statistics professor always insisted on a sample size of 30 to have enough power to determine a difference if a difference exists.

If you have a large sample and use a non-parametric test, the test are slightly less powerful than a parametric test used with a large sample.  To see what the difference is, use a t-test and a Wilcoxon test to analyze one question on post/pre and see what the difference is.  Won’t be much.

If you have a small sample and you use a parametric test with a distribution that is NOT normal, the probability value may be inaccurate.  Again run both tests to see the difference.  You want to use the test with the most conservative probability value (0.0001 is more conservative than 0.001).

If you have a small sample and you use a non-parametric with a normal distribution, the probability value may be too high because the non-parametric test lacks power to determine a difference.  Again, run the tests to see the difference.  Choose the test that is more conservative.

My experience is that using a non-parametric test for much of the analyses done with data from Extension-based projects provides a more realistic analysis.

Next week I”ll be attending the American Evaluation Association Annual meeting in San Antonio, TX. I’ll be posting when I return on November 15.