I’ve been writing for almost a year, 50 some columns.  This week, before the Thanksgiving holiday, I want to share evaluation resources I’ve found useful and for which I am thankful.  Although there are probably others with which I am not familiar, these are ones for which I am thankful.

\

My colleagues at UWEX, University of  Wisconsin Extension Service, Ellen Taylor-Powell, and at Penn State Extension Service,

Nancy Ellen Kiernan,



both have resources that are very useful, easily accessed, clearly written.  Ellen’s can be found at the Quick Tips site and Nancy Ellen’s can be found at her Tipsheets index.  Both Nancy Ellen and Ellen have other links that may be useful as well.  Access their sites through the links above.

Last week, I mentioned the American Evaluation Association.     One of the important structures in AEA is the Topical Interest Groups (or TIGs).  Extension has a TIG called the Extension Education Evaluation which helps organize Extension professionals who are interested or involved in evaluation.  There is a wealth of information on the AEA web site.  about the evaluation profession,  access to the AEA elibrary, links to AEA on Facebook, Twitter, and LinkedIn.  You do NOT have to be a member,  to subscribe to blog, AEA365, which as the name suggests, is posted daily by different evaluators.  Susan Kistler, AEA’s executive director, posts every Saturday.  The November 20 post talks about the elibrary–check it out.

Many states and regions have local AEA affiliates.  For example, OPEN, Oregon Program Evaluators Network, serves southern Washington and Oregon.  It has an all volunteer staff who live mostly in Portland and Vancouver WA.  The AEA site lists over 20 affiliates across the country, many with their own website.  Those websites have information about connecting with local evaluators.

In addition to these valuable resources, National eXtension (say e-eXtension) has developed a community of practice devoted to evaluation and Mike Lambur, eXtension Evaluation and Research Leader, who can be reached at mike.lambur@extension.org. According to the web site, National eXtension “…is an interactive learning environment delivering the best, most researched knowledge from the smartest land-grant university minds across America. eXtension connects knowledge consumers with knowledge providers—experts like you who know their subject matter inside out.”

Happy Thanksgiving.  Be safe.

Recently, I attended the American Evaluation Annual (AEA) conference is San Antonio, TX. And although this is a stock photo, the weather (until Sunday) was like it seems in this photo.  The Alamo was crowded–curious adults, tired children, friendly dogs, etc.  What I learned was that  San Antonio is the only site in the US where there are five Spanish missions within 10 miles of each other.  Starting with the Alamo (the formal name is San Antonio de Valero), as you go south out of San Antonio, the visitor will experience the Missions Concepcion, San Juan, San Jose, and Espada, all of which will, at some point in the future, be on the Mission River Walk (as opposed to the Museum River Walk).  The missions (except the Alamo) are National Historic Sites.  For those of you who have the National Park Service Passport, site stamps are available.

AEA is the professional home for evaluators.  The AEA has approximately 6000 members and about 2500 of them attended the conference, called Evaluation 2010.  This year’s president, Leslie Cooksy, identified “Evaluation Quality”

as the theme for the conference.  Leslie says in her welcome letter, “Evaluation quality is an umbrella theme, with room underneath for all kinds of ideas–quality from the perspective of different evaluation approaches, the role of certification in quality assurance, metaevaluation and the standards used to judge quality…”  Listening to the plenary sessions, attending the concurrent sessions, networking with long time colleagues, I got to hear so many different perspectives on quality.

In the closing plenary, Hallie Preskill, 2007 AEA president, was asked to comment on the themes she heard throughout the conference.  She used mind mapping (a systems tool) to quickly and (I think) effectively organize the value of AEA.  She listed seven main themes:

  1. Truth
  2. Perspectives
  3. Context
  4. Design and methods
  5. Representation
  6. Intersections
  7. Relationships

Although she lists, context as a separate theme, I wonder if evaluation quality is really contextual first and then these other things.

Hallie listed sub themes under each of these topics:

  1. What is (truth)?  Whose (truth)?  How much data is enough?
  2. Whose (perspectives)?  Cultural (perspectives).
  3. Cultural (context). Location (context).  Systems (context).
  4. Multiple and mixed (methods).  Multiple case studies.  Stories.  Credible.
  5. Diverse (representation).  Stakeholder (representation).
  6. Linking (intersections).  Interdisciplinary (intersections).
  7. (Relationships) help make meaning.  (Relationships) facilitate quality.   (Relationships) support use.  (Relationships) keep evaluation alive.

Being a member of AEA is all this an more.  Membership is affordable ($80.00, regular; $60.00 for joint membership with the Canadian Evaluation Society; and $30.00 for full time students).  Benefits are worth that and more.  The conference brings together evaluators from all over.  AEA is quality.

While I discussing evaluation in general earlier this week, the colleague with whom I was conversing asked me how data from a post/pre evaluation form are analyzed.  I pondered this for a nanosecond and said change scores…one would compute the difference between the post ranking and the pre ranking and subject that change to some statistical test.  “What test?” my colleague asked.

So, today’s post is on what test  and why?

First, you need to remember that the post/pre data are related response.  SPSS uses the label “paired samples” or “2-related samples” and those labels are used with a parametric test and a non-parametric test, respectively for responses from the same person (two related responses).

Parametric tests (like the t-test) are based on the assumption that the data are collected from a normal distribution (i.e., bell shaped distribution), a distribution based on known parameters (i.e., means and standard deviation).

Non-parametric tests (like the Wilcoxon  or the McNemar test) do not make assumptions about the population distribution.  Instead, these tests rank the data from low to high and then analyze the ranks.  Some times these tests are known as distribution-free tests because the parameters of the population are not known.   Extension professionals work with populations where parameters are not known most of the time.

If you KNOW (reasonably) that the population’s distribution approximates a normal bell curve, choose a parametric test–in the case of post/pre, that would be a t-test, because the responses are related.

You need to use a non-parametric test if the following conditions are met:

  • the response is a rank or a score and the distribution is not normal;
  • some values are “out of range”–if someone says 11 on a scale of 1 – 10;
  • the data are measurements (like a post/pre) and you are sure the distribution is NOT normal;
  • you don’t have data from a previous sample with which to compare the current sample; or
  • you have a small sample size (statistical tests to test for normality don’t work with small samples).

The last criteria is the one to remember.

If you have a large sample, it doesn’t matter if the distribution is normal because the parametric test is robust enough to ignore the distribution.  The only caveat is determining what a “large sample” is.  One source I read says, “Unless the population distribution is really weird, you are probably safe choosing a parametric test when there are at least two dozen data points in each group.”  That means at least 24 data point in each group.  If the post/pre evaluation has six questions and each question is answered by 12 people both post and pre, each question has only 12 data points–12 post; 12 pre.  You can’t lump the questions (6) and multiply by the number of people (12) by post and pre (2).  Each question is viewed as a separate set of data points. My statistics professor always insisted on a sample size of 30 to have enough power to determine a difference if a difference exists.

If you have a large sample and use a non-parametric test, the test are slightly less powerful than a parametric test used with a large sample.  To see what the difference is, use a t-test and a Wilcoxon test to analyze one question on post/pre and see what the difference is.  Won’t be much.

If you have a small sample and you use a parametric test with a distribution that is NOT normal, the probability value may be inaccurate.  Again run both tests to see the difference.  You want to use the test with the most conservative probability value (0.0001 is more conservative than 0.001).

If you have a small sample and you use a non-parametric with a normal distribution, the probability value may be too high because the non-parametric test lacks power to determine a difference.  Again, run the tests to see the difference.  Choose the test that is more conservative.

My experience is that using a non-parametric test for much of the analyses done with data from Extension-based projects provides a more realistic analysis.

Next week I”ll be attending the American Evaluation Association Annual meeting in San Antonio, TX. I’ll be posting when I return on November 15.

I’ve been reminded recently about Kirkpatrick’s evaluation model.

Donald L. Kirkpatrick (1959) developed a four level model used primarily for evaluating training.  This model is still used extensively in the training field and is espoused by ASTD, the American Society of Training and Development.

It also occurred to me that Extension conducts a lot of training from pesticide handling to logic model use and that Kirkpatrick’s model is one that isn’t talked about a lot in Extension–at least I don’t use it as a reference.  And that may not be a good thing, given that Extension professionals are conducting training a lot of the time.

Kirkpatrick’s four levels are these:

  1. Reaction:  To what degree participants react favorably to the training
  2. Learning:  To what degree participants acquire intended knowledge, skills, and attitudes based on the participation in learning event
  3. Application:  To what degree do participants apply what they learned during training on the job
  4. Impact:  To what degree targeted outcomes occur, as a result of the learning event(s) and subsequent reinforcement

Sometimes it is important to know what the affective reaction our participants are having during and at the end of  the training.  I would call this a formative evaluation and formative evaluation is often used for program improvement.  Reactions are a way that participants can tell the Extension professional how things are going–i.e., what their reaction is–using a continuous feedback mechanism.  Extension professionals can use this to change the program, revise their approach, adjust the pace, etc.  The feedback mechanism doesn’t have to be constant–which is often the interpretation of “continuous”.  Soliciting feedback at natural breaks, using a show of hands, is often enough for on-the- spot adjustments.  It is a form of formative evaluation as it is an “in-process” evaluation.  Kirkpatrick’s level one (reaction)  doesn’t provide a measure of outcomes or impacts.  I might call it a “happiness” evaluation or a satisfaction evaluation–tells me only what is the participants’ reaction.  Outcome evaluation–to determine a measure of effectiveness–happens in a later level and is another approach to evaluation which I would call summative–although, Michael Patton might call developmental in a training situation where the outcome is always moving, changing, developing.

Kirkpatrick, D. L. (1959) Evaluating Training Programs, 2nd ed., Berrett Koehler, San Francisco.

Kirkpatrick, D. L. (comp.) (1998) Another Look at Evaluating Training Programs, ASTD, Alexandria, USA.

For more information about the Kirkpatrick model, see their site, Kirkpatrick Partners.

After experiencing summer in St. Petersburg,  FL, then peak color in Bar Harbor and Arcadia National Park, ME,  I am once again reminded of how awesome these United States truly are.  Oregon holds its own special brand of beauty and it is nice to be back home.  Evaluation was everywhere on this trip.

A recent  AEA365 post talks about systems thinking and evaluating educational programs.  Bells went off with me because Extension DOES educational programs and does them in existing systems.  Often, Extension professionals neglect the systems aspect of their programming and attempt to implement the program in isolation.  In today’s complex world, isolation isn’t possible.  David Bella, an Emeritus professor at OSU uses the term “complex messy systems”.  I think that clearly characterizes what Extension faces in developing programs. The AEA365 post has some valuable points for Extension professionals to remember (see the link for more details):

1.  Build relationships with experts from across disciplines.

2.  Ensure participation from stakeholders across the entire evaluated entity.

3.  Create rules of order to guide the actions of the evaluation team.

These are points for Extension professionals to keep in mind as they develop their programs.  By keeping them in mind and using them,  Extension professionals can strengthen their programs.  More and more, extension programs are multi-site as well as multi-discipline.   Ask yourself:   What part of the program is missing because of failure to consult across disciplines? or What part of the program won’t be recognized because of failure to include as many stakeholders as possible in helping to design the evaluation?  Who will know better what makes an effective program than those individuals in the target audience?  Helping everyone know what the expectations are helps systems work, change, and grow.

It is also important consider the many contextual factors.  When working in community-based programs, Extension professionals need to develop partnerships and those partnerships need to work in agreement.  This is another example Extension work and evaluation of that work occurs withing an existing system.

Last Wednesday, I had the privilege to attend the OPEN (Oregon Program Evaluators Network) annual meeting.

Michael Quinn Patton, the key note speaker, talked about  developmental evaluation and

utilization focused evaluation.  Utilization Focused Evaluation makes sense–use by intended users.

Developmental Evaluation, on the other hand, needs some discussion.

The way Michael tells the story (he teaches a lot through story) is this:

“I had a standard 5-year contract with a community leadership program that specified 2 1/2 years of formative evaluation for program improvement to be followed by 2 1/2 years of summative evaluation that would lead to an overall decision about whether the program was effective. ”   After 2 1/2 years, Michael called for the summative evaluation to begin.  The director  was adamant, “We can’t stand still for 2 years.  Let’s keep doing formative evaluation.  We want to keep improving the program… (I) Never (want to do a summative evaluation)”…if it means standardizing the program.  We want to keep developing and changing.”  He looked at Michael sternly, challengingly.  “Formative evaluation!  Summative evaluation! Is that all you evaluators have to offer?” Michael hemmed and hawed and said, “I suppose we could do…ummm…we could do…ummm…well, we might do, you know…we could try developmental evaluation!” Not knowing what that was, the director asked “What’s that?”  Michael responded, “It’s where you, ummm, keep developing.”  Developmental evaluation was born.

The evaluation field offered, until now, two global approaches to evaluation, formative for program improvement and summative to make an overall judgment of merit and worth.  Now, developmental evaluation (DE) offers another approach, one which is relevant to social innovators looking to bring about major social change.  It takes into consideration systems theory, complexity concepts, uncertainty principles,  nonlinearity, and emergence.  DE acknowledges that resistance and push back are likely when change happens.  Developmental evaluation recognized that change brings turbulence and suggests ways that “adapts to the realities of complex nonlinear dynamics rather than trying to impose order and certainty on a disorderly and uncertain world” (Patton, 2011).  Social innovators recognize that outcomes will emerge as the program moves forward and to predefine outcomes limits the vision.

Michael has used the art of Mark M. Rogers to illustrate the point.  The cartoon has two early humans, one with what I would call a wheel, albeit primitive, who is saying, “No go.  The evaluation committee said it doesn’t meet utility specs.  They want something linear, stable, controllable, and targeted to reach a pre-set destination.  They couldn’t see any use for this (the wheel).”

For Extension professionals who are delivering programs designed to lead to a specific change, DE may not be useful.  For those Extension professionals who vision something different, DE may be the answer.  I think DE is worth a look.

Look for my next post after October 14; I’ll be out of the office until then.

Patton, M. Q. (2011) Developmental Evaluation. NY: Guilford Press.

Ryan asks a good question: “Are youth serving programs required to have an IRB for applications, beginning and end-of-year surveys, and program evaluations?”  His question leads me to today’s topic.

The IRB is concerned with “research on human subjects”.  So you ask, When is evaluation a form research?

It all depends.

Although evaluation methods have evolved from  social science research, there are important distinctions between the two.

Fitzpatrick, Sanders, and Worthen list five differences between the two and it is in those differences that one must consider IRB assurances.

These five differences are:

  1. purpose,
  2. who sets the agenda,
  3. generalizability of results,
  4. criteria, and
  5. preparation.

Although these criteria differ for evaluation and research, there are times when evaluation and research overlap.    If the evaluation study adds to knowledge in a discipline or research informs our judgments about a program, then the distinctions are blurred and a broader view of the inquiry is needed and possibly an IRB approval.

IRB considers children a vulnerable population.  Vulnerable populations require IRB protection.  Evaluations with vulnerable populations may need IRB assurances.  IF you have a program that involves children AND you plan to use the program activities as the basis of an effectiveness evaluation (ass opposed to program improvement) AND use that evaluation as scholarship you will need IRB.

Ryan asks “what does publish mean”.  That question takes us to what is scholarship.  One definition of scholarship is that scholarship is creative work, that is validated by peers and communicated.  Published means communicating to peers in a peer reviewed journal or professional meeting not, for example, in a press release.

How do you decide if your evaluation needs IRB?  How do you decide if your evaluation is research or not?   Start with the purpose of your inquiry.  Do you want to add knowledge in the field?   Do you want to see if what you are doing is applicable in other settings?  Do you want others to know what you’ve done and why?  They you want to communicate this.  In academics, that means publishing it in a peer reviewed journal or presenting it at a professional meeting.  And to do that and use the information provided you by your participants who are human subjects, you will need IRB assurance that they are protected.

Every IRB is different.  Check with your institution.  Most work done by Extension professionals falls under the category of “exempt from full board review”.  It is the shortest review and the least restrictive.  Vulnerable populations, audio and/or video taping, or asking sensitive questions typically is categorized as expedited, a more stringent review than the “exempt” category, which takes a little longer.  IF you are working with vulnerable populations and asking for sensitive information,  doing an invasive procedure, or involving participants in something that could be viewed as coercive, then the inquiry will probably need full board review (which takes the longest turn around time.

September 25 – October 2 is Banned Book Week.

All of the books shown below have been or are banned.

and the American Library Association has once again published a list of banned or challenged books.  The September issue of the AARP Bulletin listed 50 banned books.  The Merriam Webster Dictionary was banned in a California elementary school in January 2010.

Yes, you say, so what?  How does that relate to program evaluation?

Remember the root of the work “evaluation” is value.  Someplace in the United States, some group used some criteria to “value” (or not) a book– to lodge a protest, successfully (or not), to remove a book from a library, school, or other source.  Establishing a criteria means that evaluation was taking place.  In this case, those criteria included being “too political,” having “too much sex,” being “irreligious,” being “socially offensive,” or some other criteria.   Some one, some place, some where has decided that the freedom to think for your self, the freedom to read, the importance of the First Amendment, the importance of free and open access to information are not important parts of our rights and they used evaluation to make that decision.

Although I don’t agree with censorship–I agree with the right that a person has to express her or his opinion as guaranteed by the First Amendment.  Yet in expressing an opinion, especially an evaluative opinion, an individual has a responsibility to express that opinion without hurting other people or property; to evaluate responsibly.

To aid evaluators to evaluate responsibly, the The American Evaluation Association has developed a set of five guiding principles for evaluators and even though you may not consider yourself a professional evaluator, considering these principals when conducting your evaluations is important and responsible.  The Guiding Principles are:

A. Systematic Inquiry: Evaluators conduct systematic, data-based inquiries;

B. Competence: Evaluators provide competent performance to stakeholders;

C. Integrity/Honesty: Evaluators display honesty and integrity in their own behavior, and attempt to ensure the honesty and integrity of the entire evaluation process;

D.  Respect for People:  Evaluators respect the security, dignity, and self-worth of respondents, program participants, clients, and other evaluation stakeholders; and

E. Responsibilities for General and Public Welfare: Evaluators articulate and take into account the diversity of general and public interests and values that may be related to the evaluation.

I think free and open access to information is covered by principle D and E.  You may or may not agree with the people who used evaluation to challenge a book and in doing so used evaluation.  Yet, as someone who conducts evaluation, you have a responsibility to consider these principles, making sure that your evaluations respect people and are responsible for general and public welfare (in addition to employing systematic inquiry, competence, and integrity/honesty).  Now–go read a good (banned) book!

A faculty member asked me how does one determine impact from qualitative data.  And in my mail box today was a publication from Sage Publishers inviting me to “explore these new and best selling qualitative methods titles from Sage.”

Many Extension professionals are leery of gathering data using qualitative methods.  “There is just too much data to make sense of it,” is one complaint I often hear.  Yes, one characteristic of qualitative data is the rich detail that usually results. (Of course is you are only asking closed ended questions resulting in Yes/No, the richness is missing.)  Other complaints include “What do I do with the data?” “How do I draw conclusions?”  “How do I report the findings?”  And as a result, many Extension professionals default to what is familiar–a survey.  Surveys, as we have discussed previously, are easy to code, easy to report (frequencies and percentages), and difficult to write well.

The Sage brochure provides resources to answer some of these questions.

Michael Patton’s 3rd edition of Qualitative Research and Evaluation Methods “…contains hundreds of examples and stories illuminating all aspects of qualitative inquiry…it offers strategies for enhancing quality and credibility of qualitative findings…and providing detailed analytical guidelines.”  Michael is the keynote speaker for the Oregon Program Evaluator Network (OPEN) fall conference where he will be talking about his new book, Developmental Evaluation. If you are in Portland, I encourage you to attend.  (For more information, see:

http://www.oregoneval.org/program/

Another reference I just purchased is Bernard and Ryan’ s volume, Analyzing Qualitative Data. This book is a systematic approach to making sense out of words. It, too, is available from Sage.

What does all this have to do with a analyzing a conversation?  A conversation is qualitative data.  It is made up of words.  Knowing what to do with those words will provide evaluation data that is powerful.  My director is forever saying the story is what legislators want to hear.  Stories are qualitative data.

One of the most common forms of conversation that Extension professionals use is focus groups.  It is a guided, structured, and focused conversation.  It can yield a wealth of information if the questions are well crafted, if those questions have been piloted tested, and the data are analyzed in a meaningful way.  There are numerous ways to analyze qualitative data (cultural domain analysis, KWIC analysis, discourse analysis, narrative analysis, grounded theory, content analysis, schema analysis, analytic induction and qualitative comparative analysis, and ethnographic decision models) all of which are discussed in the above mentioned reference.  Deciding which will best work with the gathered qualitative data is a decision only the principal investigator can make.  Comfort and experience will enter into that decision.  Keep in mind qualitative data can be reduced to numbers; numbers cannot be exploded to capture the words from which they came.

One response I got for last week’s query was about on-line survey services.  Are they reliable?  Are they economical?  What are the design limitations?  What are the question format limitations?

Yes.  Depends.  Some.  Not many.

Let me take the easy question first:  Are they economical?

Depends.  Cost of postage for paper survey (both out and back) vs. the time it takes to enter questions in system.  Cost of system vs. length of survey.  These are things to consider.

Because most people have access to email today,  using an on-line survey service is often the easiest and most economical way to distribute an evaluation survey.  Most institutional review boards view an on-line survey like a mail survey and typically grant a waiver of documentation of informed consent.  The consenting document is the entry screen and often an agree to participate question is included on that screen.

Are they valid and reliable?

Yes, but…The old adage “Garbage in, garbage out” applies here.  Like a paper survey, and internet survey is only as good as the survey questions.  Don Dillman, in his third edition “Internet, mail, and mixed-mode surveys” (co-authored with Jolene D.  Smyth and Leah Melani Christian), talks about question development.  Since he wrote the book (literally), I use this resource a lot!

What are the design limitations?

Some limitations apply…Each online survey service is different.  The most common service is Survey Monkey (www.surveymonkey.com).  The introduction to Survey Monkey says, “Create and publish online surveys in minutes, and view results graphically and in real time.”  The basic account with Survey Monkey is free.  It has limitations (number of questions [10]; limited number of question formats [15]; number of responses [100]). And you can upgrade to the Pro or Unlimited  for a subscription fee ($19.95/mo or $200/annually, respectively).  There are others.  A search using “survey services” returns many options such as Zoomerang or InstantSurvey.

What are the question format limitations?

Not many–both open-ended and closed ended questions can be asked.  Survey Monkey has 15 different formats from which to choose (see below).  I’m sure there may be others; this list covers most formats.

  • Multiple Choice (Only one Answer)
  • Multiple Choice (Multiple Answers)
  • Matrix of Choices (Only one Answer per Row)
  • Matrix of Choices (Multiple Answers per Row)
  • Matrix of Drop-down Menus
  • Rating Scale
  • Single Textbox
  • Multiple Textboxes
  • Comment/Essay Box
  • Numerical Textboxes
  • Demographic Information (US)
  • Demographic Information (International)
  • Date and/or Time
  • Image
  • Descriptive Text

Oregon State University has an in-house service sponsored by the College of Business (BSG–Business Survey Groups).  OSU also has an institutional account with Student Voice, an on-line service designed initially for learning assessment which I have found useful for evaluations.  Check your institution for options available.  For your next evaluation that involves a survey, think electronically.