At a loss for what to write, I once again went to one of my favorite books, Michael Scriven’s ScrivenEvaluation Thesaurus Scriven book cover. This time when I opened the volume randomly, I came upon the entry for meta-evaluation. This is a worthy topic, one that isn’t addressed often. So this week, I’ll talk about meta-evaluation and quote Scriven as I do.

First, what is meta-evaluation? This is an evaluation approach which is the evaluation of evaluations (and “indirectly, the evaluation of evaluators”). Scriven suggests the application of an evaluation-specific checklist or a Key Evaluation Checklist (KEC) (p. 228). Although this approach can be used to evaluate one’s own work, the results are typically unreliable which implies (if one can afford it) to use an independent evaluator to conduct a meta-evaluation of your evaluations.

Then, Scriven goes on to say the following key points:

  • Meta-evaluation is the professional imperative of evaluation;
  • Meta-evaluation can be done formatively or summatively or both; and
  • Use the KEC to generate a new evaluation OR apply the checklist to the original evaluation as a product.

He lists the parts a KEC involved in a meta evaluation; this process includes 13 steps (pp. 230-231).

He gives the following reference:

Stufflebeam, D. (1981). Meta-evaluation: Concepts, standards, and uses. In R. Berk (Ed.), Educational evaluation methodology: The state of the art. Baltimore, MD: Johns Hopkins.


Last week I spoke about thinking like an evaluator by identifying the evaluative questions that you face daily.  They are endless…Yet, doing this is hard, like any new behavior.  Remember when you first learned to ride a bicycle?  You had to practice before you got your balance.  You had to practice a lot.  The same is true for identifying the evaluative questions you face daily.

So you practice, maybe.  You try to think evaluatively.  Something happens along the way; or perhaps you don’t even get to thinking about those evaluative questions.  That something that interferes with thinking or doing is resistance.  Resistance is a Freudian concept that means that you directly or indirectly refuse to change your behavior.   You don’t look for evaluative questions.  You don’t articulate the criteria for value.  Resistance usually occurs with anxiety about a new and strange situation.  A lot of folks are anxious about evaluation–they personalize the process.  And unless it is personnel evaluation, it is never about you.  It is all about the program and the participants in that program.

What is interesting (to me at least) is that there is resistance at many different levels–the evaluator, the participant, the stakeholder  (which may include the other two levels as well).  Resistance may be active or passive.  Resistance may be overt or covert.  I’ve often viewed resistance as a 2×2 diagram.   The rows are active or passive; the columns are overt or covert.  So combining labels, resistance can be active overt, active covert, passive overt, passive covert.  Now I know this is an artificial and  socially constructed idea and may be totally erroneous.  This approach helps me to make sense out of what I see when I go to meetings to help a content team develop their program and try to introduce (or not) evaluation in the process.  I imagine you have seen examples of these types of resistance–maybe you’ve even demonstrated them.  If so, then you are in good company–most people have demonstrated all of these types of resistance.

I bring up the topic of resistance now for two reasons.

1) Because I’ve just started a 17-month long evaluation capacity building program with 38 participants.  Some of those participants were there because they were told to be there, and let me know their feelings about participating–what kind of resistance could they demonstrate?  Some of those participants are there because they are curious and want to know–what kind of resistance could that be?  Some of the participants just sat there–what kind of resistance could that be?  Some of the participants did anything else while sitting in the program–what kind of resistance could that be? and

2) Because I will be delivering a paper on resistance and evaluation at the annual American Evaluation Association meeting in November.  This is helping me organize my thoughts.

I would welcome your thoughts on this complex topic.

I was talking with a colleague about evaluation capacity building (see last week’s post) and the question was raised about thinking like an evaluator.  Got me thinking about the socialization of professions and what has to happen to build a critical mass of like minded people.

Certainly, preparatory programs in academia conducted by experts, people who have worked in the field a long time–or at least longer than you starts the process.  Professional development helps–you know, attending meetings where evaluators meet (like the upcoming AEA conference, U. S. regional affiliates [there are many and they have conferences and meetings, too], and international organizations [increasing in number–which also host conferences and professional development sessions]–let me know if you want to know more about these opportunities).  Reading new and timely literature  on evaluation provides insights into the language.  AND looking at the evaluative questions in everyday activities.  Questions such as:  What criteria?  What  standards?  Which values?  What worth? Which decisions?

The socialization of evaluators happens because people who are interested in being evaluators look for the evaluation questions in everything they do.  Sometimes, looking for the evaluative question is easy and second nature–like choosing a can of corn at the grocery store; sometimes it is hard and demands collaboration–like deciding on the effectiveness of an educational program.

My recommendation is start with easy things–corn, chocolate chip cookies, wine, tomatoes; move to harder things with more variables–what to wear when and where, or whether to include one group or another .  The choices you make  will all depend upon what criteria is set, what standards have been agreed upon, and what value you place on the outcome or what decision you make.

The socialization process is like a puzzle, something that takes a while to complete, something that is different for everyone, yet ultimately the same.  The socialization is not unlike evaluation…pieces fitting together–criteria, standards, values, decisions.  Asking the evaluative questions  is an ongoing fluid process…it will become second nature with practice.

Hopefully, the technical difficulties with images is no longer a problem and I will be able to post the answers to the history quiz and the post I had hoped to post last week.  So, as promised, here are the answers to the quiz I posted the week of July 5.  The keyed responses are in BOLD

1.  Michael Quinn Patton, author of Utilization-Focused Evaluation and the new book, Developmental Evaluation and the classic Qualitative Evaluation and Research Methods .

2.   Michael Scriven is best known for his concept of formative and summative evaluation. He has also advocated that evaluation is a transdiscipline.  He is the author of the Evaluation Thesaurus .

3. Hallie Preskill is the co-author (with Darlene Russ-Eft) of Evaluation Capacity Building

4. Robert E. Stake has advanced work in case study and is the author of the book Multiple Case Study and The Art of Case Study Research.

5. David M. Fetterman is best known for his advocacy of empowerment evaluation and the book of that name, Foundations of Empowerment Evaluation .

6. Daniel Stufflebeam developed the CIPP (context input process product) model which is discussed in the book Evaluation Models .

7. James W. Altschuldt is the go-to person for needs assessment.  He is the editor of the Needs Assessment Kit (or everything you wanted to know about needs assessment and didn’t know where to find the answer).  He is also the co-author with Bell Ruth Witkin of two needs assessment books,  and  .

8. Jennifer C. Greene, the current President of the American Evaluation Association, and the author of a book on mixed methods .

9. Ernest R. House is a leader in the work of evaluation policy and is the author of  an evaluation novel,  Regression to the Mean   .

10. Lee J. Cronbach is a pioneer in education evaluation and the reform of that practice.  He co-authored with several associates the book, Toward Reform of Program Evaluation .

11.  Ellen Taylor-Powell, the former Evaluation Specialist at University of Wisconsin Extension Service and is credited with developing the logic model later adopted by the USDA for use by the Extension Service.  To go to the UWEX site, click on the words “logic model”.

12. Yvonna Lincoln, with her husband Egon Guba (see below) co-authored the book Naturalistic Inquiry  . She is the currently co-editor (with Norman K. Denzin) of the Handbook of Qualitative Research .

13.   Egon Guba, with his wife Yvonna Lincoln, is the co-author of 4th Generation Evaluation.

14. Blaine Worthen has championed certification for evaluators.  He wit h Jody L. Fitzpatrick and James
R. Sanders have co-authored Program Evaluation: Alternative Approaches and Practical Guidelines.

15.  Thomas A. Schwandt, a philosopher at heart who started as an auditor, has written extensively on evaluation ethics. He is also the co-author (with Edward S. Halpern) of Linking Auditing and Metaevaluation.

16.   Peter H. Rossi, co-author with Howard E. Freeman and Mark E. Lipsey, wrote Evaluation: A Systematic Approach , and is a pioneer in evaluation research.

17. W. James Popham, a leader in educational evaluation, and authored the volume, Educational Evaluation

18. Jason Millman was a pioneer of teacher evaluation and author of  Handbook of Teacher Evaluation

19.  William R. Shadish co-edited (with Laura C. Leviton and Thomas Cook) of Foundations of Program Evaluation: Theories of Practice . His work in theories of evaluation practice earned him the Paul F. Lazarsfeld Award for Evaluation Theory, from the American Evaluation Association in 1994.

20.   Laura C. Leviton (co-editor with Will Shadish and Tom Cook–see above) of Foundations of Program Evaluation: Theories of Practice has pioneered work in participatory evaluation.



Although I’ve only list 20 leaders, movers and shakers, in the evaluation field, there are others who also deserve mention:  John Owen, Deb Rog, Mark Lipsey, Mel Mark, Jonathan Morell, Midge Smith, Lois-Ellin Datta, Patricia Rogers, Sue Funnell, Jean King, Laurie Stevahn, John, McLaughlin, Michale Morris, Nick Smith, Don Dillman, Karen Kirkhart, among others.

If you want to meet the movers and shakers, I suggest you attend the American Evaluation Association annual meeting.  In 2011, it will be held in Anaheim CA, November 2 – 5; professional development sessions are being offered October 31, November 1 and 2, and also November 6.  More conference information can be found here.



Those of you who read this blog know a little about evaluation.  Perhaps you’d like to know more?  Perhaps not…

I think it would be valuable to know who was instrumental in developing the profession to the point it is today; hence, a little history.  This will be fun for those of you who don’t like history.  It will be a matching game.  Some of these folks have been mentioned in previous posts.  I’ll post the keyed responses next week.

Directions:  Match the  name with the evaluation contribution.  I’ve included photos so you know who is who, who you can put with a name and a contribution.

1.  2.  

3. 4.

5. 6.

7. 8.

9. 10.

11. 12.

13.   14. 15. 

16.   17. 18.

19. 20.  



A.  Michael Scriven                1.  Empowerment Evaluation

B.  Michael Quinn Patton     2.  Mixed Methods

C.  Blaine Worthen                 3.  Naturalistic Inquiry

D.  David Fetterman              4.  CIPP

E.  Thomas Schwandt            5. Formative/Summative

F.  Jennifer Greene                  6. Needs Assessment

G.  James W. Altschuld          7.  Developmental Evaluation

H.  Ernie House                          8.  Case study

I.   Yvonna Lincoln                    9.  Fourth Generation Evaluation

J.  Egon Guba                            10. Evaluation Capacity Building

K.  Lee J. Cronbach                   11.  Evaluation Research

L.  W. James Popham               12.  Teacher Evaluation

M.  Peter H. Rossi                       13.  Logic Models

N.  Hallie Preskill                       14.  Educational Evaluation

O.  Ellen Taylor-Powell            15.  Foundations of Program Evaluation

P.  Robert Stake                           16. Toward Reform of Program Evaluation

Q.  Dan Stufflebeam                  17. Participatory Evaluation

R.  Jason Millman                      18. Evaluation and Policy

S.  Will Shadish                           19. Evaluation and epistomology

T.  Laura Leviton                        20. Evaluation Certification


There are others more recent who have made contributions.These represent the folks who did seminal work that built the profession.  It also includes some more recent thinkers.  Have fun.

We recently held Professional Development Days for the Division of Outreach and Engagement.  This is an annual opportunity for faculty and staff in the Division to build capacity in a variety of topics.  The question this training posed was evaluative:

How do we provide meaningful feedback?

Evaluating a conference or a multi-day, multi-session training is no easy task.  Gathering meaningful data is a challenge.  What can you do?  Before you hold the conference (I’m using the word conference to mean any multi-day, multi-session training), decide on the following:

  • Are you going to evaluate the conference?
  • What is the focus of the evaluation?
  • How are you going to use the results?

The answer to the first question is easy:  YES.  If the conference is an annual event (or a regular event), you will want to have participants’ feedback of their experience, so, yes, you will evaluate the conference. Look at a Penn State Tip Sheet 16 for some suggestions.  (If this is a one time event, you may not; though as an evaluator, I wouldn’t recommend ignoring evaluation.)

The second question is more critical.  I’ve mentioned in previous blogs the need to prioritize your evaluation.  Evaluating a conference can be all consuming and result in useless data UNLESS the evaluation is FOCUSED.  Sit down with the planners and ask them what they expect to happen as a result of the conference.  Ask them if there is one particular aspect of the conference that is new this year.  Ask them if feedback in previous years has given them any ideas about what is important to evaluate this year.

This year, the planners wanted to provide specific feedback to the instructors.  The instructors had asked for feedback in previous years.  This is problematic if planning evaluative activities for individual sessions is not done before the conference.  Nancy Ellen Kiernan, a colleague at Penn State, suggests a qualitative approach called a Listening Post.  This approach will elicit feedback from participants at the time of the conference.  This method involves volunteers who attended the sessions and may take more persons than a survey.  To use the Listening Post, you must plan ahead of time to gather these data.  Otherwise, you will need to do a survey after the conference is over and this raises other problems.

The third question is also very important.  If the results are just given to the supervisor, the likelihood of them being used by individuals for session improvement or by organizers for overall change is slim.  Making the data usable for instructors means summarizing the data in a meaningful way, often visually.  There are several way to visually present survey data including graphs, tables, or charts.  More on that another time.  Words often get lost, especially if words dominate the report.

There is a lot of information in the training and development literature that might also be helpful.  Kirkpatrick has done a lot of work in this area.  I’ve mentioned their work in previous blogs.

There is no one best way to gather feedback from conference participants.  My advice:  KISS–keep it simple and straightforward.

You’ve developed your program.  You think you’ve met a need.  You conduct an evaluation.  Low and behold!  Some of your respondents give you such negative feedback you wonder what program they attended.  Could it really have been your program?

This is the phenomena I call “all of the people all of the time”, which occurs regularly  in evaluating training  programs.  And it has to do with use–what you do with the results of this evaluation.  And you can’t do it–please all of the people all of the time, that is.  There will always be some sour grapes.  In fact, you will probably have more negative comments than positive comments.  People who are upset want you to know; people are happy are just happy.

Now, I’m sure you are really confused.  Good.  At least I’ve got your attention and maybe you’ll read to the end of today’s post.

You have seen this scenario:  You ask the participants for formative data so that you can begin planning the next event or program.  You ask about the venue, the time of year, the length of the conference, the concurrent offerings, the plenary speakers.  Although some of these data are satisfaction data (the first level, called Reaction,  in Don Kirkpatrick’s training model and the Reaction category in Claude Bennett’s TOPs Hierarchy [see diagram]

they are important part of formative evaluation; an important part of program planning.  You are using the evaluation report.  That is important.  You are not asking if the participants learned something.  You are not asking if they intend to change their behavior.  You are not asking about what conditions have changed.  You only want to know about their experience in the program.

What do you do with the sour grapes?  You could make vinegar, only that won’t be very useful and use is what you are after.  Instead, sort the data into those topics over which you have some control and those topics over which you have no control.  For example–you have control over who is invited to be a plenary speaker, if there will be a plenary speaker, how many concurrent sessions, who will teach those concurrent sessions;  you have no control over the air handling at the venue, the chairs at the venue, and probably, the temperature of the venue.

You can CHANGE those topics over which you have control.  Comments say the plenary speaker was terrible.  Do not invite that person to speak again.  Feedback says that the concurrent sessions didn’t provide options for classified staff, only faculty.  Decide the focus of your program and be explicit in the program promotional materials–advertise it explicitly to your target audience.  You get complaints about the venue–perhaps there is another venue; perhaps not.

You can also let your audience know what you decided based on your feedback.  One organization for which I volunteered sent out a white paper with all the concerns and how the organization was addressing them–or not.  It helped the grumblers see that the organization takes their feedback seriously.

And if none of this works…ask yourself: Is it a case of all of the people all of the time?

Although I have been learning about and doing evaluation for a long time, this week I’ve been searching for a topic to talk about.  A student recently asked me about the politics of evaluation–there is a lot that can be said on that topic, which I will save for another day.  Another student asked me about when to do an impact study and how to bound that study.  Certainly a good topic, too, though one that can wait for another post.  Something I read in another blog got me thinking about today’s post.  So, today I want to talk about gathering demographics.

Last week, I mentioned in my TIMELY TOPIC post about the AEA Guiding Principles. Those Principles along with the Program Evaluation Standards make significant contributions in assisting evaluators in making ethical decisions.  Evaluators make ethical decisions with every evaluation.  They are guided by these professional standards of conduct.  There are five Guiding Principles and five Evaluation Standards.  And although these are not proscriptive, they go along way to ensuring ethical evaluations.  That is a long introduction into gathering demographics.

The guiding principle, Integrity/Honesty states thatEvaluators display honesty and integrity in their own behavior, and attempt to ensure the honesty and integrity of the entire evaluation process.”  When we look at the entire evaluation process, as evaluators, we must strive constantly to maintain both personal and professional integrity in our decision making.  One decision we must make involves deciding what we need/want to know about our respondents.  As I’ve mentioned before, knowing what your sample looks like is important to reviewers, readers, and other stakeholders.  Yet, if we gather these data in a manner that is intrusive, are we being ethical?

Joe Heimlich, in a recent AEA365 post, says that asking demographic questions “…all carry with them ethical questions about use, need, confidentiality…”  He goes on to say that there are “…two major conditions shaping the decision to include – or to omit intentionally – questions on sexual or gender identity…”:

  1. When such data would further our understanding of the effect or the impact of a program, treatment, or event.
  2. When asking for such data would benefit the individual and/or their engagement in the evaluation process.

The first point relates to gender role issues–for example are gay men more like or more different from other gender categories?  And what gender categories did you include in your survey?  The second point relates to allowing an individual’s voice to be heard clearly and completely and have categories on our forms reflect their full participation in the evaluation.  For example, does marital status ask for domestic partnerships as well as traditional categories and are all those traditional categories necessary to hear your participants?

The next time you develop a questionnaire that includes demographic questions, take a second look at the wording–in an ethical manner.

I’ve been reminded recently about Kirkpatrick’s evaluation model.

Donald L. Kirkpatrick (1959) developed a four level model used primarily for evaluating training.  This model is still used extensively in the training field and is espoused by ASTD, the American Society of Training and Development.

It also occurred to me that Extension conducts a lot of training from pesticide handling to logic model use and that Kirkpatrick’s model is one that isn’t talked about a lot in Extension–at least I don’t use it as a reference.  And that may not be a good thing, given that Extension professionals are conducting training a lot of the time.

Kirkpatrick’s four levels are these:

  1. Reaction:  To what degree participants react favorably to the training
  2. Learning:  To what degree participants acquire intended knowledge, skills, and attitudes based on the participation in learning event
  3. Application:  To what degree do participants apply what they learned during training on the job
  4. Impact:  To what degree targeted outcomes occur, as a result of the learning event(s) and subsequent reinforcement

Sometimes it is important to know what the affective reaction our participants are having during and at the end of  the training.  I would call this a formative evaluation and formative evaluation is often used for program improvement.  Reactions are a way that participants can tell the Extension professional how things are going–i.e., what their reaction is–using a continuous feedback mechanism.  Extension professionals can use this to change the program, revise their approach, adjust the pace, etc.  The feedback mechanism doesn’t have to be constant–which is often the interpretation of “continuous”.  Soliciting feedback at natural breaks, using a show of hands, is often enough for on-the- spot adjustments.  It is a form of formative evaluation as it is an “in-process” evaluation.  Kirkpatrick’s level one (reaction)  doesn’t provide a measure of outcomes or impacts.  I might call it a “happiness” evaluation or a satisfaction evaluation–tells me only what is the participants’ reaction.  Outcome evaluation–to determine a measure of effectiveness–happens in a later level and is another approach to evaluation which I would call summative–although, Michael Patton might call developmental in a training situation where the outcome is always moving, changing, developing.

Kirkpatrick, D. L. (1959) Evaluating Training Programs, 2nd ed., Berrett Koehler, San Francisco.

Kirkpatrick, D. L. (comp.) (1998) Another Look at Evaluating Training Programs, ASTD, Alexandria, USA.

For more information about the Kirkpatrick model, see their site, Kirkpatrick Partners.

Last Wednesday, I had the privilege to attend the OPEN (Oregon Program Evaluators Network) annual meeting.

Michael Quinn Patton, the key note speaker, talked about  developmental evaluation and

utilization focused evaluation.  Utilization Focused Evaluation makes sense–use by intended users.

Developmental Evaluation, on the other hand, needs some discussion.

The way Michael tells the story (he teaches a lot through story) is this:

“I had a standard 5-year contract with a community leadership program that specified 2 1/2 years of formative evaluation for program improvement to be followed by 2 1/2 years of summative evaluation that would lead to an overall decision about whether the program was effective. ”   After 2 1/2 years, Michael called for the summative evaluation to begin.  The director  was adamant, “We can’t stand still for 2 years.  Let’s keep doing formative evaluation.  We want to keep improving the program… (I) Never (want to do a summative evaluation)”…if it means standardizing the program.  We want to keep developing and changing.”  He looked at Michael sternly, challengingly.  “Formative evaluation!  Summative evaluation! Is that all you evaluators have to offer?” Michael hemmed and hawed and said, “I suppose we could do…ummm…we could do…ummm…well, we might do, you know…we could try developmental evaluation!” Not knowing what that was, the director asked “What’s that?”  Michael responded, “It’s where you, ummm, keep developing.”  Developmental evaluation was born.

The evaluation field offered, until now, two global approaches to evaluation, formative for program improvement and summative to make an overall judgment of merit and worth.  Now, developmental evaluation (DE) offers another approach, one which is relevant to social innovators looking to bring about major social change.  It takes into consideration systems theory, complexity concepts, uncertainty principles,  nonlinearity, and emergence.  DE acknowledges that resistance and push back are likely when change happens.  Developmental evaluation recognized that change brings turbulence and suggests ways that “adapts to the realities of complex nonlinear dynamics rather than trying to impose order and certainty on a disorderly and uncertain world” (Patton, 2011).  Social innovators recognize that outcomes will emerge as the program moves forward and to predefine outcomes limits the vision.

Michael has used the art of Mark M. Rogers to illustrate the point.  The cartoon has two early humans, one with what I would call a wheel, albeit primitive, who is saying, “No go.  The evaluation committee said it doesn’t meet utility specs.  They want something linear, stable, controllable, and targeted to reach a pre-set destination.  They couldn’t see any use for this (the wheel).”

For Extension professionals who are delivering programs designed to lead to a specific change, DE may not be useful.  For those Extension professionals who vision something different, DE may be the answer.  I think DE is worth a look.

Look for my next post after October 14; I’ll be out of the office until then.

Patton, M. Q. (2011) Developmental Evaluation. NY: Guilford Press.