I’ve long suspected I wasn’t alone in the recognition that the term impact is used inappropriately in most evaluation. 

Terry Smutlyo sings a song about impact during an outcome mapping seminar he conducted.  Terry Smutlyo is the Director, Evaluation International Development Research Development Research Center, Ottawa, Canada.  He ought to know a few things about evaluation terminology.  He has two versions of this song, Impact Blues, on YouTube; his comments speak to this issue.  Check it out.

 

Just a gentle reminder to use your words carefully.  Make sure everyone knows what you mean and that everyone at the table agrees with the meaning you use.

 

This week the post is  short.  Terry says it best.

Next week I’ll be at the American Evaluation Association annual meeting in Anaheim, CA, so no post.  No Disneyland visit either…sigh

 

 

A colleague asks for advice on handling evaluation stories, so that they don’t get brushed aside as mere anecdotes.  She goes on to say of the AEA365 blog she read, ” I read the steps to take (hot tips), but don’t know enough about evaluation, perhaps, to understand how to apply them.”  Her question raises an interesting topic.  Much of what Extension does can be captured in stories (i.e., qualitative data)  rather than in numbers (i.e., quantitative data).  Dick Krueger, former Professor and Evaluation Leader (read specialist) at the University of Minnesota has done a lot of work in the area of using stories as evaluation.  Today’s post summarizes his work.

 

At the outset, Dick asks the following question:  What is the value of stories?  He provides these three answers:

  1. Stories make information easier to remember
  2. Stories make information more believable
  3. Stories can tap into emotions.

There are all types of stories.  The type we are interested in for evaluation purposes are organizational stories.  Organizational stories can do the following things for an organization:

  1. Depict culture
  2. Promote core values
  3. Transmit and reinforce the culture
  4. Provide instruction to employees
  5. Motivate, inspire, and encourage

He suggests six common types of organizational stories:

  1. Hero stories  (someone in the organization who has done something beyond the normal range of achievement)
  2. Success stories (highlight organizational successes)
  3. Lessons learned stories (what major mistakes and triumphs teach the organization)
  4. “How it works around here” stories (highlight core organizational values reflected in actual practice
  5. “Sacred bundle” stories (a collection of stories that together depict the culture of an organization; core philosophies)
  6. Training and orientation stories (assists new employees in understanding how the organization works)

To use stories as evaluation, the evaluator needs to consider how stories might be used, that is, do they depict how people experience the program?  Do they understand program outcomes?  Do they get insights into program processes?

You (as evaluator) need to think about how the story fits into the evaluation design (think logic model; program planning).  Ask yourself these questions:  Should you use stories alone?  Should you use stories that lead into other forma of inquiry?  Should you use stories that augment/illustrate results from other forms of inquiry?

You need to establish criteria for stories.  Rigor can be applied to story even though the data are narrative.  Criteria include the following:   Is the story authentic–is it truthful?  Is the story verifiable–is there a trail of evidence back to the source of the story?  Is there a need to consider confidentiality?  What was the original intent–purpose behind the original telling?  And finally, what does the story represent–other people or locations?

You will need a plan for capturing the stories.  Ask yourself these questions:  Do you need help capturing the stories?  What strategy will you use for collecting the stories?  How will you ensure documentation and record keeping?  (Sequence the questions; write them down the type–set up; conversational; etc.)  You will also need a plan for analyzing and reporting the stories  as you, the evaluator,  are responsible for finding meaning.

 

I was talking with a colleague about evaluation capacity building (see last week’s post) and the question was raised about thinking like an evaluator.  Got me thinking about the socialization of professions and what has to happen to build a critical mass of like minded people.

Certainly, preparatory programs in academia conducted by experts, people who have worked in the field a long time–or at least longer than you starts the process.  Professional development helps–you know, attending meetings where evaluators meet (like the upcoming AEA conference, U. S. regional affiliates [there are many and they have conferences and meetings, too], and international organizations [increasing in number–which also host conferences and professional development sessions]–let me know if you want to know more about these opportunities).  Reading new and timely literature  on evaluation provides insights into the language.  AND looking at the evaluative questions in everyday activities.  Questions such as:  What criteria?  What  standards?  Which values?  What worth? Which decisions?

The socialization of evaluators happens because people who are interested in being evaluators look for the evaluation questions in everything they do.  Sometimes, looking for the evaluative question is easy and second nature–like choosing a can of corn at the grocery store; sometimes it is hard and demands collaboration–like deciding on the effectiveness of an educational program.

My recommendation is start with easy things–corn, chocolate chip cookies, wine, tomatoes; move to harder things with more variables–what to wear when and where, or whether to include one group or another .  The choices you make  will all depend upon what criteria is set, what standards have been agreed upon, and what value you place on the outcome or what decision you make.

The socialization process is like a puzzle, something that takes a while to complete, something that is different for everyone, yet ultimately the same.  The socialization is not unlike evaluation…pieces fitting together–criteria, standards, values, decisions.  Asking the evaluative questions  is an ongoing fluid process…it will become second nature with practice.

A colleague asked me what I considered an output in a statewide program we were discussing.  This is a really good example of assumptions and how they can blind side an individual–in this case me.  Once I (figuratively) picked myself up, I proceeded to explain how this terminology applied to the program under discussion.  Once the meeting concluded, I realized that perhaps a bit of a refresher was in order.  Even the most seasoned evaluators can benefit from a reminder every so often.

 

So OK–inputs, outputs, outcomes.

As I’ve mentioned before, Ellen Taylor-Powell, former UWEX Evaluation specialist has a marvelous tutorial on logic modeling.  I recommend you go there for your own refresher.  What I offer you here is a brief (very) overview of these terms.

Logic models whether linear or circular are composed of various focus points.  Those focus points include (in addition to those mention in the title of this post) the situation, assumptions, and external factors.  Simply put, the situation is a what is going on–the priorities, the needs, the problems that led to the program you are conducting–that is program with a small p (we can talk about sub and supra models later).

Inputs are those resources you need to conduct the program. Typically, they are lumped into personnel, time, money, venue, equipment.  Personnel covers staff, volunteers, partners, any stakeholder.  Time is not just your time–also the time needed for implementation, evaluation, analysis, and reporting.  Money (speaks for itself).  Venue is where the program will be held.  Equipment is what stuff you will need–technology, materials, gear, etc.

Outputs are often classified into two parts–first, participants (or target audience) and the second part, activities that are conducted.  Typically (although not always), those activities are counted and are called bean counts..  In the example that started this post, we would be counting the number of students who graduated high school; the number of students who matriculated to college (either 2 or 4 year); the number of students who transferred from 2 year to 4 year colleges; the number of students who completed college in 2 or 4 years; etc.  This bean  count could also be the number of classes offered; the number of brochures distributed; the number of participants in the class; the number of  (fill in the blank).  Outputs are necessary and not sufficient to determine if a program is being effective.  The field of evaluation started with determining bean counts and satisfactions.

Outcomes can be categorized as short term, medium/intermediate term, or long term.  Long term outcomes are often called impacts.  (There are those in the field who would classify impacts as something separate from an outcome–a discussion for another day.)  Whatever you choose to call the effects of your program, be consistent–don’t use the terms interchangeably; it confuses the reader.  What you are looking for as an outcome is change–in learning; in behavior; in conditions.  This change is measured in the target audience–individuals, groups, communities, etc.

I’ll talk about assumptions and external factors another day.  Have a wonderful holiday weekend…the last vestiges of summer–think tomatoes, corn-on-the-cob , state fair, and  a tall cool drink.

 

I started this post the third week in July.  Technical difficulties prevented me from completing the post.  Hopefully, those difficulties are now in the past.

A colleague asked me what can we do when we can’t measure actual behavior change in our evaluations.  Most evaluations can capture knowledge change (short term outcomes); some evaluations can capture behavior change (intermediate or medium term outcomes); very few can capture condition change (long term outcomes, often called impacts–though not by me).  I thought about that.  Intention to change behavior can be measured.  Confidence (self-efficacy) to change behavior can be measured.  For me, all evaluations need to address those two points.

Paul Mazmanian, Associate Dean for Continuing Professional Development and Evaluation Studies at Virginia Commonwealth University, has studied changing practice patterns for several years.  One study, conducted in 1998, reported that “…physicians in both study and control groups were significantly more likely to change (47% vs. 7% p< .001) if they indicated intent to change immediately following the lecture” (Academic Medicine. 1998; 73:882-886).   Mazmanian and his co-authors say in their conclusions that “successful change in practice may depend less on clinical and barriers information than on other factors that influence physicians’ performance.  To further develop the commitment-to-change strategy in measuring effects of planned change, it is important to isolate and learn the powers of individual components of the strategy as well as their collective influence on physicians’ clinical behavior.”

 

What are the implications for Extension and other complex organizations?   It makes sense to extrapolate from this information from the continuing medical education literature.  Physicians are adults; most of Extension’s audience are adults.  If stated intention to change is highly predictable  “immediately following the lecture” (i.e., continuing education program) based on stated intention to change, then stated intention to change solicited from participants in Extension programs immediately following the program delivery would increase the likelihood of behavior change.  One of the outcomes Extension wants to see is change in behavior (medium term outcomes).  Measuring those behavior changes directly (through observation, or some other method) is often outside the resources available.  Measuring those intended behavior changes is within the scope of Extension resources.  Using a time frame (such as 6 months) helps bound the anticipated behavior change.  In addition, intention to change can be coupled with confidence to implement the behavior change to provide the evaluator with information about the effect of the program.  The desired effect is high confidence to change and willingness to implement the change within the specified time frame.  If Extension professionals find that result, then it would be safe to say that the program is successful.

REFERENCES

1.  Mazmanian, P.E., Daffron, S. R., Johnson, R. E., Davis, D. A., Kantrowitz, M. P.  (1998).  Information about barriers to planned change:  A Randomized controlled trial involving continuing medical education lectures and commitment to change.  Academic Medicine, 73 (8), 882-886.

2.  Mazmanian, P. E. & Mazmanian, P. M.  (1999).  Commitment to change: Theoretical foundations, methods, and outcomes.  The Journal of Continuing Education in the Health Professions, 19, 200 – 207.

3.  Mazmanian, P. E., Johnson, R. E, Zhang, A. Boothby, J. & Yeatts, E. J. (2001).  Effects of a signature on rates of change: A randomized controlled trial involving continuing medical education and the commitment-to-change model.  Academic Medicine, 76 (6), 642-646.

 

Historically, April 15 is tax day (although in 2011, it is April 18 )–the day taxes are due to the revenue departments.

State legislatures are dealing with budgets and Congress is trying to balance a  Federal budget.

Everywhere one looks, money is the issue–this is especially true in these recession ridden time.  How does all this relate to evaluation, you ask?  This is the topic for today’s blog.  How does money figure into evaluation.

Let’s start with the simple and move to the complex.  Everything costs–and although I’m talking about money, time, personnel, and resources  (like paper, staples, electricity, etc.)  must also be taken into consideration.

When we talk about evaluation, four terms typically come to mind:  efficacy, effectiveness, efficiency, and fidelity.

Efficiency is the term that addresses money or costs.  Was the program efficient in its use of resources?  That is the question asked addressing efficiency.

To answer that question, there are three (at least) approaches that are used to address this question:

  1. Cost  or cost analysis;
  2. Cost effectiveness analysis; and
  3. Cost-benefit analysis.

Simply then:

  1. Cost analysis is the number of dollars it takes to deliver the program, including salary of the individual(s) planning the program.
  2. Cost effectiveness analysis is a computation of the target outcomes in an appropriate unit in ratio to the costs.
  3. Cost-benefit analysis is also a ratio of the costs of outcomes to the benefits of the program measured in the same units, usually money.

How are these computed?

  1. Cost can be measured by how much the consumer is willing to pay.  Costs can be the value of each resource that is consumed in the implementation of the program.  Or cost analysis can be “measuring costs so they can be related to procedures and outcomes” (Yates, 1996, p. 25).   So you list the money spent to implement the program, including salaries, and that is a cost analysis.  Simple.
  2. Cost effectiveness analysis says that there is some metric in which the outcomes are measured (number of times hands are washed during the day, for example) and that is put in ratio of the total costs of the program.  So movement from washing hands only once a day (a bare minimum) to washing hands at least six times a day would have the costs of the program (including salaries) divided by the changed number of times hands are washed a day (i.e., 5).  The resulting value is the cost-effectiveness analysis.  Complex.
  3. Cost-benefit analysis puts the outcomes in the same metric as the costs–in this case dollars.  The costs  (in dollars) of the program (including salaries) are put in ratio to the  outcomes (usually benefits) measured in dollars.  The challenge here is assigning a dollar amount to the outcomes.  How much is frequent hand washing worth? It is often measured in days saved from communicable/chronic/ acute  illnesses.  Computations of health days (reduction in days affected by chronic illness) is often difficult to value in dollars.  There is a whole body of literature in health economics for this topic, if you’re interested.  Complicated and complex.

Yates, B. T. (1996).  Analyzing costs, procedures, processes, and outcomes in human services.  Thousand Oaks, CA: Sage.

There has been a lot of buzz recently about the usefulness of the Kirkpatrick model

I’ve been talking about it (in two previous posts) and so have others.   This model has been around a long time and has continued to be useful in the training field.  Extension does a lot of training.  Does that mean this model should be used exclusively when training is the focus?  I don’t think so.  Does this model have merits.  I think so.  Could it be improved upon?  That depends on the objective of your program and your evaluation, so probably.

If you want to know about whether your participants react favorably to the training, then this model is probably useful.

If you want to know about the change in knowledge, skills,  attitudes, then this model may be useful.  You would need to be careful because knowledge is a slippery concept to measure.

If you want to know about the change in behavior, probably not. Kirkpatrick on the website says that application of learning is what is measured in the behavioral stage.  How do you observe behavior change at a training?  Observation is the obvious answer here and one does not necessarily observe behavior change at a training.  Intention to change is not mentioned in this level.

If you want to know what difference you made in the social, economic, and/or environmental conditions in which your participants live, work, and practice, then the Kirkpatrick model won’t take you there.  The 4th level (which is where evaluation starts for this model, according to Kirkpatrick) says:  To what degree targeted outcomes occur as a result of the training event and subsequent reinforcement. I do not see this as condition change or what I call impact.

A faculty member asked me for specific help in assessing impact.  First, one needs to define what is meant by impact.  I use the word to mean change in social, environmental, and/or economic conditions over the long run.  This means changes in social institutions like family, school, employment (social conditions). It means changes in the environment which may be clean water or clean air OR it may mean removing the snack food vending machine from the school (environmental conditions).  It means changes in some economic indicator, up or down, like return on investment, change in employment status,  or increase revenue (economic conditions).  This doesn’t necessarily mean targeted outcomes of the training event.

I hope that any training event will move participants to a different place in their thinking and acting that will manifest in the LONG RUN in changes in one of the three conditions mentioned above.  To get there, one needs to be specific in what one is asking the participants.  Intention to change doesn’t necessarily get to impact.  You could anticipate impact if participants follow through with their intention.  The only way to know that for sure  is to observe it.  We approximate that by asking good questions.

What questions are you asking about condition change to get at impacts of your training and educational programs?

Next week:  TIMELY TOPIC.  Any suggestions?

Although I have been learning about and doing evaluation for a long time, this week I’ve been searching for a topic to talk about.  A student recently asked me about the politics of evaluation–there is a lot that can be said on that topic, which I will save for another day.  Another student asked me about when to do an impact study and how to bound that study.  Certainly a good topic, too, though one that can wait for another post.  Something I read in another blog got me thinking about today’s post.  So, today I want to talk about gathering demographics.

Last week, I mentioned in my TIMELY TOPIC post about the AEA Guiding Principles. Those Principles along with the Program Evaluation Standards make significant contributions in assisting evaluators in making ethical decisions.  Evaluators make ethical decisions with every evaluation.  They are guided by these professional standards of conduct.  There are five Guiding Principles and five Evaluation Standards.  And although these are not proscriptive, they go along way to ensuring ethical evaluations.  That is a long introduction into gathering demographics.

The guiding principle, Integrity/Honesty states thatEvaluators display honesty and integrity in their own behavior, and attempt to ensure the honesty and integrity of the entire evaluation process.”  When we look at the entire evaluation process, as evaluators, we must strive constantly to maintain both personal and professional integrity in our decision making.  One decision we must make involves deciding what we need/want to know about our respondents.  As I’ve mentioned before, knowing what your sample looks like is important to reviewers, readers, and other stakeholders.  Yet, if we gather these data in a manner that is intrusive, are we being ethical?

Joe Heimlich, in a recent AEA365 post, says that asking demographic questions “…all carry with them ethical questions about use, need, confidentiality…”  He goes on to say that there are “…two major conditions shaping the decision to include – or to omit intentionally – questions on sexual or gender identity…”:

  1. When such data would further our understanding of the effect or the impact of a program, treatment, or event.
  2. When asking for such data would benefit the individual and/or their engagement in the evaluation process.

The first point relates to gender role issues–for example are gay men more like or more different from other gender categories?  And what gender categories did you include in your survey?  The second point relates to allowing an individual’s voice to be heard clearly and completely and have categories on our forms reflect their full participation in the evaluation.  For example, does marital status ask for domestic partnerships as well as traditional categories and are all those traditional categories necessary to hear your participants?

The next time you develop a questionnaire that includes demographic questions, take a second look at the wording–in an ethical manner.

Hello, readers.  This week I’m doing something different with this blog.  This week, and the third week in each month from now on, I’ll be posting a column called Timely Topic.  This will be a post on a topic that someone (that means you reader) has suggested.  A topic that has been buzzing around in conversations.  A topic that has relevance to evaluation.  This all came about because a colleague from another land grant institution is concerned about the dearth of evaluation skills among Extension colleagues.  (Although this comment makes me wonder to whom this colleague is talking, that question is content for another post, another day.)  So thinking about how to get core evaluation information out to more folks, I decided to devote one post a month to TIMELY TOPICS.  To day’s post is about “THINKING CAREFULLY”.

Recently, I’ve been asked to review a statistics text book for my department. This particular book uses a program that is available on everyone’s computer.  The text has some important points to make and today’s post reflects one of those points.  The point is thinking carefully about using statistics.

As an evaluator–if only the evaluator of your own programs–you must think critically about the “…context of the data, the source of the data, the method used in data collection, the conclusions reached, and the practical implications” (Triola, 2010, p. 18).  The author posits that to understand general methods of using sample data; make inferences about populations; understand sampling and surveys; and important measures of key characteristics of data, as well as the use of valid statistical methods, one must recognize the misuse of statistics.

I’m sure all of you have heard the quote, “Figures don’t lie; liars figure,” which is attributed to Mark Twain.  I’ve always heard the quote as “Statistics lie and liars use statistics.”  Statistics CAN lie.  Liars CAN use statistics.  That is where thinking carefully comes in–to determine if the statistical conclusions being presented are seriously flawed.

As evaluators, we have a responsibility (according to the AEA guiding principles) to conduct systematic, data-based inquiry; provide competent performance; display honesty and integrity…of the entire evaluation process; respect the security, dignity, and self-worth of all respondents; and consider the diversity of the general and public interests and values.  This demands that we think carefully about the reporting of data.  Triola cautions, “Do not use voluntary response sample data for making conclusions about a population.”  How often have you used data from individuals who decide themselves (self-selected) whether to participate in your survey or not?  THINK CAREFULLY about your sample.  These data cannot be generalized to all people like your respondents because of the bias that is introduced by self-selection.

Other examples of misuse of statistics include

  • using correlation for concluding causation;
  • reporting data that involves a sponsors product;
  • identifying respondents inappropriately;
  • reporting data that is affected with a desired response bias;
  • using small samples to draw conclusions for large groups;
  • implying that being precise is being accurate; and
  • reporting misleading or unclear percentages. (This cartoon was drawn by Ben Shabad.)

When reporting statistics gathered from your evaluation, THINK CAREFULLY.

Three weeks ago, I promised you a series of posts on related topics–Program planning, Evaluation implementation, monitoring and delivering, and Evaluation utilization.  This is the third one–using the findings of evaluation.

Michael Patton’s book  is my reference.

I’ll try to condense the 400+ page book down to 500+ words for today’s post.  Fortunately, I have the Reader’s Digest version as well (look for Chapter 23 [Utilization-Focused Evaluation] in the following citation: Stufflebeam, D. L., Madaus, G. F. Kellaghan, T. (2000). Evaluation Models: Viewpoints on educational and human services evaluation, 2ed. Boston, MA: Kluwer Academic Publishers).  Patton’s chapter is a good summary–still it is 14 pages.

To start, it is important to understand exactly how the word “evaluation” is used in the context of utilization.  In the Stufflebeam, Madaus, & Kellaghan publication cited above, Patton (2000, p. 426) describes evaluation as, “the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness and/or inform decisions about future programming.  Utilization-focused evaluation (as opposed to program evaluation in general) is evaluation done for and with specific intended primary users for specific, intended uses (emphasis added). ”

There are four different types of use–instrumental, conceptual, persuasive, and process. The interest of potential stakeholders cannot be served well unless the stakeholder(s) whose interests are being served is made explicit.

To understand the types of use,  I will quote from a document titled, “Non-formal Educator Use of Evaluation Findings: Factors of Influence” by Sarah Baughman.

“Instrumental use occurs when decision makers use the findings to change or modify the program in some way (Fleisher & Christie, 2009; McCormick, 1997; Shulha & Cousins, 1997). The information gathered is used in a direct, concrete way or applied to a specific decision (McCormick, 1997).

Conceptual use occurs when the evaluation findings help the program staff or key stakeholders understand the program in a new way (Fleisher & Christie, 2009).

Persuasive use has also been called political use and is not always viewed as a positive type of use (McCormick, 1997). Examples of negative persuasive use include using evaluation results to justify or legitimize a decision that is already made or to prove to stakeholders or other administrative decision makers that the organization values accountability (Fleisher & Christie, 2009). It is sometimes considered a political use of findings with no intention to take the actual findings or the evaluation process seriously (Patton, 2008). Recently persuasive use has not been viewed as negatively as it once was.

Process use is the cognitive, behavioral, program, and organizational changes resulting, either directly or indirectly, from engagement in the evaluation process and learning to think evaluatively” (Patton, 2008, p. 109). Process use results not from the evaluation findings but from the evaluation activities or process.”

Before beginning the evaluation, the question, “Who is the primary intended user of the evaluation?” must not only be asked; it also must be answered.  What stakeholders need to be at the table? Those are the people who have a stake in the evaluation findings and those stakeholders may be different for each evaluation.  They are probably the primary intended users who will determine the evaluations use.

Citations mentioned in the Baughman quotation include:

  • Fleischer, D. N. & Christie, C. A. (2009). Evaluation use: Results from a survey of U.S. American Evaluation Association members. American Journal of Evaluation, 30(2), 158-175
  • McCormick, E. R. (1997). Factors influencing the use of evaluation results. Dissertation Abstracts International: Section A: The Humanities and Social Sciences, 58, 4187 (UMI 9815051).
  • Shula, L. M. & Cousins, J. B. (1997). Evaluation use: Theory, research and practice since 1986. Evaluation Practice, 18, 195-208.
  • Patton, M. Q. (2008). Utilization Focused Evaluation (4th ed.). Thousand Oaks: Sage Publications.