A few weeks ago I  mentioned that a colleague of mine shared with me some insights she had about survey development.  She had an Aha! moment.   We had a conversation about that Aha! Moment and video taped the conversation.  To see the video, click here.


In thinking about what Linda learned, I realized that Aha! Moments could be a continuing series…so watch for more.

Let me know what you think.  Feedback is always welcome.

Oh–I want to remind you about an excellent resource for surveys.  Dillman’s current book, Internet, mail, and mixed-mode surveys:  The tailored design method.  It is a Wiley publication by Don A. Dillman, Jolene D. Smyth, and Leah Melani Christian.  Needs to be on your desk if you do any kind of survey work.

I started this post back in April.  I had an idea that needed to be remembered…it had to do with the unit of analysis; a question which often occurs in evaluation.  To increase sample size and, therefore,  power, evaluators often choose run analyses on the larger number when the aggregate, i.e., smaller number is probably the “true” unit of analysis.  Let me give you an example.

A program is randomly assigned to fifth grade classrooms in three different schools.  School A has three classrooms; school B has two classrooms; and school C has one classroom.  All together, there are approximately 180 students, six classrooms, three schools.  What is the appropriate unit of analysis?  Many people use students, because of the sample size issue.  Some people will use classroom because each got a different treatment.  Occasionally, some evaluators will use schools because that is the unit of randomization.  This issue elicits much discussion.  Some folks say that because students are in the school, they are really the unit of analysis because they are imbedded in the randomization unit.  Some folks say that students is the best unit of analysis because there are more of them.  That certainly is the convention.  What you need to decide is what is the unit and be able to defend that choice.  Even though I would loose power, I think I would go with the the unit of randomization.  Which leads me to my next point–truth.

At the end of the first paragraph, I use the words “true” in quotation marks. The Kirkpatricks in their most recent blog opened with a quote from the US CIA headquarters in Langley Virginia, “”And ye shall know the truth, and the truth shall make you free”.   (We wont’ talk about the fiction in the official discourse, today…)   (Don Kirkpatrick developed the four levels of evaluation specifically in the training and development field.)  Jim Kirkpatrick, Don’s son, posits that, “Applied to training evaluation, this statement means that the focus should be on discovering and uncovering the truth along the four levels path.”  I will argue that the truth is how you (the principle investigator, program director, etc.) see the answer to the question.  Is that truth with an upper case “T” or is that truth with a lower case “t”?  What do you want it to mean?

Like history (history is what is written, usually by the winners, not what happened), truth becomes what do you want the answer to mean.  Jim Kirkpatrick offers an addendum (also from the CIA), that of “actionable intelligence”.  He goes on to say that, “Asking the right questions will provide data that gives (sic) us information we need (intelligent) upon which we can make good decisions (actionable).”  I agree that asking the right question is important–probably the foundation on which an evaluation is based.  Making “good decisions”  is in the eyes of the beholder–what do you want it to mean.

An important question that evaluators ask is, “What difference is this program making?”  Followed quickly with, “How do you know?”

Recently, I happened on a blog called {grow} and the author, Mark Schaefer,  had a post called, “Did this blog make a difference?”  Since this is a question as an evaluator I am always asking, I jumped on the page.  Mr. Schaefer is in marketing and as a marketing expert he says the following, “You’re in marketing for one reason: Grow. Grow your company, reputation, customers, impact, profits. Grow yourself. This is a community that will help. It will stretch your mind, connect you to fascinating people, and provide some fun along the way.”  So I wondered how relevant this blog would be to me and other evaluators whether they blogged or not.

Mr. Schaefer is taking stock of his blog–a good thing to do for a blog that has been posted for a while.  So although he lists four innovations, he asks the reader to “…be the judge if it made a difference in your life, your outlook, and your business.”  The four innovations are

  1. Paid contributing columnists.  He actually paid the folks who contributed to his blog; not something those of us in Extension can do.
  2. {growtoons}. Cartoons designed specifically for the blog that “…adds an element of fun and unique social media commentary.”  Hmmm…
  3. New perspectives. He showcased fresh deserving voices; some that he agreed with and some that he did not.  A possibility.
  4. Video. He did many video blogs and that gave him the opportunity to “…shine the light on some incredible people…”  He interviews folks and posts the short video.  Yet another possibility.

His approach seems really different to what I do.  Maybe it is the content; maybe it is the cohort; maybe it is something else.  Maybe there is something to be learned from what he does.  Maybe this blog is making a difference.  Only I don’t know.  So, I take a clue from Mr. Schaefer and ask you to judge if it has made a difference in what you do–then let me know.  I’ve imbedded a link  to a quick survey that will NOT link to you nor in anyway identify you.  I will only be  using the findings for program improvement.  Please let me know.  Click here to link to the survey.


Oh, and I won’t be posting next week–spring break and I’ll be gone.


Last weekend, I was in Florida visiting my daughter at Eckerd College.  The College was offering an Environmental Film Festival and I had the good fortune to see Green Fire, a film about Aldo Leopold and the land ethic.   I had seen it at OSU and was impressed because it was not all doom and gloom; rather it celebrated Aldo Leopold as one of the three leading and  early conservationists  (the other two are John Muir and Henry David Thoreau ).  Dr. Curt Meine, who narrates the film and is a conservation biologist, was leading the discussion again; I had heard him at OSU.  At the showing early, I was able to chat with him about the film and its effects.  I asked him how he knew he was being effective.  His response was to tell me about the new memberships in the Foundation, the number of showings, and the size of the audience seeing the film.  Appropriate responses for my question.  What I really wanted to know was how did he know he was making a difference.  That is a different question; one which talks about change.  Change is what programs like Green Fire is all about.  It is what Aldo Leopold was all about (read Sand County Almanac to understand Leopold’s position.)


Change is what evaluation is all about.  But did I ask the right question?  How could I have phrased it differently to get at what change had occurred in the viewers of the film?  Did new memberships in the Foundation demonstrate change?  Knowing what question to ask is important for program planners as well as evaluators.  There are often multiple levels of questions that could be asked–individual, programmatic, organizational, regional, national, global.  Are they all equally important?  Do they provide a means forgathering pertinent data?  How are you going to use these data once you’ve gathered them?  How carefully do you think about the questions you ask when you craft your logic model?  When you draft a survey?  When you construct questions for focus groups?  Asking the right question will yield relevant answers.  It will show you what difference you’ve made in the lives of your target audience.


Oh, and if you haven’t see the film, Green Fire, or read the book, Sand County Almanac–I highly recommend them.


I started this post the third week in July.  Technical difficulties prevented me from completing the post.  Hopefully, those difficulties are now in the past.

A colleague asked me what can we do when we can’t measure actual behavior change in our evaluations.  Most evaluations can capture knowledge change (short term outcomes); some evaluations can capture behavior change (intermediate or medium term outcomes); very few can capture condition change (long term outcomes, often called impacts–though not by me).  I thought about that.  Intention to change behavior can be measured.  Confidence (self-efficacy) to change behavior can be measured.  For me, all evaluations need to address those two points.

Paul Mazmanian, Associate Dean for Continuing Professional Development and Evaluation Studies at Virginia Commonwealth University, has studied changing practice patterns for several years.  One study, conducted in 1998, reported that “…physicians in both study and control groups were significantly more likely to change (47% vs. 7% p< .001) if they indicated intent to change immediately following the lecture” (Academic Medicine. 1998; 73:882-886).   Mazmanian and his co-authors say in their conclusions that “successful change in practice may depend less on clinical and barriers information than on other factors that influence physicians’ performance.  To further develop the commitment-to-change strategy in measuring effects of planned change, it is important to isolate and learn the powers of individual components of the strategy as well as their collective influence on physicians’ clinical behavior.”


What are the implications for Extension and other complex organizations?   It makes sense to extrapolate from this information from the continuing medical education literature.  Physicians are adults; most of Extension’s audience are adults.  If stated intention to change is highly predictable  “immediately following the lecture” (i.e., continuing education program) based on stated intention to change, then stated intention to change solicited from participants in Extension programs immediately following the program delivery would increase the likelihood of behavior change.  One of the outcomes Extension wants to see is change in behavior (medium term outcomes).  Measuring those behavior changes directly (through observation, or some other method) is often outside the resources available.  Measuring those intended behavior changes is within the scope of Extension resources.  Using a time frame (such as 6 months) helps bound the anticipated behavior change.  In addition, intention to change can be coupled with confidence to implement the behavior change to provide the evaluator with information about the effect of the program.  The desired effect is high confidence to change and willingness to implement the change within the specified time frame.  If Extension professionals find that result, then it would be safe to say that the program is successful.


1.  Mazmanian, P.E., Daffron, S. R., Johnson, R. E., Davis, D. A., Kantrowitz, M. P.  (1998).  Information about barriers to planned change:  A Randomized controlled trial involving continuing medical education lectures and commitment to change.  Academic Medicine, 73 (8), 882-886.

2.  Mazmanian, P. E. & Mazmanian, P. M.  (1999).  Commitment to change: Theoretical foundations, methods, and outcomes.  The Journal of Continuing Education in the Health Professions, 19, 200 – 207.

3.  Mazmanian, P. E., Johnson, R. E, Zhang, A. Boothby, J. & Yeatts, E. J. (2001).  Effects of a signature on rates of change: A randomized controlled trial involving continuing medical education and the commitment-to-change model.  Academic Medicine, 76 (6), 642-646.


We recently held Professional Development Days for the Division of Outreach and Engagement.  This is an annual opportunity for faculty and staff in the Division to build capacity in a variety of topics.  The question this training posed was evaluative:

How do we provide meaningful feedback?

Evaluating a conference or a multi-day, multi-session training is no easy task.  Gathering meaningful data is a challenge.  What can you do?  Before you hold the conference (I’m using the word conference to mean any multi-day, multi-session training), decide on the following:

  • Are you going to evaluate the conference?
  • What is the focus of the evaluation?
  • How are you going to use the results?

The answer to the first question is easy:  YES.  If the conference is an annual event (or a regular event), you will want to have participants’ feedback of their experience, so, yes, you will evaluate the conference. Look at a Penn State Tip Sheet 16 for some suggestions.  (If this is a one time event, you may not; though as an evaluator, I wouldn’t recommend ignoring evaluation.)

The second question is more critical.  I’ve mentioned in previous blogs the need to prioritize your evaluation.  Evaluating a conference can be all consuming and result in useless data UNLESS the evaluation is FOCUSED.  Sit down with the planners and ask them what they expect to happen as a result of the conference.  Ask them if there is one particular aspect of the conference that is new this year.  Ask them if feedback in previous years has given them any ideas about what is important to evaluate this year.

This year, the planners wanted to provide specific feedback to the instructors.  The instructors had asked for feedback in previous years.  This is problematic if planning evaluative activities for individual sessions is not done before the conference.  Nancy Ellen Kiernan, a colleague at Penn State, suggests a qualitative approach called a Listening Post.  This approach will elicit feedback from participants at the time of the conference.  This method involves volunteers who attended the sessions and may take more persons than a survey.  To use the Listening Post, you must plan ahead of time to gather these data.  Otherwise, you will need to do a survey after the conference is over and this raises other problems.

The third question is also very important.  If the results are just given to the supervisor, the likelihood of them being used by individuals for session improvement or by organizers for overall change is slim.  Making the data usable for instructors means summarizing the data in a meaningful way, often visually.  There are several way to visually present survey data including graphs, tables, or charts.  More on that another time.  Words often get lost, especially if words dominate the report.

There is a lot of information in the training and development literature that might also be helpful.  Kirkpatrick has done a lot of work in this area.  I’ve mentioned their work in previous blogs.

There is no one best way to gather feedback from conference participants.  My advice:  KISS–keep it simple and straightforward.

…that there is a difference between a Likert item and a Likert scale?**

Did you know that a Likert item was developed by Rensis Likert, a psychometrician and an educator? 

And that the item was developed to have the individual respond to the level of agreement or disagreement with a specific phenomenon?

And did you know that most of the studies on Likert items use a five- or seven-points on the item? (Although sometimes a four- or six-point scale is used and that is called a forced-choice approach–because you really want an opinion, not a middle ground, also called a neutral ground.)

And that the choices in an odd-number choice usually include some variation on the following theme, “Strongly disagree”, “Disagree”, “Neither agree or disagree”, “Agree”, “Strongly Agree”?

And if you did, why do you still write scales, and call them Likert, asking for information using a scale that goes from “Not at all” to “A little extent” to “Some extent” to “Great extent?  Responses that are not even remotely equidistant (that is, have equal intervals with respect to the response options) from each other–a key property of a Likert item.

And why aren’t you using a visual analog scale to get at the degree of whatever the phenomenon is being measured instead of an item for which the points on the scale are NOT equidistant? (For more information on a visual analog scale see a brief description here or Dillman’s book.)

I sure hope Rensis Likert isn’t rolling over in his grave (he died in 1981 at the age of 78).

Extension professionals use survey as the primary method for data gathering.  The choice of survey is a defensible one.  However, the format of the survey, the question content, and the question construction must also be defensible.  Even though psychometric properties (including internal consistency, validity, and other statistics) may have been computed, if the basic underlying assumptions are violated, no psychometric properties will compensate for a poorly designed instrument, an instrument that is not defensible.

All Extension professionals who choose to use survey to evaluate their target audiences need to have scale development as a personal competency.  So take it upon yourself to learn about guidelines for scale development (yes, there are books written on the subject!).


**Likert scale is the SUM of of responses on several Likert items.  A Likert item is just one 4 -, 5-, 6, or 7-point single statement asking for an opinion.

Reference:  Devellis, R. F. (1991).  Scale development:  Theory and applications. Newbury Park: Sage Publications. Note:  there is a newer edition.

Dillman, D. A, Smyth, J. D., & Christian, L. M. (2009).  Internet, mail, and mixed-mode surveys:  The tailored design method. (3rd ed.). Hoboken, NJ: John Wiley& Sons, Inc.

A part of my position is to build evaluation capacity.  This has many facets–individual, team, institutional.

One way I’ve always seen as building capacity is knowing where to find the answer to the how to questions.  Those how to questions apply to program planning, evaluation design, evaluation implementation, data gathering, data analysis, report writing, and dissemination.  Today I want to give you resources to build your tool box.  These resources build capacity only if you use them.


1.  Contact your evaluation specialist.

2.  Listen to stakeholders–that means including them in the planning.

3.  Read.

If you don’t know what to read to give you information about a particular part of your evaluation, see resource Number 1 above.  For those of you who do not have the luxury of an evaluation specialist, I’m providing some reading resources below (some of which I’ve mentioned in previous blogs).

1.  For program planning (aka program development):  Ellen Taylor-Powell’s web site at the University of Wisconsin Extension.  Her web site is rich with information about program planning, program development, and logic models.

2.  For evaluation design and implementation:  Jody Fitzpatrick”s book.

Citation:  Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2004). Program evaluation: Alternative approaches and practical guidelines.  (3rd ed.).  Boston: Pearson Education, Inc.

3.  For evaluation methods, that depends on the method you want to use for data gathering; it doesn’t cover the discussion of evaluation design, though.

  • For needs assessment, the books by Altschuld and Witkin (there are two).

(Yes, needs assessment is an evaluation activity).

Citation:  Witkin, B. R. & Altschuld, J. W. (1995).  Planning and conducting needs assessments: A practical guide. Thousand Oaks, CA:  Sage Publications.

Citation:  Altschuld, J. W. & Witkin B. R. (2000).  From needs assessment to action: Transforming needs into solution strategies. Thousand Oaks, CA:  Sage Publications, Inc.

  • For survey design:     Don Dillman’s book.

Citation:  Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009).  Internet, mail, and mixed-mode surveys:  The tailored design method.  (3rd. ed.).  Hoboken, New Jersey: John Wiley & Son, Inc.

  • For focus groups:  Dick Krueger’s book.

Citation:  Krueger, R. A. & Casey, M. A. (2000).  Focus groups:  A practical guide for applied research. (3rd. ed.).  Thousand Oaks, CA: Sage Publications, Inc.

  • For case study:  Robert Yin’s classic OR

Bob Brinkerhoff’s book. 

Citation:  Yin, R. K. (2009). Case study research: Design and methods. (4th ed.). Thousand Oaks, CA: Sage, Inc.

Citation:  Brinkerhoff, R. O. (2003).  the success case method:  Find out quickly what’s working and what’s not. San Francisco:  Berrett-Koehler Publishers, Inc.

  • For multiple case studies:  Bob Stake’s book.

Citation:  Stake, R. E. (2006).  Multiple case study analysis. New York: The Guilford Press.

Since this post is about capacity building, a resource for evaluation capacity building:

Hallie Preskill and Darlene Russ-Eft’s book .

Citation:  Preskill, H. & Russ-Eft, D. (2005).  Building Evaluation Capacity: 72 Activities for teaching and training. Thousand Oaks, CA: Sage Publications.

I’ll cover reading resources for data analysis, report writing, and dissemination another time.

Although I have been learning about and doing evaluation for a long time, this week I’ve been searching for a topic to talk about.  A student recently asked me about the politics of evaluation–there is a lot that can be said on that topic, which I will save for another day.  Another student asked me about when to do an impact study and how to bound that study.  Certainly a good topic, too, though one that can wait for another post.  Something I read in another blog got me thinking about today’s post.  So, today I want to talk about gathering demographics.

Last week, I mentioned in my TIMELY TOPIC post about the AEA Guiding Principles. Those Principles along with the Program Evaluation Standards make significant contributions in assisting evaluators in making ethical decisions.  Evaluators make ethical decisions with every evaluation.  They are guided by these professional standards of conduct.  There are five Guiding Principles and five Evaluation Standards.  And although these are not proscriptive, they go along way to ensuring ethical evaluations.  That is a long introduction into gathering demographics.

The guiding principle, Integrity/Honesty states thatEvaluators display honesty and integrity in their own behavior, and attempt to ensure the honesty and integrity of the entire evaluation process.”  When we look at the entire evaluation process, as evaluators, we must strive constantly to maintain both personal and professional integrity in our decision making.  One decision we must make involves deciding what we need/want to know about our respondents.  As I’ve mentioned before, knowing what your sample looks like is important to reviewers, readers, and other stakeholders.  Yet, if we gather these data in a manner that is intrusive, are we being ethical?

Joe Heimlich, in a recent AEA365 post, says that asking demographic questions “…all carry with them ethical questions about use, need, confidentiality…”  He goes on to say that there are “…two major conditions shaping the decision to include – or to omit intentionally – questions on sexual or gender identity…”:

  1. When such data would further our understanding of the effect or the impact of a program, treatment, or event.
  2. When asking for such data would benefit the individual and/or their engagement in the evaluation process.

The first point relates to gender role issues–for example are gay men more like or more different from other gender categories?  And what gender categories did you include in your survey?  The second point relates to allowing an individual’s voice to be heard clearly and completely and have categories on our forms reflect their full participation in the evaluation.  For example, does marital status ask for domestic partnerships as well as traditional categories and are all those traditional categories necessary to hear your participants?

The next time you develop a questionnaire that includes demographic questions, take a second look at the wording–in an ethical manner.

Sure, you want to know the outcomes resulting from your program.  Sure, you want to know if your program is effective.  Perhaps, you will even attempt to answer the question, “So What?” when you program is effective on some previously identified outcome.  All that is important.

My topic today is something that is often over looked when developing an evaluation–the participant and program characteristics.

Do you know what your participants look like?

Do you know what your program looks like?

Knowing these characteristics may seem unimportant at the outset of the implementation.  As you get to the end, questions will arise–How many females?  How many Asians?  How many over 60?

Demographers typically ask demographic questions as part of the data collection.

Those questions often include the following categories:

  • Gender
  • Age
  • Race/ethnicity
  • Marital status
  • Household income
  • Educational level

Some of those may not be relevant to your program and you may want to include other general characteristic questions instead.  For example, in a long term evaluation of a forestry program where the target audience was individuals with wood lots, asking how many acres were owned was important and marital status did not seem relevant.

Sometimes asking some questions may seem intrusive–for example, household income or age.  In all demographic cases, giving the participant an option to not respond is appropriate.  When these data are reported, report the number of participants who chose not to respond.

When characterizing your program, it is sometimes important to know characteristics of the geographic area where the program is being implemented–rural, suburban, urban, ?  This is especially true when the program is a multisite program.   Local introduces an unanticipated variable that is often not recognized or remembered.

Any variation in the implementation–number of contact hours, for example, or the number of training modules.  The type of intervention is important as well–was the program delivered as a group intervention or individually. The time of the year that the program is implemented may also be important to document.  The time of the year may inadvertently introduce a history bias into the study–what is happening in September is different than what is happening in December.

Documenting these characteristics  and then defining them when reporting the findings helps to understand the circumstances surrounding the program implementation.  If the target audience is large, documenting these characteristics can provide comparison groups–did males do something differently than females?  Did participants over 50 do something different than participants 49 or under?

Keep in mind when collecting participant and program characteristic data, that these data help you and the audience to whom you disseminate the findings understand your outcomes and the effect of your program.