A colleague asked me yesterday about authenticating anecdotes–you know–those wonderful stories you gather about how what you’ve done has made a difference in someones life?

 

I volunteer service to a non-profit board (two, actually) and the board members are always telling stories about how “X has happened” and how “Y was wonderful” yet,  my evaluator self says, “How do you know?”  This becomes a concern for organizations which do not have evaluation as part of their mission statement.  Evan though many boards hold accountable the Executive Director, few make evaluation explicit.

Dick Krueger  , who has written about focus groups, also writes and studies the use of stories in evaluation and much of what I will share with y’all today is from his work.

First, what is a story?  Creswell (2007, 2 ed.) defines story as “…aspects that surface during an interview in which the participant describes a situation, usually with a beginning, a middle, and an end, so that the researcher can capture a complete idea and integrate it, intact, into the qualitative narrative”.  Krueger elaborates on that definintion by saying that a story “…deals with an experience of an event, program, etc. that has a point or a purpose.”  Story differs from case study in that case study is a story that tries to understand a system, not an individual event or experience; a story deals with an experience that has a point.  Stories provide examples of core philosophies, of significant events.

There are several purposes for stories that can be considered evaluative.  These include depicting the culture, promoting core values, transmitting and reinforcing current culture, providing instruction (another way to transmit culture), and motivating, inspiring, and/or encouraging (people).  Stories can be of the following types:  hero stories, success stories, lesson-learned stories, core value  stories, cultural stories, and teaching stories.

So why tell a story?  Stories make information easier to remember, more believable, and tap into emotion.  For stories to be credible (provide authentication), an evaluator needs to establish criteria for stories.  Krueger suggests five different criteria:

  • Authentic–is it truthful?  Is there truth in the story?  (Remember “truth” depends on how you look at something.)
  • Verifiable–is there a trail of evidence back to the source?  Can you find this story again?
  • Confidential–is there a need to keep the story confidential?
  • Original intent–what is the basis for the story?  What motivated telling the story? and
  • Representation–what does the story represent?  other people?  other locations?  other programs?

Once you have established criteria for the stories collected, there will need to be some way to capture stories.  So develop a plan.  Stories need to be willingly shared, not coerced; documented and recorded; and collected in a positive situation.  Collecting stories is an example where the protections for  humans in research must be considered.  Are the stories collected confidentially?  Does telling the stories result in little or no risk?  Are stories told voluntarily?

Once the stories have been collected, analyzing and reporting those stories is the final step.  Without this, all the previous work  was for naught.  This final step authenticates the story.  Creswell provides easily accessible guidance for analysis.

My oldest daughter graduated from High School Monday.  Now, she is facing the reality of life after high school–the emotional let down, the lack of structure; the loss of focus.  I remember what it was like to commence…another word for beginning.  I think I was depressed for days.  The question becomes evaluative when one thinks of planning, which is what she has to do now.  In planning, she needs to think:  What excites me?  What are my passions?  How will I accomplish the what?  How will I connect again to the what?  How will I know I’m successful?

Ellen Taylor-Powell,  former Distinguished Evaluation Specialist at the University of Wisconsin Extension, talks about planning on the professional development website at UWEX.  (There are many other useful publications on this site…I urge you to check them out.)  This publication has four sections:  focusing the evaluation, collecting the information, using the information, and managing the evaluation.  I want to talk more about focusing the evaluation–because that is key when beginning, whether it is the next step in your life, the next program you want to implement, or the next report you want to write.

This section of the publication asks you to identify what you are going to evaluate, the purpose of the evaluation, who and how they will use the evaluation, what questions you want to answer, what information you need to answer those questions, develop a time-line, and, finally, identify what resources you will need.  I see this as puzzle assembly–one where you do not necessarily have a picture to guide you.  Not unlike a newly commenced graduate–finding a focus is putting together a puzzle.–you won’t know what the picture is, where you are going, until you focus and develop a plan.  For me, that means putting the puzzle together.  It means finding the what and the so what.  It is always the first place to commence.

One of the opportunities I have as a faculty member at OSU is to mentor students.  I get to do this in a variety of ways–sit on committees, provide independent studies, review preliminary proposal, listen…I find it very exciting to see the change and growth in students’ thinking and insights when I work with students.  I get some of my best ideas from them.  Like today’s post…

I just reviewed several chapters of student dissertation proposals.  These students had put a lot of thought and passion into their research questions.  To them, the inquiry was important; it could be the impetus to change.  Yet, the quality of the writing often detracted from the quality of the question; the importance of the inquiry; the opportunity to make a difference.

How does this relate to evaluation?  For evaluations to make a difference, the findings must be used.  This does not mean writing the report and giving it to the funder, the principal investigator, the program leader, or other stakeholders.  Too many reports have gathered dust on someone shelf because they are not used.  In order to be used, the report must be written so that they can be understood.  The report needs to be written to a naive audience; as though the reader knows nothing about the topic.

When I taught technical writing, I used the mnemonic of the 5Cs.  My experience is that if these concepts (all starting with the letter      ) were employed, the report/paper/manuscript would be able to be understood by any reader.

The report needs to be written:

  • Clearly
  • Coherently
  • Concisely
  • Correctly
  • Consistently

Clearly means not using jargon; using simple words; explaining technical words.

Coherently means having the sections of the report hang together; not having any (what I call) quantum leaps.

Concisely means using few words; avoiding long meandering paragraphs; avoiding the over use of prepositions (among other things).

Correctly means making sure that grammar and syntax are correct; subject/verb agreements; remembering that the word “data” is a plural word and takes a plural verb and plural articles.

Consistently means using the same word to describe the parts of your research; participants are participants all through the report, not subjects on page 5, respondents on page 11, and students on page 22.

This little mnemonic has helped many students write better papers; I know it can help many evaluators write better reports.

This is no easy task.  Writing is hard work; using the 5Cs makes it easier.

A colleague recently asked, “How many people need to be sampled (interviewed) to make the study valid?”

Interesting question.  Takes me to the sampling books.  And the interview books.

To answer that question, I need to know if you are gathering qualitative or quantitative data; if you are conducting two measurements on one group (like a pretest/posttest or a post-then-pre); if you are conducting one measurement on two groups; or any number of other conditions that affect sampling (like site, population size, phenomenon studied, etc).

So here is the easy answer.

If you are conducting two observations on one group, you will need a minimum of 30 participants with complete responses.

If you are conducting one observation on two groups, you will need at least 30 participants in each group with complete responses.

If you are conducting focus groups, you will need 10 -12 participants in each group and you will need to conduct groups until you reach saturation, that is, the responses are being repeated and you are not getting any new information (some folks say reaching saturation takes 3 – 4 groups).

If you are conducting exploratory qualitative research, you will need…it all depends on your research question.

If you are conducting confirmatory qualitative research, you will need…it all depends on your research question.

If you are conducting individual interviews, you will need…and here my easy answer fails me…so let me tell you some other information that may be helpful.

Dillman has a little chart on page 57 (Figure 3.1) that lists the sample size you will need for various populations sizes and three sizes of margin of error.  For example, if your population is 100 (a good size) and you want a margin of error of 5% (that means that the results will be accurate within + or – 5%, 95% of the time), you will need 49 participants with complete data sets if you think that for a yes/no question, participants will be split 50 yes/50 no (the most conservative assumption that can be made) You will need only 38 participants with complete data sets if you think that the responses will be unevenly split (the usual case).

This chart assumes random selection.  It takes into consideration variation in sample (the greater the variation, the larger the sample size needed).  It assumes maximum heterogeneity on a proportion of the population from which the sample is drawn.

Marshall and Rossman say very clearly, “One cannot study the universe…”  So you need to make selections of sites, samples of times, places, people and things to study.  The how many depends…sometimes it is one person; sometimes it is one organization;  sometimes it is more.  They say one to four respondents for case studies and mixed-methods studies; 10 groups was the average number of focus groups; 16 – 24 months was the average for observational fieldwork;  and one set of interviews they cite involved 92 participants.  It all depends on the research purpose–an unknown culture could be studied with a single in depth case study;  a study of mother’s receptivity of breast-feeding could have a huge sample or a small sample could provide thick description while a large sample would enhance transferability.  Credibility and trustworthiness of the findings must be considered.

The best answer, then, is…it all depends.

I was putting together a reading list for an evaluation capacity building program I’ll be leading come September and was reminded about process evaluation.  Nancy Ellen Kiernan has a one page handout on the topic.  It is a good place to start.  Like everything in evaluation, there is so much more to say.  Let’s see what I can say in 440 words or less.

When I first started doing evaluation (back when we beat on hollow logs), I developed a simple approach (call it a model) so I could talk to stakeholders about what I did and what they wanted done.  I called it the P3 model–Process, Progress, Product.  This is a simple approach that answers the following evaluative questions:

  • How did I do what I did? (Process)
  • Did I do what I did in a timely manner? (Progress)
  • Did I get the outcome I wanted (Product)

It is the “how” question I’m going to talk about today.

Scriven, in the 4th ed of the Evaluation Thesaurus, says that a process evaluation “focuses entirely on the variables between input and output”.  It may include input variables.  Knowing this helps you know what the evaluative question is for the input and output parts of a logic model (remember there are evaluative questions/activities for each part of a logic model).

When considering evaluating a program, process evaluation is not sufficient; it may be necessary and still not be sufficient.  An outcome evaluation must accompany a process evaluation.  Evaluating process components of a program involves looking at internal and external communications (think memos, emails, letters, reports, etc.); interface with stakeholders (think meeting minutes); the formative evaluation system of a program (think participant satisfaction); and infrastructure effectiveness (think administrative patterns, implementation steps, corporate responsiveness; instructor availability, etc.).

Scriven provides these examples that suggest the need for program improvement: “…program’s receptionists are rude to most of a random selection of callers; the telephonists are incompetent; the senior staff is unhelpful to evaluators called in by the program to improve it; workers are ignorant about the reasons for procedures that are intrusive to their work patterns;  or the quality control system lacks the power to call a halt to the process when it discerns an emergency.”  Other examples which demonstrate program success are administrators are transparent about organizational structure; program implementation is inclusive; or participants are encouraged to provide ongoing feedback to program managers.  We could then say that a process evaluation assesses the development and actual implementation of a program to determine whether the program was  implemented as planned and whether expected output was actually produced.

Gathering data regarding the program as actually implemented assists program planners in identifying what worked and what did not. Some of the components included in a process evaluation are descriptions of program environment, program design, and program implementation plan.  Data on any changes to the program or program operations and on any intervening events that may have affected the program should also be included.

Quite likely, these data will be qualitative in nature and will need to be coded using one of the many qualitative data analysis methods.

Hi everybody–it is time for another TIMELY TOPIC.  This week’s topic is about using pretest/posttest evaluation or a post-then-pre evaluation.

There are many considerations for using these designs.  You have to look at the end result and decide what is most appropriate for your program.  Some of the key considerations include:

  • the length of your program;
  • the information you want to measure;
  • the factors influencing participants response; and
  • available resources.

Before explaining the above four factors, let me urge you to read on this topic.  There are a couple of resources (yes, print…) I want to pass your way.

  1. Campbell, D. T. & Stanley, J. C. (1963).  Experimental and quasi-experimental designs for research.  Houghton Mifflin Company:  Boston, MA.  (The classic book on research and evaluation designs.)
  2. Rockwell, S. K., & Kohn, H. (1989). Post-then-pre evaluation. Journal of Extension [On-line]. 27(2). Available at: http://www.joe.org/joe/1989summer/a5.htm (A seminal JoE paper explaining post-then-pre test.)
  3. Nimon, K. Zigarami, D. & Allen, J. (2011).  Measures of program effectiveness based on retrospective pretest data:  Are all created equal? American Journal of Evaluation, 32, 8 – 28.  (A 2011 publication with an extensive bibliography.)

Let’s talk about considerations.

Length of program.

For pre/post test, you want a program that is long.  More than a day.  Otherwise you risk introducing a desired response bias and the threats to internal validity that  Campbell and Stanley identify.  Specifically the threats called history, maturation, testing, and instrumentation,  also a possible regression to the mean threat, though that is on a possible source of concern.  These threats to internal validity assume no randomization and a one group design, typical for Extension programs and other educational programs.  Post-then-pre works well for short programs, a day or less, and  tend to control for response shift and desired response bias.  There may still be threats to internal validity.

Information you want to measure.

If you want to know a participants specific knowledge, a post-then-pre cannot provide you with that information because you can not test something you cannot unknow.  The traditional pre/post can focus on specific knowledge, e.g., what food is the highest in Vitamin C in a list that includes apricot, tomato, strawberry cantaloupe. (Answer:  strawberry)  If you are wanting agreement/disagreement with general knowledge (e.g., I know what the key components of strategic planning are), the post-pre works well.  Confidence, behaviors, skills, and attitudes can all be easily measured with a post-then-pre.

Factors influencing participants response.

I mentioned threats to internal validity above.  These factors all influence participants responses.  If there is a long time between the pretest and the post test, participants can be affected by history (a tornado prevents attendance to the program); maturation (especially true with programs with children–they grow up); testing (having taken the pretest, the post test scores will be better);  and instrumentation (the person administering the posttest administers it differently than the pretest was administered).  Participants desire to please the program leader/evaluator, called desired response bias, also affects participants response.

Available resources.

Extension programs (as well as many other educational programs) are affected by the availability of resources (time, money, personnel, venue, etc.).  If you only have a certain amount of time, or a certain number of people who can administer the evaluation, or a set amount of money, you will need to consider which approach to evaluation you will use.

The idea is to get usable, meaningful data that accurately reflects the work that went into the program.

We recently held Professional Development Days for the Division of Outreach and Engagement.  This is an annual opportunity for faculty and staff in the Division to build capacity in a variety of topics.  The question this training posed was evaluative:

How do we provide meaningful feedback?

Evaluating a conference or a multi-day, multi-session training is no easy task.  Gathering meaningful data is a challenge.  What can you do?  Before you hold the conference (I’m using the word conference to mean any multi-day, multi-session training), decide on the following:

  • Are you going to evaluate the conference?
  • What is the focus of the evaluation?
  • How are you going to use the results?

The answer to the first question is easy:  YES.  If the conference is an annual event (or a regular event), you will want to have participants’ feedback of their experience, so, yes, you will evaluate the conference. Look at a Penn State Tip Sheet 16 for some suggestions.  (If this is a one time event, you may not; though as an evaluator, I wouldn’t recommend ignoring evaluation.)

The second question is more critical.  I’ve mentioned in previous blogs the need to prioritize your evaluation.  Evaluating a conference can be all consuming and result in useless data UNLESS the evaluation is FOCUSED.  Sit down with the planners and ask them what they expect to happen as a result of the conference.  Ask them if there is one particular aspect of the conference that is new this year.  Ask them if feedback in previous years has given them any ideas about what is important to evaluate this year.

This year, the planners wanted to provide specific feedback to the instructors.  The instructors had asked for feedback in previous years.  This is problematic if planning evaluative activities for individual sessions is not done before the conference.  Nancy Ellen Kiernan, a colleague at Penn State, suggests a qualitative approach called a Listening Post.  This approach will elicit feedback from participants at the time of the conference.  This method involves volunteers who attended the sessions and may take more persons than a survey.  To use the Listening Post, you must plan ahead of time to gather these data.  Otherwise, you will need to do a survey after the conference is over and this raises other problems.

The third question is also very important.  If the results are just given to the supervisor, the likelihood of them being used by individuals for session improvement or by organizers for overall change is slim.  Making the data usable for instructors means summarizing the data in a meaningful way, often visually.  There are several way to visually present survey data including graphs, tables, or charts.  More on that another time.  Words often get lost, especially if words dominate the report.

There is a lot of information in the training and development literature that might also be helpful.  Kirkpatrick has done a lot of work in this area.  I’ve mentioned their work in previous blogs.

There is no one best way to gather feedback from conference participants.  My advice:  KISS–keep it simple and straightforward.

I’ve talked about how each phase of a logic model has evaluative activities.  I’ve probably even alluded to the fact that needs assessment is the evaluative activity for that phase called situation (see the turquoise area on the left end of the image below.)

What I haven’t done is talk about is the why, what,  and how of needs assessment (NA).  I also haven’t talked about the utilization of the findings of a needs assessment–what makes meaning of the needs assessment.

OK.  So why is a NA conducted?  And what is a NA?

Jim Altschuld is my go-to person when it comes to questions about needs assessment.  He recently edited a series of books on the topic.

Although Jim is my go-to person, Belle Ruth Witkin (a colleague, friend, and collaborator of Jim Altschuld) says in the preface to the co-authored volume (Witkin and Altschuld, 1995–see below),  that the most effective way to decide the best way to divide the (often scarce) resources among the demands (read programs) is to conduct a needs assessment when the planning for the use of those resources begins.

Book 1 of the kit discusses an overview.  In that volume, Jim defines what a needs assessment is: “Needs assessment is the process of identifying needs, prioritizing them, making needs-based decisions, allocating resources, and implementing actions in organizations to resolve problems underlying important needs (pg.20).”  Altschuld states that there are many models for assessing needs and provides citations for those models.  I think the most important aspect of this first volume is the presentation of the phased model developed by Belle Ruth Witkin in 1984 and revised by Altschuld and Witkin in their 1995 and 2000 volumes.Those phases are preassessment, assessment, and postassessment.  They divide those three phases into three levels, primary, secondary, and tertiary, each level targeting a different group of stakeholders.  This volume also discusses the why and the how.  Subsequent volumes go into more detail–volume 2 discusses phase 1 (getting started); volume 3 discusses phase II (collecting data); volume 4 discusses analysis and priorities; and volume 5 discusses phase III (taking action).

Laurie Stevahn and Jean A. King are the authors of this volume. In chapter 3, they discuss strategies for the action plan using facilitation procedures that promote positive relationships, develop shared understanding, prioritize decisions, and assess progress.  They warn of interpersonal conflict and caution against roadblocks that impede change efforts.  They also promote the development of evaluation activities at the onset of the NA because that helps ensure the use of the findings.

Needs assessment is a political experience.  Some one (or ones) will feel disenfranchised, loose resources, have programs ended.  These activities create hard feelings and resentments.  These considerations need to be identified and discussed at the beginning of the process.  It is like the elephant and the blind people–everyone has an image of what the creature is, there may or may not be consensus, yet for the NA to be successful, consensus is important.  Without it, the data will sit on someone’s shelf or in someone’s computer.  Not useful.

…that there is a difference between a Likert item and a Likert scale?**

Did you know that a Likert item was developed by Rensis Likert, a psychometrician and an educator? 

And that the item was developed to have the individual respond to the level of agreement or disagreement with a specific phenomenon?

And did you know that most of the studies on Likert items use a five- or seven-points on the item? (Although sometimes a four- or six-point scale is used and that is called a forced-choice approach–because you really want an opinion, not a middle ground, also called a neutral ground.)

And that the choices in an odd-number choice usually include some variation on the following theme, “Strongly disagree”, “Disagree”, “Neither agree or disagree”, “Agree”, “Strongly Agree”?

And if you did, why do you still write scales, and call them Likert, asking for information using a scale that goes from “Not at all” to “A little extent” to “Some extent” to “Great extent?  Responses that are not even remotely equidistant (that is, have equal intervals with respect to the response options) from each other–a key property of a Likert item.

And why aren’t you using a visual analog scale to get at the degree of whatever the phenomenon is being measured instead of an item for which the points on the scale are NOT equidistant? (For more information on a visual analog scale see a brief description here or Dillman’s book.)

I sure hope Rensis Likert isn’t rolling over in his grave (he died in 1981 at the age of 78).

Extension professionals use survey as the primary method for data gathering.  The choice of survey is a defensible one.  However, the format of the survey, the question content, and the question construction must also be defensible.  Even though psychometric properties (including internal consistency, validity, and other statistics) may have been computed, if the basic underlying assumptions are violated, no psychometric properties will compensate for a poorly designed instrument, an instrument that is not defensible.

All Extension professionals who choose to use survey to evaluate their target audiences need to have scale development as a personal competency.  So take it upon yourself to learn about guidelines for scale development (yes, there are books written on the subject!).

<><><><><>

**Likert scale is the SUM of of responses on several Likert items.  A Likert item is just one 4 -, 5-, 6, or 7-point single statement asking for an opinion.

Reference:  Devellis, R. F. (1991).  Scale development:  Theory and applications. Newbury Park: Sage Publications. Note:  there is a newer edition.

Dillman, D. A, Smyth, J. D., & Christian, L. M. (2009).  Internet, mail, and mixed-mode surveys:  The tailored design method. (3rd ed.). Hoboken, NJ: John Wiley& Sons, Inc.

Hi everyone–it is the third week in April and time for a TIMELY TOPIC!  (I was out of town last week.)

Recently, I was asked: Why should I plan my evaluation strategy in the program planning stage? Isn’t it good enough to just ask participants if they are satisfied with the program?

Good question.  This is the usual scenario:  You have something to say to your community.  The topic has research support and is timely.  You think it would make a really good new program (or a revision of a current program).  So you plan the program. 

Do you plan the evaluation at the same time? The keyed response is YES.  The usual response is something like, “Are you kidding?”  No, not kidding.  When you plan your program is the time to plan your evaluation.

Unfortunately, my experience is that many (most) faculty when planning or revising a program fail to think about evaluating that program at the planning stage.  Yet, it is at the planning stage that you can clearly and effectively identify what you think will happen and what will indicate that your program has made a difference. Remember the evaluative question isn’t, “Did the participants like the program?”  The evaluative question is, “What difference did my program make in the lives of your participants–and if possible in the economic, environmental, and social conditions in which they live.” That is the question you need to ask yourself when you plan your program.  It also happens to be the evaluative question for the long term outcomes in a logic model.

If you ask this question before you implement your program, you may find that you can not gather data to answer it.  This allows you to look at what change (or changes) can you measure.  Can you measure changes in behavior?  This answers the question, “What difference did this program make in the way the participants act in the context presented in the program?” Or perhaps,  “What change occurred in what the participants know about the program topic?”  These are the evaluative questions for the short and intermediate term outcomes in a logic model.  (As an a side, there are evaluative questions that can be asked at every stage of a logic model.)

By thinking about and planning for evaluation at the PROGRAM PLANNING STAGE,you avoid an evaluation that gives you data that cannot be used to support your program.  A program you can defend with good evaluation data is a program that has staying power.  You also avoid having to retrofit your evaluation to your program.  Retrofits, though often possible,  may miss important data that could only be gathered by thinking of your outcomes ahead of the implementation.

Years ago (back when we beat on hollow logs), evaluations typically asked questions that measured participant satisfaction.  You probably still want to know if participants are satisfied with your program.  Satisfaction questionnaires may be necessary; they are no longer sufficient.  They do not answer the evaluative question, “What difference did this program make?”