Filed Under (Methodology) by Molly on 31-05-2011

A colleague recently asked, “How many people need to be sampled (interviewed) to make the study valid?”

Interesting question.  Takes me to the sampling books.  And the interview books.

To answer that question, I need to know if you are gathering qualitative or quantitative data; if you are conducting two measurements on one group (like a pretest/posttest or a post-then-pre); if you are conducting one measurement on two groups; or any number of other conditions that affect sampling (like site, population size, phenomenon studied, etc).

So here is the easy answer.

If you are conducting two observations on one group, you will need a minimum of 30 participants with complete responses.

If you are conducting one observation on two groups, you will need at least 30 participants in each group with complete responses.

If you are conducting focus groups, you will need 10 -12 participants in each group and you will need to conduct groups until you reach saturation, that is, the responses are being repeated and you are not getting any new information (some folks say reaching saturation takes 3 – 4 groups).

If you are conducting exploratory qualitative research, you will need…it all depends on your research question.

If you are conducting confirmatory qualitative research, you will need…it all depends on your research question.

If you are conducting individual interviews, you will need…and here my easy answer fails me…so let me tell you some other information that may be helpful.

Dillman has a little chart on page 57 (Figure 3.1) that lists the sample size you will need for various populations sizes and three sizes of margin of error.  For example, if your population is 100 (a good size) and you want a margin of error of 5% (that means that the results will be accurate within + or – 5%, 95% of the time), you will need 49 participants with complete data sets if you think that for a yes/no question, participants will be split 50 yes/50 no (the most conservative assumption that can be made) You will need only 38 participants with complete data sets if you think that the responses will be unevenly split (the usual case).

This chart assumes random selection.  It takes into consideration variation in sample (the greater the variation, the larger the sample size needed).  It assumes maximum heterogeneity on a proportion of the population from which the sample is drawn.

Marshall and Rossman say very clearly, “One cannot study the universe…”  So you need to make selections of sites, samples of times, places, people and things to study.  The how many depends…sometimes it is one person; sometimes it is one organization;  sometimes it is more.  They say one to four respondents for case studies and mixed-methods studies; 10 groups was the average number of focus groups; 16 – 24 months was the average for observational fieldwork;  and one set of interviews they cite involved 92 participants.  It all depends on the research purpose–an unknown culture could be studied with a single in depth case study;  a study of mother’s receptivity of breast-feeding could have a huge sample or a small sample could provide thick description while a large sample would enhance transferability.  Credibility and trustworthiness of the findings must be considered.

The best answer, then, is…it all depends.

Filed Under (Methodology, program evaluation) by Molly on 25-05-2011

I was putting together a reading list for an evaluation capacity building program I’ll be leading come September and was reminded about process evaluation.  Nancy Ellen Kiernan has a one page handout on the topic.  It is a good place to start.  Like everything in evaluation, there is so much more to say.  Let’s see what I can say in 440 words or less.

When I first started doing evaluation (back when we beat on hollow logs), I developed a simple approach (call it a model) so I could talk to stakeholders about what I did and what they wanted done.  I called it the P3 model–Process, Progress, Product.  This is a simple approach that answers the following evaluative questions:

  • How did I do what I did? (Process)
  • Did I do what I did in a timely manner? (Progress)
  • Did I get the outcome I wanted (Product)

It is the “how” question I’m going to talk about today.

Scriven, in the 4th ed of the Evaluation Thesaurus, says that a process evaluation “focuses entirely on the variables between input and output”.  It may include input variables.  Knowing this helps you know what the evaluative question is for the input and output parts of a logic model (remember there are evaluative questions/activities for each part of a logic model).

When considering evaluating a program, process evaluation is not sufficient; it may be necessary and still not be sufficient.  An outcome evaluation must accompany a process evaluation.  Evaluating process components of a program involves looking at internal and external communications (think memos, emails, letters, reports, etc.); interface with stakeholders (think meeting minutes); the formative evaluation system of a program (think participant satisfaction); and infrastructure effectiveness (think administrative patterns, implementation steps, corporate responsiveness; instructor availability, etc.).

Scriven provides these examples that suggest the need for program improvement: “…program’s receptionists are rude to most of a random selection of callers; the telephonists are incompetent; the senior staff is unhelpful to evaluators called in by the program to improve it; workers are ignorant about the reasons for procedures that are intrusive to their work patterns;  or the quality control system lacks the power to call a halt to the process when it discerns an emergency.”  Other examples which demonstrate program success are administrators are transparent about organizational structure; program implementation is inclusive; or participants are encouraged to provide ongoing feedback to program managers.  We could then say that a process evaluation assesses the development and actual implementation of a program to determine whether the program was  implemented as planned and whether expected output was actually produced.

Gathering data regarding the program as actually implemented assists program planners in identifying what worked and what did not. Some of the components included in a process evaluation are descriptions of program environment, program design, and program implementation plan.  Data on any changes to the program or program operations and on any intervening events that may have affected the program should also be included.

Quite likely, these data will be qualitative in nature and will need to be coded using one of the many qualitative data analysis methods.


Hi everybody–it is time for another TIMELY TOPIC.  This week’s topic is about using pretest/posttest evaluation or a post-then-pre evaluation.

There are many considerations for using these designs.  You have to look at the end result and decide what is most appropriate for your program.  Some of the key considerations include:

  • the length of your program;
  • the information you want to measure;
  • the factors influencing participants response; and
  • available resources.

Before explaining the above four factors, let me urge you to read on this topic.  There are a couple of resources (yes, print…) I want to pass your way.

  1. Campbell, D. T. & Stanley, J. C. (1963).  Experimental and quasi-experimental designs for research.  Houghton Mifflin Company:  Boston, MA.  (The classic book on research and evaluation designs.)
  2. Rockwell, S. K., & Kohn, H. (1989). Post-then-pre evaluation. Journal of Extension [On-line]. 27(2). Available at: (A seminal JoE paper explaining post-then-pre test.)
  3. Nimon, K. Zigarami, D. & Allen, J. (2011).  Measures of program effectiveness based on retrospective pretest data:  Are all created equal? American Journal of Evaluation, 32, 8 – 28.  (A 2011 publication with an extensive bibliography.)

Let’s talk about considerations.

Length of program.

For pre/post test, you want a program that is long.  More than a day.  Otherwise you risk introducing a desired response bias and the threats to internal validity that  Campbell and Stanley identify.  Specifically the threats called history, maturation, testing, and instrumentation,  also a possible regression to the mean threat, though that is on a possible source of concern.  These threats to internal validity assume no randomization and a one group design, typical for Extension programs and other educational programs.  Post-then-pre works well for short programs, a day or less, and  tend to control for response shift and desired response bias.  There may still be threats to internal validity.

Information you want to measure.

If you want to know a participants specific knowledge, a post-then-pre cannot provide you with that information because you can not test something you cannot unknow.  The traditional pre/post can focus on specific knowledge, e.g., what food is the highest in Vitamin C in a list that includes apricot, tomato, strawberry cantaloupe. (Answer:  strawberry)  If you are wanting agreement/disagreement with general knowledge (e.g., I know what the key components of strategic planning are), the post-pre works well.  Confidence, behaviors, skills, and attitudes can all be easily measured with a post-then-pre.

Factors influencing participants response.

I mentioned threats to internal validity above.  These factors all influence participants responses.  If there is a long time between the pretest and the post test, participants can be affected by history (a tornado prevents attendance to the program); maturation (especially true with programs with children–they grow up); testing (having taken the pretest, the post test scores will be better);  and instrumentation (the person administering the posttest administers it differently than the pretest was administered).  Participants desire to please the program leader/evaluator, called desired response bias, also affects participants response.

Available resources.

Extension programs (as well as many other educational programs) are affected by the availability of resources (time, money, personnel, venue, etc.).  If you only have a certain amount of time, or a certain number of people who can administer the evaluation, or a set amount of money, you will need to consider which approach to evaluation you will use.

The idea is to get usable, meaningful data that accurately reflects the work that went into the program.

Filed Under (Methodology, program evaluation) by Molly on 12-05-2011

We recently held Professional Development Days for the Division of Outreach and Engagement.  This is an annual opportunity for faculty and staff in the Division to build capacity in a variety of topics.  The question this training posed was evaluative:

How do we provide meaningful feedback?

Evaluating a conference or a multi-day, multi-session training is no easy task.  Gathering meaningful data is a challenge.  What can you do?  Before you hold the conference (I’m using the word conference to mean any multi-day, multi-session training), decide on the following:

  • Are you going to evaluate the conference?
  • What is the focus of the evaluation?
  • How are you going to use the results?

The answer to the first question is easy:  YES.  If the conference is an annual event (or a regular event), you will want to have participants’ feedback of their experience, so, yes, you will evaluate the conference. Look at a Penn State Tip Sheet 16 for some suggestions.  (If this is a one time event, you may not; though as an evaluator, I wouldn’t recommend ignoring evaluation.)

The second question is more critical.  I’ve mentioned in previous blogs the need to prioritize your evaluation.  Evaluating a conference can be all consuming and result in useless data UNLESS the evaluation is FOCUSED.  Sit down with the planners and ask them what they expect to happen as a result of the conference.  Ask them if there is one particular aspect of the conference that is new this year.  Ask them if feedback in previous years has given them any ideas about what is important to evaluate this year.

This year, the planners wanted to provide specific feedback to the instructors.  The instructors had asked for feedback in previous years.  This is problematic if planning evaluative activities for individual sessions is not done before the conference.  Nancy Ellen Kiernan, a colleague at Penn State, suggests a qualitative approach called a Listening Post.  This approach will elicit feedback from participants at the time of the conference.  This method involves volunteers who attended the sessions and may take more persons than a survey.  To use the Listening Post, you must plan ahead of time to gather these data.  Otherwise, you will need to do a survey after the conference is over and this raises other problems.

The third question is also very important.  If the results are just given to the supervisor, the likelihood of them being used by individuals for session improvement or by organizers for overall change is slim.  Making the data usable for instructors means summarizing the data in a meaningful way, often visually.  There are several way to visually present survey data including graphs, tables, or charts.  More on that another time.  Words often get lost, especially if words dominate the report.

There is a lot of information in the training and development literature that might also be helpful.  Kirkpatrick has done a lot of work in this area.  I’ve mentioned their work in previous blogs.

There is no one best way to gather feedback from conference participants.  My advice:  KISS–keep it simple and straightforward.

Filed Under (Methodology, program evaluation) by Molly on 06-05-2011

I’ve talked about how each phase of a logic model has evaluative activities.  I’ve probably even alluded to the fact that needs assessment is the evaluative activity for that phase called situation (see the turquoise area on the left end of the image below.)

What I haven’t done is talk about is the why, what,  and how of needs assessment (NA).  I also haven’t talked about the utilization of the findings of a needs assessment–what makes meaning of the needs assessment.

OK.  So why is a NA conducted?  And what is a NA?

Jim Altschuld is my go-to person when it comes to questions about needs assessment.  He recently edited a series of books on the topic.

Although Jim is my go-to person, Belle Ruth Witkin (a colleague, friend, and collaborator of Jim Altschuld) says in the preface to the co-authored volume (Witkin and Altschuld, 1995–see below),  that the most effective way to decide the best way to divide the (often scarce) resources among the demands (read programs) is to conduct a needs assessment when the planning for the use of those resources begins.

Book 1 of the kit discusses an overview.  In that volume, Jim defines what a needs assessment is: “Needs assessment is the process of identifying needs, prioritizing them, making needs-based decisions, allocating resources, and implementing actions in organizations to resolve problems underlying important needs (pg.20).”  Altschuld states that there are many models for assessing needs and provides citations for those models.  I think the most important aspect of this first volume is the presentation of the phased model developed by Belle Ruth Witkin in 1984 and revised by Altschuld and Witkin in their 1995 and 2000 volumes.Those phases are preassessment, assessment, and postassessment.  They divide those three phases into three levels, primary, secondary, and tertiary, each level targeting a different group of stakeholders.  This volume also discusses the why and the how.  Subsequent volumes go into more detail–volume 2 discusses phase 1 (getting started); volume 3 discusses phase II (collecting data); volume 4 discusses analysis and priorities; and volume 5 discusses phase III (taking action).

Laurie Stevahn and Jean A. King are the authors of this volume. In chapter 3, they discuss strategies for the action plan using facilitation procedures that promote positive relationships, develop shared understanding, prioritize decisions, and assess progress.  They warn of interpersonal conflict and caution against roadblocks that impede change efforts.  They also promote the development of evaluation activities at the onset of the NA because that helps ensure the use of the findings.

Needs assessment is a political experience.  Some one (or ones) will feel disenfranchised, loose resources, have programs ended.  These activities create hard feelings and resentments.  These considerations need to be identified and discussed at the beginning of the process.  It is like the elephant and the blind people–everyone has an image of what the creature is, there may or may not be consensus, yet for the NA to be successful, consensus is important.  Without it, the data will sit on someone’s shelf or in someone’s computer.  Not useful.