I have been writing this blog since December 2009.  That seems like forever from my perspective.

I write.  I post.  I wait.  Nothing.

Oh, occasionally, I receive an email (which is wonderful and welcome) and early on I received a few comments (that was great).


Recently, nothing.

I know it is summer–and in Extension that is the time for fairs and camps and that means every one is busy.

Yet, I know that learning happens all the time and you have some amazing experiences that can teach.  So, my good readers: What evaluation question have you had this week?  Any question related to evaluating what you are doing is welcome.  Let me hear from you.  You can email me (molly.engle@oregonstate.edu) or you can post a comment (see comment link below).

Bias causes problems for the evaluator.

Scriven says that the evaluative use of “bias” means the “…same as ‘prejudice’ ” with its antonyms being objectivity, fairness, impartiality.  Bias causes systematic errors that are likely to affect humans and are often due to the tendency to prejudge issues because of previous experience or perspective.

Why is bias a problem for evaluators and evaluations?

  • It leads to invalidity.
  • It results in lack of reliability.
  • It reduces credibility.
  • It leads to spurious outcomes.

What types of bias are there that can affect evaluations?

  • Shared bias
  • Design bias
  • Selectivity bias
  • Item bias

I’m sure there are others.  Knowing how these affect an evaluation is what I want to talk about.

Shared bias:     Agreement among/between experts may be due to common error; often seen as conflict of interest; also seen in individual relationships.   For example, an external expert asked to provide content validation for a nutrition education program which was developed by the evaluator’s sister-in-law.  The likelihood that they share the same opinion of the program is high.

Design bias: Designing an evaluation to favor (or disfavor) a certain target group in order to support the program being evaluated.  For example, selecting a sample of students enrolled in school on a day when absenteeism is high will result in a design bias against lower socio-economic students because absenteeism is usually higher among lower economic groups.

Selectivity bias: When a sample is inadvertently connected to desired outcomes, the evaluation will be affected.  Similar to the design bias mentioned above.

Item bias: The construction of an individual item on an evaluation scale which adversely affects some subset of the target audience.  For example, a Southwest Native American child who has never seen a brook is presented with an item asking the child to identify a synonym for brook out of a list of words.  This raises the question about the objectivity of the scale as a whole.

Other types of bias that evaluators will experience include desired response bias and response shift bias

Desired response bias occurs when the participant provides an answer that s/he thinks the evaluator wants to hear.  The responses the evaluator solicits are slanted towards what the participant thinks the evaluator wants to know.  It is often found with general positive bias–that is, the tendency to report positive findings when the program doesn’t merit those findings.  General positive bias  is often seen with grade inflation–an average student is awarded a B when the student has actually earned a C grade.

Response shift bias occurs when the participant changes his/her frame of reference from the beginning of the program to the end of the program and then reports an experience that is less than the experience perceived at the beginning of the program.  This results in lower program effectiveness.

A good friend of mine asked me today if I knew of any attributes (which I interpreted to be criteria) of qualitative data (NOT qualitative research).  My friend likened the quest for attributes for qualitative data to the psychometric properties of a measurement instrument–validity and reliability–that could be applied to the data derived from those instruments.

Good question.  How does this relate to program evaluation, you may ask.  That question takes us to an understanding of paradigm.

Paradigm (according to Scriven in Evaluation Thesaurus) is a general concept or model for a discipline that may be influential in shaping the development of that discipline.  They do not (again according to Scriven) define truth; rather they define prima facie truth (i.e., truth on first appearance) which is not the same as truth.  Scriven goes on to say, “…eventually, paradigms are rejected as too far from reality and they are always governed by that possibility[i.e.,that they will be rejected] (page 253).”

So why is it important to understand paradigms.  They frame the inquiry. And evaluators are asking a question, that is, they are inquiring.

How inquiry is framed is based on the components of paradigm:

  • ontology–what is the nature of reality?
  • epistemology–what is the relationship between the known and the knower?
  • methodology–what is done to gain knowledge of reality, i.e., the world?

These beliefs shape how the evaluator sees the world and then guides the evaluator in the use of data, whether those data are derived from records, observations, interviews (i.e., qualitative data) or those data are derived from measurement,  scales,  instruments (i.e., quantitative data).  Each paradigm guides the questions asked and the interpretations brought to the answers to those questions.  This is the importance to evaluation.

Denzin and Lincoln (2005) in their 3rd edition volume of the Handbook of Qualitative Research

list what they call interpretive paradigms. They are described in Chapters 8 – 14 in that volume.  The paradigms are:

  1. Positivist/post positivist
  2. Constructivist
  3. Feminist
  4. Ethnic
  5. Marxist
  6. Cultural studies
  7. Queer theory

They indicate that each of these paradigms have criteria, a form of theory, and have a specific type of narration or report.  If paradigms have criteria, then it makes sense to me that the data derived in the inquiry formed by those paradigms would have criteria.  Certainly, the psychometric properties of validity and reliability (stemming from the positivist paradigm) relate to data, usually quantitative.  It would make sense to me that the parallel, though different, concepts in constructivist paradigm, trustworthiness and credibility,  would apply to data derived from that paradigm–often qualitative.

If that is the case–then evaluators need to be at least knowledgeable about paradigms.

In 1963, Campbell and Stanley (in their classic book, Experimental and Quasi-Experimental Designs for Research), discussed the retrospective pretest.  This is the method where by the participant’s attitude, knowledge, skills, behaviors, etc. existing prior to and after the program are assessed together AFTER the program. A novel approach to capturing what the participant knew, felt, did before they experienced the program.

Does it work?  Yes…and no (according to the folks in the know).

Campbell and Stanley mention the use of the retrospective pretest in measuring attitudes towards Blacks (they use the term Negro) of soldiers who are assigned to racially mixed vs. all white combat infantry units (1947) and to measure housing project occupants attitudes to being in integrated vs. segregated housing units when there was a housing shortage (1951).  Both tests showed no difference between the two groups in remembering prior attitudes towards the idea of interest.  Campbell and Stanley argue that having only posttest measures,  any difference found may have been attributable to selection bias.    They caution readers to “…be careful to note that the probable direction of memory bias is to distort the past…into agreement with (the) present…or has come to believe to be socially desirable…”

This brings up several biases that the Extension professional needs to be concerned with in planning and conducting an evaluation: selection bias, desired response bias, and response shift bias.  All of which can have serious implications for the evaluation.

Those are technical words for several limitations which can affect any evaluation.  Selection bias is the preference to put some participants into one group rather than the other.  Campbell and Stanley call this bias a threat to validity.  Desired response bias occurs when participants try to answer the way they think the evaluator wants them to answer.  Response shift bias happens when participants frame of reference or  understanding changes during the program, often due to misunderstanding  or preconceived ideas.

So these are the potential problems.  Are there any advantages/strengths to using the retrospective pretest?  There are at least two.  First, there is only one administration, at the end of the program.  This is advantageous when the program is short and when participants do not like to fill out forms (that is, minimizes paper burden).  And second, avoids the response-shift bias by not introducing information that may not be understood  prior to the program.

Theodore Lamb (2005) tested the two methods and concluded that the two approaches appeared similar and recommended the retrospective pretest if conducting a pretest/posttest is  difficult or impossible.  He cautions, however, that supplementing the data from the retrospective pretest with other data is necessary to demonstrate the effectiveness of the program.

There is a vast array of information about this evaluation method.  If you would like to know more, let me know.

Last week, I talked about formative and summative evaluations.  Formative and summative  evaluation roles  can help you prioritize what evaluations you do when.  I was then reminded of another approach to viewing evaluation that relates to prioritizing evaluations that might also be useful.

When I first started in this work, I realized that I could view evaluation in three parts–process, progress, product.  Each part could be conducted or the total approach could be used.  This approach provides insights to different aspects of a program.  It can also provide a picture of the whole program.  Deciding on which part to focus is another way to prioritize an evaluation.

Process evaluation captures the HOW of a program.  Process evaluation has been defined as the evaluation that assesses the delivery of the program (Scheirer, 1994).  Process evaluation identifies what the program is and if it is delivered as intended both to the “right audience” and in the “right amount”.  The following questions (according to Scheirer) can guide a process evaluation:

  1. Why is the program expected to produce its results?
  2. For what types of people may it be effective?
  3. In what circumstances may it be effective?
  4. What are the day-to-day aspects of program delivery?

Progress evaluation captures the FIDELITY of a program–that is, did the program do what the planners said would be done in the time allotted? Progress evaluation has been very useful when I have grant activities and need to be accountable for the time-line.

Product evaluation captures a measure of the program’s products or OUTCOMES.  Sometimes outputs are also captured and this is fine.  Just keep in mind that outputs may be (and often are) necessary; they are not sufficient for demonstrating the impact of the program.  A product evaluation is often summative.  However, it can also be formative, especially if the program planners want to gather information to improve the program rather than to determine the ultimate effectiveness of the program.

This framework may be useful in helping Extension professionals decide what to evaluate and when.  It may help determine what program needs a process, progress, or product evaluation.  Trying to evaluate all your program all at once often defeats being purposeful in your evaluation efforts and often leads to results that are confusing, invalid, and/or useless.  It makes sense to choose carefully what evaluation to do when–that is, prioritize.

I had a conversation today about how to measure if I was making a difference in what I do.  Although the conversation was referring to working with differences, I am conscious that the work work I do and the work of working with differences transcends most disciplines and positions.  How does it relate to evaluation?

Perspective and voice.

These are two sides of the same coin.  Individuals come to evaluation with a history or perspective.  Individuals voice their view in the development of evaluation plans.  If individuals are not invited and/or do  not come to the table for the discussion, a voice is missing.

This conversation went on–the message was that voice and perspective are  more important in evaluations which employ a qualitative approach rather than a quantitative approach.  Yes—and no.

Certainly, words have perspective and provide a vehicle for voice.  And words are the basis for qualitative methods.   So this is the “Yes”.   Is this still an issue when the target audience is homogeneous?  Is it still an issue when the evaluator is “different” on some criteria than the target audience.  Or as one mental health worker once stated, only an addict can provide effective therapy to another addict.  Is that really the case?  Or do voice and perspective always over lay an evaluation?

Let’s look at quantitative methods.  Some would argue that numbers aren’t affected by perspective and voice.  I will argue that the basis for these numbers is words.  If words are turned into numbers are voice and perspective still an issue?  This is the “Yes and no”.  
I am reminded of the story of a brook and a Native American child.  The standardized test asked which of the following is similar to a brook.  The possible responses were (for the sake of this conversation) river, meadow, lake, inlet.  The Native American child, growing up in the desert Southwest, had never heard of the word “brook”.  Consequently got the item wrong.  This was one of many questions where perspective affected the response.  Wrong answers were totaled to a number subtracted from the possible total and a score (a number) resulted.  That individual number was grouped with other individual numbers and compared to numbers from another group using a statistical test (for the sake of conversation), a t-test.  Is the resulting statistic of significance valid?  I would say not.  So this is the “No”.  Here the voice and perspective have been obfuscated.

The statistical significance between those groups is clear according to the computation; clear that is  until one looks at the words behind the numbers.  It is in the words behind the numbers that perspective and voice affect the outcomes.

The question was raised recently: From whom am I not hearing?

Hearing from key stakeholders is important.  Having as many perspectives as possible, as time and money will allow, enhances the evaluation.

How often do you only target the recipients of the program in your evaluation, needs assessment, or focus groups?

If only voices heard in planning the evaluation are the program team, what information will you miss? What valuable information is not being communicated?

I was the evaluator on a recovery program for cocaine abusing moms and their children.  The PI was a true academic and had all sorts of standardized measures to use to determine that the program was successful.  The PI had not thought to ask individuals like the recipients of the program what they thought.  When we brought members of the program’s target audience to the table and asked them, after explaining the proposed program, “How will you know that the program has been worked; has been successful?”, their answers did not include the standardized measures proposed by the PI. The evaluation was revised to include their comments and suggestions. Fortunately, this happened early in the planning stages, before the implementation and we were able to capture important information.

Ask yourself, “How can I seek those voices that will capture the key perspectives of this evaluation?” Then figure out a way to include those stakeholders in the evaluation planning. Participatory-evaluation at its best.

tools of the tradeHaving spent the last week reviewing two manuscripts for a journal editor, it became clear to me that writing is an evaluative activity.

How so?

The criteria for good writing is meeting the 5 Cs: Clarity, Coherence, Conciseness, Correctness, and Consistency.

Evaluators write–they write survey questions, summaries of findings, reports, journal manuscripts. If they do not employ the 5 Cs to communicate to a naive audience  what is important, then the value (remember the root for evaluation is value) of their writing is lost, often never to be reclaimed.

In a former life, I taught scientific/professional writing to medical students, residents, junior professors, and other graduate students. I found many sources that were useful and valuable to me. The conclusion to which I came is that taking a scientific/professional (or non-fiction) writing course is an essential tool to have as an evaluator. So I set about collecting useful (and, yes, valuable) resources. I offer them here.strunk and white 4th edstrunk and white 3rd ed

Probably the single resource that every evaluator needs to have on hand is Strunk and White’s slim volume called “The Elements of Style”. It is in the 4th edition–I still use the 3rd. Recently, a 50th anniversary edition was published that is a fancy version of the 4th edition.  Amazon has the 50th anniversary edition as well as the 4th edition–the 3rd ed is out of print.

APA style guideYou also need the style guide (APA, MLA, Biomedical Editors, Chicago) that is used by the journal to which you are submitting your manuscript. Choose one. Stick with it. I have the 6th edition of the APA guide on my desk. It is on line as well.

Access to a dictionary and a thesaurus (now conveniently available on line and through computer software) is essential. I prefer the hard copy Webster’s (I love the feel of books), yet would recommend the on-line version of the Oxford English Dictionary.

There are a number of helpful writing books (in no particular order or preference):

  • Turabian, K. L. (2007).    A manual for writers of research papers, theses, and dissertations. Chicago: The University of Chicago Press.
  • Thyer, B. A. (1994). Successful publishing in scholarly journals. Thousand Oaks, CA: Sage.
  • Berger, A. A. (1993). Improving writing skills. Thousand Oaks, CA: Sage.
  • Silvia, P. J. (2007). How to write a lot. Washington DC: American Psychological Association.
  • Zeiger, M. (1999). Essentials of writing biomedical research papers. NY: McGraw-Hill.

I will share Will Safire’s 17 lighthearted looks at grammar and good usage another day.


I was asked about the need for an evaluation plan to be reviewed by the institutional review board (IRB) office.  In pausing to answer, the atrocities that have occurred and are occurring throughout the world428px-Durer_Revelation_Four_Riders registered once again with me…the Inquisition, the Crusades, Cortes, Auschwitz, Nuremberg trials, Sudan, to name only a few.  Now, although I know there is little or no evaluation in most of these situations, humans were abused in the guise of finding the truth.  (I won’t capitalize truth, although some would argue that Truth was the impetus for these acts.)

So what responsibility DO evaluators have for protecting individuals who participate in the inquiry we call evaluation?  The American Evaluation Association has developed and endorsed for all evaluators a set of Guiding Principals.  There are five principals–Systematic Inquiry, Competence, Integrity/Honesty, Respect for People, and Responsibilities for the General and Public Welfare.  An evaluator must perform the systematic inquiry competently with integrity, respecting the individuals participating and recognizing diversity of police imagespublic interests and values.  This isn’t a mandated code; there are no evaluation police; an evaluator will not be sanctioned if these principals are not followed (the evaluator may not get repeat work, though).  These guiding principals were established to “guide” the evaluator to do the best work possible within the limitations of the job.

IRB imagesThe IRB is there to protect the participant first and foremost; then the investigator and the institution.  So although there is not a direct congruence with the IRB principals of voluntary participation, confidentiality, and minimal risk, to me, evaluators following the guiding principals will be able to assure participants that they will be respected and that the inquiry will be conducted with integrity.  No easy task…and a lot of work.

I think evaluators have a responsibility embedded in the guiding principals to assure individuals participating in evaluations that participants engage voluntarily, that they provide information that will remain confidential, and that what is expected of them involves minimal risk.  Securing IRB approval will assure participants that this is so.

Although two different monitoring systems (one federal; one professional), I think it is important meet both sets of expectations.

It is Wednesday and the sun is shining. oregon susnhine

The thought for today is evaluation use.

A colleague of mine tell me there are four types of evaluation use:

  1. Conceptual use
  2. Instrumental use
  3. Persuasive use
  4. Process use

How are evaluation results used? Seems to me that using evaluation results is the goal–otherwise it what you’ve done really evaluation?   HOW do YOU use your evaluation results?reports 2

Is scholarship a use?

Is reporting a use?

Does use always mean taking action?

Does use always mean “so what”?

Jane Davidson (Davidson Consulting, http://davidsonconsulting.co.nz
Aotearoa New Zealand)  says there are three question that any evaluator needs to ask:

  1. What’s so?
  2. So what?
  3. Now what?

Seems to me that you need the “now what” question  to have evaluation use. What do you think? Post a comment.