Bias causes problems for the evaluator.

Scriven says that the evaluative use of “bias” means the “…same as ‘prejudice’ ” with its antonyms being objectivity, fairness, impartiality.  Bias causes systematic errors that are likely to affect humans and are often due to the tendency to prejudge issues because of previous experience or perspective.

Why is bias a problem for evaluators and evaluations?

  • It leads to invalidity.
  • It results in lack of reliability.
  • It reduces credibility.
  • It leads to spurious outcomes.

What types of bias are there that can affect evaluations?

  • Shared bias
  • Design bias
  • Selectivity bias
  • Item bias

I’m sure there are others.  Knowing how these affect an evaluation is what I want to talk about.

Shared bias:     Agreement among/between experts may be due to common error; often seen as conflict of interest; also seen in individual relationships.   For example, an external expert asked to provide content validation for a nutrition education program which was developed by the evaluator’s sister-in-law.  The likelihood that they share the same opinion of the program is high.

Design bias: Designing an evaluation to favor (or disfavor) a certain target group in order to support the program being evaluated.  For example, selecting a sample of students enrolled in school on a day when absenteeism is high will result in a design bias against lower socio-economic students because absenteeism is usually higher among lower economic groups.

Selectivity bias: When a sample is inadvertently connected to desired outcomes, the evaluation will be affected.  Similar to the design bias mentioned above.

Item bias: The construction of an individual item on an evaluation scale which adversely affects some subset of the target audience.  For example, a Southwest Native American child who has never seen a brook is presented with an item asking the child to identify a synonym for brook out of a list of words.  This raises the question about the objectivity of the scale as a whole.

Other types of bias that evaluators will experience include desired response bias and response shift bias

Desired response bias occurs when the participant provides an answer that s/he thinks the evaluator wants to hear.  The responses the evaluator solicits are slanted towards what the participant thinks the evaluator wants to know.  It is often found with general positive bias–that is, the tendency to report positive findings when the program doesn’t merit those findings.  General positive bias  is often seen with grade inflation–an average student is awarded a B when the student has actually earned a C grade.

Response shift bias occurs when the participant changes his/her frame of reference from the beginning of the program to the end of the program and then reports an experience that is less than the experience perceived at the beginning of the program.  This results in lower program effectiveness.

A good friend of mine asked me today if I knew of any attributes (which I interpreted to be criteria) of qualitative data (NOT qualitative research).  My friend likened the quest for attributes for qualitative data to the psychometric properties of a measurement instrument–validity and reliability–that could be applied to the data derived from those instruments.

Good question.  How does this relate to program evaluation, you may ask.  That question takes us to an understanding of paradigm.

Paradigm (according to Scriven in Evaluation Thesaurus) is a general concept or model for a discipline that may be influential in shaping the development of that discipline.  They do not (again according to Scriven) define truth; rather they define prima facie truth (i.e., truth on first appearance) which is not the same as truth.  Scriven goes on to say, “…eventually, paradigms are rejected as too far from reality and they are always governed by that possibility[i.e.,that they will be rejected] (page 253).”

So why is it important to understand paradigms.  They frame the inquiry. And evaluators are asking a question, that is, they are inquiring.

How inquiry is framed is based on the components of paradigm:

  • ontology–what is the nature of reality?
  • epistemology–what is the relationship between the known and the knower?
  • methodology–what is done to gain knowledge of reality, i.e., the world?

These beliefs shape how the evaluator sees the world and then guides the evaluator in the use of data, whether those data are derived from records, observations, interviews (i.e., qualitative data) or those data are derived from measurement,  scales,  instruments (i.e., quantitative data).  Each paradigm guides the questions asked and the interpretations brought to the answers to those questions.  This is the importance to evaluation.

Denzin and Lincoln (2005) in their 3rd edition volume of the Handbook of Qualitative Research

list what they call interpretive paradigms. They are described in Chapters 8 – 14 in that volume.  The paradigms are:

  1. Positivist/post positivist
  2. Constructivist
  3. Feminist
  4. Ethnic
  5. Marxist
  6. Cultural studies
  7. Queer theory

They indicate that each of these paradigms have criteria, a form of theory, and have a specific type of narration or report.  If paradigms have criteria, then it makes sense to me that the data derived in the inquiry formed by those paradigms would have criteria.  Certainly, the psychometric properties of validity and reliability (stemming from the positivist paradigm) relate to data, usually quantitative.  It would make sense to me that the parallel, though different, concepts in constructivist paradigm, trustworthiness and credibility,  would apply to data derived from that paradigm–often qualitative.

If that is the case–then evaluators need to be at least knowledgeable about paradigms.

In 1963, Campbell and Stanley (in their classic book, Experimental and Quasi-Experimental Designs for Research), discussed the retrospective pretest.  This is the method where by the participant’s attitude, knowledge, skills, behaviors, etc. existing prior to and after the program are assessed together AFTER the program. A novel approach to capturing what the participant knew, felt, did before they experienced the program.

Does it work?  Yes…and no (according to the folks in the know).

Campbell and Stanley mention the use of the retrospective pretest in measuring attitudes towards Blacks (they use the term Negro) of soldiers who are assigned to racially mixed vs. all white combat infantry units (1947) and to measure housing project occupants attitudes to being in integrated vs. segregated housing units when there was a housing shortage (1951).  Both tests showed no difference between the two groups in remembering prior attitudes towards the idea of interest.  Campbell and Stanley argue that having only posttest measures,  any difference found may have been attributable to selection bias.    They caution readers to “…be careful to note that the probable direction of memory bias is to distort the past…into agreement with (the) present…or has come to believe to be socially desirable…”

This brings up several biases that the Extension professional needs to be concerned with in planning and conducting an evaluation: selection bias, desired response bias, and response shift bias.  All of which can have serious implications for the evaluation.

Those are technical words for several limitations which can affect any evaluation.  Selection bias is the preference to put some participants into one group rather than the other.  Campbell and Stanley call this bias a threat to validity.  Desired response bias occurs when participants try to answer the way they think the evaluator wants them to answer.  Response shift bias happens when participants frame of reference or  understanding changes during the program, often due to misunderstanding  or preconceived ideas.

So these are the potential problems.  Are there any advantages/strengths to using the retrospective pretest?  There are at least two.  First, there is only one administration, at the end of the program.  This is advantageous when the program is short and when participants do not like to fill out forms (that is, minimizes paper burden).  And second, avoids the response-shift bias by not introducing information that may not be understood  prior to the program.

Theodore Lamb (2005) tested the two methods and concluded that the two approaches appeared similar and recommended the retrospective pretest if conducting a pretest/posttest is  difficult or impossible.  He cautions, however, that supplementing the data from the retrospective pretest with other data is necessary to demonstrate the effectiveness of the program.

There is a vast array of information about this evaluation method.  If you would like to know more, let me know.

I was reading another evaluation blog (the American Evaluation Association’s blog AEA365) which talked about data base design.  I was reminded that over the years, almost every Extension professional with whom I have worked has asked me the following question: “What do I do with my data now that I have all my surveys back?”

As Leigh Wang points out in her AEA365 comments, “Most training programs and publication venues focus on the research design, data collection, and data analysis phases, but largely leave the database design phase out of the research cycle.”  The questions that this statement raises are:

  1. How do/did you learn what to do with data once you have it?
  2. How do/did you decide to organize it?
  3. What software do/did you use?
  4. How important is it to make the data accessible to colleagues in the same field?

I want to know the answers to those questions.  I have some ideas.  Before I talk about what I do, I want to know what you do.  Email me, or comment on this blog.