Last week I suggested a few evaluation related resolutions…one I didn’t mention which is easily accomplished is reading and/or contributing to AEA365.  AEA365 is a daily evaluation blog sponsored by the American Evaluation Association.  AEA’s Newsletter says: “The aea365 Tip-a-Day Alerts are dedicated to highlighting Hot Tips, Cool Tricks, Rad Resources, and Lessons Learned by and for evaluators (see the aea365 site here). Begun on January 1, 2010, we’re kicking off our second year and hoping to expand the diversity of voices, perspectives, and content shared during the coming year. We’re seeking colleagues to write one-time contributions of 250-400 words from their own experience. No online writing experience is necessary – you simply review examples on the aea356 Tip-a-Day Alerts site, craft your entry according to the contributions guidelines, and send it to Michelle Baron our blog coordinator. She’ll do a final edit and upload. If you have questions, or want to learn more, please review the site and then contact Michelle at (updated December 2011)”

AEA365 is a valuable site.  I commend it to you.

Now the topic for today: Data sources–the why and the why not (or advantages and disadvantages for the source of information).

Ellen Taylor Powell, Evaluation Specialist at UWEX, has a handout that identifies sources of evaluation data.  These sources are existing information, people, and pictorial records and observations. Each source has advantages and disadvantages.

The source for the information below is the United Way publication, Measuring Program Outcomes (p. 86).

1.  Existing information such as Program Records are

  • Available
  • Accessible
  • Known sources and methods  of data collection

Program records can also

  • Be corrupt because of data collection methods
  • Have missing data
  • Omit post intervention impact data

2. Another form of existing information is Other Agency Records

  • Offer a different perspective
  • May contain impact data

Other agency records may also

  • Be corrupt because of data collection methods
  • Have missing data
  • May be unavailable as a data source
  • Have inconsistent time frames
  • Have case identification difficulties

3.  People are often main data source and include Individuals and General Public and

  • Have unique perspective on experience
  • Are an original source of data
  • General public can provide information when individuals are not accessible
  • Can serve geographic areas or specific population segments

Individuals and the general public  may also

  • Introduce a self-report bias
  • Not be accessible
  • Have limited overall experience

4.  Observations and pictorial records include Trained Observers and Mechanical Measurements

  • Can provide information on behavioral skills and practices
  • Supplement self reports
  • Can be easily quantified and standardized

These sources of data also

  • Are only relevant to physical observation
  • Need data collectors who must be reliably trained
  • Often result in inconsistent data with multiple observers
  • Are affected by the accuracy of testing devices
  • Have limited applicability to outcome measurement

One response I got for last week’s query was about on-line survey services.  Are they reliable?  Are they economical?  What are the design limitations?  What are the question format limitations?

Yes.  Depends.  Some.  Not many.

Let me take the easy question first:  Are they economical?

Depends.  Cost of postage for paper survey (both out and back) vs. the time it takes to enter questions in system.  Cost of system vs. length of survey.  These are things to consider.

Because most people have access to email today,  using an on-line survey service is often the easiest and most economical way to distribute an evaluation survey.  Most institutional review boards view an on-line survey like a mail survey and typically grant a waiver of documentation of informed consent.  The consenting document is the entry screen and often an agree to participate question is included on that screen.

Are they valid and reliable?

Yes, but…The old adage “Garbage in, garbage out” applies here.  Like a paper survey, and internet survey is only as good as the survey questions.  Don Dillman, in his third edition “Internet, mail, and mixed-mode surveys” (co-authored with Jolene D.  Smyth and Leah Melani Christian), talks about question development.  Since he wrote the book (literally), I use this resource a lot!

What are the design limitations?

Some limitations apply…Each online survey service is different.  The most common service is Survey Monkey (  The introduction to Survey Monkey says, “Create and publish online surveys in minutes, and view results graphically and in real time.”  The basic account with Survey Monkey is free.  It has limitations (number of questions [10]; limited number of question formats [15]; number of responses [100]). And you can upgrade to the Pro or Unlimited  for a subscription fee ($19.95/mo or $200/annually, respectively).  There are others.  A search using “survey services” returns many options such as Zoomerang or InstantSurvey.

What are the question format limitations?

Not many–both open-ended and closed ended questions can be asked.  Survey Monkey has 15 different formats from which to choose (see below).  I’m sure there may be others; this list covers most formats.

  • Multiple Choice (Only one Answer)
  • Multiple Choice (Multiple Answers)
  • Matrix of Choices (Only one Answer per Row)
  • Matrix of Choices (Multiple Answers per Row)
  • Matrix of Drop-down Menus
  • Rating Scale
  • Single Textbox
  • Multiple Textboxes
  • Comment/Essay Box
  • Numerical Textboxes
  • Demographic Information (US)
  • Demographic Information (International)
  • Date and/or Time
  • Image
  • Descriptive Text

Oregon State University has an in-house service sponsored by the College of Business (BSG–Business Survey Groups).  OSU also has an institutional account with Student Voice, an on-line service designed initially for learning assessment which I have found useful for evaluations.  Check your institution for options available.  For your next evaluation that involves a survey, think electronically.

Rensis Likert was a sociologist at the University of Michigan.  He is credited with developing the Likert scale.

Before I say a few words about the scale and subsequently the item (two different entities), I want to clarify how to say his name:

Likert pronounced (he died in 1981) his name lick-urt (short i), like to lick something.  Most people mispronounce it.  I hope he is resting easy…

Lickert scales and Lickert items are two different things.

A Lickert scale is a multi-item instrument composed of items asking opinions (attitudes) on an agreement-disagreement continuum.  The several items have response levels arranged horizontally.  The response levels are anchored with sequential integers as well as words that assumes equal intervals.  These words–strongly disagree, somewhat disagree, neither agree or disagree, somewhat agree, strongly agree–are symmetrical around a neutral middle point.  Likert always measured attitude by agreement or disagreement. Today the methodology is applied to other domains.

A Lickert item is one of many that has response levels arranged horizontally and anchored with consecutive integers that are more or less evenly spaced, bivalent and symmetrical about a neutral middle.  If it doesn’t have these characteristics, it is not a Lickert item–some authors would say that without these characteristics, the item is not even a Likert-type item.  For example, an item asking how often you do a certain behavior with a scale of  “never,” “sometimes, “average,” “often,” and “very often” would not be a Lickert item.  Some writers would consider it a Likert-type item.  If the middle point “average” is omitted, it would still be considered a Likert-type item.

Referring to ANY ordered category item as Likert-type is a misconception.  Unless it has response levels arranged horizontally, anchored with consecutive integers, anchored with words that connote even spacing, and are bivalent, the item is only an ordered-category item or sometimes a visual analog scale or a semantic differential scale.  More on visual analog scales and semantic differential scales at another time.

In 1963, Campbell and Stanley (in their classic book, Experimental and Quasi-Experimental Designs for Research), discussed the retrospective pretest.  This is the method where by the participant’s attitude, knowledge, skills, behaviors, etc. existing prior to and after the program are assessed together AFTER the program. A novel approach to capturing what the participant knew, felt, did before they experienced the program.

Does it work?  Yes…and no (according to the folks in the know).

Campbell and Stanley mention the use of the retrospective pretest in measuring attitudes towards Blacks (they use the term Negro) of soldiers who are assigned to racially mixed vs. all white combat infantry units (1947) and to measure housing project occupants attitudes to being in integrated vs. segregated housing units when there was a housing shortage (1951).  Both tests showed no difference between the two groups in remembering prior attitudes towards the idea of interest.  Campbell and Stanley argue that having only posttest measures,  any difference found may have been attributable to selection bias.    They caution readers to “…be careful to note that the probable direction of memory bias is to distort the past…into agreement with (the) present…or has come to believe to be socially desirable…”

This brings up several biases that the Extension professional needs to be concerned with in planning and conducting an evaluation: selection bias, desired response bias, and response shift bias.  All of which can have serious implications for the evaluation.

Those are technical words for several limitations which can affect any evaluation.  Selection bias is the preference to put some participants into one group rather than the other.  Campbell and Stanley call this bias a threat to validity.  Desired response bias occurs when participants try to answer the way they think the evaluator wants them to answer.  Response shift bias happens when participants frame of reference or  understanding changes during the program, often due to misunderstanding  or preconceived ideas.

So these are the potential problems.  Are there any advantages/strengths to using the retrospective pretest?  There are at least two.  First, there is only one administration, at the end of the program.  This is advantageous when the program is short and when participants do not like to fill out forms (that is, minimizes paper burden).  And second, avoids the response-shift bias by not introducing information that may not be understood  prior to the program.

Theodore Lamb (2005) tested the two methods and concluded that the two approaches appeared similar and recommended the retrospective pretest if conducting a pretest/posttest is  difficult or impossible.  He cautions, however, that supplementing the data from the retrospective pretest with other data is necessary to demonstrate the effectiveness of the program.

There is a vast array of information about this evaluation method.  If you would like to know more, let me know.

I was reading another evaluation blog (the American Evaluation Association’s blog AEA365) which talked about data base design.  I was reminded that over the years, almost every Extension professional with whom I have worked has asked me the following question: “What do I do with my data now that I have all my surveys back?”

As Leigh Wang points out in her AEA365 comments, “Most training programs and publication venues focus on the research design, data collection, and data analysis phases, but largely leave the database design phase out of the research cycle.”  The questions that this statement raises are:

  1. How do/did you learn what to do with data once you have it?
  2. How do/did you decide to organize it?
  3. What software do/did you use?
  4. How important is it to make the data accessible to colleagues in the same field?

I want to know the answers to those questions.  I have some ideas.  Before I talk about what I do, I want to know what you do.  Email me, or comment on this blog.

The question about interpretation of data arose today.  I was with colleagues and the discussion focused on measures of central tendency and dispersion or variability.  These are important terms and concepts that form the foundation for any data analysis.  It is important to know how they are used.


There are five measures of central tendency.  Those measures of numbers that reflect the tendency of data to cluster around the center of the group.  Two of these (geometric mean and harmonic mean) won’t be discussed here as they are not typically used in the work Extension does.  The three I’m talking about are:

  • Median Symbolized by Md (read M subscript d)
  • Mode Symbolized by Mo (read M subscript o)

The mean is the sum of all the numbers divided by the total number of numbers.  Like this:

The median is the middle number of a sorted list (some folks use the 50% point), like this:

The mode is the most popular number, the number “voted” most frequently, like this:

Sometimes,  all of these measures fall on the same number, like this:          

Sometimes, all of these measures fall on different numbers, like this:


There are four measure of variability, three of which I want to mention today.  The fourth, known as the Mean (average) deviation, is seldom used in Extension work.  They are:

  • Range Symbolized by R
  • Variance Symbolized by V
  • Standard deviation Symbolized by s or SD (for sample) and σ, the lower case Greek letter sigma (for standard deviation of a population).

The range is the difference between the largest and the smallest number in the sample, like this  

In this example, the blue distribution (distribution A) has a larger range than the red distribution (Distribution B).

Variance is more technical.  It is the sum of squares of the deviations (difference from the mean) about the mean minus 1. Subtracting one removes the bias from the calculation and that allows for a more conservative estimate and being more conservative reduces possible error.

There is a mathematical formula for computing the variance.  Fortunately, a computer software program like SPSS or SAS will do it for you.

The standard deviation results when the square root is taken of the variance.  It gives us an indication of  “…how much each score in a set of scores, on average, varies from the mean” (Salkind, 2004, p. 41).  Again, there is a mathematical formula that is computed by a software package.  Most people are familiar with the mean and standard deviation of IQ scores: mean=100 and sd = plus or minus 20.

Convention has it that the lower case Greek letters are used for parameters of populations and Roman letters to represent  corresponding  estimates of samples.  So you would see σ for standard deviation (lower case sigma) and μ for mean (lower case mu) for populations and s (or sd for standard deviation) and for samples.

These statistics relate to the measurement scale you have chosen to use.  Permissible statistics for a nominal scale are frequency and mode; for ordinal scale, median and percentiles; for an interval scale, mean, variance, standard deviation,  and Pearson correlation; and for a ratio scale, the geometric mean.  So think seriously about reporting a mean for your Likert-type scale.  What exactly does that tell you?

How many time have you been interviewed?

How many times have you conducted an interview?

Did you notice any similarities?  Probably.

My friend and colleague, Ellen Taylor-Powell has defined interviews as a method for collecting information by talking with and listening to people–a conversation if you will.  These conversations traditionally happen over the phone or face to face–with social media, they could also happen via chat, IM, or some other technology-based approach. A resource I have found  useful is the Evaluation Cookbook.

Interviews can be structured (not unlike a survey with discrete responses) or unstructured (not unlike a conversation).  You might also hear interviews consisting of closed-ended questions and open-ended questions.

Perhaps the most common place for interviews is in the hiring process (seen in personnel evaluation).

Another place for the use of interviews is in the performance review process (seen in performance evaluation).

Unless the evaluator conducting personnel/performance  evaluations,  the most common place for interviews to occur when survey methodology is employed.

Dillman (I’ve mentioned him in previous posts) has sections in his second (pg. 140 – 148) and third (pg. 311-314) editions that talk about the use of interviews in survey construction.  He makes a point in his third edition that I think is important for evaluators to remember and that is the issue of social desirability bias (pg. 313).  Social desirability bias is the possibility that the respondent would answer with what s/he thinks the person asking the questions would want/hope to hear.  Dillman  goes on to say, “Because of the interaction with another person, interview surveys are more likely to produce socially desirable answers for sensitive questions, particularly for questions about potentially embarrassing behavior…”

Expect social desirability response bias with interviewing (and expect differences in social desirability when part of the interview is self-report and part is face-to-face).  Social desirability responses could (and probably will) occur when questions do not appear particularly sensitive to the interviewer; the respondent may have a different cultural perspective which increases sensitivity.  That same cultural difference could also manifest in increased agreement with interview questions often called acquiescence.

Interviews take time; cost more; and often yield a lot of data which may be difficult to analyze.  Sometimes, as with a pilot program, interviews are worth it.  Interviews can be used for formative and summative  evaluations.  Consider if interviews are the best source of evaluation data for the program in question.

What do you really want to know? What would be interesting to know? What can you forget about?Thought_Bubble_1

When you sit down to write survey questions, keep these questions in mind.

  • What do you really want to know?

You are doing an evaluation of the impact of your program. This particular project is peripherally related to two other projects you do. You think, “I could capture all projects with just a few more questions.” You are tempted to lump them all together. DON’T.

Keep your survey focused. Keep your questions specific. Keep it simple.

  • What would be interesting to know?

survey images 4There are many times where I’ve heard investigators and evaluators say something like, “That would be really interesting to see if abc or qrs happens.” Do you really need to know this?  Probably not.  Interesting is not a compelling reason to include a question. So–DON’T ASK.

I always ask the principal investigator, “Is this information necessary or just nice to know?  Do you want to report that finding? Will the answer to that question REALLY add to the measure of impact you are so arduously trying to capture? If the answer is probably not, DON’T ASK.

Keep your survey focused. Keep your questions specific. Keep it simple.

  • What can you forget about?

Do you really want to know the marital status of your participants? Or if possible participants are volunteers in some other program, school teachers,  and students all at the same time? My guess is that this will not affect the overall outcome of the project, nor its impact. If not, FORGET IT!

Keep your survey focused. Keep your questions specific. Keep it simple.survey image

The question was raised about writing survey questions.

Dillman's book

My short answer is Don Dillman’s book, Internet, Mail, and Mixed-Mode Surveys: The Tailored Design Method is your best source. It is available from the publisher John Wiley. Or from Amazon. Chapter 4 in the book, “The basics of crafting good questions”, helps to focus your thinking. Dillman (and his co-authors Jolene D. Smyth and Leah Melani Christian) make it clear that how the questions are constructed raise several methodological questions and not attending to those questions can affect on how the question performs.

One (among several) consideration that Dillman et al, suggest to be considered every time is:

  • The order of the questions (Chapter 6 has a section on ordering questions).survey image 3

I only touching briefly on the order of questions using Dillman et al’s guidelines. There are 22 guidelines in Chapter 6, “From Questions to a Questionnaire”; of those 22, five refer to ordering the questions. They are:

  1. ” Group related questions that cover similar topics, and begin with questions likely to be salient to nearly all respondents” (pg. 157).  Doing this closely approximates a conversation, the goal in questionnaire development.
  2. “Choose the first question carefully” (pg. 158). The first question is the one which will “hook” respondents into answering the survey.
  3. “Place sensitive or potentially objectionable questions near the end of the questionnaire” (pg. 159). This placement increases the likelihood that respondents will be engaged in the questionnaire and will, therefore, answer sensitive questions.
  4. “Ask questions about events in the order the events occurred” (pg. 159). Ordering the questions most distant to most recent  occurrence, least important to most important activity, presents a logical flow to the respondent.
  5. “Avoid unintended question order effects” (pg. 160). Keep in mind that questions do not stand alone, that respondents may use previous questions as a foundation for the following questions. This can create an answer bias.

When constructing surveys, remember to always have other people read your questions–especially people similar to and different from your target audience.

More on survey question development later.

Hi–Today is Wednesday, not Tuesday…I’m still learning about this technology. Thanks for staying with me.

Today I want to talk about a check list called  The Fantastic Five. (Thank you Amy Germuth for bringing this to my attention.)  This checklist  presents five questions against which to judge any/all survey questions you have written. survey image 3

The five questions are:

1. Can the question be consistently understood?

2. Does the question communicate what constitutes a good answer?

3. Do all respondents have access to the information needed to answer the question?

4. Is the question one which all respondents will be willing to answer?

5. Can the question be consistently communicated to respondents?

“Sure,” you say “…all my survey questions do that.”  Do they really? SurveyQns

Let me explain.

1. When you ask about an experience, think about the focus of the question. Is it specific for what you want? I used the question,  “When were you first elected to office?” and got all manner of answers. I wanted the year elected.  I got months (7 months ago), years (both “2 years ago” and  “in 2006”)), words (at the last election). A more specific question would have been “In what year were you first elected to office?”

2. When I asked the election question above, I did not communicate what constitutes a good answer. I wanted a specific year so that I could calculate how long the respondent had been an elected official. Fortunately, I found this out in a pilot test, so the final question gave me answers I wanted.

3. If you are looking for information that is located on supplemental documents (for example, tax forms), let the respondent know that these documents will be needed to answer your survey. Respondents will guess without having the supporting documentation ready, reducing the reliability of your data.

4. Often, we must ask questions that are of a sensitive nature, which could be seen as a violation of privacy. Using Amy’s example, “Have you ever been tested for HIV?” involves a sensitive subject.  Many people will not want to answer that question because of the implications.  Asking instead, “Have you donated blood in the last 15 years?” gets you the same information without violating privacy.  Red Cross began testing blood for HIV in 1985.

5. This point is especially important with mixed mode surveys (paper and interview for example). Again, being as specific as possible is the key. When asking an open ended question, make sure that the options included cover what you want to know. Also, make sure that the method of administration doesn’t affect the answer you want.

This post was based on a presentation by Amy Germuth, President of EvalWorks, LLC at a Coffee Break webinar sponsored by the American Evaluation Association. The full presentation can be found at:

Dillman's bookThe best source I’ve found for survey development is Don Dillman’s 3rd edition of “Internet, mail, and mixed-mode surveys: The tailored design method” published in 2009 and available from the publisher (Wiley).