A faculty member asked me how does one determine impact from qualitative data.  And in my mail box today was a publication from Sage Publishers inviting me to “explore these new and best selling qualitative methods titles from Sage.”

Many Extension professionals are leery of gathering data using qualitative methods.  “There is just too much data to make sense of it,” is one complaint I often hear.  Yes, one characteristic of qualitative data is the rich detail that usually results. (Of course is you are only asking closed ended questions resulting in Yes/No, the richness is missing.)  Other complaints include “What do I do with the data?” “How do I draw conclusions?”  “How do I report the findings?”  And as a result, many Extension professionals default to what is familiar–a survey.  Surveys, as we have discussed previously, are easy to code, easy to report (frequencies and percentages), and difficult to write well.

The Sage brochure provides resources to answer some of these questions.

Michael Patton’s 3rd edition of Qualitative Research and Evaluation Methods “…contains hundreds of examples and stories illuminating all aspects of qualitative inquiry…it offers strategies for enhancing quality and credibility of qualitative findings…and providing detailed analytical guidelines.”  Michael is the keynote speaker for the Oregon Program Evaluator Network (OPEN) fall conference where he will be talking about his new book, Developmental Evaluation. If you are in Portland, I encourage you to attend.  (For more information, see:

http://www.oregoneval.org/program/

Another reference I just purchased is Bernard and Ryan’ s volume, Analyzing Qualitative Data. This book is a systematic approach to making sense out of words. It, too, is available from Sage.

What does all this have to do with a analyzing a conversation?  A conversation is qualitative data.  It is made up of words.  Knowing what to do with those words will provide evaluation data that is powerful.  My director is forever saying the story is what legislators want to hear.  Stories are qualitative data.

One of the most common forms of conversation that Extension professionals use is focus groups.  It is a guided, structured, and focused conversation.  It can yield a wealth of information if the questions are well crafted, if those questions have been piloted tested, and the data are analyzed in a meaningful way.  There are numerous ways to analyze qualitative data (cultural domain analysis, KWIC analysis, discourse analysis, narrative analysis, grounded theory, content analysis, schema analysis, analytic induction and qualitative comparative analysis, and ethnographic decision models) all of which are discussed in the above mentioned reference.  Deciding which will best work with the gathered qualitative data is a decision only the principal investigator can make.  Comfort and experience will enter into that decision.  Keep in mind qualitative data can be reduced to numbers; numbers cannot be exploded to capture the words from which they came.

One response I got for last week’s query was about on-line survey services.  Are they reliable?  Are they economical?  What are the design limitations?  What are the question format limitations?

Yes.  Depends.  Some.  Not many.

Let me take the easy question first:  Are they economical?

Depends.  Cost of postage for paper survey (both out and back) vs. the time it takes to enter questions in system.  Cost of system vs. length of survey.  These are things to consider.

Because most people have access to email today,  using an on-line survey service is often the easiest and most economical way to distribute an evaluation survey.  Most institutional review boards view an on-line survey like a mail survey and typically grant a waiver of documentation of informed consent.  The consenting document is the entry screen and often an agree to participate question is included on that screen.

Are they valid and reliable?

Yes, but…The old adage “Garbage in, garbage out” applies here.  Like a paper survey, and internet survey is only as good as the survey questions.  Don Dillman, in his third edition “Internet, mail, and mixed-mode surveys” (co-authored with Jolene D.  Smyth and Leah Melani Christian), talks about question development.  Since he wrote the book (literally), I use this resource a lot!

What are the design limitations?

Some limitations apply…Each online survey service is different.  The most common service is Survey Monkey (www.surveymonkey.com).  The introduction to Survey Monkey says, “Create and publish online surveys in minutes, and view results graphically and in real time.”  The basic account with Survey Monkey is free.  It has limitations (number of questions [10]; limited number of question formats [15]; number of responses [100]). And you can upgrade to the Pro or Unlimited  for a subscription fee ($19.95/mo or $200/annually, respectively).  There are others.  A search using “survey services” returns many options such as Zoomerang or InstantSurvey.

What are the question format limitations?

Not many–both open-ended and closed ended questions can be asked.  Survey Monkey has 15 different formats from which to choose (see below).  I’m sure there may be others; this list covers most formats.

  • Multiple Choice (Only one Answer)
  • Multiple Choice (Multiple Answers)
  • Matrix of Choices (Only one Answer per Row)
  • Matrix of Choices (Multiple Answers per Row)
  • Matrix of Drop-down Menus
  • Rating Scale
  • Single Textbox
  • Multiple Textboxes
  • Comment/Essay Box
  • Numerical Textboxes
  • Demographic Information (US)
  • Demographic Information (International)
  • Date and/or Time
  • Image
  • Descriptive Text

Oregon State University has an in-house service sponsored by the College of Business (BSG–Business Survey Groups).  OSU also has an institutional account with Student Voice, an on-line service designed initially for learning assessment which I have found useful for evaluations.  Check your institution for options available.  For your next evaluation that involves a survey, think electronically.

A good friend of mine asked me today if I knew of any attributes (which I interpreted to be criteria) of qualitative data (NOT qualitative research).  My friend likened the quest for attributes for qualitative data to the psychometric properties of a measurement instrument–validity and reliability–that could be applied to the data derived from those instruments.

Good question.  How does this relate to program evaluation, you may ask.  That question takes us to an understanding of paradigm.

Paradigm (according to Scriven in Evaluation Thesaurus) is a general concept or model for a discipline that may be influential in shaping the development of that discipline.  They do not (again according to Scriven) define truth; rather they define prima facie truth (i.e., truth on first appearance) which is not the same as truth.  Scriven goes on to say, “…eventually, paradigms are rejected as too far from reality and they are always governed by that possibility[i.e.,that they will be rejected] (page 253).”

So why is it important to understand paradigms.  They frame the inquiry. And evaluators are asking a question, that is, they are inquiring.

How inquiry is framed is based on the components of paradigm:

  • ontology–what is the nature of reality?
  • epistemology–what is the relationship between the known and the knower?
  • methodology–what is done to gain knowledge of reality, i.e., the world?

These beliefs shape how the evaluator sees the world and then guides the evaluator in the use of data, whether those data are derived from records, observations, interviews (i.e., qualitative data) or those data are derived from measurement,  scales,  instruments (i.e., quantitative data).  Each paradigm guides the questions asked and the interpretations brought to the answers to those questions.  This is the importance to evaluation.

Denzin and Lincoln (2005) in their 3rd edition volume of the Handbook of Qualitative Research

list what they call interpretive paradigms. They are described in Chapters 8 – 14 in that volume.  The paradigms are:

  1. Positivist/post positivist
  2. Constructivist
  3. Feminist
  4. Ethnic
  5. Marxist
  6. Cultural studies
  7. Queer theory

They indicate that each of these paradigms have criteria, a form of theory, and have a specific type of narration or report.  If paradigms have criteria, then it makes sense to me that the data derived in the inquiry formed by those paradigms would have criteria.  Certainly, the psychometric properties of validity and reliability (stemming from the positivist paradigm) relate to data, usually quantitative.  It would make sense to me that the parallel, though different, concepts in constructivist paradigm, trustworthiness and credibility,  would apply to data derived from that paradigm–often qualitative.

If that is the case–then evaluators need to be at least knowledgeable about paradigms.

Having addressed the question about which measurement scale was used (“Statistics, not the dragon you think”), I want to talk about how many groups are being included in the evaluation and how those groups are determined.

The first part of that question is easy–there will be either one, two, or more than two groups.  Most of what Extension does results in one group, often an intact group.  An intact group is called a population and consists of all the participants in the program.  All program participants can be a very large number or a very small number.

The Tree School program is an example that has resulted in a very large number of participants (hundreds) .  It is a program that has been in existence for about 20 years.  Contacting all of these participants would be inefficient.  On the other hand, the 4H science teacher training program involved a  small number participants (about 75) and has been in existence for 5 years. Contacting all participants would be efficient.

With a large population, choosing a part of the bigger group is the best approach.  The part chosen is called a sample and is only a part of a population.  Identifying a part of the population starts with the contact list of participants.  The contact list is called the sampling frame.  It is the basis for determining the sample.

Identifying who will be included in the evaluation is called a sampling plan or a sampling approach.  There are two types of sampling approaches–probability sampling and nonprobability sampling.  Probability sampling methods are those which assure that the sample represents the population from which it is drawn.  Nonprobability sampling methods are those which are based on characteristics of the population.  Including all participants works well for a population with less than 100 participants.  If there are over 100 participants, choosing a subset of the sampling frame will be more efficient and effective.  There are several ways to select a sample and reduce the population to a manageable number of participants.  Probability sampling approaches include:

  • simple random sampling
  • stratified random sampling
  • systematic sampling
  • cluster sampling

Nonprobability sampling approaches include:

  • convenience sampling
  • snowball sampling
  • quota sampling
  • focus groups

More on these sampling approaches later.

I had a conversation today about how to measure if I was making a difference in what I do.  Although the conversation was referring to working with differences, I am conscious that the work work I do and the work of working with differences transcends most disciplines and positions.  How does it relate to evaluation?

Perspective and voice.

These are two sides of the same coin.  Individuals come to evaluation with a history or perspective.  Individuals voice their view in the development of evaluation plans.  If individuals are not invited and/or do  not come to the table for the discussion, a voice is missing.

This conversation went on–the message was that voice and perspective are  more important in evaluations which employ a qualitative approach rather than a quantitative approach.  Yes—and no.

Certainly, words have perspective and provide a vehicle for voice.  And words are the basis for qualitative methods.   So this is the “Yes”.   Is this still an issue when the target audience is homogeneous?  Is it still an issue when the evaluator is “different” on some criteria than the target audience.  Or as one mental health worker once stated, only an addict can provide effective therapy to another addict.  Is that really the case?  Or do voice and perspective always over lay an evaluation?

Let’s look at quantitative methods.  Some would argue that numbers aren’t affected by perspective and voice.  I will argue that the basis for these numbers is words.  If words are turned into numbers are voice and perspective still an issue?  This is the “Yes and no”.  
I am reminded of the story of a brook and a Native American child.  The standardized test asked which of the following is similar to a brook.  The possible responses were (for the sake of this conversation) river, meadow, lake, inlet.  The Native American child, growing up in the desert Southwest, had never heard of the word “brook”.  Consequently got the item wrong.  This was one of many questions where perspective affected the response.  Wrong answers were totaled to a number subtracted from the possible total and a score (a number) resulted.  That individual number was grouped with other individual numbers and compared to numbers from another group using a statistical test (for the sake of conversation), a t-test.  Is the resulting statistic of significance valid?  I would say not.  So this is the “No”.  Here the voice and perspective have been obfuscated.

The statistical significance between those groups is clear according to the computation; clear that is  until one looks at the words behind the numbers.  It is in the words behind the numbers that perspective and voice affect the outcomes.

Statistics are not the dragon you think it is.

For many people, the field of statistics is a dragon in disguise and like dragons, most people shy away from statistics.

I have found that Neil Salkind’s book “Statistics for People Who (Think They) Hate Statistics” a good reference for understanding the basics of statistics.  The 4th edition is due out in September 2010.  This book  isn’t intimidating; it is easy to understand; it isn’t heavy on the math or formulas; it has a lot of tips.   I’m using it for this column.  I keep it on my desk along with Dillman.

Faculty who come to me with questions about analyzing their data typically want to know how to determine statistical significance.  But before I can talk to faculty about statistical significance, there are a few questions that need to be answered.

  • What type of measurement scale have you used?
  • How many groups do you have on which you have data?
  • How many variables do you have for those groups?
  • Are you examining relationships or differences?
  • What question(s) you want to answer?

Most people immediately jump to what test to use.  Don’t go there.  Start with what measurement scale do you have.  Then answer the other questions.

So let’s talk about scales of measurement.  All data are not created equally.  Some data are easier to analyze than other data.  Scale of measurement makes that difference.

There are four scales of measurement and most data fall into one of these four. They are either categorical (even if they have been converted to numbers) or numerical (originally numbers).  They are:

  • nominal
  • ordinal
  • interval
  • ratio

Scales of measurement are rules determining the particular levels at which outcomes are measured.  When you decide on an answer to a question, you are deciding on the scale of measurement, you are agreeing to the particular set of characteristics for that measurement.

Nominal scales name something. For example–gender is either male, female, or unknown/not stated; ethnicity is one of several names of groups.  When you gather demographic data, such as gender, ethnicity, or race, you are employing a nominal scale.  The data that result from nominal scales are categorical data–that is data resulting from categories which are mutually exclusive from each other.  The respondent is either male or female, not both.

Ordinal scale orders something; it puts the thing being measured in order–high to low; low to high.  Salkind gives the example of ranking candidates for a job.  Extension professionals (and many/most survey professionals) use ordinal scales in surveys (strongly agree to strongly disagree; don’t like to like a lot).  We do not know how much difference is between don’t like and likes a lot.  The data that result from ordinal scales are categorical data.

Interval scale is based on a continuum of equally spaced intervals along that continuum. Think of a thermometer; test score; weight.  We know that the intervals along the scale are equal to one another.  The data that result from interval scales are numerical data.

Ratio scale is a scale with absolute zero or a situation where the characteristic of interest is absent–like zero light or no molecular movement.  This rarely happens social or behavioral science, the work that most Extension Professionals do.  The data that result from ratio data are numerical data.

Why do we care?

  • Scales are ordered from the least precise (nominal)  to the most precise (ratio).
  • The scale used determines the detail provided by the data collected; more precision, more information.
  • The more precise scale is a scale which contains all the qualities of less precise scales (interval has the qualities of ordinal and nominal).

Using an inappropriate scale will invalidate your data and provide you with spurious outcomes which yield spurious impacts.

A colleague of mine trying to explain observation to a student said, “Count the number of legs you see on the playground and divide by two. You have observed the number of students on the playground.” That is certainly one one way to look at the topic.

I’d like to be a bit more precise that that, though.  Observation is collecting information through the use of the senses–seeing, hearing, tasting, smelling, feeling.  To gather observations, the evaluator must have a clearly specified protocol–a step-by-step approach to what data are to be collected and how. The evaluator typically gets the first exposure to collecting information by observation at a very young age–learning to talk (hearing); learning to feed oneself (feeling); I’m sure you can think of other examples.  When the evaluator starts school and studies science,   when the teacher asks the student to “OBSERVE” the phenomenon and record what is seen, the evaluator is exposed to another approach to the method of observation.

As the process becomes more sophisticated, all manner of instruments may assist the evaluator–thermometers, chronometers, GIS, etc. And for that process to be able to be replicated (for validity), the steps become more and more precise.

Does that mean that looking at the playground and counting the legs and dividing by two has no place? Those who decry data manipulation would say agree that this form of observation yields information of questionable usefulness.   Those who approach observation as an unstructured activity would disagree and say that exploratory observation could result in an emerging premise.

You will see observation as the basis for ethnographic inquiry.  David Fetterman has a small volume (Ethnography: Step by step) published by Sage that explains how ethnography is used in field work.  Take simple ethnography a step up and one can read about meta-ethnography by George W. Noblit and R. Dwight Hare. I think my anthropology friends would say that observation is a tool used extensively by anthropologists. It is a tool that can be used by evaluators as well.


How many time have you been interviewed?

How many times have you conducted an interview?

Did you notice any similarities?  Probably.

My friend and colleague, Ellen Taylor-Powell has defined interviews as a method for collecting information by talking with and listening to people–a conversation if you will.  These conversations traditionally happen over the phone or face to face–with social media, they could also happen via chat, IM, or some other technology-based approach. A resource I have found  useful is the Evaluation Cookbook.

Interviews can be structured (not unlike a survey with discrete responses) or unstructured (not unlike a conversation).  You might also hear interviews consisting of closed-ended questions and open-ended questions.

Perhaps the most common place for interviews is in the hiring process (seen in personnel evaluation).

Another place for the use of interviews is in the performance review process (seen in performance evaluation).

Unless the evaluator conducting personnel/performance  evaluations,  the most common place for interviews to occur when survey methodology is employed.

Dillman (I’ve mentioned him in previous posts) has sections in his second (pg. 140 – 148) and third (pg. 311-314) editions that talk about the use of interviews in survey construction.  He makes a point in his third edition that I think is important for evaluators to remember and that is the issue of social desirability bias (pg. 313).  Social desirability bias is the possibility that the respondent would answer with what s/he thinks the person asking the questions would want/hope to hear.  Dillman  goes on to say, “Because of the interaction with another person, interview surveys are more likely to produce socially desirable answers for sensitive questions, particularly for questions about potentially embarrassing behavior…”

Expect social desirability response bias with interviewing (and expect differences in social desirability when part of the interview is self-report and part is face-to-face).  Social desirability responses could (and probably will) occur when questions do not appear particularly sensitive to the interviewer; the respondent may have a different cultural perspective which increases sensitivity.  That same cultural difference could also manifest in increased agreement with interview questions often called acquiescence.

Interviews take time; cost more; and often yield a lot of data which may be difficult to analyze.  Sometimes, as with a pilot program, interviews are worth it.  Interviews can be used for formative and summative  evaluations.  Consider if interviews are the best source of evaluation data for the program in question.

I  have six references on case study in my library. Robert K. Yin wrote two seminal books on case studies, one in 1993 (now in a 2nd edition, 1993 was the 1st edition) and the other in 1989 (now in the 4th edition, 1989 was the 1st edition).  I have the 1994 edition (2nd edition of the 1989 book), and in it Yin says that “case studies are increasingly commonplace in evaluation research…are the preferred strategy when “how” and “why” questions are being posed, when the investigator has little control over events, and when the focus in on a contemporary phenomenon within some real-life context.

So what exactly is a case study?

A case study is typically an in-depth study of one or more individuals, institutions, communities, programs, populations. Whatever the “case” it is clearly bounded and what is studied is what is happening and important within those boundaries. Case studies use multiple sources of information to build the case.  For a more detailed review see Wikipedia

There are three types of case studies

  • Explanatory
  • Exploratory
  • Descriptive

Over the years, case method has become more sophisticated.

Brinkerhoff has developed a method, the Success Case Method, as an evaluation approach that “easier, faster, and cheaper than competing approaches, and produces compelling evidence decision-makers can actually use.”  As an evaluation approach, this method is quick and inexpensive and most of all, produces useful results.

Robert E. Stake has taken case study beyond one to many with his recent book, Multiple Case Study Analysis.  It looks at cross-case analysis and can be used when broadly occurring phenomena need to be explored, such as leadership or management.

I’ve mentioned four of the six books, if you want to know the others, let me know.

Extension has consistently used survey as a method for collecting information.

Survey collects information through structured questionnaires resulting in quantitative data.   Don Dillman wrote the book, Internet, Mail and Mixed-Mode Surveys: The Tailored Design Method .  Although mail and individual interviews were once the norm, internet survey software has changed that.

Other ways  are often more expedient, less costly, less resource intensive than survey. When needing to collect information, consider some of these other ways:

  • Case study
  • Interviews
  • Observation
  • Group Assessment
  • Expert or peer review
  • Portfolio reviews
  • Testimonials
  • Tests
  • Photographs, slides, videos
  • Diaries, journals
  • Logs
  • Document analysis
  • Simulations
  • Stories
  • Unobtrusive measures

I’ll talk about these in later posts and provide resources for each of these.

When deciding what information collection method (or methods) to use, remember there are three primary sources of evaluation information. Those sources often dictate the methods of information collection. The Three sources are:

  1. Existing information
  2. People
  3. Pictorial records and observation

When using existing information, developing a systematic approach to LOOKING at the information source is what is important.

When gathering information from people, ASKING them is the approach to use–and how that asking is structured.

When using pictorial records and observations, determine what you are looking for before you collect information