I’m about to start a large scale project, one that will be primarily qualitative (it may end up being a mixed methods study; time will tell); I’m in the planning stages with the PI now.  I’ve done qualitative studies before–how could I not with all the time I’ve been an evaluator?  My go to book for qualitative data analysis has always been Miles and Huberman miles and huberman qualitative data (although my volume is black).  This is their second edition published in 1994.  I loved that book for a variety of reasons: 1) it provided me with a road map to process qualitative data; 2) it offered the reader an appendix for choosing a qualitative software program (now out of date); and 3) it was systematic and detailed in its description of display.  I was very saddened to learn that both the authors had died and there would not be a third edition.  Imaging my delight when I got the Sage flier of a third edition! Qualitative data analysis ed. 3  Of course I ordered it.  I also discovered that Saldana (the new third author on the third edition) has written another book on qualitative data that he sites a lot in this third edition (Coding manual for qualitative researchers coding manual--johnny saldana) and I ordered that volume as well.

Saldana, in the third edition, talks a lot about data display, one of the three factors that qualitative researchers must keep in mind.  The other two are data condensation and conclusion drawing/verification.  In their review, Sage Publications says, “The Third Edition’s presentation of the fundamentals of research design and data management is followed by five distinct methods of analysis: exploring, describing, ordering, explaining, and predicting.”  These five chapters are the heart of the book (in my thinking); that is not to say that the rest of the book doesn’t have gems as well–it does.  The chapter on “Writing About Qualitative Research” and the appendix are two.  The appendix (this time) is an “An Annotated Bibliography of Qualitative Research Resources”, which lists at least 32 different classifications of references that would be helpful to all manner of qualitative researchers.  Because it is annotated, the bibliography provides a one sentence summary of the substance of the book.  A find, to be sure.   Check out the third edition.

I will be attending a professional development session with Mr. Saldana next week.  It will be a treat to meet him and hear what he has to say about qualitative data.  I’m taking the two books with me…I’ll write more on this topic when I return.  (I won’t be posting next week).

 

 

 

You implement a program.  You think it is effective; that it makes a difference; that it has merit and worth.  You develop a survey to determine the merit and worth of the program.  You send the survey out to the target audience which is an intact population–that is, all of the participants are in the target audience for the survey.  You get less than 4o% response rate.  What does that mean?  Can you use the results to say that the participants saw merit in the program?  Do the results indicate that the program has value; that it made a difference if only 40% let you know what they thought.

I went looking for some insights on non-responses and non-responders.  Of course, I turned to Dillman  698685_cover.indd(my go to book for surveys…smiley).  His bottom line: “…sending reminders is an integral part of minimizing non-response error” (pg. 360).

Dillman (of course) has a few words of advice.  For example, on page 360, he says, ” Actively seek means of using follow-up reminders in order to reduce non-response error.”  How do you not burden the target audience with reminders, which are “…the most powerful way of improving response rate…” (Dillman, pg. 360).  When reminders are sent they need to be carefully worded and relate to the survey being sent.  Reminders stress the importance of the survey and the need for responding.

Dillman also says (on page 361) to “…provide all selected respondents with similar amounts and types of encouragement to respond.”  Since most of the time incentives are not an option for you the program person, you have to encourage the participants in other ways.  So we are back to reminders again.

To explore the topic of non-response further, there is a booksurvey non-response (Groves, Robert M., Don A. Dillman, John Eltinge, and Roderick J. A. Little (eds.). 2002. Survey Nonresponse. Wiley-Interscience: New York) that deals with the topic. I don’t have it on my shelf, so I can’t speak to it.  I found it while I was looking for information on this topic.

I also went on line to EVALTALK and found this comment which is relevant to evaluators attempting to determine if the program made a difference:  “Ideally you want your non-response percents to be small and relatively even-handed across items. If the number of nonresponds is large enough, it does raise questions as to what is going for that particular item, for example, ambiguous wording or a controversial topic. Or, sometimes a respondent would rather not answer a question than respond negatively to it. What you do with such data depends on issues specific to your individual study.”  This comment was from Kathy Race of Race & Associates, Ltd.,  September 9, 2003.

A bottom line I would draw from all this is respond…if it was important to you to participate in the program then it is important for you to provide feedback to the program implementation team/person.

 

 


 

This Thursday, the U.S. celebrates THE national holiday. independence-2   I am reminded of all that comprises that holiday.  No, not barbeque and parades; fireworks and leisure.  Rather all the work that has gone on to assure that we as citizens CAN celebrate this independence day.  The founding fathers (and yes, they were old [or not so old] white men} took great risks to stand up for what they believed.  They did what I advocate- determined (through a variety of methods) the merit/worth/value of the program, and took a stand.  To me, it is a great example of evaluation as an everyday activity. We now live under that banner of the freedoms for which they stood.   independence

Oh, we may not agree with everything that has come down the pike over the years; some of us are quite vocal about the loss of freedoms because of events that have happened through no real fault of our own.  We just happened to be citizens of the U.S.  Could we have gotten to this place where we have the freedoms, obligations, responsibilities, and limitations without folks leading us?  I doubt it.  Anarchy is rarely, if ever, fruitful.  Because we believe in leaders (even if we don’t agree with who is leading), we have to recognize that as citizens we are interdependent; we can’t do it alone (little red hen notwithstandinglittle red hen).  Yes, the U.S. is known for the  strength that is fostered in the individual (independence).  Yet, if we really look at what a day looks like, we are interdependent on so many others for all that we do, see, hear, smell, feel, taste.  We need to take a moment and thank our farmer, our leaders, our children (if we have them as they will be tomorrow’s leaders), our parents (if we are so lucky to still have parents), and our neighbors for being part of our lives.  For fostering the interdependence that makes the U.S. unique.  Evaluation is an everyday activity; when was the last time you recognized that you can’t do anything alone?

Happy Fourth of July–enjoy your blueberry pie!blueberry pie natural light

The question of the week is:

What statistical test do I use when I have pre/post reflective questions.

First, what is a reflective question?

Ask says: “A reflective question is a question that requires an individual to think about their knowledge or information, before giving a response. A reflective question is mostly used to gain knowledge about an individual’s personal life.”

I assume (and we have talked about assumptions before assume) that these items were scaled to some hierarchy, like a lot to a little, and a number assigned to each.  Since the questions are pre/post, they are “matched” and can be compared using a comparison test of dependence, like a t-test or a Wilcoxon.  However, if the questions are truly nominal (i.e., “know” and “not know”) and in response to some prompt and DO NOT have a keyed response (like specific knowledge questions),  then even though the same person answered the pre questions and the post questions there really isn’t established dependence.

If the data are nominal, then using a chi-square test would be the best approach because it will tell you if there is a difference from what was expected and what was actually observed (responded).  On a pre/post reflective question, one would expect that they respondents would “know” some information before the intervention, say 50-50 and after the intervention, that difference would shift to say 80 “know” to 20 “not know”.  A chi-square test would give you a statistic of probability that that distribution on the post occurred by chance.  SPSS will run this test; find it under the non-parametric tests.

Miscellaneous thought 1.

Yesterday, I had a conversation with a long time friend of mine.  When we stopped and calculated (which we don’t do very often), we realized that we have know each other since 1981.  We met at the first AEA (only it wasn’t AEA then) conference in Austin, TX.  I was a graduate student; my friend was a practicing professional/academic.  Although we were initially talking about other things evaluation; I asked my friend to look at an evaluation form I was developing.  I truly believe that having other eyes (a pilot if you will) view the document helps.  It certainly did in this case.  I feel really good about the form.  In the course of the conversation, my friend advocated strongly for a odd numbered scales.  My friend had good reasons, specifically

1) It tends to force more comparisons on the respondents; and

2)  if you haven’t given me a neutral  point I tend to mess up the scale on purpose because you are limiting my ability to tell you what I am thinking.

I, of course, had an opposing view (rule number 8–question authority).  I said, ” My personal preference is an even number scale to avoid a mid-point.  This is important because I want to know if the framework (of the program in question) I provided worked well with the group and a mid-point would provide the respondent with a neutral point of view, not a working or not working opinion.   An even number (in my case four points) can be divided into working and not working halves.  When I’m offered a middle point, I tend to circle that because folks really don’t want to know what I’m thinking.  By giving me an opt out/neutral/neither for or against option they are not asking my opinion or view point.”

Recently, I came across an aea365 post on just this topic.  Although this specific post was talking about Likert scales, it applies to all scaling that uses a range of numbers (as my friend pointed out).  The authors sum up their views with this comment, “There isn’t a simple rule regarding when to use odd or even, ultimately that decision should be informed by (a) your survey topic, (b) what you know about your respondents, (c) how you plan to administer the survey, and (d) your purpose. Take time to consider these four elements coupled with the advantages and disadvantages of odd/even, and you will likely reach a decision that works best for you.”  (Certainly knowing my friend like I do, I would be suspicious of responses that my friend submitted.)  Although they list advantages and disadvantages for odd and even responses, I think there are other advantages and disadvantages that they did not mentioned yet are summed up in their concluding sentence.

Miscellaneous thought 2.

I’m reading the new edition of Qualitative Data Analysis (QDA).  Qualitative data analysis ed. 3  This has always been my go to book for QDA and I was very sad when I learned that both of the original authors had died.  The new author, Johnny Saldana (who is also the author of The Coding Manual for Qualitative Researcherscoding manual--johnny saldana), talks (in the third person plural, active voice) about being a pragmatic realist.  That is an interesting concept.  They (because the new author includes the previous authors in his statement) say “that social phenomena exist not only in the mind but also in the world–and that some reasonably stable relationships can be found among the idiosyncratic messiness of life.”  Although I had never used those exact words before, I agree.  It is nice to know the label that applies to my world view.  Life is full of idiosyncratic messiness; probably why I think systems thinking is so important.  I’m reading this volume because I’ve been asked to write the review of one of my favorite books.  We will see if I can get through it between now and July 1 when the draft of the review is due.  Probably aught to pair it with Saldana’s other book; won’t happen between now and July 1.

I have a few thoughts about causation, which I will get to in a bit…first, though, I want to give my answers to the post last week.

I had listed the following and wondered if you thought they were a design, a method, or an approach. (I had also asked which of the 5Cs was being addressed–clarity or consistency.)  Here is what I think about the other question.

Case study is a method used when gathering qualitative data, that is, words as opposed to numbers.  Bob Stake, Robert Brinkerhoff, Robert Yin, and others have written extensively on this method.

Pretest-post test Control Group is (according to Campbell and Stanley, 1963) an example of  a true experimental design if a control group is used (pg. 8 and 13).  NOTE: if only one group is used (according to Campbell and Stanley, 1963), pretest-post test is considered a pre-experimental design (pg. 7 and 8); still it is a design.

Ethnography is a method used when gathering qualitative data often used in evaluation by those with training in anthropology.  David Fetterman is one such person who has written on this topic.

Interpretive is an adjective use to describe the approach one uses in an inquiry (whether that inquiry is as an evaluator or a researcher) and can be traced back to the sociologists Max Weber and Wilhem Dilthey in the later part of the 19th century.

Naturalistic is  an adjective use to describe an approach with a diversity of constructions and is a function of “…what the investigator does…” (Lincoln and Guba, 1985, pg.8).

Random Control Trials (RCT) is the “gold standard” of clinical trials, now being touted as the be all and end all of experimental design; its proponents advocate the use of RCT in all inquiry as it provides the investigator with evidence that X (not Y) caused Z.

Quasi-Experimental is a term used by Campbell and Stanley(1963) to denote a design where random assignment cannot be made for ethical or practical reasons be accomplished; this is often contrasted with random selection for survey purposes.

Qualitative is an adjective to describe an approach (as in qualitative inquiry), a type of data (as in qualitative data) or
methods (as in qualitative methods).  I think of qualitative as an approach which includes many methods.

Focus Group is a method of gathering qualitative data through the use of specific, structured interviews in the form of questions; it is also an adjective for defining the type of interviews or the type of study being conducted (Krueger & Casey, 2009, pg. 2)

Needs Assessment is method for determining priorities for the allocation of resources and actions to reduce the gap between the existing and the desired.

I’m sure there are other answers to the terms listed above; these are mine.  I’ve gotten one response (from Simon Hearn at BetterEvaluation).  If I get others, I’ll aggregate them and share them with you.  (Simon can check his answers against this post.

Now causation, and I pose another question:  If evaluation (remember the root word here is value) is determining if a program (intervention, policy, product, etc. ) made a difference, and determined the merit or worth (i.e., value) of that program (intervention, policy, product, etc.), how certain are you that your program (intervention, policy, program, etc.) caused the outcome?  Chris Lysy and Jane Davidson have developed several cartoons that address this topic.  They are worth the time to read them.

When I teach scientific writing (and all evaluators need to be able to communicate clearly verbally and in writing), I focus on the 5Cs:  letter c 1larity, 5Cs-2-Coherenceoherence, 5Cs-3-Concisenessonciseness, 5Cs-4-Consistencysonsistency, and 5Cs-5-Correctnessorrectness,   I’ve written about the 5Cs in a previous blog post, so I won’t belabor them here.  Suffice it to say that when I read a document that violates one (or more) of these 5Cs, I have to wonder.

Recently, I was reading a document where the author used design (first), then method, then approach.  In reading the context, I think (not being able to clarify) that the author was referring to the same thing–a method and used these different words in an effort to make the reading more entertaining where all it did was cause obfuscation, violating 5Cs-1-Claritylarity, one of the 5Cs     .

So I’ll ask you, reader.  Are these different?  What makes them different?  Should they have been used interchangeably in the document?  I went to my favorite thesaurus of evaluation terms (Scriven)Scriven book cover  (published by Sage) to see what he had to say, if anything.  Only “design” was listed and the definition said, “…process of stipulating the investigatory procedures to be followed in doing a certain evaluation…”  OK–investigatory procedure.

So, I’m going to list several terms used commonly in evaluation and research.  Think about what each is–design, method, approach.  I’ll provide my answers next week.  Let me know what you think each of the following is:

Case Study

Pretest-Posttest Control Group

Ethnography

Interpretive

Naturalistic

Random Control Trials (RCT)

Quasi-Experimental

Qualitative

Focus Group

Needs Assessment

 

 

 

I was reminded recently about the 1992 AEA meeting in Seattle, WA.  That seems like so long ago.  The hot topic of that meeting was whether qualitative data or quantitative data were best.  At the time I was a nascent evaluator having been in the field less that 10 years and absorbed debates like this as a dry sponge does water.  It was interesting; stimulating; exciting.  It felt cutting edge.

Now 20+ years later, I wonder what all the hype was about.  Now, there can be rigor in what ever data are collected, regardless of type (numbers or words); language has been developed to look at that rigor.   (Rigor can also escape the investigator regardless of the data collected; another post, another day.)  Words are important for telling stories (and there is a wealth of information on how story can be rigorous) and numbers are important for counting (and numbers have a long history of use–Thanks Don Campbell).  Using both (that is, mixed methods) makes really good sense when conducting an evaluation in community environments, work that I’ve done for most of my career (community-based work).

I was reading another evaluation blog (ACET) and found the following bit of information that I thought I’d share as it is relevant to looking at data.  This particular post (July, 2012) was a reflection of the author. (I quote from that blog).

  • § Utilizing both quantitative and qualitative data. Many of ACET’s evaluations utilize both quantitative (e.g., numerical survey items) and qualitative (e.g., open-ended survey items or interviews) data to measure outcomes. Using both types of data helps triangulate evaluation findings. I learned that when close-ended survey findings are intertwined with open-ended responses, a clearer picture of program effectiveness occurs. Using both types of data also helps to further explain the findings. For example, if 80% of group A “Strongly agreed” to question 1, their open-ended responses to question 2 may explain why they “Strongly agreed” to question 1.

Triangulation was a new (to me at least) concept in 1981 when a whole chapter was devoted to the topic in a volume dedicated to Donald Campbell, titled Scientific Inquiry and the Social Sciences. scientific inquiry and the social sciences   I have no doubt that this concept was not new; Crano, the author of this chapter titled “Triangulation and Cross-Cultural Research”, has three and one half pages of references listed that support the premise put forth in the chapter.  Mainly, that using data from multiple different sources may increase the understanding of the phenomena under investigation.  That is what triangulation is all about–looking at a question from multiple points of view; bringing together the words and the numbers and then offering a defensible explanation.

I’m afraid that many beginning evaluators forget that words can support numbers and numbers can support words.

Recently, I was privileged to see the recommendations of  William (Bill) Tierney on the top education blogs.  (Tierney is the Co-director of the Pullias Center for Higher Education at the University of Southern California.)  He (among others) writes the blog, 21st scholar.  The blogs are actually the recommendation of his research assistant Daniel Almeida.  These are the recommendations:

  1. Free Technology for Teachers

  2. MindShift

  3. Joanne Jacobs

  4. Teaching Tolerance

  5. Brian McCall’s Economics of Education Blog

What criteria were used?  What criteria would you use?  Some criteria that come to mind are interest, readability, length, frequency.  But I’m assuming that they would be your criteria (and you know what assuming does…)

If I’ve learned anything in my years as an evaluator, it is to make assumptions explicit.  Everyone comes to the table with built in biases (called cognitive biases).  I call them personal and situational biases (I did my dissertation on those biases). So by making your assumptions explicit (and thereby avoiding personal and situational biases), you are building a rubric because a rubric is developed from criteria for a particular product, program, policy, etc.

How would you build your rubric? Many rubrics are in chart format, that is columns and rows with the criteria detailed in those cross boxes.  That isn’t cast in stone.  Given the different ways people view the world–linear, circular, webbed–there may be others, I would set yours up in the format that works best for you.  The only thing to keep in mind is be specific.

Now, perhaps you are wondering how this relates to evaluation in the way I’ve been using evaluation.  Keep in mind evaluation is an everyday activity.  And everyday, all day, you perform evaluations.  Rubrics formalizes the evaluations you conduct–by making the criteria explicit.  Sometimes you internalize them; sometimes you write them down.  If you need to remember what you did the last time you were in a similar situation, I would suggest you write them down. rubric cartoon No, you won’t end up with lots of little sticky notes posted all over.  Use your computer.  Create a file.  Develop criteria that are important to you.  Typically, the criteria are in a table format; an x by x form.  If you are assigning number, you might want to have the rows be the numbers (for example, 1-10) and the columns be words that describe those numbers (for example, 1 boring; 10 stimulating and engaging).  Rubrics are used in reviewing manuscripts, student papers, assigning grades to activities as well as programs.  Your format might look like this:generic rubric

Or it might not.  What other configuration have you seen rubrics?  How would you develop your rubric?  Or would you–perhaps you prefer a bunch of sticky notes.  Let me know.

Ever wonder where the 0.05 probability level number was derived?  Ever wonder if that is the best number?  How many of you were taught in your introduction to statistics course that 0.05 is the probability level necessary for rejecting the null hypothesis of no difference?  This confidence may be spurious.  As Paul Bakker indicates in the AEA 365 blog post for March 28, “Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision.”  Do they really need to be 95% confident?  Or would 90% confidence be sufficient?  What about 75% or even 55%?

Think about it for a minute?  If you were a brain surgeon, you wouldn’t want anything less than 99.99% confidence;  if you were looking at level of risk for a stock market investment, 55% would probably make you a lot of money.  The academic community  has held to and used the probability level of 0.05 for years (the computation of the p value dating back to 1770).   (Quoting Wikipedia, ” In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.”) Fisher first proposed the 0.05 level in 1025 and established a one in 20 limit for statistical significance when considering a two tailed test.   Sometimes the academic community makes the probability level even more restrictive by using 0.01 or 0.001 to demonstrate that the findings are significant.  Scientific journals expect 95% confidence or a probability level of at least 0.05.

Although I have held to these levels, especially when I publish a manuscript, I have often wondered if this level makes sense.  If I am only curious about a difference, do I need 0.05?  Oor could I use 0.10 or 0.15 or even 0.20?  I have often asked students if they are conducting confirmatory or exploratory research?  I think confirmatory research expects a more stringent probability level.  I think exploratory research requires a less stringent probability level.  The 0.05 seems so arbitrary.

Then there is the grounded theory approach which doesn’t use a probability level.  It generates theory from categories which are generated from concepts which are identified from data, usually qualitative in nature.  It uses language like fit, relevance, workability, and modifiability.  It does not report statistically significant probabilities as it doesn’t use inferential statistics.  Instead, it uses a series of probability statements about the relationships between concepts.

So what do we do?  What do you do?  Let me know.