Variables.

We all know about independent variables, and dependent variables.  Probably even learned about moderator variables, control variables and intervening variables.  Have you heard of confounding variables?  Variables over which you have no (or very little) control.  They present as a positive or negative correlation with the dependent and independent variable.  This spurious relationship plays havoc with analyses, program outcomes, and logic models.  You see them often in social programs.

Ever encounter one? (Let me know).  Need an example?  Here is one a colleague provided.  There was a program developed to assist children removed from their biologic  mothers (even though the courts typically favor mothers) to improve the children’s choices and chances of success.  The program had included training of key stakeholders (including judges, social service, potential foster parents).  The confounding variable that wasn’t taken into account was the sudden appearance of the biological father.  Judges assumed that he was no longer present (and most of the time he wasn’t); social service established fostering without taking into consideration the presence of the biological father; potential foster parents were not allerted in their training of the possibility.  Needless to say, the program failed.  When biologic fathers appeared (as often happened), the program had no control over the effect they had.  Fathers had not been included in the program’s equation.

Reviews.

Recently, I was asked to review a grant proposal, the award would result in several hundred thousand dollars (and in today’s economy, no small change).  The PI’s passion came through in the proposal’s text.  However, the PI and the PI’s colleagues did some major lumping in the text that confounded the proposed outcomes.  I didn’t see how what was being proposed would result in what was said to happen.  This is an evaluative task.  I was charged to with evaluating the proposal on technical merit, possibility of impact (certainly not world peace), and achievability.  The proposal was lofty and meant well.  The likelihood that it would accomplish what it proposed was unclear, despite the PI’s passion.  When reviewing a proposal, it is important to think big picture as well as small picture.  Most proposals will not be sustainable after the end of funding.  Will the proposed project be able to really make an impact (and I’m not talking here about world peace).

Conversations.

I attended a meeting recently that focused on various aspects of diversity.  (Now among the confounding here is what does one mean by diversity; is it only the intersection of gender and race/ethnicity?  Or something bigger, more?)  One of the presenters talked about how just by entering into the conversation, the participants would be changed.  I wondered, how can that change be measured?  How would you know that a change took place?  Any ideas?  Let me know.

Focus groups.

A colleague asked whether a focus group could be conducted via email.  I had never heard of such a thing (virtual, yes; email, no).  Dick Krueger and Mary Ann Casey only talk about electronic reporting in their 4th edition of their Focus Group book. krueger 4th ed  If I go to Wikipedia (keep in mind it is a wiki…), there is a discussion of online focus groups.  Nothing offered about email focus groups.  So I ask you, readers, is it a focus group if it is conducted by email?

 

 

 

I had a topic all ready to write about then I got sick.  I’m sitting here typing this trying to remember what that topic was, to no avail. That topic went the way of much of my recent memory; another day, perhaps.

I do remember the conversation with my daughter about correlation.  She had a correlation of .3 something with a probability of 0.011 and didn’t understand what that meant.  We had a long discussion of causation and attribution and correlation.

We had another long conversation about practical v. statistical significance, something her statistics professor isn’t teaching.  She isn’t learning about data management in her statistics class either.  Having dealt with both qualitative and quantitative data for a long time, I have come to realize that data management needs to be understood long before you memorize the formulas for the various statistical tests you wish to perform.  What if the flood happens????lost data

So today I’m telling you about data management as I understand it, because the flood  did actually happen and, fortunately, I didn’t loose my data.  I had a data dictionary.

Data dictionary.  The first step in data management is a data dictionary.   There are other names for this, which escape me right now…know that a hard copy of how and what you have coded is critical.  Yes, make a back up copy on your hard drive…have a hard copy because the flood might happen. (It is raining right now and it is Oregon in November.)

Take a hard copy of your survey, evaluation form, qualitative data coding sheet and mark on it what every code notation you used means.  I’d show you an example of what I do, only they are at the office and I am home sick without my files.  So, I’ll show you a clip art instead…data management    smiley.  No, I don’t use cards any more for my data (I did once…most of you won’t remember that time…), I do make a hard copy with clear notations.  I find my self doing that with other things to make sure I code the response the same way.  That is what a data dictionary allows you to do–check yourself.

Then I run a frequencies and percentages analysis.  I use SPSS (because that is what I learned first).  I look for outliers, variables that are miscoded, and system generated missing data that isn’t missing.  I look for any anomaly in the data, any humon error (i. e. my error).  Then I fix it.  Then I run my analyses.

There are probably more steps than I’ve covered today.  These are the first steps that absolutely must be done BEFORE you do any analyses.  Then you have a good chance of keeping your data safe.

There has been quite a bit written about data visualization, a topic important to evaluators who want their findings used.  Michael Patton talks about evaluation use in his 4th edition of utilization-focused evaluation. Patton's utilization focused evaluation  He doesn’t however list data visualization in the index; so he may talk about it somewhere–it isn’t obvious.

The current issue of New Directions for Evaluation data visualization NDE is devoted to data visualization and it is the first part (implying, I hope, for at least a part 2).  Tarek Azzam and Stephanie Evergreen are the guest editors.  This volume (the first on this topic in 15 years) sets the stage (chapter 1) and talks about quantitative data visualization and quantitative data visualization.  The last chapter talks about the tools that are available to the evaluator and there are many and they are various.  I cannot do them justice in this space; read about them in the NDE volume.  (If you are an AEA member, the volume is available on line.)

freshspectrum, a blog by Chris Lysy, talks about INTERACTIVE data visualization with illustrations.

Stephanie Evergreen, the co-guest editor of the above NDE, also blogs and in her October 2 post, talks about “Design for Federal Proposals (aka Design in a Black & White Environment)”.  More on data visualization.

The data visualizer that made the largest impact on me was Hans Rosling in his TED talks.  Certainly the software he uses makes the images engaging.  If he didn’t understand his data the way he does, he wouldn’t be able to do what he does.

Data visualization is everywhere.  There will be multiple sessions at the AEA conference next week.  If you can, check them out–get there early as they will fill quickly.

Before you know it, Evaluation ’13 will be here and thousands of evaluators will converge on Washington DC, the venue for this year’s AEA annual meeting.

The Local Arrangements Working Group (LAWG) is blogging this week in AEA365. (You might want to check out all the posts this week.)  There are A LOT of links in these posts (including related past posts) that are worth checking.  For those who have not been to AEA before or for those who have recently embraced evaluation, reading their posts are a wealth of information.

What I want to focus on today is the role of the local arrangements working group.  The Washington Evaluators group is working in tandem with AEA to organize the local part of the conference.  These folks live locally and know the area.  Often they include graduate students as well as seasoned evaluators.  (David Bernstein and Valerie Caracelli are the co-chairs of this year’s LAWG .)  They have a wealth of information in their committee.  (Scroll down to the “Please Check Back for Periodic Updates” to see the large committee–it really does take a village!)  They only serve for the current year and are truly local.  Next year in Denver, there will be a whole new LAWG.

Some things that the committee do include identifying (and evaluating) local restaurants, things to do in DC, and getting around DC.   Although these links provide valuable information, there are those of us (me… smiley) who are still technopeasants and do not travel with a smart phone, tablet, computer, or other electronic connectivity and would like hard copy of pertinent information.  (I want to pay attention to real people in real time–I acknowledge that I am probably an artifact, certainly a technology immigrant–see previous blog about civility.)

Restaurants change quicker than I can keep track–although I’m sure that there are still some which existed when I was in DC last for business.  I’m sure that today, most restaurants provide vegetarian, vegan, gluten-free options (it is, after all, the current trend).  That is very different from when I was there for the last AEA in 2002.  I did a quick search for vegetarian restaurants using the search options available at the LAWG/Washington Evaluators’ site–there were several…I also went to look at reviews…I wonder about the singular bad (very) review…was it just an off night or a true reflection?

There are so many things to do in DC…please take a day–the newer monuments are amazing–see them.

Getting around DC…use the Metro–it gets you to most places; it is inexpensive; it is SAFE!  It has been expanded to reach beyond the DC boundaries.  If nothing else, ride the Metro–you will be able to see a lot of DC.  You can get from Reagan-Washington NationalAirport to the conference venue (yes, you will have to walk 4 blocks and there may be some problem with a receipt–put the fare plus $0.05 on the Metro card and turn in the card).

The LAWG has done a wonderful job providing information to evaluators…check out their site.  See you in DC.

The question of the week is:

What statistical test do I use when I have pre/post reflective questions.

First, what is a reflective question?

Ask says: “A reflective question is a question that requires an individual to think about their knowledge or information, before giving a response. A reflective question is mostly used to gain knowledge about an individual’s personal life.”

I assume (and we have talked about assumptions before assume) that these items were scaled to some hierarchy, like a lot to a little, and a number assigned to each.  Since the questions are pre/post, they are “matched” and can be compared using a comparison test of dependence, like a t-test or a Wilcoxon.  However, if the questions are truly nominal (i.e., “know” and “not know”) and in response to some prompt and DO NOT have a keyed response (like specific knowledge questions),  then even though the same person answered the pre questions and the post questions there really isn’t established dependence.

If the data are nominal, then using a chi-square test would be the best approach because it will tell you if there is a difference from what was expected and what was actually observed (responded).  On a pre/post reflective question, one would expect that they respondents would “know” some information before the intervention, say 50-50 and after the intervention, that difference would shift to say 80 “know” to 20 “not know”.  A chi-square test would give you a statistic of probability that that distribution on the post occurred by chance.  SPSS will run this test; find it under the non-parametric tests.

I have a few thoughts about causation, which I will get to in a bit…first, though, I want to give my answers to the post last week.

I had listed the following and wondered if you thought they were a design, a method, or an approach. (I had also asked which of the 5Cs was being addressed–clarity or consistency.)  Here is what I think about the other question.

Case study is a method used when gathering qualitative data, that is, words as opposed to numbers.  Bob Stake, Robert Brinkerhoff, Robert Yin, and others have written extensively on this method.

Pretest-post test Control Group is (according to Campbell and Stanley, 1963) an example of  a true experimental design if a control group is used (pg. 8 and 13).  NOTE: if only one group is used (according to Campbell and Stanley, 1963), pretest-post test is considered a pre-experimental design (pg. 7 and 8); still it is a design.

Ethnography is a method used when gathering qualitative data often used in evaluation by those with training in anthropology.  David Fetterman is one such person who has written on this topic.

Interpretive is an adjective use to describe the approach one uses in an inquiry (whether that inquiry is as an evaluator or a researcher) and can be traced back to the sociologists Max Weber and Wilhem Dilthey in the later part of the 19th century.

Naturalistic is  an adjective use to describe an approach with a diversity of constructions and is a function of “…what the investigator does…” (Lincoln and Guba, 1985, pg.8).

Random Control Trials (RCT) is the “gold standard” of clinical trials, now being touted as the be all and end all of experimental design; its proponents advocate the use of RCT in all inquiry as it provides the investigator with evidence that X (not Y) caused Z.

Quasi-Experimental is a term used by Campbell and Stanley(1963) to denote a design where random assignment cannot be made for ethical or practical reasons be accomplished; this is often contrasted with random selection for survey purposes.

Qualitative is an adjective to describe an approach (as in qualitative inquiry), a type of data (as in qualitative data) or
methods (as in qualitative methods).  I think of qualitative as an approach which includes many methods.

Focus Group is a method of gathering qualitative data through the use of specific, structured interviews in the form of questions; it is also an adjective for defining the type of interviews or the type of study being conducted (Krueger & Casey, 2009, pg. 2)

Needs Assessment is method for determining priorities for the allocation of resources and actions to reduce the gap between the existing and the desired.

I’m sure there are other answers to the terms listed above; these are mine.  I’ve gotten one response (from Simon Hearn at BetterEvaluation).  If I get others, I’ll aggregate them and share them with you.  (Simon can check his answers against this post.

Now causation, and I pose another question:  If evaluation (remember the root word here is value) is determining if a program (intervention, policy, product, etc. ) made a difference, and determined the merit or worth (i.e., value) of that program (intervention, policy, product, etc.), how certain are you that your program (intervention, policy, program, etc.) caused the outcome?  Chris Lysy and Jane Davidson have developed several cartoons that address this topic.  They are worth the time to read them.

I was reminded recently about the 1992 AEA meeting in Seattle, WA.  That seems like so long ago.  The hot topic of that meeting was whether qualitative data or quantitative data were best.  At the time I was a nascent evaluator having been in the field less that 10 years and absorbed debates like this as a dry sponge does water.  It was interesting; stimulating; exciting.  It felt cutting edge.

Now 20+ years later, I wonder what all the hype was about.  Now, there can be rigor in what ever data are collected, regardless of type (numbers or words); language has been developed to look at that rigor.   (Rigor can also escape the investigator regardless of the data collected; another post, another day.)  Words are important for telling stories (and there is a wealth of information on how story can be rigorous) and numbers are important for counting (and numbers have a long history of use–Thanks Don Campbell).  Using both (that is, mixed methods) makes really good sense when conducting an evaluation in community environments, work that I’ve done for most of my career (community-based work).

I was reading another evaluation blog (ACET) and found the following bit of information that I thought I’d share as it is relevant to looking at data.  This particular post (July, 2012) was a reflection of the author. (I quote from that blog).

  • § Utilizing both quantitative and qualitative data. Many of ACET’s evaluations utilize both quantitative (e.g., numerical survey items) and qualitative (e.g., open-ended survey items or interviews) data to measure outcomes. Using both types of data helps triangulate evaluation findings. I learned that when close-ended survey findings are intertwined with open-ended responses, a clearer picture of program effectiveness occurs. Using both types of data also helps to further explain the findings. For example, if 80% of group A “Strongly agreed” to question 1, their open-ended responses to question 2 may explain why they “Strongly agreed” to question 1.

Triangulation was a new (to me at least) concept in 1981 when a whole chapter was devoted to the topic in a volume dedicated to Donald Campbell, titled Scientific Inquiry and the Social Sciences. scientific inquiry and the social sciences   I have no doubt that this concept was not new; Crano, the author of this chapter titled “Triangulation and Cross-Cultural Research”, has three and one half pages of references listed that support the premise put forth in the chapter.  Mainly, that using data from multiple different sources may increase the understanding of the phenomena under investigation.  That is what triangulation is all about–looking at a question from multiple points of view; bringing together the words and the numbers and then offering a defensible explanation.

I’m afraid that many beginning evaluators forget that words can support numbers and numbers can support words.

Ever wonder where the 0.05 probability level number was derived?  Ever wonder if that is the best number?  How many of you were taught in your introduction to statistics course that 0.05 is the probability level necessary for rejecting the null hypothesis of no difference?  This confidence may be spurious.  As Paul Bakker indicates in the AEA 365 blog post for March 28, “Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision.”  Do they really need to be 95% confident?  Or would 90% confidence be sufficient?  What about 75% or even 55%?

Think about it for a minute?  If you were a brain surgeon, you wouldn’t want anything less than 99.99% confidence;  if you were looking at level of risk for a stock market investment, 55% would probably make you a lot of money.  The academic community  has held to and used the probability level of 0.05 for years (the computation of the p value dating back to 1770).   (Quoting Wikipedia, ” In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.”) Fisher first proposed the 0.05 level in 1025 and established a one in 20 limit for statistical significance when considering a two tailed test.   Sometimes the academic community makes the probability level even more restrictive by using 0.01 or 0.001 to demonstrate that the findings are significant.  Scientific journals expect 95% confidence or a probability level of at least 0.05.

Although I have held to these levels, especially when I publish a manuscript, I have often wondered if this level makes sense.  If I am only curious about a difference, do I need 0.05?  Oor could I use 0.10 or 0.15 or even 0.20?  I have often asked students if they are conducting confirmatory or exploratory research?  I think confirmatory research expects a more stringent probability level.  I think exploratory research requires a less stringent probability level.  The 0.05 seems so arbitrary.

Then there is the grounded theory approach which doesn’t use a probability level.  It generates theory from categories which are generated from concepts which are identified from data, usually qualitative in nature.  It uses language like fit, relevance, workability, and modifiability.  It does not report statistically significant probabilities as it doesn’t use inferential statistics.  Instead, it uses a series of probability statements about the relationships between concepts.

So what do we do?  What do you do?  Let me know.

What do I know that they don’t know?
What do they know that I don’t know?
What do all of us need to know that few of us knows?”

These three questions have buzzed around my head for a while in various formats.

When I attend a conference, I wonder.

When I conduct a program, I wonder, again.

When I explore something new, I am reminded that perhaps someone else has been here and wonder, yet again.

Thinking about these questions, I had these ideas

  • I see the first statement relating to capacity building;
  • The second statement  relating to engagement; and
  • The third statement (relating to statements one and two) relating to cultural competence.

After all, aren’t both of these statements (capacity building and engagement)  relating to a “foreign country” and a different culture?

How does all this relate to evaluation?  Read on…

Premise:  Evaluation is an everyday activity.  You evaluate everyday; all the time; you call it making decisions.  Every time you make a decision, you are building capacity in your ability to evaluate.  Sure, some of those decisions may need to be revised.  Sure, some of those decisions may just yield “negative” results.  Even so, you are building capacity.  AND you share that knowledge–with your children (if you have them), with your friends, with your colleagues, with the random shopper in the (grocery) store.  That is building capacity.  Building capacity can be systematic, organized, sequential.  Sometimes formal, scheduled, deliberate.  It is sharing “What do I know that they don’t know (in the hope that they too will know it and use it).

Premise:  Everyone knows something.  In knowing something, evaluation happens–because people made decisions about what is important and what is not.  To really engage (not just outreach which much of Extension does), one needs to “do as” the group that is being engaged.  To do anything else (“doing to” or “doing with”) is simply outreach and little or no knowledge is exchanged.  Doesn’t mean that knowledge isn’t distributed; Extension has been doing that for years.  Just means that the assumption (and you know what assumptions do) is that only the expert can distribute knowledge.  Who is to say that the group (target audience, participants) aren’t expert in at least part of what is being communicated.  Probably are.  It is the idea that … they know something that I don’t know (and I would benefit from knowing).

Premise:  Everything, everyone is connected.  Being prepared is the best way to learn something.  Being prepared by understanding culture (I’m not talking only about the intersection of race and gender; I’m talking about all the stereotypes you carry with you all the time) reinforces connections.  Learning about other cultures (something everyone can do) helps dis-spell stereotypes and mitigate stereotype threats.  And that is an evaluative task.  Think about it.  I think it captures the What do all of us need to know that few of us knows?” question.

 

 

 

The topic of survey development seems to be  popping up everywhere–AEA365, Kirkpatrick Partners, eXtension Evaluation Community of Practice, among others.  Because survey development is so important to Extension faculty, I’m providing links and summaries.

 

 AEA365 says:

“… it is critical that you pre-test it with a small sample first.”  Real time testing helps eliminate confusion, improve clarity, and assures that you are asking a question that will give you an answer to what you want to know.  This is so important today when many surveys are electronic.

It is also important to “Train your data collection staff…Data collection staff are the front line in the research process.”  Since they are the people who will be collecting the data, they need to understand the protocols, the rationales, and the purposes of the survey.

Kirkpatrick Partners say:

“Survey questions are frequently impossible to answer accurately because they actually ask more than one question. ”  This is the biggest problem in constructing survey questions.  They provide some examples of asking more than one question.

 

Michael W. Duttweiler, Assistant Director for Program Development and Accountability at Cornell Cooperative Extension stresses the four phases of survey construction:

  1. Developing a Precise Evaluation Purpose Statement and Evaluation Questions
  2. Identifying and Refining Survey Questions
  3. Applying Golden Rules for Instrument Design
  4. Testing, Monitoring and Revising

He then indicates that the next three blog posts will cover point 2, 3, and 4.

Probably my favorite post on survey recently was one that Jane Davidson did back in August, 2012 in talking about survey response scales.  Her “boxers or briefs” example captures so many issues related to survey development.

Writing survey questions which give you useable data that answers your questions about your program is a challenge; it is not impossible.  Dillman writes the book about surveys; it should be on your desk.

Here is the Dillman citation:
Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009).  Internet, mail, and mixed-mode surveys: The tailored design method.  Hoboken, NJ: John Wiley & Sons, Inc.