A colleague asks, “What is the appropriate statistical analysis test when comparing means of two groups ?”

 

I’m assuming (yes, I know what assuming does) that parametric tests are appropriate for what the colleague is doing.  Parametric tests (i.e., t-test, ANOVA,) are appropriate when the parameters of the population are known.    If that is the case (and non-parametric tests are not being considered), I need to clarify the assumptions underlying the use of parametric tests, which have more stringent assumptions than nonparametric tests.  Those assumptions are the following:

The sample is

  1. randomized (either by assignment or selection).
  2. drawn from a population which has specified parameters.
  3. normally distributed.
  4. demonstrating  equality of variance in each variable.

If those assumptions are met,  the part answer is, “It all depends”.  (I know you have heard that before today.)

I will ask the following questions:

  1. Do you know the parameters (measures of central tendency and variability) for the data?
  2. Are they dependent or independent samples?
  3. Are they intact populations?

Once I know the answers to these questions I can suggest a test.

My current favorite statistics book, Statistics for People Who (Think They) Hate Statistics, by Neil J. Salkind (4th ed.) has a flow chart that helps you by asking if you are looking at differences between the sample and the population and relationships or differences between one or more groups. The flow chart ends with the name of a statistical test.  The caveat is that you are working with a sample from a larger population that meets the above stated assumptions.

How you answer the questions above also depends on what test you can use.  If you do not know the parameters, you will NOT use a parametric test.  If you are using an intact population (and many Extension professionals use intact populations), you will NOT use inferential statistics as you will not be inferring to anything bigger than what you have at hand.  If you have two groups and the groups are related (like a pre-post test or a post-pre test), you will use a parametric or non-parametric test for dependency.  If you have two groups and are they unrelated (like boys and girls), you will use a parametric or non-parametric test for independence.  If you have more than two groups you will use different test yet.

Extension professionals are rigorous in their content material; they need to be just as rigorous in their analysis of the data collected from the content material.  Understanding the what analyses to use when is a good skill to have.

 

 

 

A colleague asked an interesting question, one that I am often asked as an evaluation specialist:  “without a control group is it possible to show that the intervention had anything to do with a skill increase?”  The answer to the question “Do I need a control group to do this evaluation?” is, “It all depends.”

It depends on what question are you asking.  Are you testing a hypothesis–a question posed in a null form of no difference?  Or answering an evaluative question–what difference was made?  The methodology you use depends on what question you are asking.  If you want to know how effective or efficient a program (aka intervention) is, you can determine that without a control group.  Campbell and Stanley in their, now well read, 1963 volume, Experimental and quasi-experimental designs for research, talk about quasi-experimental designs that do not use a control group.   Yes, there are threats to internal validity; yes, there are stronger designs; yes, the controls are not as rigorous as in a double-blind, cross-over design (considered the gold standard by some groups).  We are talking here about evaluation, people, NOT research.  We are not asking questions of efficacy (research); rather we want to know what difference is being made; we want to know the answer to “so what”.  Remember, the root of evaluation is value; not cause.

This is certainly a quandary–how to determine cause for the desired outcome.  John Mayne has recognized this quandary and has approached the question of attributing the outcome to the intervention in his use of contribution analysis.  In community-based work, like what Extension does, attributing cause is difficult at best.  Why–because there are factors which Extension cannot control and identifying a control group may not be ethical, appropriate, or feasible.  Use something else that is ethical, appropriate, and feasible (see Campbell and Stanley).

Using a logic model to guide your work helps to defend your premise of “If I have these resources, then I can do these activities with these participants; if I do these activities with these participants, then I expect (because the literature says so–the research has already been done) that the participants will learn these things; do these things; change these conditions.”  The likelihood of achieving world peace with your intervention is low at best; the likelihood of changing something (learning, practices, conditions)  if you have a defensible model (road map) is high.  Does that mean your program caused that change–probably not.  Can you take credit for the change; most definitely.

Last weekend, I was in Florida visiting my daughter at Eckerd College.  The College was offering an Environmental Film Festival and I had the good fortune to see Green Fire, a film about Aldo Leopold and the land ethic.   I had seen it at OSU and was impressed because it was not all doom and gloom; rather it celebrated Aldo Leopold as one of the three leading and  early conservationists  (the other two are John Muir and Henry David Thoreau ).  Dr. Curt Meine, who narrates the film and is a conservation biologist, was leading the discussion again; I had heard him at OSU.  At the showing early, I was able to chat with him about the film and its effects.  I asked him how he knew he was being effective.  His response was to tell me about the new memberships in the Foundation, the number of showings, and the size of the audience seeing the film.  Appropriate responses for my question.  What I really wanted to know was how did he know he was making a difference.  That is a different question; one which talks about change.  Change is what programs like Green Fire is all about.  It is what Aldo Leopold was all about (read Sand County Almanac to understand Leopold’s position.)

 

Change is what evaluation is all about.  But did I ask the right question?  How could I have phrased it differently to get at what change had occurred in the viewers of the film?  Did new memberships in the Foundation demonstrate change?  Knowing what question to ask is important for program planners as well as evaluators.  There are often multiple levels of questions that could be asked–individual, programmatic, organizational, regional, national, global.  Are they all equally important?  Do they provide a means forgathering pertinent data?  How are you going to use these data once you’ve gathered them?  How carefully do you think about the questions you ask when you craft your logic model?  When you draft a survey?  When you construct questions for focus groups?  Asking the right question will yield relevant answers.  It will show you what difference you’ve made in the lives of your target audience.

 

Oh, and if you haven’t see the film, Green Fire, or read the book, Sand County Almanac–I highly recommend them.

I regularly follow Harold Jarche’s blog .

Much of what he writes would not fall under the general topic of evaluation.  Yet his blog for February 18 does.  This blog is titled Why is learning and the sharing of information so important?

I see that intimately related to evaluation, especially given Michael Quinn Patton’s focus on use.  The way I see it, something can’t be used effectively unless one learns about it.  Oh, I know you can use just about anything for anything–and I am reminded of the anecdote of when you have a hammer everything looks like a nail, even if it isn’t. 

That is not the kind of use I’m talking about.

I’m talking about rational, logical, systematic use based on thoughtful inquiry, critical thinking, and problem solving.  I’m talking about making a difference because you have learned something new and valuable (remember the root of evaluation?). In his blog, Jarche cites the Governor-General of Canada, David Johnston and Johnston’s article recently published in the Globe and Mail, a newspaper published in Toronto. What Johnston says makes sense.  Evaluators in this context are diplomats, making learning accessible and sharing knowledge.

Sharing knowledge is what statistics is all about.  If you think the field of statistics is boring, I urge you to check out the video called The Joy of Stats presented by Swedish scholar Hans Rosling  .  I think you will have a whole new appreciation of statistics and the knowledge that can be conveyed.  If you find Hans Rosling compelling (or even if you don’t),  I urge you to check out his TED Talks presentation.  It is an eye-opener.

I think he makes a compelling argument about learning and sharing information.  About making a difference.  That is what evaluation is all about.

 

 

The GAO (Government Accounting Office) has a long and respected history of evaluation.  Many luminaries at AEA (American Evaluation Association) have spent/are spending their professional careers at GAO.  The GAO has just published (January 2012) their handbook on evaluation.  It is called, DESIGNING EVALUATIONS 2012 Revision.  Nancy Kingsbury, a longtime AEA luminary, wrote the preface.  For those of us who receive Federal money in any form (grants, contracts, Extension) this will be a worthwhile read.  Fortunately, it is a relatively short read (text is 61 pages plus another 7 pages of appended material).  This manuscript explains the “official” federal view of evaluation.  It is always good to know what is expected.  I highly recommend this read.  The worst it could be is good bedtime reading…zzzzzz-zz-z. 

I have a quandary.  Perhaps you have a solution.

I am the evaluator on a program where the funding agency wants clear, measurable, and specific outcomes.  (OK, you say) The funding agency program people were asked to answer the question, “What do you expect to happen as a result of the program?”

These folks responded with a programmatic equivalent of “world peace.”  I virtually rolled my eyes.  IMHO there was no way that this program would end in world peace.  Not even no hunger (a necessary precursor to world peace).  After suggesting that perhaps that goal was unattainable given the resources and activities intended,  they came out of the fantasy world in which they were living and said, realistically, “We don’t know, exactly.”  I probed further.  The sites (several) were all different; the implementation processes (also several) were all different; the resources were all different (depending on site); and the list goes on.  Oh, and the program was to be rolled out soon in another site without an evaluation of the previous sites.  BUT THEY WANTED CLEAR, MEASURABLE, AND SPECIFIC OUTCOMES.

What would you do in this situation?  (I know what I proposed–there was lukewarm response.  I have an idea what would work–although the approach was not mainstream evaluation and these were mainstream folks.)  So I turn to you, Readers.  What would you do?  Let me know.  PLEASE.

 

Oh, and Happy Groundhog’s Day.  I understand there will be six more weeks of winter (there was serious frost this morning in Corvallis OR).

 

 

 

Recently, I’ve been dealing with several different logic models which all use the box format.  You know the one that Ellen Taylor-Powell advocated in her UWEX tutorial.  We are all familiar with this approach.  And all know that this approach helps conceptualize a program; identify program theory; and possible outcomes (maybe even world peace).  Yet, there is much more that can be done with logic models that isn’t in the tutorial.  The tutorial starts us off with this diagram. 

Inputs are what is invested; outputs are what is done; and outcomes are what results/happens.  And we assume (you KNOW what assumptions do, right?) that all the inputs lead to all outputs lead to all outcomes, because that is what the arrows show.  NOT.  One of the best approaches to logic modeling that I’ve seen and learned in the last few years is to make the inputs specific to the outputs and the outputs specific to the outcomes.  It IS possible that volunteers are NOT the input you need to have the outcome you desire (change in social conditions); or it may be. OR volunteers will lead to an entirely different outcome–for example, only change in knowledge, not condition. Connecting the resources specifically helps to clarify for program people what is expected with what will be done and with what resources.

Connecting those points with individual arrows and feedback loops (if appropriate) makes sense.

Jonny Morell suggests that these relationships may be 1:1, 1:many, many:1; many:many; and/or be classified by precedence (which he describes as A before B, A & B simultaneously, and agnostic with respect to procedure.)  If these relationships exist,  and I believe they do, then just filling boxes isn’t a good idea.  (If you want to check out his Power Point presentation at the AEA site, you will have to join  AEA because access this presentation is in the non-public  eLibrary available only to members.  However, I was able to copy and include the slide to which I refer (with permission).



As you can see, it all depends.  Depends on the resources, the planned outputs, the desired outcomes.  Relationships are key.

And you thought logic models were simple.

 

For my new year’s post, I mentioned that AEA is running a series blog posts in aea365 written by evaluators who blog.  Susan Kistler has compiled a schedule of who will be blogging in aea365 when.  This link will take you to the full series and be updated as new posts come online

http://aea365.org/blog/?s=bloggers+series&submit=Go).  The results of Susan’s request is that evaluators who blog will post to aea365 one week a month, starting the last week in December.  January posts will run January 22-27; February posts will run February 12-17; March, the 18th-23; April will run 22-25.

I’ve mentioned aea365 before.  I’ll mention it again.  You can subscribe either by email or RSS feed.  The blogs are archived.  They are not specific to any aspect of evaluation.  Some times they are interesting and helpful; sometimes not.  The variety is rich; the effort tremendous; and the resources useful.  Check it out.

A colleague made a point last week that I want to bring to your attention.  The comment made it clear that when a planning program it is important to think about how to determine what difference the program is making at the beginning of the program, not at the end.

Over the last two years, I’ve alluded to the fact that retrofitting evaluation, while possible, is not ideal.  Granted, sometimes programs are already in place and it is important to report the difference the program made, so evaluation needs to be retrofitted.  Sometimes programs have been in place a long time and need to show long term outcomes (even if they are called impacts).  In cases like that, yes, evaluation needs to be retrofitted.  What this colleague was talking about was a NEW program; one that has never been presented before.

There are lots of ways to get the answer to the question, “What difference is this program making?”  We are not going to talk about methods today, though.  We are going to talk about programs and how programs relate to evaluation.

When I start to talk about evaluation with a faculty member, I ask them what do they expect to happen.  If they understand the program theory, they can describe what outcome is expected.  This is when I pull out the model below.

This model shows the logical linkage between what is expected (outcomes) and what was done to whom (outputs) with what resources (inputs), if you follow the arrow right to left.  If, however, you follow the arrow left to right, you see what resources you need to conduct what activities to whom to expect what outcomes.  Each box (inputs, outputs, outcomes) has an evaluative activity that accompanies it.  In the situation, a needs assessment is the evaluative activity.  Here you are evaluating how to determine what needs to be changed between what is and what should be.  In the resources, you can do a variety of activities; specifically, you can determine if you had enough.  You can also do a cost analysis (there are several).  You can also do a process evaluation.  In outputs, you can determine if you did what you said you would do in the time you said you would do it and with the target audience.  I have always called this a progress evaluation.  In outcomes, you actually determine what difference the program made in the lives of the target audience–for teaching purposes, I have called this a product evaluation.  Here, you want to know if what they know is different; what they do is different; and what the conditions in which they work, live, and play are different.  You do that by thinking first what will the program do.

 

Now this is all very well and good–if you have some idea about what the specific and  measurable outcomes are.  Sometimes you won’t know this because the program has never been done before in quite the way you are doing it OR because the program is developing as you provide it.  (I’m sure there is a third reason–there always is–only I can’t think of one as I type.)

This is why planning evaluation when you are planning the program is important.

 

Starting this week, aea365 is posting a series of posts authored by evaluators who blog.  Check it out!

 

There will be a lot of different approaches starting with Susan Kistler, Executive Director of the American Evaluation Association, who blogs every Saturday for aea365.  She has been doing this for almost two years.

 

So even though I’m not blogging on a topic this week (see last week’s post), I wanted to share this with you.What a good way to start a new year–new resources for evaluators.