Quantitative data analysis is typically what happens to data that are numbers (although qualitative data can be reduced to numbers, I’m talking here about data that starts as numbers.)  Recently, a library colleague sent me an article that was relevant to what evaluators often do–analyze numbers.

This article refers specifically to three metrics that are often overlooked by Extension faculty:  margin of error (MoE), confidence level (CL), and cross-tabulation analysis.   These are three statistics which will help you in your work. The article also does a nice job of listing the eight recommended best practices which I’ve appended here with only some of the explanatory text.

## Complete List of Best Practices for Analyzing Multiple Choice Surveys

1. Inferential statistical tests. To be more certain of the conclusions drawn from survey data, use inferential statistical tests.

2. Confidence Level (CL). Choose your desired confidence level (typically 90%, 95%, or 99%) based upon the purpose of your survey and how confident you need to be of the results. Once chosen, don’t change it unless the purpose of your survey changes. Because the chosen confidence level is part of the formula that determines the margin of error, it’s also important to document the CL in your report or article where you document the margin of error (MoE).

3. Estimate your ideal sample size before you survey. Before you conduct your survey use a sample size calculator specifically designed for surveys to determine how many responses you will need to meet your desired confidence level with your hypothetical (ideal) margin of error (usually 5%).

4. Determine your actual margin of error after you survey. Use a margin of error calculator specifically designed for surveys (you can use the same Raosoft online calculator recommended above).

6. Apply the chi-square test to your crosstab tables to see if there are relationships among the variables that are not likely to have occurred by chance.

7. Reading and reporting chi-square tests of cross-tab tables.

• Use the .05 threshold for your chi-square p-value results in cross-tab table analysis.
• If the chi-square p-value is larger than the threshold value, no relationship between the variables is detected. If the p-value is smaller than the threshold value, there is a statistically valid relationship present, but you need to look more closely to determine what that relationship is. Chi-square tests do not indicate the strength or the cause of the relationship.
• Always report the p-value somewhere close to the conclusion it supports (in parentheses after the conclusion statement, or in a footnote, or in the caption of the table or graph).

Hightower, C. & Kelly, S. (2012, Spring).  Infer more, describe less: More powerful survey conclusions through easy inferential tests.  Issues in Science and Technology Librarianship. DOI:10.5062/F45H7D64. [Online]. Available at: http://www.istl.org/12-spring/article1.html

A colleague asks, “What is the appropriate statistical analysis test when comparing means of two groups ?”

I’m assuming (yes, I know what assuming does) that parametric tests are appropriate for what the colleague is doing.  Parametric tests (i.e., t-test, ANOVA,) are appropriate when the parameters of the population are known.    If that is the case (and non-parametric tests are not being considered), I need to clarify the assumptions underlying the use of parametric tests, which have more stringent assumptions than nonparametric tests.  Those assumptions are the following:

The sample is

1. randomized (either by assignment or selection).
2. drawn from a population which has specified parameters.
3. normally distributed.
4. demonstrating  equality of variance in each variable.

If those assumptions are met,  the part answer is, “It all depends”.  (I know you have heard that before today.)

I will ask the following questions:

1. Do you know the parameters (measures of central tendency and variability) for the data?
2. Are they dependent or independent samples?
3. Are they intact populations?

Once I know the answers to these questions I can suggest a test.

My current favorite statistics book, Statistics for People Who (Think They) Hate Statistics, by Neil J. Salkind (4th ed.) has a flow chart that helps you by asking if you are looking at differences between the sample and the population and relationships or differences between one or more groups. The flow chart ends with the name of a statistical test.  The caveat is that you are working with a sample from a larger population that meets the above stated assumptions.

How you answer the questions above also depends on what test you can use.  If you do not know the parameters, you will NOT use a parametric test.  If you are using an intact population (and many Extension professionals use intact populations), you will NOT use inferential statistics as you will not be inferring to anything bigger than what you have at hand.  If you have two groups and the groups are related (like a pre-post test or a post-pre test), you will use a parametric or non-parametric test for dependency.  If you have two groups and are they unrelated (like boys and girls), you will use a parametric or non-parametric test for independence.  If you have more than two groups you will use different test yet.

Extension professionals are rigorous in their content material; they need to be just as rigorous in their analysis of the data collected from the content material.  Understanding the what analyses to use when is a good skill to have.

I came across this quote from Viktor Frankl today (thanks to a colleague)

“…everything can be taken from a man (sic) but one thing: the last of the human freedoms – to choose one’s attitude in any given set of circumstances, to choose one’s own way.” Viktor Frankl (Man’s Search for Meaning – p.104)

I realized that,  especially at this time of year, attitude is everything–good, bad, indifferent–the choice is always yours.

How we choose to approach anything depends upon our previous experiences–what I call personal and situational bias.   Sadler* has three classifications for these biases.  He calls them value inertias (unwanted distorting influences which reflect background experience), ethical compromises (actions for which one is personally culpable), and cognitive limitations (not knowing for what ever reason).

When we approach an evaluation, our attitude leads the way.  If we are reluctant, if we are resistant, if we are excited, if we are uncertain, all these approaches reflect where we’ve been, what we’ve seen, what we have learned, what we have done (or not).  We can make a choice how to proceed.

The America n Evaluation Association (AEA) has long had a history of supporting difference.  That value is imbedded in the guiding principles.  The two principles which address supporting differences are

• Respect for People:  Evaluators respect the security, dignity, and self-worth of respondents, program participants, clients, and other evaluation stakeholders.
• Responsibilities for General and Public Welfare: Evaluators articulate and take into account the diversity of general and public interests and values that may be related to the evaluation.

AEA also has developed a Cultural Competence statement.  In it, AEA affirms that “A culturally competent evaluator is prepared to engage with diverse segments of communities to include cultural and contextual dimensions important to the evaluation. Culturally competent evaluators respect the cultures represented in the evaluation.”

Both of these documents provide a foundation for the work we do as evaluators as well as relating to our personal and situational bias. Considering them as we  enter into the choice we make about attitude will help minimize the biases we bring to our evaluation work.  The evaluative question from all this–When has your personal and situational biases interfered with you work in evaluation?

Attitude is always there–and it can change.  It is your choice.

Sadler, D. R. (1981). Intuitive data processing as a potential source of bias in naturalistic evaluations.  Education Evaluation and Policy Analysis, 3, 25-31.

I am reading the book, Eaarth, by Bill McKibben (a NY Times review is here).  He writes about making a difference in the world on which we live.  He provides numerous  examples that have all happened in the 21st century, none of them positive or encouraging. He makes the point that the place in which we live today is not, and never will be again, like the place in which we lived when most of us were born.  He talks about not saving the Earth for our grandchildren but rather how our parents needed to have done things to save the earth for them–that it is too late for the grandchildren.  Although this book is very discouraging, it got me thinking.

Isn’t making a difference what we as Extension professionals strive to do?

Don’t we, like McKibben, need criteria to determine what that difference can/could/would be made and look like?

And if we have that criteria well established, won’t we be able to make a difference, hopefully positive (think hand washing here)?  And like this graphic, , won’t that difference be worth the effort we have put into the attempt?  Especially if we thoughtfully plan how to determine what that difference is?

We might not be able to recover (according to McKibben, we won’t) the Earth the way it was when most of us were born; I think we can still make a difference–a positive difference–in the lives of the people with whom we work.  That is an evaluative opportunity.

I was talking with a colleague about evaluation capacity building (see last week’s post) and the question was raised about thinking like an evaluator.  Got me thinking about the socialization of professions and what has to happen to build a critical mass of like minded people.

Certainly, preparatory programs in academia conducted by experts, people who have worked in the field a long time–or at least longer than you starts the process.  Professional development helps–you know, attending meetings where evaluators meet (like the upcoming AEA conference, U. S. regional affiliates [there are many and they have conferences and meetings, too], and international organizations [increasing in number–which also host conferences and professional development sessions]–let me know if you want to know more about these opportunities).  Reading new and timely literature  on evaluation provides insights into the language.  AND looking at the evaluative questions in everyday activities.  Questions such as:  What criteria?  What  standards?  Which values?  What worth? Which decisions?

The socialization of evaluators happens because people who are interested in being evaluators look for the evaluation questions in everything they do.  Sometimes, looking for the evaluative question is easy and second nature–like choosing a can of corn at the grocery store; sometimes it is hard and demands collaboration–like deciding on the effectiveness of an educational program.

My recommendation is start with easy things–corn, chocolate chip cookies, wine, tomatoes; move to harder things with more variables–what to wear when and where, or whether to include one group or another .  The choices you make  will all depend upon what criteria is set, what standards have been agreed upon, and what value you place on the outcome or what decision you make.

The socialization process is like a puzzle, something that takes a while to complete, something that is different for everyone, yet ultimately the same.  The socialization is not unlike evaluation…pieces fitting together–criteria, standards, values, decisions.  Asking the evaluative questions  is an ongoing fluid process…it will become second nature with practice.

Hopefully, the technical difficulties with images is no longer a problem and I will be able to post the answers to the history quiz and the post I had hoped to post last week.  So, as promised, here are the answers to the quiz I posted the week of July 5.  The keyed responses are in BOLD

1.  Michael Quinn Patton, author of Utilization-Focused Evaluation and the new book, Developmental Evaluation and the classic Qualitative Evaluation and Research Methods .

2.   Michael Scriven is best known for his concept of formative and summative evaluation. He has also advocated that evaluation is a transdiscipline.  He is the author of the Evaluation Thesaurus .

3. Hallie Preskill is the co-author (with Darlene Russ-Eft) of Evaluation Capacity Building

4. Robert E. Stake has advanced work in case study and is the author of the book Multiple Case Study and The Art of Case Study Research.

5. David M. Fetterman is best known for his advocacy of empowerment evaluation and the book of that name, Foundations of Empowerment Evaluation .

6. Daniel Stufflebeam developed the CIPP (context input process product) model which is discussed in the book Evaluation Models .

7. James W. Altschuldt is the go-to person for needs assessment.  He is the editor of the Needs Assessment Kit (or everything you wanted to know about needs assessment and didn’t know where to find the answer).  He is also the co-author with Bell Ruth Witkin of two needs assessment books,  and  .

8. Jennifer C. Greene, the current President of the American Evaluation Association, and the author of a book on mixed methods .

9. Ernest R. House is a leader in the work of evaluation policy and is the author of  an evaluation novel,  Regression to the Mean   .

10. Lee J. Cronbach is a pioneer in education evaluation and the reform of that practice.  He co-authored with several associates the book, Toward Reform of Program Evaluation .

11.  Ellen Taylor-Powell, the former Evaluation Specialist at University of Wisconsin Extension Service and is credited with developing the logic model later adopted by the USDA for use by the Extension Service.  To go to the UWEX site, click on the words “logic model”.

12. Yvonna Lincoln, with her husband Egon Guba (see below) co-authored the book Naturalistic Inquiry  . She is the currently co-editor (with Norman K. Denzin) of the Handbook of Qualitative Research .

13.   Egon Guba, with his wife Yvonna Lincoln, is the co-author of 4th Generation Evaluation.

14. Blaine Worthen has championed certification for evaluators.  He wit h Jody L. Fitzpatrick and James
R. Sanders have co-authored Program Evaluation: Alternative Approaches and Practical Guidelines.

15.  Thomas A. Schwandt, a philosopher at heart who started as an auditor, has written extensively on evaluation ethics. He is also the co-author (with Edward S. Halpern) of Linking Auditing and Metaevaluation.

16.   Peter H. Rossi, co-author with Howard E. Freeman and Mark E. Lipsey, wrote Evaluation: A Systematic Approach , and is a pioneer in evaluation research.

17. W. James Popham, a leader in educational evaluation, and authored the volume, Educational Evaluation

18. Jason Millman was a pioneer of teacher evaluation and author of  Handbook of Teacher Evaluation

19.  William R. Shadish co-edited (with Laura C. Leviton and Thomas Cook) of Foundations of Program Evaluation: Theories of Practice . His work in theories of evaluation practice earned him the Paul F. Lazarsfeld Award for Evaluation Theory, from the American Evaluation Association in 1994.

20.   Laura C. Leviton (co-editor with Will Shadish and Tom Cook–see above) of Foundations of Program Evaluation: Theories of Practice has pioneered work in participatory evaluation.

Although I’ve only list 20 leaders, movers and shakers, in the evaluation field, there are others who also deserve mention:  John Owen, Deb Rog, Mark Lipsey, Mel Mark, Jonathan Morell, Midge Smith, Lois-Ellin Datta, Patricia Rogers, Sue Funnell, Jean King, Laurie Stevahn, John, McLaughlin, Michale Morris, Nick Smith, Don Dillman, Karen Kirkhart, among others.

If you want to meet the movers and shakers, I suggest you attend the American Evaluation Association annual meeting.  In 2011, it will be held in Anaheim CA, November 2 – 5; professional development sessions are being offered October 31, November 1 and 2, and also November 6.  More conference information can be found here.

We recently held Professional Development Days for the Division of Outreach and Engagement.  This is an annual opportunity for faculty and staff in the Division to build capacity in a variety of topics.  The question this training posed was evaluative:

How do we provide meaningful feedback?

Evaluating a conference or a multi-day, multi-session training is no easy task.  Gathering meaningful data is a challenge.  What can you do?  Before you hold the conference (I’m using the word conference to mean any multi-day, multi-session training), decide on the following:

• Are you going to evaluate the conference?
• What is the focus of the evaluation?
• How are you going to use the results?

The answer to the first question is easy:  YES.  If the conference is an annual event (or a regular event), you will want to have participants’ feedback of their experience, so, yes, you will evaluate the conference. Look at a Penn State Tip Sheet 16 for some suggestions.  (If this is a one time event, you may not; though as an evaluator, I wouldn’t recommend ignoring evaluation.)

The second question is more critical.  I’ve mentioned in previous blogs the need to prioritize your evaluation.  Evaluating a conference can be all consuming and result in useless data UNLESS the evaluation is FOCUSED.  Sit down with the planners and ask them what they expect to happen as a result of the conference.  Ask them if there is one particular aspect of the conference that is new this year.  Ask them if feedback in previous years has given them any ideas about what is important to evaluate this year.

This year, the planners wanted to provide specific feedback to the instructors.  The instructors had asked for feedback in previous years.  This is problematic if planning evaluative activities for individual sessions is not done before the conference.  Nancy Ellen Kiernan, a colleague at Penn State, suggests a qualitative approach called a Listening Post.  This approach will elicit feedback from participants at the time of the conference.  This method involves volunteers who attended the sessions and may take more persons than a survey.  To use the Listening Post, you must plan ahead of time to gather these data.  Otherwise, you will need to do a survey after the conference is over and this raises other problems.

The third question is also very important.  If the results are just given to the supervisor, the likelihood of them being used by individuals for session improvement or by organizers for overall change is slim.  Making the data usable for instructors means summarizing the data in a meaningful way, often visually.  There are several way to visually present survey data including graphs, tables, or charts.  More on that another time.  Words often get lost, especially if words dominate the report.

There is a lot of information in the training and development literature that might also be helpful.  Kirkpatrick has done a lot of work in this area.  I’ve mentioned their work in previous blogs.

There is no one best way to gather feedback from conference participants.  My advice:  KISS–keep it simple and straightforward.

Last week, I spoke about how to questions  and applying them  to program planning, evaluation design, evaluation implementation, data gathering, data analysis, report writing, and dissemination.  I only covered the first four of those topics.  This week, I’ll give you my favorite resources for data analysis.

This list is more difficult to assemble.  This is typically where the knowledge links break down and interest is lost.  The thinking goes something like this.  I’ve conducted my program, I’ve implemented the evaluation, now what do I do?  I know my program is a good program so why do I need to do anything else?

YOU  need to understand your findings.  YOU need to be able to look at the data and be able to rigorously defend your program to stakeholders.  Stakeholders need to get the story of your success in short clear messages.  And YOU need to be able to use the findings in ways that will benefit your program in the long run.

Remember the list from last week?  The RESOURCES for EVALUATION list?  The one that says:

2.  Listen to stakeholders–that means including them in the planning.

Good.  This list still applies, especially the read part.  Here are the readings for data analysis.

First, it is important to know that there are two kinds of data–qualitative (words) and quantitative (numbers).  (As an aside, many folks think words that describe are quantitative data–they are still words even if you give them numbers for coding purposes, so treat them like words, not numbers).

• Qualitative data analysis. When I needed to learn about what to do with qualitative data, I was given Miles and Huberman’s book.  (Sadly, both authors are deceased so there will not be a forthcoming revision of their 2nd edition, although the book is still available.)

Citation: Miles, M. B., & Huberman, A. Michael. (1994). Qualitative data analysis: An expanded source book. Thousand Oaks, CA: Sage Publications.

Fortunately, there are newer options, which may be as good.  I will confess, I haven’t read them cover to cover at this point (although they are on my to-be-read pile).

Citation:  Saldana, J.  (2009). The coding manual for qualitative researchers. Los Angeles, CA: Sage.

Bernard, H. R. & Ryan, G. W. (2010).  Analyzing qualitative data. Los Angeles, CA: Sage.

If you don’t feel like tackling one of these resources, Ellen Taylor-Powell has written a short piece  (12 pages in PDF format) on qualitative data analysis.

There are software programs for qualitative data analysis that may be helpful (Ethnograph, Nud*ist, others).  Most people I know prefer to code manually; even if you use a soft ware program, you will need to do a lot of coding manually first.

• Quantitative data analysis. Quantitative data analysis is just as complicated as qualitative data analysis.  There are numerous statistical books which explain what analyses need to be conducted.  My current favorite is a book by Neil Salkind.

Citation: Salkind, N. J. (2004).  Statistics for people who (think they) hate statistics. (2nd ed. ). Thousand Oaks, CA: Sage Publications.

NOTE:  there is a 4th ed.  with a 2011 copyright available. He also has a version of this text that features Excel 2007.  I like Chapter 20 (The Ten Commandments of Data Collection) a lot.  He doesn’t talk about the methodology, he talks about logistics.  Considering the logistics of data collection is really important.

Also, you need to become familiar with a quantitative data analysis software program–like SPSS, SAS, or even Excel.  One copy goes a long way–you can share the cost and share the program–as long as only one person is using it at a time.  Excel is a program that comes with Microsoft Office.  Each of these has tutorials to help you.

Historically, April 15 is tax day (although in 2011, it is April 18 )–the day taxes are due to the revenue departments.

State legislatures are dealing with budgets and Congress is trying to balance a  Federal budget.

Everywhere one looks, money is the issue–this is especially true in these recession ridden time.  How does all this relate to evaluation, you ask?  This is the topic for today’s blog.  How does money figure into evaluation.

Let’s start with the simple and move to the complex.  Everything costs–and although I’m talking about money, time, personnel, and resources  (like paper, staples, electricity, etc.)  must also be taken into consideration.

When we talk about evaluation, four terms typically come to mind:  efficacy, effectiveness, efficiency, and fidelity.

Efficiency is the term that addresses money or costs.  Was the program efficient in its use of resources?  That is the question asked addressing efficiency.

To answer that question, there are three (at least) approaches that are used to address this question:

1. Cost  or cost analysis;
2. Cost effectiveness analysis; and
3. Cost-benefit analysis.

Simply then:

1. Cost analysis is the number of dollars it takes to deliver the program, including salary of the individual(s) planning the program.
2. Cost effectiveness analysis is a computation of the target outcomes in an appropriate unit in ratio to the costs.
3. Cost-benefit analysis is also a ratio of the costs of outcomes to the benefits of the program measured in the same units, usually money.

How are these computed?

1. Cost can be measured by how much the consumer is willing to pay.  Costs can be the value of each resource that is consumed in the implementation of the program.  Or cost analysis can be “measuring costs so they can be related to procedures and outcomes” (Yates, 1996, p. 25).   So you list the money spent to implement the program, including salaries, and that is a cost analysis.  Simple.
2. Cost effectiveness analysis says that there is some metric in which the outcomes are measured (number of times hands are washed during the day, for example) and that is put in ratio of the total costs of the program.  So movement from washing hands only once a day (a bare minimum) to washing hands at least six times a day would have the costs of the program (including salaries) divided by the changed number of times hands are washed a day (i.e., 5).  The resulting value is the cost-effectiveness analysis.  Complex.
3. Cost-benefit analysis puts the outcomes in the same metric as the costs–in this case dollars.  The costs  (in dollars) of the program (including salaries) are put in ratio to the  outcomes (usually benefits) measured in dollars.  The challenge here is assigning a dollar amount to the outcomes.  How much is frequent hand washing worth? It is often measured in days saved from communicable/chronic/ acute  illnesses.  Computations of health days (reduction in days affected by chronic illness) is often difficult to value in dollars.  There is a whole body of literature in health economics for this topic, if you’re interested.  Complicated and complex.

You’ve developed your program.  You think you’ve met a need.  You conduct an evaluation.  Low and behold!  Some of your respondents give you such negative feedback you wonder what program they attended.  Could it really have been your program?

This is the phenomena I call “all of the people all of the time”, which occurs regularly  in evaluating training  programs.  And it has to do with use–what you do with the results of this evaluation.  And you can’t do it–please all of the people all of the time, that is.  There will always be some sour grapes.  In fact, you will probably have more negative comments than positive comments.  People who are upset want you to know; people are happy are just happy.

Now, I’m sure you are really confused.  Good.  At least I’ve got your attention and maybe you’ll read to the end of today’s post.

You have seen this scenario:  You ask the participants for formative data so that you can begin planning the next event or program.  You ask about the venue, the time of year, the length of the conference, the concurrent offerings, the plenary speakers.  Although some of these data are satisfaction data (the first level, called Reaction,  in Don Kirkpatrick’s training model and the Reaction category in Claude Bennett’s TOPs Hierarchy [see diagram]

they are important part of formative evaluation; an important part of program planning.  You are using the evaluation report.  That is important.  You are not asking if the participants learned something.  You are not asking if they intend to change their behavior.  You are not asking about what conditions have changed.  You only want to know about their experience in the program.

What do you do with the sour grapes?  You could make vinegar, only that won’t be very useful and use is what you are after.  Instead, sort the data into those topics over which you have some control and those topics over which you have no control.  For example–you have control over who is invited to be a plenary speaker, if there will be a plenary speaker, how many concurrent sessions, who will teach those concurrent sessions;  you have no control over the air handling at the venue, the chairs at the venue, and probably, the temperature of the venue.

You can CHANGE those topics over which you have control.  Comments say the plenary speaker was terrible.  Do not invite that person to speak again.  Feedback says that the concurrent sessions didn’t provide options for classified staff, only faculty.  Decide the focus of your program and be explicit in the program promotional materials–advertise it explicitly to your target audience.  You get complaints about the venue–perhaps there is another venue; perhaps not.

You can also let your audience know what you decided based on your feedback.  One organization for which I volunteered sent out a white paper with all the concerns and how the organization was addressing them–or not.  It helped the grumblers see that the organization takes their feedback seriously.

And if none of this works…ask yourself: Is it a case of all of the people all of the time?