Ever wonder where the 0.05 probability level number was derived? Ever wonder if that is the best number? How many of you were taught in your introduction to statistics course that 0.05 is the probability level necessary for rejecting the null hypothesis of no difference? This confidence may be spurious. As Paul Bakker indicates in the AEA 365 blog post for March 28, “Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision.” Do they really need to be 95% confident? Or would 90% confidence be sufficient? What about 75% or even 55%?
Think about it for a minute? If you were a brain surgeon, you wouldn’t want anything less than 99.99% confidence; if you were looking at level of risk for a stock market investment, 55% would probably make you a lot of money. The academic community has held to and used the probability level of 0.05 for years (the computation of the p value dating back to 1770). (Quoting Wikipedia, ” In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.”) Fisher first proposed the 0.05 level in 1025 and established a one in 20 limit for statistical significance when considering a two tailed test. Sometimes the academic community makes the probability level even more restrictive by using 0.01 or 0.001 to demonstrate that the findings are significant. Scientific journals expect 95% confidence or a probability level of at least 0.05.
Although I have held to these levels, especially when I publish a manuscript, I have often wondered if this level makes sense. If I am only curious about a difference, do I need 0.05? Oor could I use 0.10 or 0.15 or even 0.20? I have often asked students if they are conducting confirmatory or exploratory research? I think confirmatory research expects a more stringent probability level. I think exploratory research requires a less stringent probability level. The 0.05 seems so arbitrary.
Then there is the grounded theory approach which doesn’t use a probability level. It generates theory from categories which are generated from concepts which are identified from data, usually qualitative in nature. It uses language like fit, relevance, workability, and modifiability. It does not report statistically significant probabilities as it doesn’t use inferential statistics. Instead, it uses a series of probability statements about the relationships between concepts.
So what do we do? What do you do? Let me know.
We are four months into 2013 and I keep asking the question “Is this blog making a difference?” I’ve asked for an analytic report to give me some answers. I’ve asked you readers for your stories.
Let’s hear it for SEOs and how they pick up that title–I credit that with the number of comments I’ve gotten. I AM surprised at the number of comments I have gotten since January (hundreds, literally). Most say things like, “of course it is making a difference.” Some compliment me on my writing style. Some are in a foreign language which I cannot read (I am illiterate when it comes to Cyrillic, Arabic, Greek, Chinese, and other non-English alphabets). Some are marketing–wanting ping backs to their recently started blogs for some product. Some have commented specifically on the content (sample size and confidence intervals); some have commented on the time of year (vernal equinox). Occasionally, I get a comment like the comment below and I keep writing.
The questions of all questions… Do I make a difference? I like how you write and let me answer your question. Personally I was supposed to be dead ages ago because someone tried to kill me for the h… of it … Since then (I barely survived) I have asked myself the same question several times and every single time I answer with YES. Why? Because I noticed that whatever you do, there is always someone using what you say or do to improve their own life. So, I can answer the question for you: Do you make a difference? Yes, you do, because there will always be someone who uses your writings to do something positive with it. So, I hope I just made your day! And needless to say, keep the blog posts coming!
Enough update. New topic: I just got a copy of the third edition of Miles and Huberman (my to go reference for qualitative data analysis). Wait you say–Miles and Huberman are dead–yes, they are. Johnny Saldana (there needs to be a~ above the “n” in his name only I don’t know how to do that with this keyboard) was approached by Sage to be the third author and revise and update the book. A good thing, I think. Miles and Huberman’s second edition was published in 1994. That is almost 20 years. I’m eager to see if it will hold as a classic given that there are many other books on qualitative coding in press currently. (The spring research flyer from Gilford lists several on qualitative inquiry and analysis from some established authors.)
I also recently sat in on a research presentation of a candidate for a tenure track position here at OSU who talked about how the analysis of qualitative data was accomplished. Took me back to when I was learning–index cards and sticky notes. Yes, there are marvelous software programs out there (NVivo, Ethnograph, N*udist); I will support the argument that the best way to learn about your qualitative data is to immerse yourself in it with color coded index cards and sticky notes. Then you can use the software to check your results. Keep in mind, though, that you are the PI and you will bring many biases to the analysis of your data.
Harold Jarche shared in his blog a comment by a participant in one of his presentations. The comment is:
Knowledge is evolving faster than can be codified in formal systems and is depreciating in value over time.
This is really important for those of us who love the printed work (me) and teach (me and you). A statement like this tells us that we are out of date the moment we open our mouths; those institutions on which we depended for information (schools, libraries, even churches) are now passe.
The exponential growth of knowledge is much like that of population. I think this graphic image of population (by Waldir) is pretty telling (click on the image to read the fine print). The evaluative point that this brings home to me is the delay in making information available.
Do you (like me) when you say, “Look it up”, think web, not press, books, library, hard copy? Do you (like me) wonder how and where this information originated when the information is so cutting edge? Do you (like me) wonder how to keep up or even if you can? Books take over a year to come to fruition (I think the 2 year frame is more representative). Journal manuscripts take 6 to 9 months on a quick journal turn around. Blogs are faster and they express opinion; could they be a source of information?
I’ve decided to go to an advanced qualitative data seminar this summer as part of my professional development because I’m using more and more qualitative data (I still use quantitative data, too). It is supposed to be cutting edge. The book on which the seminar is based won’t be published until next month (April). How much information has been developed since that book went to press? How much information will be shared at the seminar? Or will that seminar be old news (and like old news, be ready for fish)? The explosion of information like the explosion of population, may be a good thing; or not. The question is what is being done with that knowledge? How is it being used? Or is it? Is the knowledge explosion an excuse for people to be information illiterate? To become focused (read narrow) in their field? What are you doing with what I would call miscellaneous information that is gathered unsystematically? What are you doing with information now–how are you using it for professional development–or are you?
These three questions have buzzed around my head for a while in various formats.
When I attend a conference, I wonder.
When I conduct a program, I wonder, again.
When I explore something new, I am reminded that perhaps someone else has been here and wonder, yet again.
After all, aren’t both of these statements (capacity building and engagement) relating to a “foreign country” and a different culture?
How does all this relate to evaluation? Read on…
Premise: Evaluation is an everyday activity. You evaluate everyday; all the time; you call it making decisions. Every time you make a decision, you are building capacity in your ability to evaluate. Sure, some of those decisions may need to be revised. Sure, some of those decisions may just yield “negative” results. Even so, you are building capacity. AND you share that knowledge–with your children (if you have them), with your friends, with your colleagues, with the random shopper in the (grocery) store. That is building capacity. Building capacity can be systematic, organized, sequential. Sometimes formal, scheduled, deliberate. It is sharing “What do I know that they don’t know (in the hope that they too will know it and use it).
Premise: Everyone knows something. In knowing something, evaluation happens–because people made decisions about what is important and what is not. To really engage (not just outreach which much of Extension does), one needs to “do as” the group that is being engaged. To do anything else (“doing to” or “doing with”) is simply outreach and little or no knowledge is exchanged. Doesn’t mean that knowledge isn’t distributed; Extension has been doing that for years. Just means that the assumption (and you know what assumptions do) is that only the expert can distribute knowledge. Who is to say that the group (target audience, participants) aren’t expert in at least part of what is being communicated. Probably are. It is the idea that … they know something that I don’t know (and I would benefit from knowing).
Premise: Everything, everyone is connected. Being prepared is the best way to learn something. Being prepared by understanding culture (I’m not talking only about the intersection of race and gender; I’m talking about all the stereotypes you carry with you all the time) reinforces connections. Learning about other cultures (something everyone can do) helps dis-spell stereotypes and mitigate stereotype threats. And that is an evaluative task. Think about it. I think it captures the What do all of us need to know that few of us knows?” question.
The topic of survey development seems to be popping up everywhere–AEA365, Kirkpatrick Partners, eXtension Evaluation Community of Practice, among others. Because survey development is so important to Extension faculty, I’m providing links and summaries.
“… it is critical that you pre-test it with a small sample first.” Real time testing helps eliminate confusion, improve clarity, and assures that you are asking a question that will give you an answer to what you want to know. This is so important today when many surveys are electronic.
It is also important to “Train your data collection staff…Data collection staff are the front line in the research process.” Since they are the people who will be collecting the data, they need to understand the protocols, the rationales, and the purposes of the survey.
Kirkpatrick Partners say:
“Survey questions are frequently impossible to answer accurately because they actually ask more than one question. “ This is the biggest problem in constructing survey questions. They provide some examples of asking more than one question.
Michael W. Duttweiler, Assistant Director for Program Development and Accountability at Cornell Cooperative Extension stresses the four phases of survey construction:
He then indicates that the next three blog posts will cover point 2, 3, and 4.
Probably my favorite post on survey recently was one that Jane Davidson did back in August, 2012 in talking about survey response scales. Her “boxers or briefs” example captures so many issues related to survey development.
Writing survey questions which give you useable data that answers your questions about your program is a challenge; it is not impossible. Dillman writes the book about surveys; it should be on your desk.
Here is the Dillman citation:
Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009). Internet, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: John Wiley & Sons, Inc.
Evaluation costs: A few weeks ago, I posted a summary about evaluation costs. A recent AEA LinkedIn discussion was on the same topic (see this link). If you have not linked to other evaluators, there are other groups besides AEA that have LinkedIn groups. You might want to join one that is relevant.
New topic: The video on surveys posted last week generated a flurry of comments (though not on this blog). I think it is probably appropriate to revisit the topic of surveys. As I decided to revisit this topic, an AEA 365 post from the Wilder Research group talked about data coding related to longitudinal data.
Now, many surveys, especially Extension surveys, focus on cross sectional data not on longitudinal data. They may, however, involve a large number of participants and the hot tips that are provided apply to coding surveys. Whether the surveys Extension professionals develop involve 30, 300, or 3000 participants, these tips are important especially if the participants are divided into groups on some variable. Although the hot tips in the Wilder post talk about coding, not surveys specifically, they are relevant to surveys and I’m repeating them here. (I’ve also adapted the original tip to Extension use).
Over the rest of the year, I’ll be revisiting survey on a regular basis. Survey is often used by Extension. Developing a survey that provides you with information you want, can use, and makes sense is a useful goal.
New topic: I’m thinking of varying the format of the blog or offering alternative formats with evaluation information. I’m curious as to what would help you do your work better. Below are a few options. Let me know what you’d like.
A colleague asks, “What is the appropriate statistical analysis test when comparing means of two groups ?”
I’m assuming (yes, I know what assuming does) that parametric tests are appropriate for what the colleague is doing. Parametric tests (i.e., t-test, ANOVA,) are appropriate when the parameters of the population are known. If that is the case (and non-parametric tests are not being considered), I need to clarify the assumptions underlying the use of parametric tests, which have more stringent assumptions than nonparametric tests. Those assumptions are the following:
The sample is
If those assumptions are met, the part answer is, “It all depends”. (I know you have heard that before today.)
I will ask the following questions:
Once I know the answers to these questions I can suggest a test.
My current favorite statistics book, Statistics for People Who (Think They) Hate Statistics, by Neil J. Salkind (4th ed.) has a flow chart that helps you by asking if you are looking at differences between the sample and the population and relationships or differences between one or more groups. The flow chart ends with the name of a statistical test. The caveat is that you are working with a sample from a larger population that meets the above stated assumptions.
How you answer the questions above also depends on what test you can use. If you do not know the parameters, you will NOT use a parametric test. If you are using an intact population (and many Extension professionals use intact populations), you will NOT use inferential statistics as you will not be inferring to anything bigger than what you have at hand. If you have two groups and the groups are related (like a pre-post test or a post-pre test), you will use a parametric or non-parametric test for dependency. If you have two groups and are they unrelated (like boys and girls), you will use a parametric or non-parametric test for independence. If you have more than two groups you will use different test yet.
Extension professionals are rigorous in their content material; they need to be just as rigorous in their analysis of the data collected from the content material. Understanding the what analyses to use when is a good skill to have.
I came across this quote from Viktor Frankl today (thanks to a colleague)
“…everything can be taken from a man (sic) but one thing: the last of the human freedoms – to choose one’s attitude in any given set of circumstances, to choose one’s own way.” Viktor Frankl (Man’s Search for Meaning – p.104)
I realized that, especially at this time of year, attitude is everything–good, bad, indifferent–the choice is always yours.
How we choose to approach anything depends upon our previous experiences–what I call personal and situational bias. Sadler* has three classifications for these biases. He calls them value inertias (unwanted distorting influences which reflect background experience), ethical compromises (actions for which one is personally culpable), and cognitive limitations (not knowing for what ever reason).
When we approach an evaluation, our attitude leads the way. If we are reluctant, if we are resistant, if we are excited, if we are uncertain, all these approaches reflect where we’ve been, what we’ve seen, what we have learned, what we have done (or not). We can make a choice how to proceed.
The America n Evaluation Association (AEA) has long had a history of supporting difference. That value is imbedded in the guiding principles. The two principles which address supporting differences are
AEA also has developed a Cultural Competence statement. In it, AEA affirms that “A culturally competent evaluator is prepared to engage with diverse segments of communities to include cultural and contextual dimensions important to the evaluation. Culturally competent evaluators respect the cultures represented in the evaluation.”
Both of these documents provide a foundation for the work we do as evaluators as well as relating to our personal and situational bias. Considering them as we enter into the choice we make about attitude will help minimize the biases we bring to our evaluation work. The evaluative question from all this–When has your personal and situational biases interfered with you work in evaluation?
Attitude is always there–and it can change. It is your choice.
Sadler, D. R. (1981). Intuitive data processing as a potential source of bias in naturalistic evaluations. Education Evaluation and Policy Analysis, 3, 25-31.
I am reading the book, Eaarth, by Bill McKibben (a NY Times review is here). He writes about making a difference in the world on which we live. He provides numerous examples that have all happened in the 21st century, none of them positive or encouraging. He makes the point that the place in which we live today is not, and never will be again, like the place in which we lived when most of us were born. He talks about not saving the Earth for our grandchildren but rather how our parents needed to have done things to save the earth for them–that it is too late for the grandchildren. Although this book is very discouraging, it got me thinking.
Isn’t making a difference what we as Extension professionals strive to do?
Don’t we, like McKibben, need criteria to determine what that difference can/could/would be made and look like?
And if we have that criteria well established, won’t we be able to make a difference, hopefully positive (think hand washing here)? And like this graphic, , won’t that difference be worth the effort we have put into the attempt? Especially if we thoughtfully plan how to determine what that difference is?
We might not be able to recover (according to McKibben, we won’t) the Earth the way it was when most of us were born; I think we can still make a difference–a positive difference–in the lives of the people with whom we work. That is an evaluative opportunity.
A colleague asks for advice on handling evaluation stories, so that they don’t get brushed aside as mere anecdotes. She goes on to say of the AEA365 blog she read, ” I read the steps to take (hot tips), but don’t know enough about evaluation, perhaps, to understand how to apply them.” Her question raises an interesting topic. Much of what Extension does can be captured in stories (i.e., qualitative data) rather than in numbers (i.e., quantitative data). Dick Krueger, former Professor and Evaluation Leader (read specialist) at the University of Minnesota has done a lot of work in the area of using stories as evaluation. Today’s post summarizes his work.
There are all types of stories. The type we are interested in for evaluation purposes are organizational stories. Organizational stories can do the following things for an organization:
He suggests six common types of organizational stories:
To use stories as evaluation, the evaluator needs to consider how stories might be used, that is, do they depict how people experience the program? Do they understand program outcomes? Do they get insights into program processes?
You (as evaluator) need to think about how the story fits into the evaluation design (think logic model; program planning). Ask yourself these questions: Should you use stories alone? Should you use stories that lead into other forma of inquiry? Should you use stories that augment/illustrate results from other forms of inquiry?
You need to establish criteria for stories. Rigor can be applied to story even though the data are narrative. Criteria include the following: Is the story authentic–is it truthful? Is the story verifiable–is there a trail of evidence back to the source of the story? Is there a need to consider confidentiality? What was the original intent–purpose behind the original telling? And finally, what does the story represent–other people or locations?
You will need a plan for capturing the stories. Ask yourself these questions: Do you need help capturing the stories? What strategy will you use for collecting the stories? How will you ensure documentation and record keeping? (Sequence the questions; write them down the type–set up; conversational; etc.) You will also need a plan for analyzing and reporting the stories as you, the evaluator, are responsible for finding meaning.