Ever wonder where the 0.05 probability level number was derived? Ever wonder if that is the best number? How many of you were taught in your introduction to statistics course that 0.05 is the probability level necessary for rejecting the null hypothesis of no difference? This confidence may be spurious. As Paul Bakker indicates in the AEA 365 blog post for March 28, “Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision.” Do they really need to be 95% confident? Or would 90% confidence be sufficient? What about 75% or even 55%?
Think about it for a minute? If you were a brain surgeon, you wouldn’t want anything less than 99.99% confidence; if you were looking at level of risk for a stock market investment, 55% would probably make you a lot of money. The academic community has held to and used the probability level of 0.05 for years (the computation of the p value dating back to 1770). (Quoting Wikipedia, ” In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.”) Fisher first proposed the 0.05 level in 1025 and established a one in 20 limit for statistical significance when considering a two tailed test. Sometimes the academic community makes the probability level even more restrictive by using 0.01 or 0.001 to demonstrate that the findings are significant. Scientific journals expect 95% confidence or a probability level of at least 0.05.
Although I have held to these levels, especially when I publish a manuscript, I have often wondered if this level makes sense. If I am only curious about a difference, do I need 0.05? Oor could I use 0.10 or 0.15 or even 0.20? I have often asked students if they are conducting confirmatory or exploratory research? I think confirmatory research expects a more stringent probability level. I think exploratory research requires a less stringent probability level. The 0.05 seems so arbitrary.
Then there is the grounded theory approach which doesn’t use a probability level. It generates theory from categories which are generated from concepts which are identified from data, usually qualitative in nature. It uses language like fit, relevance, workability, and modifiability. It does not report statistically significant probabilities as it doesn’t use inferential statistics. Instead, it uses a series of probability statements about the relationships between concepts.
So what do we do? What do you do? Let me know.
Today’s post is longer than I usually post. I think it is important because it captures an aspect of data analysis and evaluation use that many of us skip right over: How to present findings using the tools that are available. Let me know if this works for you.
Ann Emery blogs at Emery Evaluation. She challenged readers a couple of weeks ago to reproduce a bubble chart in either Excel or R. This week she posted the answer. She has given me permission to share that information with you. You can look at the complete post at Dataviz Copycat Challenge: The Answers.
I’ve also copied it here in a shortened format:
“Here’s my how-to guide. At the bottom of this blog post, you can download an Excel file that contains each of the submissions. We each used a slightly different approach, so I encourage you to study the file and see how we manipulated Excel in different ways.
Here’s that chart from page 7 of the State of Evaluation 2012 report. We want to see whether we can re-create the chart in the lower right corner. The visualization uses circles, which means we’re going to create a bubble chart in Excel.
To fool Excel into making circles, we need to create a bubble chart in Excel. Click here for a Microsoft Office tutorial. According to the tutorial, “A bubble chart is a variation of a scatter chart in which the data points are replaced with bubbles. A bubble chart can be used instead of a scatter chart if your data has three data series.”
We’re not creating a true scatter plot or bubble chart because we’re not showing correlations between any variables. Instead, we’re just using the foundation of the bubble chart design – the circles. But, we still need to envision our chart on an x-y axis in order to make the circles.
It helps to sketch this part by hand. I printed page 7 of the report and drew my x and y axes right on top of the chart. For example, 79% of large nonprofit organizations reported that they compile statistics. This bubble would get an x-value of 3 and a y-value of 5.
I didn’t use sequential numbering on my axes. In other words, you’ll notice that my y-axis has values of 1, 3, and 5 instead of 1, 2, and 3. I learned that the formatting seemed to look better when I had a little more space between my bubbles.
Open a new Excel file and start typing in your values. For example, we know that 79% of large nonprofit organizations reported that they compile statistics. This bubble has an x-value of 3, a y-value of 5, and a bubble size of 79%.
Go slowly. Check your work. If you make a typo in this step, your chart will get all wonky.
Highlight the three columns on the right – the x column, the y column, and the frequency column. Don’t highlight the headers themselves (x, y, and bubble size). Click on the “Insert” tab at the top of the screen. Click on “Other Charts” and select a “Bubble Chart.”
First, add the basic data labels. Right-click on one of the bubbles. A drop-down menu will appear. Select “Add Data Labels.” You’ll get something that looks like this:
Second, adjust the data labels. Right-click on one of the data labels (not on the bubble). A drop-down menu will appear. Select “Format Data Labels.” A pop-up screen will appear. You need to adjust two things. Under “Label Contains,” select “Bubble Size.” (The default setting on my computer is “Y Value.”) Next, under “Label Position,” select “Center.” (The default setting on my computer is “Right.)
Your basic bubble chart is finished! Now, you just need to fiddle with the formatting. This is easier said than done, and probably takes the longest out of all the steps.
Here’s how I formatted my bubble chart:
For more details about formatting charts, check out these tutorials.
Click here to download the Excel file that I used to create this bubble chart. Please explore the chart by right-clicking to see how the various components were made. You’ll notice a lot of text boxes on top of each other!”
Creativity is not an escape from disciplined thinking. It is an escape with disciplined thinking.” – Jerry Hirschberg – via @BarbaraOrmsby
The above quote was in the September 7 post of Harold Jarche’s blog. I think it has relevance to the work we do as evaluators. Certainly, there is a creative part to evaluation; certainly there is a disciplined thinking part to evaluation. Remembering that is sometimes a challenge.
So where in the process do we see creativity and where do we see disciplined thinking?
When evaluators construct a logic model, you see creativity; you also see disciplined thinking
When evaluators develop an implementation plan, you see creativity; you also see disciplined thinking.
When evaluators develop a methodology and a method, you see creativity; you also see disciplined thinking.
When evaluators present the findings for use, you see creativity; you also see disciplined thinking.
So the next time you say “give me a survey for this program”, think–Is a survey the best approach to determining if this program is effective; will it really answer my questions?
Creativity and disciplined thinking are companions in evaluation.
A colleague asks, “What is the appropriate statistical analysis test when comparing means of two groups ?”
I’m assuming (yes, I know what assuming does) that parametric tests are appropriate for what the colleague is doing. Parametric tests (i.e., t-test, ANOVA,) are appropriate when the parameters of the population are known. If that is the case (and non-parametric tests are not being considered), I need to clarify the assumptions underlying the use of parametric tests, which have more stringent assumptions than nonparametric tests. Those assumptions are the following:
The sample is
If those assumptions are met, the part answer is, “It all depends”. (I know you have heard that before today.)
I will ask the following questions:
Once I know the answers to these questions I can suggest a test.
My current favorite statistics book, Statistics for People Who (Think They) Hate Statistics, by Neil J. Salkind (4th ed.) has a flow chart that helps you by asking if you are looking at differences between the sample and the population and relationships or differences between one or more groups. The flow chart ends with the name of a statistical test. The caveat is that you are working with a sample from a larger population that meets the above stated assumptions.
How you answer the questions above also depends on what test you can use. If you do not know the parameters, you will NOT use a parametric test. If you are using an intact population (and many Extension professionals use intact populations), you will NOT use inferential statistics as you will not be inferring to anything bigger than what you have at hand. If you have two groups and the groups are related (like a pre-post test or a post-pre test), you will use a parametric or non-parametric test for dependency. If you have two groups and are they unrelated (like boys and girls), you will use a parametric or non-parametric test for independence. If you have more than two groups you will use different test yet.
Extension professionals are rigorous in their content material; they need to be just as rigorous in their analysis of the data collected from the content material. Understanding the what analyses to use when is a good skill to have.
Last week, I spoke about how to questions and applying them to program planning, evaluation design, evaluation implementation, data gathering, data analysis, report writing, and dissemination. I only covered the first four of those topics. This week, I’ll give you my favorite resources for data analysis.
This list is more difficult to assemble. This is typically where the knowledge links break down and interest is lost. The thinking goes something like this. I’ve conducted my program, I’ve implemented the evaluation, now what do I do? I know my program is a good program so why do I need to do anything else?
YOU need to understand your findings. YOU need to be able to look at the data and be able to rigorously defend your program to stakeholders. Stakeholders need to get the story of your success in short clear messages. And YOU need to be able to use the findings in ways that will benefit your program in the long run.
Remember the list from last week? The RESOURCES for EVALUATION list? The one that says:
1. Contact your evaluation specialist.
2. Listen to stakeholders–that means including them in the planning.
Good. This list still applies, especially the read part. Here are the readings for data analysis.
First, it is important to know that there are two kinds of data–qualitative (words) and quantitative (numbers). (As an aside, many folks think words that describe are quantitative data–they are still words even if you give them numbers for coding purposes, so treat them like words, not numbers).
Citation: Miles, M. B., & Huberman, A. Michael. (1994). Qualitative data analysis: An expanded source book. Thousand Oaks, CA: Sage Publications.
Fortunately, there are newer options, which may be as good. I will confess, I haven’t read them cover to cover at this point (although they are on my to-be-read pile).
If you don’t feel like tackling one of these resources, Ellen Taylor-Powell has written a short piece (12 pages in PDF format) on qualitative data analysis.
There are software programs for qualitative data analysis that may be helpful (Ethnograph, Nud*ist, others). Most people I know prefer to code manually; even if you use a soft ware program, you will need to do a lot of coding manually first.
Citation: Salkind, N. J. (2004). Statistics for people who (think they) hate statistics. (2nd ed. ). Thousand Oaks, CA: Sage Publications.
NOTE: there is a 4th ed. with a 2011 copyright available. He also has a version of this text that features Excel 2007. I like Chapter 20 (The Ten Commandments of Data Collection) a lot. He doesn’t talk about the methodology, he talks about logistics. Considering the logistics of data collection is really important.
Also, you need to become familiar with a quantitative data analysis software program–like SPSS, SAS, or even Excel. One copy goes a long way–you can share the cost and share the program–as long as only one person is using it at a time. Excel is a program that comes with Microsoft Office. Each of these has tutorials to help you.
Although I have been learning about and doing evaluation for a long time, this week I’ve been searching for a topic to talk about. A student recently asked me about the politics of evaluation–there is a lot that can be said on that topic, which I will save for another day. Another student asked me about when to do an impact study and how to bound that study. Certainly a good topic, too, though one that can wait for another post. Something I read in another blog got me thinking about today’s post. So, today I want to talk about gathering demographics.
Last week, I mentioned in my TIMELY TOPIC post about the AEA Guiding Principles. Those Principles along with the Program Evaluation Standards make significant contributions in assisting evaluators in making ethical decisions. Evaluators make ethical decisions with every evaluation. They are guided by these professional standards of conduct. There are five Guiding Principles and five Evaluation Standards. And although these are not proscriptive, they go along way to ensuring ethical evaluations. That is a long introduction into gathering demographics.
The guiding principle, Integrity/Honesty states that “Evaluators display honesty and integrity in their own behavior, and attempt to ensure the honesty and integrity of the entire evaluation process.” When we look at the entire evaluation process, as evaluators, we must strive constantly to maintain both personal and professional integrity in our decision making. One decision we must make involves deciding what we need/want to know about our respondents. As I’ve mentioned before, knowing what your sample looks like is important to reviewers, readers, and other stakeholders. Yet, if we gather these data in a manner that is intrusive, are we being ethical?
Joe Heimlich, in a recent AEA365 post, says that asking demographic questions “…all carry with them ethical questions about use, need, confidentiality…” He goes on to say that there are “…two major conditions shaping the decision to include – or to omit intentionally – questions on sexual or gender identity…”:
The first point relates to gender role issues–for example are gay men more like or more different from other gender categories? And what gender categories did you include in your survey? The second point relates to allowing an individual’s voice to be heard clearly and completely and have categories on our forms reflect their full participation in the evaluation. For example, does marital status ask for domestic partnerships as well as traditional categories and are all those traditional categories necessary to hear your participants?
The next time you develop a questionnaire that includes demographic questions, take a second look at the wording–in an ethical manner.
I was reading another evaluation blog (the American Evaluation Association’s blog AEA365) which talked about data base design. I was reminded that over the years, almost every Extension professional with whom I have worked has asked me the following question: “What do I do with my data now that I have all my surveys back?”
As Leigh Wang points out in her AEA365 comments, “Most training programs and publication venues focus on the research design, data collection, and data analysis phases, but largely leave the database design phase out of the research cycle.” The questions that this statement raises are: