While I discussing evaluation in general earlier this week, the colleague with whom I was conversing asked me how data from a post/pre evaluation form are analyzed. I pondered this for a nanosecond and said change scores…one would compute the difference between the post ranking and the pre ranking and subject that change to some statistical test. “What test?” my colleague asked.
So, today’s post is on what test and why?
First, you need to remember that the post/pre data are related response. SPSS uses the label “paired samples” or “2-related samples” and those labels are used with a parametric test and a non-parametric test, respectively for responses from the same person (two related responses).
Parametric tests (like the t-test) are based on the assumption that the data are collected from a normal distribution (i.e., bell shaped distribution), a distribution based on known parameters (i.e., means and standard deviation).
Non-parametric tests (like the Wilcoxon or the McNemar test) do not make assumptions about the population distribution. Instead, these tests rank the data from low to high and then analyze the ranks. Some times these tests are known as distribution-free tests because the parameters of the population are not known. Extension professionals work with populations where parameters are not known most of the time.
If you KNOW (reasonably) that the population’s distribution approximates a normal bell curve, choose a parametric test–in the case of post/pre, that would be a t-test, because the responses are related.
The last criteria is the one to remember.
If you have a large sample, it doesn’t matter if the distribution is normal because the parametric test is robust enough to ignore the distribution. The only caveat is determining what a “large sample” is. One source I read says, “Unless the population distribution is really weird, you are probably safe choosing a parametric test when there are at least two dozen data points in each group.” That means at least 24 data point in each group. If the post/pre evaluation has six questions and each question is answered by 12 people both post and pre, each question has only 12 data points–12 post; 12 pre. You can’t lump the questions (6) and multiply by the number of people (12) by post and pre (2). Each question is viewed as a separate set of data points. My statistics professor always insisted on a sample size of 30 to have enough power to determine a difference if a difference exists.
If you have a large sample and use a non-parametric test, the test are slightly less powerful than a parametric test used with a large sample. To see what the difference is, use a t-test and a Wilcoxon test to analyze one question on post/pre and see what the difference is. Won’t be much.
If you have a small sample and you use a parametric test with a distribution that is NOT normal, the probability value may be inaccurate. Again run both tests to see the difference. You want to use the test with the most conservative probability value (0.0001 is more conservative than 0.001).
If you have a small sample and you use a non-parametric with a normal distribution, the probability value may be too high because the non-parametric test lacks power to determine a difference. Again, run the tests to see the difference. Choose the test that is more conservative.
My experience is that using a non-parametric test for much of the analyses done with data from Extension-based projects provides a more realistic analysis.