I had a topic all ready to write about then I got sick. I’m sitting here typing this trying to remember what that topic was, to no avail. That topic went the way of much of my recent memory; another day, perhaps.
I do remember the conversation with my daughter about correlation. She had a correlation of .3 something with a probability of 0.011 and didn’t understand what that meant. We had a long discussion of causation and attribution and correlation.
We had another long conversation about practical v. statistical significance, something her statistics professor isn’t teaching. She isn’t learning about data management in her statistics class either. Having dealt with both qualitative and quantitative data for a long time, I have come to realize that data management needs to be understood long before you memorize the formulas for the various statistical tests you wish to perform. What if the flood happens????
So today I’m telling you about data management as I understand it, because the did actually happen and, fortunately, I didn’t loose my data. I had a data dictionary.
Data dictionary. The first step in data management is a data dictionary. There are other names for this, which escape me right now…know that a hard copy of how and what you have coded is critical. Yes, make a back up copy on your hard drive…have a hard copy because the flood might happen. (It is raining right now and it is Oregon in November.)
Take a hard copy of your survey, evaluation form, qualitative data coding sheet and mark on it what every code notation you used means. I’d show you an example of what I do, only they are at the office and I am home sick without my files. So, I’ll show you a clip art instead… . No, I don’t use cards any more for my data (I did once…most of you won’t remember that time…), I do make a hard copy with clear notations. I find my self doing that with other things to make sure I code the response the same way. That is what a data dictionary allows you to do–check yourself.
Then I run a frequencies and percentages analysis. I use SPSS (because that is what I learned first). I look for outliers, variables that are miscoded, and system generated missing data that isn’t missing. I look for any anomaly in the data, any humon error (i. e. my error). Then I fix it. Then I run my analyses.
There are probably more steps than I’ve covered today. These are the first steps that absolutely must be done BEFORE you do any analyses. Then you have a good chance of keeping your data safe.