I have been coding my qualitative interview data all in one big fell swoop, trying to get everything done for the graduation deadline. It feels almost like a class project that I’ve put off, as usual, longer than I should have. In having a conversation with another grad student, about timelines, and how I’ve been sitting on this data since oh, November or so (at least a good chunk of it), we speculated about why we don’t tackle it in smaller chunks. One reason for me, I’m sure, is just general fear of failure or whatever drives my general procrastinating and perfectionist tendencies (remember, the best dissertation is a DONE dissertation – we’re not here to save the world with this one project).

However, another reason occurs to me as well; I collected all the data myself and I wonder if I was too close to it in the process of collecting it? I certainly had to prioritize finishing collecting it, considering the struggles I had to get subjects to participate, and delays with IRB, etc. But I wonder if it’s actually been better to leave it all for a while and come back to it. I guess if I had really done the interview coding before the eye-tracking, I might have shaped the eye-tracking interviews a bit differently, but I think the main adjustments I made based on the interviews were sufficient without coding (i.e. I recognized how much the experts were just seeing that the images were all the same and I couldn’t come up with difficult enough tasks for them, really). The other reason to have coded the interviews first would have been to separate my interviewees into high- and low-performing, if the data proved to be that way, so that I could invite sub-groups for the eye-tracking. But I ended up, again due to recruitment issues, just getting whoever I could from my interview population to come back. And now, I’m not really sure there’s any high- or low-performers among the novices anyway – they each seem to have their strengths and weaknesses at this task.

Other fun with coding: I have a mix of basically closed-ended questions that I am scoring with a rubric for correctness, and then open-ended “how do you know” semi-clinical interview questions. Since I eventually repeated some of these questions for the various versions of the scaffolded images, my subjects started to conflate their answers and parsing these things apart is truly a pleasure (NOT). And, I’m up to some 120 codes, and keeping those all in mind as I go is just nuts. Of course, I have just done the first pass, and as I created codes as I went through, I have to turn around and re-code for those particular ones on the ones I coded before I created them, but I still am stressing as to whether I’m finding everything in every transcript, especially the sort of obscure codes. I have one that I’ve dubbed “Santa” because two of my subjects referred to knowing the poles of Earth are cold because they learned that Santa lives at the North Pole where it’s cold. So I’m now wondering if there were any other evidences of non-science reasoning that I missed. I don’t think this is a huge problem; I am fairly confident my coding is thorough, but I’m also at that stage of crisis where I’m not sure any of this is good enough as I draw closer to my defense!

Other fun facts: I also find myself agonizing over what to call codes, when the description is more important. And it’s also a very humbling look at how badly I (feel like I) conducted the interviews. For one thing, I asked all the wrong questions, as it turns out – what I expected people would struggle with, they didn’t really, and I didn’t have good questions ready to probe for what they did struggle with. Sigh. I guess that’s for the next experiment.

The good stuff: I do have a lot of good data about people’s expectations of the images and the topics, especially when there are misunderstandings. This will be important as we design new products for outreach, both the images themselves and the supporting info that must go alongside. I also sorta thought I knew a lot about this data going into the coding, but number of new codes with each subject is surprising, and gratifying that maybe I did get some information out of this task after all. Finally, I’m learning that this is an exercise in throwing stuff out, too – I was overly ambitious in my proposal about all the questions I could answer, and I collected a lot more data than I can use at the moment. So, as is a typical part of the research process, I have to choose what fits the story I need to tell to get the dissertation (or paper, or presentation) done for the moment, and leave the rest aside for now. That’s what all those papers post-dissertation are for, I guess!

What are your adventures with/fears about coding or data analysis? (besides putting it off to the last minute, which I don’t recommend).