I have been coding my qualitative interview data all in one big fell swoop, trying to get everything done for the graduation deadline. It feels almost like a class project that I’ve put off, as usual, longer than I should have. In having a conversation with another grad student, about timelines, and how I’ve been sitting on this data since oh, November or so (at least a good chunk of it), we speculated about why we don’t tackle it in smaller chunks. One reason for me, I’m sure, is just general fear of failure or whatever drives my general procrastinating and perfectionist tendencies (remember, the best dissertation is a DONE dissertation – we’re not here to save the world with this one project).

However, another reason occurs to me as well; I collected all the data myself and I wonder if I was too close to it in the process of collecting it? I certainly had to prioritize finishing collecting it, considering the struggles I had to get subjects to participate, and delays with IRB, etc. But I wonder if it’s actually been better to leave it all for a while and come back to it. I guess if I had really done the interview coding before the eye-tracking, I might have shaped the eye-tracking interviews a bit differently, but I think the main adjustments I made based on the interviews were sufficient without coding (i.e. I recognized how much the experts were just seeing that the images were all the same and I couldn’t come up with difficult enough tasks for them, really). The other reason to have coded the interviews first would have been to separate my interviewees into high- and low-performing, if the data proved to be that way, so that I could invite sub-groups for the eye-tracking. But I ended up, again due to recruitment issues, just getting whoever I could from my interview population to come back. And now, I’m not really sure there’s any high- or low-performers among the novices anyway – they each seem to have their strengths and weaknesses at this task.

Other fun with coding: I have a mix of basically closed-ended questions that I am scoring with a rubric for correctness, and then open-ended “how do you know” semi-clinical interview questions. Since I eventually repeated some of these questions for the various versions of the scaffolded images, my subjects started to conflate their answers and parsing these things apart is truly a pleasure (NOT). And, I’m up to some 120 codes, and keeping those all in mind as I go is just nuts. Of course, I have just done the first pass, and as I created codes as I went through, I have to turn around and re-code for those particular ones on the ones I coded before I created them, but I still am stressing as to whether I’m finding everything in every transcript, especially the sort of obscure codes. I have one that I’ve dubbed “Santa” because two of my subjects referred to knowing the poles of Earth are cold because they learned that Santa lives at the North Pole where it’s cold. So I’m now wondering if there were any other evidences of non-science reasoning that I missed. I don’t think this is a huge problem; I am fairly confident my coding is thorough, but I’m also at that stage of crisis where I’m not sure any of this is good enough as I draw closer to my defense!

Other fun facts: I also find myself agonizing over what to call codes, when the description is more important. And it’s also a very humbling look at how badly I (feel like I) conducted the interviews. For one thing, I asked all the wrong questions, as it turns out – what I expected people would struggle with, they didn’t really, and I didn’t have good questions ready to probe for what they did struggle with. Sigh. I guess that’s for the next experiment.

The good stuff: I do have a lot of good data about people’s expectations of the images and the topics, especially when there are misunderstandings. This will be important as we design new products for outreach, both the images themselves and the supporting info that must go alongside. I also sorta thought I knew a lot about this data going into the coding, but number of new codes with each subject is surprising, and gratifying that maybe I did get some information out of this task after all. Finally, I’m learning that this is an exercise in throwing stuff out, too – I was overly ambitious in my proposal about all the questions I could answer, and I collected a lot more data than I can use at the moment. So, as is a typical part of the research process, I have to choose what fits the story I need to tell to get the dissertation (or paper, or presentation) done for the moment, and leave the rest aside for now. That’s what all those papers post-dissertation are for, I guess!

What are your adventures with/fears about coding or data analysis? (besides putting it off to the last minute, which I don’t recommend).

I have just about nailed down a defense date. That means I have about two months to wrap all this up (or warp it, as I originally typed) into a coherent, cohesive, narrative worthy of a doctoral degree. It’s amazing to me to think it might actually be done one of these days.

Of course, in research, there’s always more you can analyze about your data, so in reality, I have to make some choices about what goes in the dissertation and what has to remain for later analysis. For example, I “threw in” some plain world images into the eye-tracking as potential controls just to see how people might look at a world map without any data on it. Not that there really is such a thing; technically any image has some sort of data on it, as it is always representing something, even this one:

 

 

Here, the continents are darker grey than the ocean, so it’s a representation of the Earth’s current land and ocean distinctions.

I also included two “blue marble” images that are essentially images of Earth as if seen from space, without clouds and all in daylight simultaneously, one with the typical northern hemisphere “north-up” orientation, the other “south-up” as the world is often portrayed in Australia, for one. However, I probably don’t have time to analyze all of that right now, at least not and complete the dissertation on schedule. The best dissertation is a done dissertation, not one that is perfect, or answers every single question! If it did, what would the rest of my career be for?

So a big part of the research process is making tradeoffs between how much data to collect so that you do get enough to anticipate any problems you might incur and want to examine about your data, but not so much that you lose sight of your original, specific research questions and get mired in analysis forever. Thinking about what does and doesn’t fit in the particular framework I’ve laid out for analysis, too, is part of this. That means making smart choices about how to sufficiently answer your questions with the data you have and address major potential problems but letting go and letting some questions remain unanswered. At least for the moment. That’s a major task in front of me right now, with both my interview data and my eye-tracking data. At least I’ve finished collecting data for the dissertation. I think.

Let the countdown to defense begin …

How much progress have I made on my thesis in the last month? Since last I posted about my thesis, I have completed the majority of my interviews. Out of 30 I need, I have all but four completed, and three of the four remaining scheduled. Out of about 20 eyetracking sessions, I have completed all but about 7, with probably 3 of the remaining scheduled. I also presented some preliminary findings around the eye-tracking at the Geological Society of America conference in a digital poster session. Whew!

It’s a little strange to have set a desired number of interviews at the beginning and feel like I have to fulfill that and only that number, rather than soliciting from a wide population and getting as many as I could past a minimum. Now, if I were to get a flood of applicants for the “last” novice interview spot, I might want to risk overscheduling to compensate for no-shows (which, as you know, have plagued me). On the other hand, I risk having to cancel if I got an “extra” subject scheduled, which I suppose is not a big deal, but for some reason I would feel weird canceling on a volunteer – would it put them off from volunteering for research in the future??

Next up is processing all the recordings, backing them up, and then getting them transcribed. I’ll need to create a rubric to score the informational answers as something along the lines of 100% correct, partially correct, or not at all correct. Then it will be coding, finding patterns in the data and categorizing those patterns, and asking someone to serve as a fellow coder to verify my codebook and coding once I’ve made a pass through all of the interviews. Then I’ll have to decide if the same coding will apply equally to the questions I asked during the eyetracking portion, since I didn’t dig as deeply to root out understanding completely as I did in the clinical interviews, but I still asked them to justify their answers with “how do you know” questions.

We’ll see how far I get this month.