Lining up these ducks is definitely not sexy research work, yet it’s often very important, as Laura mentioned last week. Ideally, this would have been done from the beginning. It was to some extent, but mostly with what was new coming in rather than taking stock also of what we already have.
The same will probably be true of the data that the lab collects, but since we haven’t really started collecting any new data, we can hopefully be proactive about how we integrate the old and the new. We’re building on previous work, such as video analysis of the touch tanks, where a corpus of data already exists. That means we have (perhaps) an existing data management structure, or not, but we need to find out.
I recently attended a great workshop from the OSU Library on data management plans. For this grant, we promised not only to share our experiences with this project as we go, but also to share our data to the extent possible, especially with our visiting scholars. Funders are also requiring plans for storing and managing data, and there is also a larger push for much federally-funded data to be shared in open-access databases. Even without these prods, managing data saves time in the long run by keeping information organized and easy-to-access. One of the tools they recommend is DMP Tool, which has many templates from different funders to create the plans for grant applications in the first place.
Once you’ve been awarded the grant, planning file names and creating meta-data up-front can ensure things are easy to find, easy to share, and easy to manage. That’s where we are. As we start to plan exactly what research questions we pursue first, we’ll figure exactly what types of data we’ll be collecting. As we are working with Media Macros to set up analytic programs to parse the video data, we will be able to designate how we want our file names to be constructed for example to indicate the date and time and type of data, such as video, transcript, or demographics, and source such as particular camera. Having all of this in the file name will make the data consistent and sortable. We also have to build a consistent file folder structure and locations for storing data, whether it’s dropbox or external hard drives or some other sharing platform for works in progress as well as ultimate archival.
The naming scheme goes beyond that, too, to the secondary data that we analyze, such as filenames in transcript analysis software, excel spreadsheets, and even presentations and writing. One more important type of data is the metadata, which is data about the data. For us, that could mean: a) descriptions of how and when the data is collected, part of which is captured in the filename but which can be expanded upon, b) descriptions of the data collection equipment, such as the type of camera (3301 or 1034 or other models), frame rate, and resolution, and c) descriptions of the larger events that may have been going on at the science center on any given day, such as Shark Day, Whale Watch Week, or Home School Days, if they were occurring.
Of course, with a number of people working on the project now, students rotating on and off the project, and visiting scholars joining, we have to make sure that this structure is communicated and most importantly, used! I have to admit I was guilty of this during my dissertation – having a plan at the beginning but not necessarily sticking to it. We’ll have to regularly re-visit the plan and make sure things are still working for everyone, but hopefully having a plan at the outset will prevent a lot of work down the road with a disorganized set of raw data, working data, meta data, and final products.