Big Data: Big possibilities with bigger challenges

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Did you know that Excel has a maximum number of rows? I do. During Winter Term for my GIS project, I was using Excel to merge oceanographic data, from a publicly-available data source website, and Excel continuously quit. Naturally, I assumed I had caused some sort of computer error. [As an aside, I’ve concluded that most problems related to technology are human error-based.] Therefore, I tried reformatting the data, restarting my computer, the program, etc. Nothing. Then, thanks to the magic of Google, I discovered that Excel allows no more than 1,048,576 rows by 16,384 columns. ONLY 1.05 million rows?! The oceanography data was more than 3 million rows—and that’s with me eliminating data points. This is what happens when we’re dealing with big data.

According to Merriam-Webster dictionary, big data is an accumulation of data that is too large and complex for processing by traditional database management tools (www.merriam-webster.com). However, there are journal articles, like this one from Forbes, that discuss the ongoing debate of how to define “big data”. According to the article, there are 12 major definitions; so, I’ll let you decide what you qualify as “big data”. Either way, I think that when Excel reaches its maximum row capacity, I’m working with big data.

Collecting oceanography data aboard the R/V Shimada. Photo source: Alexa K.

Here’s the thing: the oceanography data that I referred to was just a snippet of my data. Technically, it’s not even MY data; it’s data I accessed from NOAA’s ERDDAP website that had been consistently observed for the time frame of my dolphin data points. You may recall my blog about maps and geospatial analysis that highlights some of the reasons these variables, such as temperature and salinity, are important. However, what I didn’t previously mention was that I spent weeks working on editing this NOAA data. My project on common bottlenose dolphins overlays environmental variables to better understand dolphin population health off of California. These variables should have similar spatiotemporal attributes as the dolphin data I’m working with, which has a time series beginning in the 1980s. Without taking out a calculator, I still know that equates to a lot of data. Great data: data that will let me answer interesting, pertinent questions. But, big data nonetheless.

This is a screenshot of what the oceanography data looked like when I downloaded it to Excel. This format repeats for nearly 3 million rows.

Excel Screen Shot. Image source: Alexa K.

I showed this Excel spreadsheet to my GIS professor, and his response was something akin to “holy smokes”, with a few more expletives and a look of horror. It was not the sheer number of rows that shocked him; it was the data format. Nowadays, nearly everyone works with big data. It’s par for the course. However, the way data are formatted is the major split between what I’ll call “easy” data and “hard” data. The oceanography data could have been “easy” data. It could have had many variables listed in columns. Instead, this data  alternated between rows with variable headings and columns with variable headings, for millions of cells. And, as described earlier, this is only one example of big data and its challenges.

Data does not always come in a form with text and numbers; sometimes it appears as media such as photographs, videos, and audio files. Big data just got a whole lot bigger. While working as a scientist at NOAA’s Southwest Fisheries Science Center, one project brought in over 80 terabytes of raw data per year. The project centered on the eastern north pacific gray whale population, and, more specifically, its migration. Scientists have observed the gray whale migration annually since 1994 from Piedras Blancas Light Station for the Northbound migration, and 2 out of every 5 years from Granite Canyon Field Station (GCFS) for the Southbound migration. One of my roles was to ground-truth software that would help transition from humans as observers to computer as observers. One avenue we assessed was to compare how well a computer “counted” whales compared to people. For this question, three infrared cameras at the GCFS recorded during the same time span that human observers were counting the migratory whales. Next, scientists, such as myself, would transfer those video files, upwards of 80 TB, from the hard drives to Synology boxes and to a different facility–miles away. Synology boxes store arrays of hard drives and that can be accessed remotely. To review, three locations with 80 TB of the same raw data. Once the data is saved in triplet, then I could run a computer program, to detect whale. In summary, three months of recorded infrared video files requires upwards of 240 TB before processing. This is big data.

Scientists on an observation shift at Granite Canyon Field Station in Northern California. Photo source: Alexa K.
Alexa and another NOAA scientist watching for gray whales at Piedras Blancas Light Station. Photo source: Alexa K.

In the GEMM Laboratory, we have so many sources of data that I did not bother trying to count. I’m entering my second year of the Ph.D. program and I already have a hard drive of data that I’ve backed up three different locations. It’s no longer a matter of “if” you work with big data, it’s “how”. How will you format the data? How will you store the data? How will you maintain back-ups of the data? How will you share this data with collaborators/funders/the public?

The wonderful aspect to big data is in the name: big and data. The scientific community can answer more, in-depth, challenging questions because of access to data and more of it. Data is often the limiting factor in what researchers can do because increased sample size allows more questions to be asked and greater confidence in results. That, and funding of course. It’s the reason why when you see GEMM Lab members in the field, we’re not only using drones to capture aerial images of whales, we’re taking fecal, biopsy, and phytoplankton samples. We’re recording the location, temperature, water conditions, wind conditions, cloud cover, date/time, water depth, and so much more. Because all of this data will help us and help other scientists answer critical questions. Thus, to my fellow scientists, I feel your pain and I applaud you, because I too know that the challenges that come with big data are worth it. And, to the non-scientists out there, hopefully this gives you some insight as to why we scientists ask for external hard drives as gifts.

Leila launching the drone to collect aerial images of gray whales to measure body condition. Photo source: Alexa K.
Using the theodolite to collect tracking data on the Pacific Coast Feeding Group in Port Orford, OR. Photo source: Alexa K.

References:

https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

https://www.merriam-webster.com/dictionary/big%20data

Cloudy with a chance of blue whales

By Dawn Barlow, PhD student, Department of Fisheries & Wildlife, Geospatial Ecology of Marine Megafauna Lab

As a PhD student studying the ecology of blue whales in New Zealand, my time is occupied by questions such as: When and where are the blue whales? Can we predict where they will be based on environmental conditions? How does their distribution overlap with human activity such as oil and gas exploration?

Leigh and I have just returned from New Zealand, where I gave an oral presentation at the Society for Conservation Biology Oceania Congress entitled “Cloudy with a chance of whales: Forecasting blue whale presence to mitigate industrial impacts based on tiered, bottom-up models”. While the findings I presented are preliminary, an exciting ecological story is emerging, and one with clear management implications.

The South Taranaki Bight (STB) region of New Zealand is an important area for a population of blue whales which are unique to New Zealand. A wind-driven upwelling system brings cold, productive waters into the bight [1], which sustains high densities of krill [2], blue whale prey. The region is also frequented by busy shipping traffic, oil and gas drilling and extraction platforms as well as seismic survey effort for subsurface oil and gas reserves, and is the site of a recently-permitted seabed mine for iron sands (Fig. 1). However, a lack of knowledge on blue whale distribution and habitat use patterns has impeded effective management of these potential anthropogenic threats.

Figure 1. A blue whale surfaces in front of a floating production storage and offloading vessel servicing the oil rigs in the South Taranaki Bight. Photo by D. Barlow.

Three surveys were conducted in the STB region in the summer months of 2014, 2016, and 2017. During that time, we not only looked for blue whales, we also collected oceanographic data and hydroacoustic backscatter data to map and measure aspects of the krill in the region. These data streams will help us understand the functional, ecological relationships between the environment (oceanography), prey (krill), and predators (blue whales) in the ecosystem (Fig. 2). But in practice these data are costly and time-consuming to collect, while other data sources such as satellite imagery are readily accessible to managers at a variety of spatial and temporal scales. Therefore, another one of my aims is to link the data we collected in the field to satellite imagery, so that managers can have a practical tool to predict when and where the blue whales are most likely to be found in the region.

Figure 2. Data streams collected during surveys of the South Taranaki Bight Region in 2014, 2016, and 2017. 

So what did I find? Here are the highlights from my preliminary analyses:

  • The majority of the patterns in blue whale distribution can be explained by the density, depth, and thickness of the krill patches.
  • Patterns in the krill are driven by oceanography.
  • Those same oceanographic parameters that drive the krill can be used to explain blue whale distribution.
  • There are tight relationships between the important oceanographic variables and satellite images of sea surface temperature.
  • Blue whale distribution can, to some degree, be explained using just satellite imagery.

We were able to identify a sea surface temperature range in the satellite imagery of approximately 18°C where the likelihood of finding a blue whale is the highest. Is this because blue whales really like 18° water? Well, more likely this relationship exists because the satellite imagery is reflective of the oceanography, and the oceanography drives patterns in the krill distribution, and the krill drives the distribution of blue whales (Fig. 3). We were able to make each of these functional linkages through our series of models, which is quite exciting.

Figure 3. The tiered modeling approach we took to investigate the ecological relationships between blue whales, krill, oceanography, and satellite imagery. Because of the ecological linkages we made, we are able to say that any relationship between whale distribution and satellite imagery most likely reflects a relationship between the blue whales and their prey. 

That’s all well and good, but we were interested in testing these relationships to see if our identified habitat associations hold up even when we do not have field data (oceanographic, krill, and whale data). This past austral summer, we did not have a field season to collect data, but there was a large seismic airgun survey of the STB region. Seismic survey vessels are required to have trained marine mammal observers on board, and we were given access to the blue whale sightings data they recorded during the survey. In December, when the water was right around the preferred temperature identified by our models (18°C), the observers made 52 blue whale sightings (Fig. 4). In January and February, the waters warmed and only two sightings were made in each month. This is not only reassuring because it supports our model results, it also implies that there is the potential to balance industrial use of the area with protection of blue whale habitat, based on our understanding of the ecology. In January and February, very few blue whales were likely disturbed by the industrial activity in the STB, as conditions were not favorable for foraging at the location of the seismic survey. In contrast, the blue whales that were in the STB region in December may have experienced physiological consequences of sustained exposure to airgun noise since the conditions were favorable for foraging in the STB. In other words, the whales may have tolerated the noise exposure to gain access to good food, but this could have significant biological repercussions such as increased stress [3].

Figure 4. Monthly sea surface temperature (MODIS Aqua) overlaid with blue whale sightings from marine mammal observers aboard seismic survey vessel R/V Amazon Warrior. Black rectangles represent areas of seismic survey effort. Blue whale sighting location data were provided by RPS Energy Pty Ltd & Schlumberger, and Todd Energy.

In the first two weeks of July, we presented these latest findings to managers at the New Zealand Department of Conservation, the Minister of Conservation, the CEO and Policy Advisor of a major oil and gas conglomerate, NGOs, advocacy groups, and scientific colleagues. It was valuable to gather feedback from many different stakeholders, and satisfying to see such a clear interest in, and management application of, our work.

Dr. Leigh Torres and Dawn Barlow in front of Parliament in Wellington, New Zealand, following the presentation of their recent findings.

What’s next? We’re back in Oregon, and diving back into analysis. We intend to take the modeling work a step further to make the models predictive—for example, can we forecast where the blue whales will be based on the temperature, productivity, and winds two weeks prior? I am excited to see where these next steps lead!

References:

  1. Shirtcliffe TGL, Moore MI, Cole AG, Viner AB, Baldwin R, Chapman B. 1990 Dynamics of the Cape Farewell upwelling plume, New Zealand. New Zeal. J. Mar. Freshw. Res. 24, 555–568. (doi:10.1080/00288330.1990.9516446)
  2. Bradford-Grieve JM, Murdoch RC, Chapman BE. 1993 Composition of macrozooplankton assemblages associated with the formation and decay of pulses within an upwelling plume in greater cook strait, New Zealand. New Zeal. J. Mar. Freshw. Res. 27, 1–22. (doi:10.1080/00288330.1993.9516541)
  3. Rolland RM, Parks SE, Hunt KE, Castellote M, Corkeron PJ, Nowacek DP, Wasser SK, Kraus SD. 2012 Evidence that ship noise increases stress in right whales. Proc. Biol. Sci. 279, 2363–8. (doi:10.1098/rspb.2011.2429)

Searching for seabirds on the Garden Island

By Erin Pickett, M.Sc. (GEMM Lab member 2014-2016)

Field Assistant, Kaua’i Endangered Seabird Recovery Project

I heaved my body up with both arms, swung one leg up and attempted to muster any remaining energy I had into standing on the ridgeline of the valley that I had just crawled out of. Soaked from the rain, face covered with bits of dirt and with ferns sticking out of my hair I probably resembled a creature crawling out of a swamp. I smiled at this thought knowing that my dramatic emergence from the swamp might have been captured on a nearby motion-sensing trail camera.

I surveyed my surroundings to gain my bearings. I was searching for seabird burrows in a densely vegetated valley called Upper Limahuli Preserve in the mountains of Kaua’i, Hawaii. I was looking for the nests of the endangered Hawaiian Petrel (or ‘Ua’u in Hawaiian) and the threated Newell’s Shearwater (A’o), Hawaii’s only two endemic (found nowhere else in the world) Procellarid species. I registered the trail, the nearby fence line and the two valleys on either side of the ridge I was standing on. If a drone had photographed me from above, the scene of lush green mountains, waterfalls and rugged cliffs would not only look like the views from the helicopter arrival scene in the movie Jurassic Park, but indeed was the same Nā Pali coastline.

Northeastern facing view from the trail at Upper Limahuli Preserve looking toward the author’s hometown of Kīlauea and the site of the Nihokū predator-fence at Kīlauea National Wildlife Refuge

When I finished my graduate program at Oregon State University in 2017, I began working for a project called the Kaua’i Endangered Seabird Recovery Project (KESRP). Our work at KESRP focuses on monitoring Kauai’s populations of breeding a’o and ‘ua’u, mitigating on-land threats through recovery activities and conducting research (e.g. habitat modeling & at-sea tracking) to learn more about the two species.

An estimated 90% of the Newell’s Shearwater population breeds on the island of Kaua’i, as does a large portion of the Hawaiian Petrel population. Both populations have declined rapidly on Kaua’i over the past two decades, where radar surveys found a 78% decrease of Hawaiian Petrels and a 94% decrease in overall numbers of Newell’s Shearwaters (Raine et al., 2017). Light pollution, collision with electrical power lines, and invasive vertebrate predators represent primary threats to both the a’o and ‘ua’u while on land during the breeding season. As with all seabirds that nest on islands, the a’o and ‘ua’u are easy prey for invasive species such as feral cats and black rats, thus, there is a large effort within our study area to alleviate the threat of these predators.

A ‘ua’u adult incubating an egg at Upper Limahuli Preserve, 2018

The purpose of my burrow search effort on this day was to find suitable candidate burrows for a translocation project that KESRP has undertaken since 2015. This fall, we will attempt to relocate via helicopter up to 20 a’o and ‘ua’u chicks from the mountains of Kaua’i, where they are vulnerable to invasive predators, to a predator-proof fenced area located within nearby Kīlauea National Wildlife Refuge. The ultimate aim of our translocation project, a critical component of the Nihokū Ecosystem Restoration Project, is to establish successful breeding colonies of a’o and ‘ua’u within the protected boundaries of a fence that is impermeable to rats, cats, and pigs.

On Kaua’i, the imperiled a’o and ‘ua’u nest on verdant cliffs amid native Hawaiian uluhe ferns and ‘ohi‘a lehua trees. Both species raise their chicks in burrows that can only be located by humans after an extensive search effort that involves scanning the densely vegetated forest floor for tiny feathers and guano trails, and following the musty scent of seabirds until an underground tunnel is found, sometimes with a bird nestled inside.

The author with an a’o chick that was relocated to the Nihokū Ecosystem Restoration Site in 2017

My afternoon of burrow searching had been strenuous, and being day three it had already been a long week in the field so I sighed and started heading in the direction that would lead me back to our field camp. Though, after a few steps I caught the musty smell of seabird in the air and immediately stopped walking. Like an animal, I followed my nose and turned my head over my right shoulder and sniffed the air. I climbed over the fence that separated the trail I was hiking on from the 3,000 foot drop into the valley below, carefully positioned my feet on the fragile cliff side and lifted a large tuft of grass to find a freshly dug hole that smelled unmistakably like a seabird.

A triumphant selfie by the author after finding a particularly difficult to locate a’o burrow

Either a prospecting Hawaiian Petrel or Newell’s Shearwater had broken ground on this new burrow the night before. The birds had been busy digging into the cliff side while I had been conducting an auditory survey a few hundred meters away. The auditory survey had begun at sunset and over the course of the next two hours I listened for and recorded the locations of seabirds transiting overhead, heading from the sea to the mountains and calling from their burrows nearby. Ideally, this auditory survey would help me pinpoint locations of ‘ground callers’ who’s raucous would lead me to their burrows the next day.

Finding a burrow is not often as easy as pinpointing the location of a ground caller, catching a whiff of seabird near that location and immediately locating a hole in the ground. Yet, finding a burrow that is ‘reachable’ and that is reasonably close to a helicopter landing zone, is even more difficult. And this task is one of our objectives throughout the field season this year.

If you’re interested in keeping up with our progress you can follow KESRP on Facebook: https://www.facebook.com/kauaiseabirdproject/

Reference(s):

Raine, A. F., Holmes, N. D., Travers, M., Cooper, B. A., & Day, R. H. (2017). Declining population trends of Hawaiian Petrel and Newell’s Shearwater on the island of Kaua‘i, Hawaii, USA. The Condor119(3), 405-415.

Collaboration – it’s where it’s at.

By Dominique Kone, Masters Student in Marine Resource Management

As I finish my first year of graduate school, I’ve been reflecting on what has helped me develop as a young scientist over the past year. Some of these lessons are somewhat expected: making time for myself outside of academia, reading the literature, and effectively managing my time. Yet, I’ve also learned that working with my peers, other scientists, and experts outside my scientific field can be extremely rewarding.

For my thesis, I will be looking at the potential to reintroduce sea otters to the Oregon coast by identifying suitable habitat and investigating their potential ecological impacts. During this first year, I’ve spent much time getting to know various stakeholder groups, their experiences with this issue, and any advice they may have to inform my work. Through these interactions, I’ve benefitted in ways that would not have been possible if I tried tackling this project on my own.

Source: Seapoint Center for Collaborative Leadership.

When I first started my graduate studies, I was eager to jump head first into my research. However, as someone who had never lived in Oregon before, I didn’t yet have a full grasp of the complexities and context behind my project and was completely unfamiliar with the history of sea otters in Oregon. By engaging with managers, scientists, and advocates, I quickly realized that there was a wealth of knowledge that wasn’t covered in the literature. Information from people who were involved in the initial reintroduction; theories behind the cause of the first failed reintroduction; and most importantly, the various political, social, and culture implications of a potential reintroduction. This information was crucial in developing and honing my research questions, which I would have missed if I had solely relied on the literature.

As my first year in graduate school progressed, I also quickly realized that most people familiar with this issue also had strong opinions and views about how I should conduct my study, whether and how managers should bring sea otters back, and if such an effort will succeed. This input was incredibly helpful in getting to know the issue, and also fostered my development as a scientist as I had to quickly improve my listening and critically-thinking skills to consider my research from different perspectives. One of the benefits of collaboration – particularly with experts outside the marine ecology or sea otter community – is that everyone looks at an issue in a different way. Through my graduate program, I’ve worked with students and faculty in the earth, oceanic, and atmospheric sciences, whom have challenged me to consider other sources of data, other analyses, or different ways of placing my research within various contexts.

Most graduate students when they first start graduate school. Source: Know Your Meme.

One of the major advantages of being a graduate student is that most researchers – including professors, faculty, managers, and fellow graduate students – are more than happy to analyze and discuss my research approach. I’ve obtained advice on statistical analyses, availability and access to data, as well as contacts to other experts. As a graduate student, it’s important for me to consult with more-experienced researchers who can not only explain complex theories or concepts, but who can also validate the appropriateness of my research design and methods. Collaborating with senior researchers is a great way to become established and recognized within the scientific community. Because of this project, I’ve started to become adopted into the marine mammal and sea otter research communities, which is obviously beneficial for my thesis work, but also allows me to start building strong relationships for a career in marine conservation.

Source: Oregon State University.

Looking ahead to my second year of graduate school, I’m eager to make a big push toward completing my thesis, writing manuscripts for journal submission, and communicating my research to various audiences. Throughout this process, it’s still important for me to continue to reach out and collaborate with others within and outside my field as they may help me reach my personal goals. In my opinion, this is exactly what graduate students should be doing. While graduate students may have the ability and some experience to work independently, we are still students, and we are here to learn from and make lasting connections with other researchers and fellow graduate students through these collaborations.

If there’s any advice I would give to an incoming graduate student, it’s this: Collaborate, and collaborate often. Don’t be afraid to work with others because you never know whether you’ll come away with a new perspective, learn something new, come across new research or professional opportunities, or even help others with their research.