Big Data: Big possibilities with bigger challenges

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Did you know that Excel has a maximum number of rows? I do. During Winter Term for my GIS project, I was using Excel to merge oceanographic data, from a publicly-available data source website, and Excel continuously quit. Naturally, I assumed I had caused some sort of computer error. [As an aside, I’ve concluded that most problems related to technology are human error-based.] Therefore, I tried reformatting the data, restarting my computer, the program, etc. Nothing. Then, thanks to the magic of Google, I discovered that Excel allows no more than 1,048,576 rows by 16,384 columns. ONLY 1.05 million rows?! The oceanography data was more than 3 million rows—and that’s with me eliminating data points. This is what happens when we’re dealing with big data.

According to Merriam-Webster dictionary, big data is an accumulation of data that is too large and complex for processing by traditional database management tools (www.merriam-webster.com). However, there are journal articles, like this one from Forbes, that discuss the ongoing debate of how to define “big data”. According to the article, there are 12 major definitions; so, I’ll let you decide what you qualify as “big data”. Either way, I think that when Excel reaches its maximum row capacity, I’m working with big data.

Collecting oceanography data aboard the R/V Shimada. Photo source: Alexa K.

Here’s the thing: the oceanography data that I referred to was just a snippet of my data. Technically, it’s not even MY data; it’s data I accessed from NOAA’s ERDDAP website that had been consistently observed for the time frame of my dolphin data points. You may recall my blog about maps and geospatial analysis that highlights some of the reasons these variables, such as temperature and salinity, are important. However, what I didn’t previously mention was that I spent weeks working on editing this NOAA data. My project on common bottlenose dolphins overlays environmental variables to better understand dolphin population health off of California. These variables should have similar spatiotemporal attributes as the dolphin data I’m working with, which has a time series beginning in the 1980s. Without taking out a calculator, I still know that equates to a lot of data. Great data: data that will let me answer interesting, pertinent questions. But, big data nonetheless.

This is a screenshot of what the oceanography data looked like when I downloaded it to Excel. This format repeats for nearly 3 million rows.

Excel Screen Shot. Image source: Alexa K.

I showed this Excel spreadsheet to my GIS professor, and his response was something akin to “holy smokes”, with a few more expletives and a look of horror. It was not the sheer number of rows that shocked him; it was the data format. Nowadays, nearly everyone works with big data. It’s par for the course. However, the way data are formatted is the major split between what I’ll call “easy” data and “hard” data. The oceanography data could have been “easy” data. It could have had many variables listed in columns. Instead, this data  alternated between rows with variable headings and columns with variable headings, for millions of cells. And, as described earlier, this is only one example of big data and its challenges.

Data does not always come in a form with text and numbers; sometimes it appears as media such as photographs, videos, and audio files. Big data just got a whole lot bigger. While working as a scientist at NOAA’s Southwest Fisheries Science Center, one project brought in over 80 terabytes of raw data per year. The project centered on the eastern north pacific gray whale population, and, more specifically, its migration. Scientists have observed the gray whale migration annually since 1994 from Piedras Blancas Light Station for the Northbound migration, and 2 out of every 5 years from Granite Canyon Field Station (GCFS) for the Southbound migration. One of my roles was to ground-truth software that would help transition from humans as observers to computer as observers. One avenue we assessed was to compare how well a computer “counted” whales compared to people. For this question, three infrared cameras at the GCFS recorded during the same time span that human observers were counting the migratory whales. Next, scientists, such as myself, would transfer those video files, upwards of 80 TB, from the hard drives to Synology boxes and to a different facility–miles away. Synology boxes store arrays of hard drives and that can be accessed remotely. To review, three locations with 80 TB of the same raw data. Once the data is saved in triplet, then I could run a computer program, to detect whale. In summary, three months of recorded infrared video files requires upwards of 240 TB before processing. This is big data.

Scientists on an observation shift at Granite Canyon Field Station in Northern California. Photo source: Alexa K.
Alexa and another NOAA scientist watching for gray whales at Piedras Blancas Light Station. Photo source: Alexa K.

In the GEMM Laboratory, we have so many sources of data that I did not bother trying to count. I’m entering my second year of the Ph.D. program and I already have a hard drive of data that I’ve backed up three different locations. It’s no longer a matter of “if” you work with big data, it’s “how”. How will you format the data? How will you store the data? How will you maintain back-ups of the data? How will you share this data with collaborators/funders/the public?

The wonderful aspect to big data is in the name: big and data. The scientific community can answer more, in-depth, challenging questions because of access to data and more of it. Data is often the limiting factor in what researchers can do because increased sample size allows more questions to be asked and greater confidence in results. That, and funding of course. It’s the reason why when you see GEMM Lab members in the field, we’re not only using drones to capture aerial images of whales, we’re taking fecal, biopsy, and phytoplankton samples. We’re recording the location, temperature, water conditions, wind conditions, cloud cover, date/time, water depth, and so much more. Because all of this data will help us and help other scientists answer critical questions. Thus, to my fellow scientists, I feel your pain and I applaud you, because I too know that the challenges that come with big data are worth it. And, to the non-scientists out there, hopefully this gives you some insight as to why we scientists ask for external hard drives as gifts.

Leila launching the drone to collect aerial images of gray whales to measure body condition. Photo source: Alexa K.
Using the theodolite to collect tracking data on the Pacific Coast Feeding Group in Port Orford, OR. Photo source: Alexa K.

References:

https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

https://www.merriam-webster.com/dictionary/big%20data

The Recipe for a “Perfect” Marine Mammal and Seabird Cruise

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Science—and fieldwork in particular—is known for its failures. There are websites, blogs, and Twitter pages dedicated to them. This is why, when things go according to plan, I rejoice. When they go even better than expected, I practically tear up from amazement. There is no perfect recipe for a great marine mammal and seabird research cruise, but I would suggest that one would look like this:

 A Great Marine Mammal and Seabird Research Cruise Recipe:

  • A heavy pour of fantastic weather
    • Light on the wind and seas
    • Light on the glare
  • Equal parts amazing crew and good communication
  • A splash of positivity
  • A dash of luck
  • A pinch of delicious food
  • Heaps of marine mammal and seabird sightings
  • Heat to approximately 55-80 degrees F and transit for 10 days along transects at 10-12 knots
The end of another beautiful day at sea on the R/V Shimada. Image source: Alexa K.

The Northern California Current Ecosystem (NCCE) is a highly productive area that is home to a wide variety of cetacean species. Many cetaceans are indicator species of ecosystem health as they consume large quantities of prey from different levels in trophic webs and inhabit diverse areas—from deep-diving beaked whales to gray whales traveling thousands of miles along the eastern north Pacific Ocean. Because cetacean surveys are a predominant survey method in large bodies of water, they can be extremely costly. One alternative to dedicated cetacean surveys is using other research vessels as research platforms and effort becomes transect-based and opportunistic—with less flexibility to deviate from predetermined transects. This decreases expenses, creates collaborative research opportunities, and reduces interference in animal behavior as they are never pursued. Observing animals from large, motorized, research vessels (>100ft) at a steady, significant speed (>10kts/hour), provides a baseline for future, joint research efforts. The NCCE is regularly surveyed by government agencies and institutions on transects that have been repeated nearly every season for decades. This historical data provides critical context for environmental and oceanographic dynamics that impact large ecosystems with commercial and recreational implications.

My research cruise took place aboard the 208.5-foot R/V Bell M. Shimada in the first two weeks of May. The cruise was designated for monitoring the NCCE with the additional position of a marine mammal observer. The established guidelines did not allow for deviation from the predetermined transects. Therefore, mammals were surveyed along preset transects. The ship left port in San Francisco, CA and traveled as far north as Cape Meares, OR. The transects ranged from one nautical mile from shore and two hundred miles offshore. Observations occurred during “on effort” which was defined as when the ship was in transit and moving at a speed above 8 knots per hour dependent upon sea state and visibility. All observations took place on the flybridge during conducive weather conditions and in the bridge (one deck below the flybridge) when excessive precipitation was present. The starboard forward quarter: zero to ninety degrees was surveyed—based on the ship’s direction (with the bow at zero degrees). Both naked eye and 7×50 binoculars were used with at least 30 percent of time binoculars in use. To decrease observer fatigue, which could result in fewer detected sightings, the observer (me) rotated on a 40 minutes “on effort”, 20 minutes “off effort” cycle during long transits (>90 minutes).

Alexa on-effort using binoculars to estimate the distance and bearing of a marine mammal sighted off the starboard bow. Image source: Alexa K.

Data was collected using modifications to the SEEbird Wincruz computer program on a ruggedized laptop and a GPS unit was attached. At the beginning of each day and upon changes in conditions, the ship’s heading, weather conditions, visibility, cloud cover, swell height, swell direction, and Beaufort sea state (BSS) were recorded. Once the BSS or visibility was worse than a “5” (1 is “perfect” and 5 is “very poor”) observations ceased until there was improvement in weather. When a marine mammal was sighted the latitude and longitude were recorded with the exact time stamp. Then, I noted how the animal was sighted—either with binoculars or naked eye—and what action was originally noticed—blow, splash, bird, etc. The bearing and distance were noted using binoculars. The animal was given three generalized behavior categories: traveling, feeding, or milling. A sighting was defined as any marine mammal or group of animals. Therefore, a single sighting would have the species and the best, high, and low estimates for group size.

By my definitions, I had the research cruise of my dreams. There were moments when I imagined people joining this trip as a vacation. I *almost* felt guilty. Then, I remember that after watching water for almost 14 hours (thanks to the amazing weather conditions), I worked on data and reports and class work until midnight. That’s the part that no one talks about: the data. Fieldwork is about collecting data. It’s both what I live for and what makes me nervous. The amount of time, effort, and money that is poured into fieldwork is enormous. The acquisition of the data is not as simple as it seems. When I briefly described my position on this research cruise to friends, they interpret it to be something akin to whale-watching. To some extent, this is true. But largely, it’s grueling hours that leave you fatigued. The differences between fieldwork and what I’ll refer to as “everything else” AKA data analysis, proposal writing, manuscript writing, literature reviewing, lab work, and classwork, are the unbroken smile, the vaguely tanned skin, the hours of laughter, the sea spray, and the magical moments that reassure me that I’ve chosen the correct career path.

Alexa photographing a gray whale at sunset near Newport, OR. Image source: Alexa K.

This cruise was the second leg of the Northern California Current Ecosystem (NCCE) survey, I was the sole Marine Mammal and Seabird Observer—a coveted position. Every morning, I would wake up at 0530hrs, grab some breakfast, and climb to the highest deck: the fly-bridge. Akin to being on the top of the world, the fly-bridge has the best views for the widest span. From 0600hrs to 2000hrs I sat, stood, or danced in a one-meter by one-meter corner of the fly-bridge and surveyed. This visual is why people think I’m whale watching. In reality, I am constantly busy. Nonetheless, I had weather and seas that scientists dream about—and for 10 days! To contrast my luck, you can read Florence’s blog about her cruise. On these same transects, in February, Florence experienced 20-foot seas with heavy rain with very few marine mammal sightings—and of those, the only cetaceans she observed were gray whales close to shore. That starkly contrasts my 10 cetacean species with upwards of 45 sightings and my 20-minute hammock power naps on the fly-bridge under the warm sun.

Pacific white-sided dolphins traveling nearby. Image source: Alexa K.

Marine mammal sightings from this cruise included 10 cetacean species: Pacific white-sided dolphin, Dall’s porpoise, unidentified beaked whale, Cuvier’s beaked whale, gray whale, Minke whale, fin whale, Northern right whale dolphin, blue whale, humpback whale, and transient killer whale and one pinniped species: northern fur seal. What better way to illustrate these sightings than with a map? We are a geospatial lab after all.

Cetacean Sightings on the NCCE Cruise in May 2018. Image source: Alexa K.

This map is the result of data collection. However, it does not capture everything that was observed: sea state, weather, ocean conditions, bathymetry, nutrient levels, etc. There are many variables that can be added to maps–like this one (thanks to my GIS classes I can start adding layers!)–that can provide a better understanding of the ecosystem, predator-prey dynamics, animal behavior, and population health.

The catch from a bottom trawl at a station with some fish and a lot of pyrosomes (pink tube-like creatures). Image source: Alexa K.

Being a Ph.D. student can be physically and mentally demanding. So, when I was offered the opportunity to hone my data collection skills, I leapt for it. I’m happiest in the field: the wind in my face, the sunshine on my back, surrounded by cetaceans, and filled with the knowledge that I’m following my passion—and that this data is contributing to the greater scientific community.

Humpback whale photographed traveling southbound. Image source: Alexa K.

Managing Oceans: the inner-workings of marine policy

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

When we hear “marine policy” we broadly lump it together with environmental policy. However, marine ecosystems differ greatly from their terrestrial counterparts. We wouldn’t manage a forest like an ocean, nor would we manage an ocean like a forest. Why not? The answer to this question is complex and involves everything from ecology to politics.

Oceans do not have borders; they are fluid and dynamic. Interestingly, by defining marine ecosystems we are applying some kind of borders. But water (and all its natural and unnatural content) flows between these ‘ecosystems’. Marine ecosystems are home to a variety of anthropogenic activities such as transportation and recreation, in addition to an abundance of species that represent the three major domains of biology: Archaea, Bacteria, and Eukarya. Humans are the only creatures who “recognize” the borders that policymakers and policy actors have instilled. A migrating gray whale does not have a passport stamped as it travels from its breeding grounds in Mexican waters to its feeding grounds in the Gulf of Alaska. In contrast, a large cargo ship—or even a small sailing vessel—that crosses those boundaries is subjected to a series of immigration checkpoints. Combining these human and the non-human facets makes marine policy complex and variable.

The eastern Pacific gray whale migration route includes waters off of Mexico, Canada, and the United States. Source: https://www.learner.org/jnorth/tm/gwhale/annual/map.html

Environmental policy of any kind can be challenging. Marine environmental policy adds many more convoluted layers in terms of unknowns; marine ecosystems are understudied relative to terrestrial ecosystems and therefore have less research conducted on how to best manage them. Additionally, there are more hands in the cookie jar, so to speak; more governments and more stakeholders with more opinions (Leslie and McLeod 2007). So, with fewer examples of successful ecosystem-based management in coastal and marine environments and more institutions with varied goals, marine ecosystems become challenging to manage and monitor.

A visual representation of what can happen when there are many groups with different goals: no one can easily get what they want. Image Source: The Brew Monks

With this in mind, it is understandable that there is no official manual on policy development.  There is, however, a broadly standardized process of how to develop, implement, and evaluate environmental policies: 1) recognize a problem 2) propose a solution 3) choose a solution 4) put the solution into effect and 4) monitor the results (Zacharias pp. 16-21). For a policy to be deemed successful, specific criteria must be met, which means that a common policy is necessary for implementation and enforcement. Within the United States, there are a multiple governing bodies that protect the ocean, including the National Oceanic and Atmospheric Administration (NOAA), Environmental Protection Agency (EPA), Fish and Wildlife Service (USFWS), and the Department of Defense (DoD)—all of which have different mission statements, budgets, and proposals. To create effective environmental policies, collaboration between various groups is imperative. Nevertheless, bringing these groups together, even those within the same nation, requires time, money, and flexibility.

This is not to say that environmental policy for terrestrial systems, but there are fewer moving parts to manage. For example, a forest in the United States would likely not be an international jurisdiction case because the borders are permanent lines and national management does not overlap. However, at a state level, jurisdiction may overlap with potentially conflicting agendas. A critical difference in management strategies is preservation versus conservation. Preservation focuses on protecting nature from use and discourages altering the environment. Conservation, centers on wise-use practices that allow for proper human use of environments such as resource use for economic groups. One environmental group may believe in preservation, while one government agency may believe in conservation, creating friction amongst how the land should be used: timber harvest, public use, private purchasing, etc.

Linear representation of preservation versus conservation versus exploitation. Image Source: Raoof Mostafazadeh

Furthermore, a terrestrial forest has distinct edges with measurable and observable qualities; it possesses intrinsic and extrinsic values that are broadly recognized because humans have been utilizing them for centuries. Intrinsic values are things that people can monetize, such as commercial fisheries or timber harvests whereas extrinsic values are things that are challenging to put an actual price on in terms of biological diversity, such as the enjoyment of nature or the role of species in pest management; extrinsic values generally have a high level of human subjectivity because the context of that “resource” in question varies upon circumstances (White 2013). Humans are more likely to align positively with conservation policies if there are extrinsic benefits to them; therefore, anthropocentric values associated with the resources are protected (Rode et al. 2015). Hence, when creating marine policy, monetary values are often placed on the resources, but marine environments are less well-studied due to lack of accessibility and funding, making any valuation very challenging.

The differences between direct (intrinsic) versus indirect (extrinsic) values to biodiversity that factor into environmental policy. Image Source: Conservationscienceblog.wordpress.com

Assigning a cost or benefit to environmental services is subjective (Dearborn and Kark 2010). What is the benefit to a child seeing an endangered killer whale for the first time? One could argue priceless. In order for conservation measures to be implemented, values—intrinsic and extrinsic—are assigned to the goods and services that the marine environment provides—such as seafood and how the ocean functions as a carbon sink. Based off of the four main criteria used to evaluate policy, the true issue becomes assessing the merit and worth. There is an often-overlooked flaw with policy models: it assumes rational behavior (Zacharias 126). Policy involves relationships and opinions, not only the scientific facts that inform them; this is true in terrestrial and marine environments. People have their own agendas that influence, not only the policies themselves, but the speed at which they are proposed and implemented.

Tourists aboard a whale-watching vessel off of the San Juan Islands, enjoying orca in the wild. Image Source: Seattle Orca Whale Watching

One example of how marine policy evolves is through groups, such as the International Whaling Commission, that gather to discuss such policies while representing many different stakeholders. Some cultures value the whale for food, others for its contributions to the surrounding ecosystems—such as supporting healthy seafood populations. Valuing one over the other goes beyond a monetary value and delves deeper into the cultures, politics, economics, and ethics. Subjectivity is the name of the game in environmental policy, and, in marine environmental policy, there are many factors unaccounted for, that decision-making is incredibly challenging.

Efficacy in terms of the public policy for marine systems presents a challenge because policy happens slowly, as does research. There is no equation that fits all problems because the variables are different and dynamic; they change based on the situation and can be unpredictable. When comparing institutional versus impact effectiveness, they both are hard to measure without concrete goals (Leslie and McLeod 2007). Marine ecosystems are open environments which add an additional hurdle: setting measurable and achievable goals. Terrestrial environments contain resources that more people utilize, more frequently, and therefore have more set goals. Without a problem and potential solution there is no policy. Terrestrial systems have problems that humans recognize. Marine systems have problems that are not as visible to people on a daily basis. Therefore, terrestrial systems have more solutions presented to mitigate problems and more policies enacted.

As marine scientists, we don’t always immediately consider how marine policy impacts our research. In the case of my project, marine policy is something I constantly have to consider. Common bottlenose dolphins are protected under the Marine Mammal Protection Act (MMPA) and inhabit coastal of both the United States and Mexico, including within some Marine Protected Areas (MPA). In addition, some funding for the project comes from NOAA and the DoD. Even on the surface-level it is clear that policy is something we must consider as marine scientists—whether we want to or not. We may do our best to inform policymakers with results and education based on our research, but marine policy requires value-based judgements based on politics, economics, and human objectivity—all of which are challenging to harmonize into a succinct problem with a clear solution.

Two common bottlenose dolphins (coastal ecotype) traveling along the Santa Barbara, CA shoreline. Image Source: Alexa Kownacki

References:

Dearborn, D. C. and Kark, S. 2010. Motivations for Conserving Urban Biodiversity. Conservation Biology, 24: 432-440. doi:10.1111/j.1523-1739.2009.01328.x

Leslie, H. M. and McLeod, K. L. (2007), Confronting the challenges of implementing marine ecosystem‐based management. Frontiers in Ecology and the Environment, 5: 540-548. doi:10.1890/060093

Munguia, P., and A. F. Ojanguren. 2015. Bridging the gap in marine and terrestrial studies. Ecosphere 6(2):25. http://dx.doi.org/10.1890/ES14-00231.1

Rode, J., Gomez-Baggethun, E., Krause, M., 2015. Motivation crowding by economic payments in conservation policy: a review of the empirical evidence. Ecol. Econ. 117, 270–282 (in this issue).

White, P. S. (2013), Derivation of the Extrinsic Values of Biological Diversity from Its Intrinsic Value and of Both from the First Principles of Evolution. Conservation Biology, 27: 1279-1285. doi:10.1111/cobi.12125

Zacharias, M. 2014. Marine Policy. London: Routledge.