Zooming in: A closer look at bottlenose dolphin distribution patterns off of San Diego, CA

By: Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Data analysis is often about parsing down data into manageable subsets. My project, which spans 34 years and six study sites along the California coast, requires significant data wrangling before full analysis. As part of a data analysis trial, I first refined my dataset to only the San Diego survey location. I chose this dataset for its standardization and large sample size; the bulk of my sightings, over 4,000 of the 6,136, are from the San Diego survey site where the transect methods were highly standardized. In the next step, I selected explanatory variable datasets that covered the sighting data at similar spatial and temporal resolutions. This small endeavor in analyzing my data was the first big leap into understanding what questions are feasible in terms of variable selection and analysis methods. I developed four major hypotheses for this San Diego site.

The study species: common bottlenose dolphin (Tursiops truncatus) seen along the California coastline in 2015. Image source: Alexa Kownacki.

Hypotheses:

H1: I predict that bottlenose dolphin sightings along the San Diego transect throughout the years 1981-2015 exhibit clustered distribution patterns as a result of the patchy distributions of both the species’ preferred habitats, as well as the social nature of bottlenose dolphins.

H2: I predict there would be higher densities of bottlenose dolphin at higher latitudes spanning 1981-2015 due to prey distributions shifting northward and less human activities in the northerly sections of the transect.

H3: I predict that during warm (positive) El Niño Southern Oscillation (ENSO) months, the dolphin sightings in San Diego would be distributed more northerly, predominantly with prey aggregations historically shifting northward into cooler waters, due to (secondarily) increasing sea surface temperatures.

H4: I predict that along the San Diego coastline, bottlenose dolphin sightings are clustered within two kilometers of the six major lagoons, with no specific preference for any lagoon, because the murky, nutrient-rich waters in the estuarine environments are ideal for prey protection and known for their higher densities of schooling fishes.

Data Description:

The common bottlenose dolphin (Tursiops truncatus) sighting data spans 1981-2015 with a few gap years. Sightings cover all months, but not in all years sampled. The same transect in San Diego was surveyed in a small, rigid-hulled inflatable boat with approximately a two-kilometer observation area (one kilometer surveyed 90 degrees to starboard and port of the bow).

I wanted to see if there were changes in dolphin distribution by latitude and, if so, whether those changes had a relationship to ENSO cycles and/or distances to lagoons. For ENSO data, I used the NOAA database that provides positive, neutral, and negative indices (1, 0, and -1, respectively) by each month of each year. I matched these ENSO data to my month-date information of dolphin sighting data. Distance from each lagoon was calculated for each sighting.

Figure 1. Map representing the San Diego transect, represented with a light blue line inside of a one-kilometer buffered “sighting zone” in pale yellow. The dark pink shapes are dolphin sightings from 1981-2015, although some are stacked on each other and cannot be differentiated. The lagoons, ranging in size, are color-coded. The transect line runs from the breakwaters of Mission Bay, CA to Oceanside Harbor, CA.

Results: 

H1: True, dolphins are clustered and do not have a uniform distribution across this area. Spatial analysis indicated a less than a 1% likelihood that this clustered pattern could be the result of random chance (Fig. 1, z-score = -127.16, p-value < 0.0001). It is well-known that schooling fishes have a patchy distribution, which could influence the clustered distribution of their dolphin predators. In addition, bottlenose dolphins are highly social and although pods change in composition of individuals, the dolphins do usually transit, feed, and socialize in small groups.

Figure 2. Summary from the Average Nearest Neighbor calculation in ArcMap 10.6 displaying that bottlenose dolphin sightings in San Diego are highly clustered. When the z-score, which corresponds to different colors on the graphic above, is strongly negative (< -2.58), in this case dark blue, it indicates clustering. Because the p-value is very small, in this case, much less than 0.01, these results of clustering are strongly significant.

H2: False, dolphins do not occur at higher densities in the higher latitudes of the San Diego study site. The sightings are more clumped towards the lower latitudes overall (p < 2e-16), possibly due to habitat preference. The sightings are closer to beaches with higher human densities and human-related activities near Mission Bay, CA. It should be noted, that just north of the San Diego transect is the Camp Pendleton Marine Base, which conducts frequent military exercises and could deter animals.

Figure 3. Histogram comparing the latitudes with the frequency of dolphin sightings in San Diego, CA. The x-axis represents the latitudinal difference from the most northern part of the transect to each dolphin sighting. Therefore, a small difference would translate to a sighting being in the northern transect areas whereas large differences would translate to sightings being more southerly. This could be read from left to right as most northern to most southern. The y-axis represents the frequency of which those differences are seen, that is, the number of sightings with that amount of latitudinal difference, or essentially location on the transect line. Therefore, you can see there is a peak in the number of sightings towards the southern part of the transect line.

H3: False, during warm (positive) El Niño Southern Oscillation (ENSO) months, the dolphin sightings in San Diego were more southerly. In colder (negative) ENSO months, the dolphins were more northerly. The differences between sighting latitude and ENSO index was significant (p<0.005). Post-hoc analysis indicates that the north-south distribution of dolphin sightings was different during each ENSO state.

Figure 4. Boxplot visualizing distributions of dolphin sightings latitudinal differences and ENSO index, with -1,0, and 1 representing cold, neutral, and warm years, respectively.

H4: True, dolphins are clustered around particular lagoons. Figure 5 illustrates how dolphin sightings nearest to Lagoon 6 (the San Dieguito Lagoon) are always within 0.03 decimal degrees. Because of how these data are formatted, decimal degrees is the easiest way to measure change in distance (in this case, the difference in latitude). In comparison, dolphins at Lagoon 5 (Los Penasquitos Lagoon) are distributed across distances, with the most sightings further from the lagoon.

Figure 5. Bar plot displaying the different distances from dolphin sighting location to the nearest lagoon in San Diego in decimal degrees. Note: Lagoon 4 is south of the study site and therefore was never the nearest lagoon.

I found a significant difference between distance to nearest lagoon in different ENSO index categories (p < 2.55e-9): there is a significant difference in distance to nearest lagoon between neutral and negative values and positive and neutral years. Therefore, I hypothesize that in neutral ENSO months compared to positive and negative ENSO months, prey distributions are changing. This is one possible hypothesis for the significant difference in lagoon preference based on the monthly ENSO index. Using a violin plot (Fig. 6), it appears that Lagoon 5, Los Penasquitos Lagoon, has the widest variation of sighting distances in all ENSO index conditions. In neutral years, Lagoon 0, the Buena Vista Lagoon has multiple sightings, when in positive and negative years it had either no sightings or a single sighting. The Buena Vista Lagoon is the most northerly lagoon, which may indicate that in neutral ENSO months, dolphin pods are more northerly in their distribution.

Figure 6. Violin plot illustrating the distance from lagoons of dolphin sightings under different ENSO conditions. There are three major groups based on ENSO index: “-1” representing cold years, “0” representing neutral years, and “1” representing warm years. On the x-axis are lagoon IDs and on the y-axis is the distance to the nearest lagoon in decimal degrees. The wider the shapes, the more sightings, therefore Lagoon 6 has many sightings within a very small distance compared to Lagoon 5 where sightings are widely dispersed at greater distances.

 

Bottlenose dolphins foraging in a small group along the California coast in 2015. Image source: Alexa Kownacki.

Takeaways to science and management: 

Bottlenose dolphins have a clustered distribution which seems to be related to ENSO monthly indices, and likely, their social structures. From these data, neutral ENSO months appear to have something different happening compared to positive and negative months, that is impacting the sighting distributions of bottlenose dolphins off the San Diego coastline. More research needs to be conducted to determine what is different about neutral months and how this may impact this dolphin population. On a finer scale, the six lagoons in San Diego appear to have a spatial relationship with dolphin sightings. These lagoons may provide critical habitat for bottlenose dolphins and/or for their preferred prey either by protecting the animals or by providing nutrients. Different lagoons may have different spans of impact, that is, some lagoons may have wider outflows that create larger nutrient plumes.

Other than the Marine Mammal Protection Act and small protected zones, there are no safeguards in place for these dolphins, whose population hovers around 500 individuals. Therefore, specific coastal areas surrounding lagoons that are more vulnerable to habitat loss, habitat degradation, and/or are more frequented by dolphins, may want greater protection added at a local, state, or federal level. For example, the Batiquitos and San Dieguito Lagoons already contain some Marine Conservation Areas with No-Take Zones within their reach. The city of San Diego and the state of California need better ways to assess the coastlines in their jurisdictions and how protecting the marine, estuarine, and terrestrial environments near and encompassing the coastlines impacts the greater ecosystem.

This dive into my data was an excellent lesson in spatial scaling with regards to parsing down my data to a single study site and in matching my existing data sets to other data that could help answer my hypotheses. Originally, I underestimated the robustness of my data. At first, I hesitated when considering reducing the dolphin sighting data to only include San Diego because I was concerned that I would not be able to do the statistical analyses. However, these concerns were unfounded. My results are strongly significant and provide great insight into my questions about my data. Now, I can further apply these preliminary results and explore both finer and broader scale resolutions, such as using the more precise ENSO index values and finding ways to compare offshore bottlenose dolphin sighting distributions.

Data Wrangling to Assess Data Availability: A Data Detective at Work

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Data wrangling, in my own loose definition, is the necessary combination of both data selection and data collection. Wrangling your data requires accessing then assessing your data. Data collection is just what it sounds like: gathering all data points necessary for your project. Data selection is the process of cleaning and trimming data for final analyses; it is a whole new bag of worms that requires decision-making and critical thinking. During this process of data wrangling, I discovered there are two major avenues to obtain data: 1) you collect it, which frequently requires an exorbitant amount of time in the field, in the lab, and/or behind a computer, or 2) other people have already collected it, and through collaboration you put it to a good use (often a different use then its initial intent). The latter approach may result in the collection of so much data that you must decide which data should be included to answer your hypotheses. This process of data wrangling is the hurdle I am facing at this moment. I feel like I am a data detective.

Data wrangling illustrated by members of the R-programming community. (Image source: R-bloggers.com)

My project focuses on assessing the health conditions of the two ecotypes of bottlenose dolphins between the waters off of Ensenada, Baja California, Mexico to San Francisco, California, USA between 1981-2015. During the government shutdown, much of my data was inaccessible, seeing as it was in possession of my collaborators at federal agencies. However, now that the shutdown is over, my data is flowing in, and my questions are piling up. I can now begin to look at where these animals have been sighted over the past decades, which ecotypes have higher contaminant levels in their blubber, which animals have higher stress levels and if these are related to geospatial location, where animals are more susceptible to human disturbance, if sex plays a role in stress or contaminant load levels, which environmental variables influence stress levels and contaminant levels, and more!

Alexa, alongside collaborators, photographing transiting bottlenose dolphins along the coastline near Santa Barbara, CA in 2015 as part of the data collection process. (Image source: Nick Kellar).

Over the last two weeks, I was emailed three separate Excel spreadsheets representing three datasets, that contain partially overlapping data. If Microsoft Access is foreign to you, I would compare this dilemma to a very confusing exam question of “matching the word with the definition”, except with the words being in different languages from the definitions. If you have used Microsoft Access databases, you probably know the system of querying and matching data in different databases. Well, imagine trying to do this with Excel spreadsheets because the databases are not linked. Now you can see why I need to take a data management course and start using platforms other than Excel to manage my data.

A visual interpretation of trying to combine datasets being like matching the English definition to the Spanish translation. (Image source: Enchanted Learning)

In the first dataset, there are 6,136 sightings of Common bottlenose dolphins (Tursiops truncatus) documented in my study area. Some years have no sightings, some years have fewer than 100 sightings, and other years have over 500 sightings. In another dataset, there are 398 bottlenose dolphin biopsy samples collected between the years of 1992-2016 in a genetics database that can provide the sex of the animal. The final dataset contains records of 774 bottlenose dolphin biopsy samples collected between 1993-2018 that could be tested for hormone and/or contaminant levels. Some of these samples have identification numbers that can be matched to the other dataset. Within these cross-reference matches there are conflicting data in terms of amount of tissue remaining for analyses. Sorting these conflicts out will involve more digging from my end and additional communication with collaborators: data wrangling at its best. Circling back to what I mentioned in the beginning of this post, this data was collected by other people over decades and the collection methods were not standardized for my project. I benefit from years of data collection by other scientists and I am grateful for all of their hard work. However, now my hard work begins.

The cutest part of data wrangling: finding adorable images of bottlenose dolphins, photographed during a coastal survey. (Image source: Alexa Kownacki).

There is also a large amount of data that I downloaded from federally-maintained websites. For example, dolphin sighting data from research cruises are available for public access from the OBIS (Ocean Biogeographic Information System) Sea Map website. It boasts 5,927,551 records from 1,096 data sets containing information on 711 species with the help of 410 collaborators. This website is incredible as it allows you to search through different data criteria and then download the data in a variety of formats and contains an interactive map of the data. You can explore this at your leisure, but I want to point out the sheer amount of data. In my case, the OBIS Sea Map website is only one major platform that contains many sources of data that has already been collected, not specifically for me or my project, but will be utilized. As a follow-up to using data collected by other scientists, it is critical to give credit where credit is due. One of the benefits of using this website, is there is information about how to properly credit the collaborators when downloading data. See below for an example:

Example citation for a dataset (Dataset ID: 1201):

Lockhart, G.G., DiGiovanni Jr., R.A., DePerte, A.M. 2014. Virginia and Maryland Sea Turtle Research and Conservation Initiative Aerial Survey Sightings, May 2011 through July 2013. Downloaded from OBIS-SEAMAP (http://seamap.env.duke.edu/dataset/1201) on xxxx-xx-xx.

Citation for OBIS-SEAMAP:

Halpin, P.N., A.J. Read, E. Fujioka, B.D. Best, B. Donnelly, L.J. Hazen, C. Kot, K. Urian, E. LaBrecque, A. Dimatteo, J. Cleary, C. Good, L.B. Crowder, and K.D. Hyrenbach. 2009. OBIS-SEAMAP: The world data center for marine mammal, sea bird, and sea turtle distributions. Oceanography 22(2):104-115

Another federally-maintained data source that boasts more data than I can quantify is the well-known ERDDAP website. After a few Google searches, I finally discovered that the acronym stands for Environmental Research Division’s Data Access Program. Essentially, this the holy grail of environmental data for marine scientists. I have downloaded so much data from this website that Excel cannot open the csv files. Here is yet another reason why young scientists, like myself, need to transition out of using Excel and into data management systems that are developed to handle large-scale datasets. Everything from daily sea surface temperatures collected on every, one-degree of latitude and longitude line from 1981-2015 over my entire study site to Ekman transport levels taken every six hours on every longitudinal degree line over my study area. I will add some environmental variables in species distribution models to see which account for the largest amount of variability in my data. The next step in data selection begins with statistics. It is important to find if there are highly correlated environmental factors prior to modeling data. Learn more about fitting cetacean data to models here.

The ERDAPP website combined all of the average Sea Surface Temperatures collected daily from 1981-2018 over my study site into a graphical display of monthly composites. (Image Source: ERDDAP)

As you can imagine, this amount of data from many sources and collaborators is equal parts daunting and exhilarating. Before I even begin the process of determining the spatial and temporal spread of dolphin sightings data, I have to identify which data points have sex identified from either hormone levels or genetics, which data points have contaminants levels already quantified, which samples still have tissue available for additional testing, and so on. Once I have cleaned up the datasets, I will import the data into the R programming package. Then I can visualize my data in plots, charts, and graphs; this will help me identify outliers and potential challenges with my data, and, hopefully, start to see answers to my focal questions. Only then, can I dive into the deep and exciting waters of species distribution modeling and more advanced statistical analyses. This is data wrangling and I am the data detective.

What people may think a ‘data detective’ looks like, when, in reality, it is a person sitting at a computer. (Image source: Elder Research)

Like the well-known phrase, “With great power comes great responsibility”, I believe that with great data, comes great responsibility, because data is power. It is up to me as the scientist to decide which data is most powerful at answering my questions.

Data is information. Information is knowledge. Knowledge is power. (Image source: thedatachick.com)

 

Science (or the lack thereof) in the Midst of a Government Shutdown

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

In what is the longest government shutdown in the history of the United States, many people are impacted. Speaking from a scientist’s point of view, I acknowledge the scientific community is one of many groups that is being majorly obstructed. Here at the GEMM Laboratory, all of us are feeling the frustrations of the federal government grinding to a halt in different ways. Although our research spans great distances—from Dawn’s work on New Zealand blue whales that utilizes environmental data managed by our federal government, to new projects that cannot get federal permit approvals to state data collection, to many of Leigh’s projects on the Oregon coast of the USA that are funded and collaborate with federal agencies—we all recognize that our science is affected by the shutdown. My research on common bottlenose dolphins is no exception; my academic funding is through the US Department of Defense, my collaborators are NOAA employees who contribute NOAA data; I use publicly-available data for additional variables that are government-maintained; and I am part of a federally-funded public university. Ironically, my previous blog post about the intersection of science and politics seems to have become even more relevant in the past few weeks.

Many graduate students like me are feeling the crunch as federal agencies close their doors and operations. Most people have seen the headlines that allude to such funding-related issues. However, it’s important to understand what the funding in question is actually doing. Whether we see it or not, the daily operations of the United States Federal government helps science progress on a multitude of levels.

Federal research in the United States is critical. Most governmental branches support research with the most well-known agencies for doing so being the National Science Foundation (NSF), the US Department of Agriculture (USDA), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration. There are 137 executive agencies in the USA (cei.org). On a finer scale, NSF alone receives approximately 40,000 scientific proposals each year (nsf.gov).

If I play a word association game and I am given the word “science”, my response would be “data”. Data—even absence data—informs science. The largest aggregate of metadata with open resources lives in the centralized website, data.gov, which is maintained by the federal government and is no longer accessible and directs you to this message:Here are a few more examples of science that has stopped in its track from lesser-known research entities operated by the federal government:

Currently, the National Weather Service (NWS) is unable to maintain or improve its advanced weather models. Therefore, in addition to those of us who include weather or climate aspects into our research, forecasters are having less and less information on which to base their weather predictions. Prior to the shutdown, scientists were changing the data format of the Global Forecast System (GFS)—the most advanced mathematical, computer-based weather modeling prediction system in the USA. Unfortunately, the GFS currently does not recognize much of the input data it is receiving. A model is only as good as its input data (as I am sure Dawn can tell you), and currently that means the GFS is very limited. Many NWS models are upgraded January-June to prepare for storm season later in the year. Therefore, there are long-term ramifications for the lack of weather research advancement in terms of global health and safety. (https://www.washingtonpost.com/weather/2019/01/07/national-weather-service-is-open-your-forecast-is-worse-because-shutdown/?noredirect=on&utm_term=.5d4c4c3c1f59)

An example of one output from the GFS model. (Source: weather.gov)

The Food and Drug Administration (FDA)—a federal agency of the Department of Health and Human Services—that is responsible for food safety, has reduced inspections. Because domestic meat and poultry are at the highest risk of contamination, their inspections continue, but by staff who are going without pay, according to the agency’s commissioner, Dr. Scott Gottlieb. Produce, dry foods, and other lower-risk consumables are being minimally-inspected, if at all.  Active research projects investigating food-borne illness that receive federal funding are at a standstill.  Is your stomach doing flips yet? (https://www.nytimes.com/2019/01/09/health/shutdown-fda-food-inspections.html?rref=collection%2Ftimestopic%2FFood%20and%20Drug%20Administration&action=click&contentCollection=timestopics&region=stream&module=stream_unit&version=latest&contentPlacement=2&pgtype=collection)

An FDA field inspector examines imported gingko nuts–a process that is likely not happening during the shutdown. (Source: FDA.gov)

The National Parks Service (NPS) recently made headlines with the post-shutdown acts of vandalism in the iconic Joshua Tree National Park. What you might not know is that the shutdown has also stopped a 40-year study that monitors how streams are recovering from acid rain. Scientists are barred from entering the park and conducting sampling efforts in remote streams of Shenandoah National Park, Virginia. (http://www.sciencemag.org/news/2019/01/us-government-shutdown-starts-take-bite-out-science)

A map of the sampling sites that have been monitored since the 1980s for the Shenandoah Watershed Study and Virginia Trout Stream Sensitivity Study that cannot be accessed because of the shutdown. (Source: swas.evsc.virginia.edu)

NASA’s Stratospheric Observatory for Infrared Astronomy (SOFIA), better known as the “flying telescope” has halted operations, which will require over a week to bring back online upon funding restoration. SOFIA usually soars into the stratosphere as a tool to study the solar system and collect data that ground-based telescopes cannot. (http://theconversation.com/science-gets-shut-down-right-along-with-the-federal-government-109690)

NASA’s Stratospheric Observatory for Infrared Astronomy (SOFIA) flies over the snowy Sierra Nevada mountains while the telescope gathers information. (Source: NASA/ Jim Ross).

It is important to remember that science happens outside of laboratories and field sites; it happens at meetings and conferences where collaborations with other great minds brainstorm and discover the best solutions to challenging questions. The shutdown has stopped most federal travel. The annual American Meteorological Society Meeting and American Astronomical Society meeting were two of the scientific conferences in the USA that attract federal employees and took place during the shutdown. Conferences like these are crucial opportunities with lasting impacts on science. Think of all the impressive science that could have sparked at those meetings. Instead, many sessions were cancelled, and most major agencies had zero representation (https://spacenews.com/ams-2019-overview/). Topics like lidar data applications—which are used in geospatial research, such as what the GEMM Laboratory uses in some its projects, could not be discussed. The cascade effects of the shutdown prove that science is interconnected and without advancement, everyone’s research suffers.

It should be noted, that early-career scientists are thought to be the most negatively impacted by this shutdown because of financial instability and job security—as well as casting a dark cloud on their futures in science: largely unknown if they can support themselves, their families, and their research. (https://eos.org/articles/federal-government-shutdown-stings-scientists-and-science). Graduate students, young professors, and new professionals are all in feeling the pressure. Our lives are based on our research. When the funds that cover our basic research requirements and human needs do not come through as promised, we naturally become stressed.

An adult and a juvenile common bottlenose dolphin, forage along the San Diego coastline in November 2018. (Source: Alexa Kownacki)

So, yes, funding—or the lack thereof—is hurting many of us. Federally-funded individuals are selling possessions to pay for rent, research projects are at a standstill, and people are at greater health and safety risks. But, also, science, with the hope for bettering the world and answering questions and using higher thinking, is going backwards. Every day without progress puts us two days behind. At first glance, you may not think that my research on bottlenose dolphins is imperative to you or that the implications of the shutdown on this project are important. But, consider this: my study aims to quantify contaminants in common bottlenose dolphins that either live in nearshore or offshore waters. Furthermore, I study the short-term and long-term impacts of contaminants and other health markers on dolphin hormone levels. The nearshore common bottlenose dolphin stocks inhabit the highly-populated coastlines that many of us utilize for fishing and recreation. Dolphins are mammals, that respond to stress and environmental hazards, in similar ways to humans. So, those blubber hormone levels and contamination results, might be more connected to your health and livelihood than at first glance. The fact that I cannot download data from ERDDAP, reach my collaborators, or even access my data (that starts in the early 1980s), does impact you. Nearly everyone’s research is connected to each other’s at some level, and that, in turn has lasting impacts on all people—scientists or not. As the shutdown persists, I continue to question how to work through these research hurdles. If anything, it has been a learning experience that I hope will end soon for many reasons—one being: for science.

Big Data: Big possibilities with bigger challenges

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Did you know that Excel has a maximum number of rows? I do. During Winter Term for my GIS project, I was using Excel to merge oceanographic data, from a publicly-available data source website, and Excel continuously quit. Naturally, I assumed I had caused some sort of computer error. [As an aside, I’ve concluded that most problems related to technology are human error-based.] Therefore, I tried reformatting the data, restarting my computer, the program, etc. Nothing. Then, thanks to the magic of Google, I discovered that Excel allows no more than 1,048,576 rows by 16,384 columns. ONLY 1.05 million rows?! The oceanography data was more than 3 million rows—and that’s with me eliminating data points. This is what happens when we’re dealing with big data.

According to Merriam-Webster dictionary, big data is an accumulation of data that is too large and complex for processing by traditional database management tools (www.merriam-webster.com). However, there are journal articles, like this one from Forbes, that discuss the ongoing debate of how to define “big data”. According to the article, there are 12 major definitions; so, I’ll let you decide what you qualify as “big data”. Either way, I think that when Excel reaches its maximum row capacity, I’m working with big data.

Collecting oceanography data aboard the R/V Shimada. Photo source: Alexa K.

Here’s the thing: the oceanography data that I referred to was just a snippet of my data. Technically, it’s not even MY data; it’s data I accessed from NOAA’s ERDDAP website that had been consistently observed for the time frame of my dolphin data points. You may recall my blog about maps and geospatial analysis that highlights some of the reasons these variables, such as temperature and salinity, are important. However, what I didn’t previously mention was that I spent weeks working on editing this NOAA data. My project on common bottlenose dolphins overlays environmental variables to better understand dolphin population health off of California. These variables should have similar spatiotemporal attributes as the dolphin data I’m working with, which has a time series beginning in the 1980s. Without taking out a calculator, I still know that equates to a lot of data. Great data: data that will let me answer interesting, pertinent questions. But, big data nonetheless.

This is a screenshot of what the oceanography data looked like when I downloaded it to Excel. This format repeats for nearly 3 million rows.

Excel Screen Shot. Image source: Alexa K.

I showed this Excel spreadsheet to my GIS professor, and his response was something akin to “holy smokes”, with a few more expletives and a look of horror. It was not the sheer number of rows that shocked him; it was the data format. Nowadays, nearly everyone works with big data. It’s par for the course. However, the way data are formatted is the major split between what I’ll call “easy” data and “hard” data. The oceanography data could have been “easy” data. It could have had many variables listed in columns. Instead, this data  alternated between rows with variable headings and columns with variable headings, for millions of cells. And, as described earlier, this is only one example of big data and its challenges.

Data does not always come in a form with text and numbers; sometimes it appears as media such as photographs, videos, and audio files. Big data just got a whole lot bigger. While working as a scientist at NOAA’s Southwest Fisheries Science Center, one project brought in over 80 terabytes of raw data per year. The project centered on the eastern north pacific gray whale population, and, more specifically, its migration. Scientists have observed the gray whale migration annually since 1994 from Piedras Blancas Light Station for the Northbound migration, and 2 out of every 5 years from Granite Canyon Field Station (GCFS) for the Southbound migration. One of my roles was to ground-truth software that would help transition from humans as observers to computer as observers. One avenue we assessed was to compare how well a computer “counted” whales compared to people. For this question, three infrared cameras at the GCFS recorded during the same time span that human observers were counting the migratory whales. Next, scientists, such as myself, would transfer those video files, upwards of 80 TB, from the hard drives to Synology boxes and to a different facility–miles away. Synology boxes store arrays of hard drives and that can be accessed remotely. To review, three locations with 80 TB of the same raw data. Once the data is saved in triplet, then I could run a computer program, to detect whale. In summary, three months of recorded infrared video files requires upwards of 240 TB before processing. This is big data.

Scientists on an observation shift at Granite Canyon Field Station in Northern California. Photo source: Alexa K.
Alexa and another NOAA scientist watching for gray whales at Piedras Blancas Light Station. Photo source: Alexa K.

In the GEMM Laboratory, we have so many sources of data that I did not bother trying to count. I’m entering my second year of the Ph.D. program and I already have a hard drive of data that I’ve backed up three different locations. It’s no longer a matter of “if” you work with big data, it’s “how”. How will you format the data? How will you store the data? How will you maintain back-ups of the data? How will you share this data with collaborators/funders/the public?

The wonderful aspect to big data is in the name: big and data. The scientific community can answer more, in-depth, challenging questions because of access to data and more of it. Data is often the limiting factor in what researchers can do because increased sample size allows more questions to be asked and greater confidence in results. That, and funding of course. It’s the reason why when you see GEMM Lab members in the field, we’re not only using drones to capture aerial images of whales, we’re taking fecal, biopsy, and phytoplankton samples. We’re recording the location, temperature, water conditions, wind conditions, cloud cover, date/time, water depth, and so much more. Because all of this data will help us and help other scientists answer critical questions. Thus, to my fellow scientists, I feel your pain and I applaud you, because I too know that the challenges that come with big data are worth it. And, to the non-scientists out there, hopefully this gives you some insight as to why we scientists ask for external hard drives as gifts.

Leila launching the drone to collect aerial images of gray whales to measure body condition. Photo source: Alexa K.
Using the theodolite to collect tracking data on the Pacific Coast Feeding Group in Port Orford, OR. Photo source: Alexa K.

References:

https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

https://www.merriam-webster.com/dictionary/big%20data

The Land of Maps and Charts: Geospatial Ecology

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

I love maps. I love charts. As a random bit of trivia, there is a difference between a map and a chart. A map is a visual representation of land that may include details like topology, whereas a chart refers to nautical information such as water depth, shoreline, tides, and obstructions.

Map of San Diego, CA, USA. (Source: San Diego Metropolitan Transit System)
Chart of San Diego, CA, USA. (Source: NOAA)

I have an intense affinity for visually displaying information. As a child, my dad traveled constantly, from Barrow, Alaska to Istanbul, Turkey. Immediately upon his return, I would grab our standing globe from the dining room and our stack of atlases from the coffee table. I would sit at the kitchen table, enthralled at the stories of his travels. Yet, a story was only great when I could picture it for myself. (I should remind you, this was the early 1990s, GoogleMaps wasn’t a thing.) Our kitchen table transformed into a scene from Master and Commander—except, instead of nautical charts and compasses, we had an atlas the size of an overgrown toddler and salt and pepper shakers to pinpoint locations. I now had the world at my fingertips. My dad would show me the paths he took from our home to his various destinations and tell me about the topography, the demographics, the population, the terrain type—all attribute features that could be included in common-day geographic information systems (GIS).

Uncle Brian showing Alexa where they were on a map of Maui, Hawaii, USA. (Photo: Susan K. circa 1995)

As I got older, the kitchen table slowly began to resemble what I imagine the set from Master and Commander actually looked like; nautical charts, tide tables, and wind predictions were piled high and the salt and pepper shakers were replaced with pencil marks indicating potential routes for us to travel via sailboat. The two of us were in our element. Surrounded by visual and graphical representations of geographic and spatial information: maps. To put my map-attraction this in even more context, this is a scientist who grew up playing “Take-Off”, a board game that was “designed to teach geography” and involved flying your fleet of planes across a Mercator projection-style mapboard. Now, it’s no wonder that I’m a graduate student in a lab that focuses on the geospatial aspects of ecology.

A precocious 3-year-old Alexa, sitting with the airplane pilot asking him a long list of travel-related questions (and taking his captain’s hat). Photo: Susan K.

So why and how did geospatial ecology became a field—and a predominant one at that? It wasn’t that one day a lightbulb went off and a statistician decided to draw out the results. It was a progression, built upon for thousands of years. There are maps dating back to 2300 B.C. on Babylonian clay tablets (The British Museum), and yet, some of the maps we make today require highly sophisticated technology. Geospatial analysis is dynamic. It’s evolving. Today I’m using ArcGIS software to interpolate mass amounts of publicly-available sea surface temperature satellite data from 1981-2015, which I will overlay with a layer of bottlenose dolphin sightings during the same time period for comparison. Tomorrow, there might be a new version of software that allows me to animate these data. Heck, it might already exist and I’m not aware of it. This growth is the beauty of this field. Geospatial ecology is made for us cartophiles (map-lovers) who study the interdependency of biological systems where location and distance between things matters.

Alexa’s grandmother showing Alexa (a very young cartographer) how to color in the lines. Source: Susan K. circa 1994

In a broader context, geospatial ecology communicates our science to all of you. If I posted a bunch of statistical outputs in text or even table form, your eyes might glaze over…and so might mine. But, if I displayed that same underlying data and results on a beautiful map with color-coded symbology, a legend, a compass rose, and a scale bar, you might have this great “ah-ha!” moment. That is my goal. That is what geospatial ecology is to me. It’s a way to SHOW my science, rather than TELL it.

Would you like to see this over and over again…?

A VERY small glimpse into the enormous amount of data that went into this map. This screenshot gave me one point of temperature data for a single location for a single day…Source: Alexa K.

Or see this once…?

Map made in ArcGIS of Coastal common bottlenose dolphin sightings between 1981-1989 with a layer of average sea surface temperatures interpolated across those same years. A picture really is worth a thousand words…or at least a thousand data points…Source: Alexa K.

For many, maps are visually easy to interpret, allowing quick message communication. Yet, there are many different learning styles. From my personal story, I think it’s relatively obvious that I’m, at least partially, a visual learner. When I was in primary school, I would read the directions thoroughly, but only truly absorb the material once the teacher showed me an example. Set up an experiment? Sure, I’ll read the lab report, but I’m going to refer to the diagrams of the set-up constantly. To this day, I always ask for an example. Teach me a new game? Let’s play the first round and then I’ll pick it up. It’s how I learned to sail. My dad described every part of the sailboat in detail and all I heard was words. Then, my dad showed me how to sail, and it came naturally. It’s only as an adult that I know what “that blue line thingy” is called. Geospatial ecology is how I SEE my research. It makes sense to me. And, hopefully, it makes sense to some of you!

Alexa’s dad teaching her how to sail. (Source: Susan K. circa 2000)
Alexa’s first solo sailboat race in Coronado, San Diego, CA. Notice: Alexa’s dad pushing the bow off the dock and the look on Alexa’s face. (Source: Susan K. circa 2000)
Alexa mapping data using ArcGIS in the Oregon State University Library. (Source: Alexa K circa a few minutes prior to posting).

I strongly believe a meaningful career allows you to highlight your passions and personal strengths. For me, that means photography, all things nautical, the great outdoors, wildlife conservation, and maps/charts.  If I converted that into an equation, I think this is a likely result:

Photography + Nautical + Outdoors + Wildlife Conservation + Maps/Charts = Geospatial Ecology of Marine Megafauna

Or, better yet:

? + ⚓ + ? + ? + ? =  GEMM Lab

This lab was my solution all along. As part of my research on common bottlenose dolphins, I work on a small inflatable boat off the coast of California (nautical ✅, outdoors ✅), photograph their dorsal fin (photography ✅), and communicate my data using informative maps that will hopefully bring positive change to the marine environment (maps/charts ✅, wildlife conservation✅). Geospatial ecology allows me to participate in research that I deeply enjoy and hopefully, will make the world a little bit of a better place. Oh, and make maps.

Alexa in the field, putting all those years of sailing and chart-reading to use! (Source: Leila L.)

 

What REALLY is a Wildlife Biologist?

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

The first lecture slide. Source: Lecture1_Population Dynamics_Lou Botsford

This was the very first lecture slide in my population dynamics course at UC Davis. Population dynamics was infamous in our department for being an ultimate rite of passage due to its notoriously challenging curriculum. So, when Professor Lou Botsford pointed to his slide, all 120 of us Wildlife, Fish, and Conservation Biology majors, didn’t know how to react. Finally, he announced, “This [pointing to the slide] is all of you”. The class laughed. Lou smirked. Lou knew.

Lou knew that there is more truth to this meme than words could express. I can’t tell you how many times friends and acquaintances have asked me if I was going to be a park ranger. Incredibly, not all—or even most—wildlife biologists are park rangers. I’m sure that at one point, my parents had hoped I’d be holding a tiger cub as part of a conservation project—that has never happened. Society may think that all wildlife biologists want to walk in the footsteps of the famous Steven Irwin and say thinks like “Crikey!”—but I can’t remember the last time I uttered that exclamation with the exception of doing a Steve Irwin impression. Hollywood may think we hug trees—and, don’t get me wrong, I love a good tie-dyed shirt—but most of us believe in the principles of conservation and wise-use A.K.A. we know that some trees must be cut down to support our needs. Helicoptering into a remote location to dart and take samples from wild bear populations…HA. Good one. I tell myself this is what I do sometimes, and then the chopper crashes and I wake up from my dream. But, actually, a scientist staring at a computer with stacks of papers spread across every surface, is me and almost every wildlife biologist that I know.

The “dry lab” on the R/V Nathaniel B. Palmer en route to Antarctica. This room full of technology is where the majority of the science takes place. Drake Passage, International Waters in August 2015. Source: Alexa Kownacki

There is an illusion that wildlife biologists are constantly in the field doing all the cool, science-y, outdoors-y things while being followed by a National Geographic photojournalist. Well, let me break it to you, we’re not. Yes, we do have some incredible opportunities. For example, I happen to know that one lab member (eh-hem, Todd), has gotten up close and personal with wild polar bear cubs in the Arctic, and that all of us have taken part in some work that is worthy of a cover image on NatGeo. We love that stuff. For many of us, it’s those few, memorable moments when we are out in the field, wearing pants that we haven’t washed in days, and we finally see our study species AND gather the necessary data, that the stars align. Those are the shining lights in a dark sea of papers, grant-writing, teaching, data management, data analysis, and coding. I’m not saying that we don’t find our desk work enjoyable; we jump for joy when our R script finally runs and we do a little dance when our paper is accepted and we definitely shed a tear of relief when funding comes through (or maybe that’s just me).

A picturesque moment of being a wildlife biologist: Alexa and her coworker, Jim, surveying migrating gray whales. Piedras Blancas Light Station, San Simeon, CA in May 2017. Source: Alexa Kownacki.

What I’m trying to get at is that we accepted our fates as the “scientists in front of computers surrounded by papers” long ago and we embrace it. It’s been almost five years since I was a senior in undergrad and saw this meme for the first time. Five years ago, I wanted to be that scientist surrounded by papers, because I knew that’s where the difference is made. Most people have heard the quote by Mahatma Gandhi, “Be the change that you wish to see in the world.” In my mind, it is that scientist combing through relevant, peer-reviewed scientific papers while writing a compelling and well-researched article, that has the potential to make positive changes. For me, that scientist at the desk is being the change that he/she wish to see in the world.

Scientists aboard the R/V Nathaniel B. Palmer using the time in between net tows to draft papers and analyze data…note the facial expressions. Antarctic Peninsula in August 2015. Source: Alexa Kownacki.

One of my favorite people to colloquially reference in the wildlife biology field is Milton Love, a research biologist at the University of California Santa Barbara, because he tells it how it is. In his oh-so-true-it-hurts website, he has a page titled, “So You Want To Be A Marine Biologist?” that highlights what he refers to as, “Three really, really bad reasons to want to be a marine biologist” and “Two really, really good reasons to want to be a marine biologist”. I HIGHLY suggest you read them verbatim on his site, whether you think you want to be a marine biologist or not because they’re downright hilarious. However, I will paraphrase if you just can’t be bothered to open up a new tab and go down a laugh-filled wormhole.

Really, Really Bad Reasons to Want to be a Marine Biologist:

  1. To talk to dolphins. Hint: They don’t want to talk to you…and you probably like your face.
  2. You like Jacques Cousteau. Hint: I like cheese…doesn’t mean I want to be cheese.
  3. Hint: Lack thereof.

Really, Really Good Reasons to Want to be a Marine Biologist:

  1. Work attire/attitude. Hint: Dress for the job you want finally translates to board shorts and tank tops.
  2. You like it. *BINGO*
Alexa with colleagues showing the “cool” part of the job is working the zooplankton net tows. This DOES have required attire: steel-toed boots, hard hat, and float coat. R/V Nathaniel B. Palmer, Antarctic Peninsula in August 2015. Source: Alexa Kownacki.

In summary, as wildlife or marine biologists we’ve taken a vow of poverty, and in doing so, we’ve committed ourselves to fulfilling lives with incredible experiences and being the change we wish to see in the world. To those of you who want to pursue a career in wildlife or marine biology—even after reading this—then do it. And to those who don’t, hopefully you have a better understanding of why wearing jeans is our version of “business formal”.

A fieldwork version of a lab meeting with Leigh Torres, Tom Calvanese (Field Station Manager), Florence Sullivan, and Leila Lemos. Port Orford, OR in August 2017. Source: Alexa Kownacki.