Clara and I have just returned from ten fruitful days at sea aboard NOAA Ship Bell M. Shimada as part of the Northern California Current (NCC) ecosystem survey. We surveyed between Crescent City, California and La Push, Washington, collecting data on oceanography, phytoplankton, zooplankton, and marine mammals (Fig. 1). This year represents the third year I have participated in these NCC cruises, which I have come to cherish. I have become increasingly confident in my marine mammal observation and species identification skills, and I have become more accepting of the things out of my control – the weather, the sea state, the many sightings of “unidentified whale species”. Careful planning and preparation are critical, and yet out at sea we are ultimately at the whim of the powerful Pacific Ocean. Another aspect of the NCC cruises that I treasure is the time spent with members of the science team from other disciplines. The chatter about water column features, musings about plankton species composition, and discussions about what drives marine mammal distribution present lively learning opportunities throughout the cruise. Our concurrent data collection efforts and ongoing conversations allow us to piece together a comprehensive picture of this dynamic NCC ecosystem, and foster a collaborative research environment.
Every time I head to sea, I am reminded of the patchy distribution of resources in the vast and dynamic marine environment. On this recent cruise we documented a stark contrast between expansive stretches of warm, blue, stratified, and seemingly empty ocean and areas that were plankton-rich and supported multi-species feeding frenzies that had marine mammal observers like me scrambling to keep track of everything. This year, we were greeted by dozens of blue and humpback whales in the productive waters off Newport, Oregon. Off Crescent City, California, the water was very warm, the plankton community was dominated by gelatinous species like pyrosomes, salps, and other jellies, and the marine mammals were virtually absent except for a few groups of common dolphins. To the north, the plume of water flowing from the Columbia River created a front between water masses, where we found ourselves in the midst of pacific white-sided dolphins, northern right whale dolphins, and humpback whales. These observations highlight the strength of ecosystem-scale and multi-disciplinary data collection efforts such as the NCC surveys. By drawing together information on physical oceanography, primary productivity, zooplankton community composition and abundance, and marine predator distribution, we can gain a nearly comprehensive picture of the dynamics within the NCC over a broad spatial scale.
This year, the marine mammals delivered and kept us observers busy. We lucked out with good survey conditions and observed many different species throughout the NCC (Table 1, Fig. 2).
Table 1. Summary of all marine mammal sightings from the NCC September 2020 cruise.
This year’s NCC cruise was unique. We went to sea as a global pandemic, wildfires, and political tensions continue to strain this country and our communities. This cruise was the first NOAA Fisheries cruise to set sail since the start of the pandemic. Our team of scientists and the ship’s crew went to great lengths to make it possible, including a seven-day shelter-in-place period and COVID-19 tests prior to cruise departure. As a result of these extra challenges and preparations, I think we were all especially grateful to be on the water, collecting data. At-sea fieldwork is always challenging, but morale was up, spirits were high, and laughs were frequent despite smiles being concealed by our masks. I am grateful for the opportunity to participate in this ongoing valuable data collection effort, and to be part of this team. Thanks to all who made it such a memorable cruise.
Clara Bird, Masters Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab
The GEMM Lab gray whale team is in the midst of preparing for our fifth field season studying the Pacific Coast Foraging Group (PCFG): whales that forage off the coast of Newport, OR, USA each summer. On any given good weather day from June to October, our team is out on the water in a small zodiac looking for gray whales (Figure 1). When we find a gray whale, we try to collect photo ID data, fecal samples, drone data, and behavioral data. We use the drone data to study both the whale’s body condition and their behavior. In a previous blog, I described ethograms and how I would like to use the behavior data from drone videos to classify behaviors, with the ultimate goal of understanding how gray whale behavior varies across space, time, and by individual. However, this explanation of studying whale behavior is actually a bit incomplete. Before we start fieldwork, we first need to decide how to collect that data.
As observers, we are far from omnipresent and there is no way to know what the animals are doing all of the time. In any environment, scientists have to decide when and where to observe their animals and what behaviors they are interested in recording. In many studies, behavior is recorded live by an observer. In those studies, other limitations need to be taken into account, such as human error and observer fatigue. Collecting behavioral data is particularly challenging in the marine environment. Cetaceans spend most of their lives out of sight from humans, their time at the surface is brief, and when they appear together in large groups it can be very difficult to keep track of who is doing what when. Imagine being in a boat trying to keep track of what three different whales are doing without a pre-determined method – the task could quickly become overwhelming and biased. This is why we need a methodology for collecting and classifying behavior. We cannot study behavior without acknowledging these limitations and the potential biases that come with the methods we choose. Different data collection methods are better suited to address different questions.
The use of drones gives us the ability to record cetacean behavior non-invasively, from a perspective that allows greater observation (Figure 2, Torres et al. 2018), and for later review, which is a significant improvement. However, as we prepare to collect more behavior data, we need to study the methods and understand the benefits and disadvantages of each approach so that we capture the information we need without bias. Altmann (1974) provides a thorough overview of behavioral sampling methods.
Ad libitum behavioral sampling has no structure and occurs when we find a group of whales and just write down everything they are doing. This method is a good first step, however it comes with bias. Without structure, we cannot be sure that there was an equal probability of detecting each kind of behavior; this problem is called detectability bias. This type of bias is an issue if we are trying to answer questions about how often a behavior occurs, or what percent of time is spent in each behavior state. This is a bias to be especially concerned about when it comes to cetaceans because there are many examples of behaviors with different levels of detectability. An extreme example would be the detectability of breaching versus a behavior that takes place under the surface. A breaching whale is easier to spot and more exciting, which could lead to results suggesting that whales breach more often than they do relative to underwater behaviors. While it’s impossible to eliminate detectability bias, other sampling methods employ decision rules to try and reduce its effect. Many decision rules revolve around time, such as setting a minimum or maximum observation time interval. Other time rules involve recording the behavior state at set intervals of time (e.g., every 5 minutes). Setting observation boundaries helps standardize the methods and the data being collected.
In a structured sampling plan, the first big decision that needs to be addressed is the need to know the duration of behaviors. Point events do not include duration data but can be used to study the frequencies of behaviors. For example, if my research question was “Do whales perform “headstands” in a specific habitat type?”, then I would need point events of headstanding behavior. But, if I wanted to ask, “Do whales spend more time spent headstanding in a specific habitat type than in other habitat types?”, I would need headstanding to be a state event. State events are events with associated duration information and can be used for activity budgets. Activity budgets show how much time an animal spends in each behavior state. Some sampling methods focus on collecting only point events. However, to get the most complete understanding of behavior I think it’s important to collect both. Focal animal follows are another method of collecting more detailed data and is commonly used in cetacean studies.
The explanation of a focal follow method is in the name. We focus on one individual, follow it, and record all of its behaviors. When employing this method, decisions are made about how an individual is chosen and how long it is followed. In some cases, the behavior of this animal is used as a proxy for the behavior of an entire group. I essentially use the focal follow method in my research. While I review drone footage to record behavioral data instead of recording behaviors live in the field, I focus on one individual a time as I go through the videos. To do this I use a software called BORIS (Friard and Gamba 2016) to mark the time of each behavior per individual (Figure 3). If there are three individuals in a video, I’ll review the footage three times to record behaviors once per individual, focusing on each in turn.
While the drone footage brings the advantages of time to review and a better view of the whale, we are constrained by the duration of a flight. Focal follows would ideally last longer than the ~15 minutes of battery life per drone flight. Our previously collected footage gives us snapshots of behavior, and this makes it challenging to compare and analyze durations of behaviors. Therefore, I am excited that we are going to try conducting drone focal follows this summer by swapping out drones when power runs low to achieve longer periods of video coverage of whale behavior. I’ll be able to use these data to move from snapshots to analyzing longer clips and better understanding the behavioral ecology of gray whales. As exciting as this opportunity is, it also presents the challenge of method development. So, I now need to develop decision rules and data collection methods to answer the questions that I have been eagerly asking.
Friard, Olivier, and Marco Gamba. 2016. “BORIS: A Free, Versatile Open-Source Event-Logging Software for Video/Audio Coding and Live Observations.” Methods in Ecology and Evolution 7 (11): 1325–30. https://doi.org/10.1111/2041-210X.12584.
Torres, Leigh G., Sharon L. Nieukirk, Leila Lemos, and Todd E. Chandler. 2018. “Drone up! Quantifying Whale Behavior from a New Perspective Improves Observational Capacity.” Frontiers in Marine Science 5 (SEP). https://doi.org/10.3389/fmars.2018.00319.
Clara Bird, Masters Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab
The GEMM lab recently completed its fourth field season studying gray whales along the Oregon coast. The 2019 field season was an especially exciting one, we collected rare footage of several interesting gray whale behaviors including GoPro footage of a gray whale feeding on the seafloor, drone footage of a gray whale breaching, and drone footage of surface feeding (check out our recently released highlight video here). For my master’s thesis, I’ll use the drone footage to analyze gray whale behavior and how it varies across space, time, and individual. But before I ask how behavior is related to other variables, I need to understand how to best classify the behaviors.
How do we collect data on behavior?
One of the most important tools in behavioral ecology is an ‘ethogram’. An ethogram is a list of defined behaviors that the researcher expects to see based on prior knowledge. It is important because it provides a standardized list of behaviors so the data can be properly analyzed. For example, without an ethogram, someone observing human behavior could say that their subject was walking on one occasion, but then say strolling on a different occasion when they actually meant walking. It is important to pre-determine how behaviors will be recorded so that data classification is consistent throughout the study. Table 1 provides a sample from the ethogram I use to analyze gray whale behavior. The specificity of the behaviors depends on how the data is collected.
In marine mammal ecology, it is challenging to define specific behaviors because from the traditional viewpoint of a boat, we can only see what the individuals are doing at the surface. The most common method of collecting behavioral data is called a ‘focal follow’. In focal follows an individual, or group, is followed for a set period of time and its behavioral state is recorded at set intervals. For example, a researcher might decide to follow an animal for an hour and record its behavioral state at each minute (Mann 1999). In some studies, they also recorded the location of the whale at each time point. When we use drones our methods are a little different; we collect behavioral data in the form of continuous 15-minute videos of the whale. While we collect data for a shorter amount of time than a typical focal follow, we can analyze the whole video and record what the whale was doing at each second with the added benefit of being able to review the video to ensure accuracy. Additionally, from the drone’s perspective, we can see what the whales are doing below the surface, which can dramatically improve our ability to identify and describe behaviors (Torres et al. 2018).
In our ethogram, the behaviors are already categorized into primary states. Primary states are the broadest behavioral states, and in my study, they are foraging, traveling, socializing, and resting. We categorize the specific behaviors we observe in the drone videos into these categories because they are associated with the function of a behavior. While our categorization is based on prior knowledge and critical evaluation, this process can still be somewhat subjective. Quantitative methods provide an objective interpretation of the behaviors that can confirm our broad categorization and provide insight into relationships between categories. These methods include path characterization, cluster analysis, and sequence analysis.
Path characterization classifies behaviors using characteristics of their track line, this method is similar to the RST method that fellow GEMM lab graduate student Lisa Hildebrand described in a recent blog. Mayo and Marx (1990) analyzed the paths of surface foraging North Atlantic Right Whales and were able to classify the paths into primary states; they found that the path of a traveling whale was more linear and then paths of foraging or socializing whales that were more convoluted (Fig 1). I plan to analyze the drone GPS track line as a proxy for the whale’s track line to help distinguish between traveling and foraging in the cases where the 15-minute snapshot does not provide enough context.
Cluster analysis looks for natural groupings in behavior. For example, Hastie et al. (2004) used cluster analysis to find that there were four natural groupings of bottlenose dolphin surface behaviors (Fig. 2). I am considering using this method to see if there are natural groupings of behaviors within the foraging primary state that might relate to different prey types or habitat. This process is analogous to breaking human foraging down into sub-categories like fishing or farming by looking for different foraging behaviors that typically occur together.
Lastly, sequence analysis also looks for groupings of behaviors but, unlike cluster analysis, it also uses the order in which behaviors occur. Slooten (1994) used this method to classify Hector’s dolphin surface behaviors and found that there were five classes of behaviors and certain behaviors connected the different categories (Fig. 3). This method is interesting because if there are certain behaviors that are consistently in the same order then that indicates that the order of events is important. What function does a specific sequence of behaviors provide that the behaviors out of that order do not?
Think about harvesting fruits and
vegetables from a garden: the order of how things are done matters and you
might use different methods to harvest different kinds of produce. Without
knowing what food was being harvested, these methods could detect that there
were different harvesting methods for different fruits or veggies. By then
studying when and where the different methods were used and by whom, we could
gain insight into the different functions and patterns associated with the
different behaviors. We might be able to detect that some methods were always
used in certain habitat types or that different methods were consistently used
at different times of the year.
Behavior classification methods such as these described provide a more refined and detailed analysis of categories that can then be used to identify patterns of gray whale behaviors. While our ultimate goal is to understand how gray whales will be affected by a changing environment, a comprehensive understanding of their current behavior serves as a baseline for that future study.
Burnett, J. D., Lemos,
L., Barlow, D., Wing, M. G., Chandler, T., & Torres, L. G. (2019).
Estimating morphometric attributes of baleen whales with photogrammetry from
small UASs: A case study with blue and gray whales. Marine Mammal Science, 35(1),
Darling, J. D., Keogh, K. E., & Steeves, T. E. (1998).
Gray whale (Eschrichtius robustus) habitat utilization and prey species off
Vancouver Island, B.C. Marine Mammal
Science, 14(4), 692–720.
Hastie, G. D., Wilson, B., Wilson, L. J., Parsons, K. M.,
& Thompson, P. M. (2004). Functional mechanisms underlying cetacean
distribution patterns: Hotspots for bottlenose dolphins are linked to foraging.
Marine Biology, 144(2), 397–403. https://doi.org/10.1007/s00227-003-1195-4
Mann, J. (1999). Behavioral sampling methods for cetaceans:
A review and critique. Marine Mammal
Science, 15(1), 102–122.
Slooten, E. (1994). Behavior of Hector’s Dolphin:
Classifying Behavior by Sequence Analysis. Journal
of Mammalogy, 75(4), 956–964.
Torres, L. G., Nieukirk, S. L., Lemos, L., & Chandler,
T. E. (2018). Drone up! Quantifying whale behavior from a new perspective
improves observational capacity. Frontiers
in Marine Science, 5(SEP).
Mayo, C. A., & Marx, M. K. (1990). Surface foraging
behaviour of the North Atlantic right whale, Eubalaena glacialis, and
associated zooplankton characteristics. Canadian
Journal of Zoology, 68(10),
By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab
Data wrangling, in my own loose definition, is the necessary combination of both data selection and data collection. Wrangling your data requires accessing then assessing your data. Data collection is just what it sounds like: gathering all data points necessary for your project. Data selection is the process of cleaning and trimming data for final analyses; it is a whole new bag of worms that requires decision-making and critical thinking. During this process of data wrangling, I discovered there are two major avenues to obtain data: 1) you collect it, which frequently requires an exorbitant amount of time in the field, in the lab, and/or behind a computer, or 2) other people have already collected it, and through collaboration you put it to a good use (often a different use then its initial intent). The latter approach may result in the collection of so much data that you must decide which data should be included to answer your hypotheses. This process of data wrangling is the hurdle I am facing at this moment. I feel like I am a data detective.
My project focuses on assessing the health conditions of the two ecotypes of bottlenose dolphins between the waters off of Ensenada, Baja California, Mexico to San Francisco, California, USA between 1981-2015. During the government shutdown, much of my data was inaccessible, seeing as it was in possession of my collaborators at federal agencies. However, now that the shutdown is over, my data is flowing in, and my questions are piling up. I can now begin to look at where these animals have been sighted over the past decades, which ecotypes have higher contaminant levels in their blubber, which animals have higher stress levels and if these are related to geospatial location, where animals are more susceptible to human disturbance, if sex plays a role in stress or contaminant load levels, which environmental variables influence stress levels and contaminant levels, and more!
Over the last two weeks, I was emailed three separate Excel spreadsheets representing three datasets, that contain partially overlapping data. If Microsoft Access is foreign to you, I would compare this dilemma to a very confusing exam question of “matching the word with the definition”, except with the words being in different languages from the definitions. If you have used Microsoft Access databases, you probably know the system of querying and matching data in different databases. Well, imagine trying to do this with Excel spreadsheets because the databases are not linked. Now you can see why I need to take a data management course and start using platforms other than Excel to manage my data.
In the first dataset, there are 6,136 sightings of Common bottlenose dolphins (Tursiops truncatus) documented in my study area. Some years have no sightings, some years have fewer than 100 sightings, and other years have over 500 sightings. In another dataset, there are 398 bottlenose dolphin biopsy samples collected between the years of 1992-2016 in a genetics database that can provide the sex of the animal. The final dataset contains records of 774 bottlenose dolphin biopsy samples collected between 1993-2018 that could be tested for hormone and/or contaminant levels. Some of these samples have identification numbers that can be matched to the other dataset. Within these cross-reference matches there are conflicting data in terms of amount of tissue remaining for analyses. Sorting these conflicts out will involve more digging from my end and additional communication with collaborators: data wrangling at its best. Circling back to what I mentioned in the beginning of this post, this data was collected by other people over decades and the collection methods were not standardized for my project. I benefit from years of data collection by other scientists and I am grateful for all of their hard work. However, now my hard work begins.
There is also a large amount of data that I downloaded from federally-maintained websites. For example, dolphin sighting data from research cruises are available for public access from the OBIS (Ocean Biogeographic Information System) Sea Map website. It boasts 5,927,551 records from 1,096 data sets containing information on 711 species with the help of 410 collaborators. This website is incredible as it allows you to search through different data criteria and then download the data in a variety of formats and contains an interactive map of the data. You can explore this at your leisure, but I want to point out the sheer amount of data. In my case, the OBIS Sea Map website is only one major platform that contains many sources of data that has already been collected, not specifically for me or my project, but will be utilized. As a follow-up to using data collected by other scientists, it is critical to give credit where credit is due. One of the benefits of using this website, is there is information about how to properly credit the collaborators when downloading data. See below for an example:
Example citation for a dataset (Dataset ID: 1201):
Lockhart, G.G., DiGiovanni Jr., R.A., DePerte, A.M. 2014. Virginia and Maryland Sea Turtle Research and Conservation Initiative Aerial Survey Sightings, May 2011 through July 2013. Downloaded from OBIS-SEAMAP (http://seamap.env.duke.edu/dataset/1201) on xxxx-xx-xx.
Another federally-maintained data source that boasts more data than I can quantify is the well-known ERDDAP website. After a few Google searches, I finally discovered that the acronym stands for Environmental Research Division’s Data Access Program. Essentially, this the holy grail of environmental data for marine scientists. I have downloaded so much data from this website that Excel cannot open the csv files. Here is yet another reason why young scientists, like myself, need to transition out of using Excel and into data management systems that are developed to handle large-scale datasets. Everything from daily sea surface temperatures collected on every, one-degree of latitude and longitude line from 1981-2015 over my entire study site to Ekman transport levels taken every six hours on every longitudinal degree line over my study area. I will add some environmental variables in species distribution models to see which account for the largest amount of variability in my data. The next step in data selection begins with statistics. It is important to find if there are highly correlated environmental factors prior to modeling data. Learn more about fitting cetacean data to models here.
As you can imagine, this amount of data from many sources and collaborators is equal parts daunting and exhilarating. Before I even begin the process of determining the spatial and temporal spread of dolphin sightings data, I have to identify which data points have sex identified from either hormone levels or genetics, which data points have contaminants levels already quantified, which samples still have tissue available for additional testing, and so on. Once I have cleaned up the datasets, I will import the data into the R programming package. Then I can visualize my data in plots, charts, and graphs; this will help me identify outliers and potential challenges with my data, and, hopefully, start to see answers to my focal questions. Only then, can I dive into the deep and exciting waters of species distribution modeling and more advanced statistical analyses. This is data wrangling and I am the data detective.
Like the well-known phrase, “With great power comes great responsibility”, I believe that with great data, comes great responsibility, because data is power. It is up to me as the scientist to decide which data is most powerful at answering my questions.
By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab
Did you know that Excel has a maximum number of rows? I do. During Winter Term for my GIS project, I was using Excel to merge oceanographic data, from a publicly-available data source website, and Excel continuously quit. Naturally, I assumed I had caused some sort of computer error. [As an aside, I’ve concluded that most problems related to technology are human error-based.] Therefore, I tried reformatting the data, restarting my computer, the program, etc. Nothing. Then, thanks to the magic of Google, I discovered that Excel allows no more than 1,048,576 rows by 16,384 columns. ONLY 1.05 million rows?! The oceanography data was more than 3 million rows—and that’s with me eliminating data points. This is what happens when we’re dealing with big data.
According to Merriam-Webster dictionary, big data is an accumulation of data that is too large and complex for processing by traditional database management tools (www.merriam-webster.com). However, there are journal articles, like this one from Forbes, that discuss the ongoing debate of how to define “big data”. According to the article, there are 12 major definitions; so, I’ll let you decide what you qualify as “big data”. Either way, I think that when Excel reaches its maximum row capacity, I’m working with big data.
Here’s the thing: the oceanography data that I referred to was just a snippet of my data. Technically, it’s not even MY data; it’s data I accessed from NOAA’s ERDDAP website that had been consistently observed for the time frame of my dolphin data points. You may recall my blog about maps and geospatial analysis that highlights some of the reasons these variables, such as temperature and salinity, are important. However, what I didn’t previously mention was that I spent weeks working on editing this NOAA data. My project on common bottlenose dolphins overlays environmental variables to better understand dolphin population health off of California. These variables should have similar spatiotemporal attributes as the dolphin data I’m working with, which has a time series beginning in the 1980s. Without taking out a calculator, I still know that equates to a lot of data. Great data: data that will let me answer interesting, pertinent questions. But, big data nonetheless.
This is a screenshot of what the oceanography data looked like when I downloaded it to Excel. This format repeats for nearly 3 million rows.
I showed this Excel spreadsheet to my GIS professor, and his response was something akin to “holy smokes”, with a few more expletives and a look of horror. It was not the sheer number of rows that shocked him; it was the data format. Nowadays, nearly everyone works with big data. It’s par for the course. However, the way data are formatted is the major split between what I’ll call “easy” data and “hard” data. The oceanography data could have been “easy” data. It could have had many variables listed in columns. Instead, this data alternated between rows with variable headings and columns with variable headings, for millions of cells. And, as described earlier, this is only one example of big data and its challenges.
Data does not always come in a form with text and numbers; sometimes it appears as media such as photographs, videos, and audio files. Big data just got a whole lot bigger. While working as a scientist at NOAA’s Southwest Fisheries Science Center, one project brought in over 80 terabytes of raw data per year. The project centered on the eastern north pacific gray whale population, and, more specifically, its migration. Scientists have observed the gray whale migration annually since 1994 from Piedras Blancas Light Station for the Northbound migration, and 2 out of every 5 years from Granite Canyon Field Station (GCFS) for the Southbound migration. One of my roles was to ground-truth software that would help transition from humans as observers to computer as observers. One avenue we assessed was to compare how well a computer “counted” whales compared to people. For this question, three infrared cameras at the GCFS recorded during the same time span that human observers were counting the migratory whales. Next, scientists, such as myself, would transfer those video files, upwards of 80 TB, from the hard drives to Synology boxes and to a different facility–miles away. Synology boxes store arrays of hard drives and that can be accessed remotely. To review, three locations with 80 TB of the same raw data. Once the data is saved in triplet, then I could run a computer program, to detect whale. In summary, three months of recorded infrared video files requires upwards of 240 TB before processing. This is big data.
In the GEMM Laboratory, we have so many sources of data that I did not bother trying to count. I’m entering my second year of the Ph.D. program and I already have a hard drive of data that I’ve backed up three different locations. It’s no longer a matter of “if” you work with big data, it’s “how”. How will you format the data? How will you store the data? How will you maintain back-ups of the data? How will you share this data with collaborators/funders/the public?
The wonderful aspect to big data is in the name: big and data. The scientific community can answer more, in-depth, challenging questions because of access to data and more of it. Data is often the limiting factor in what researchers can do because increased sample size allows more questions to be asked and greater confidence in results. That, and funding of course. It’s the reason why when you see GEMM Lab members in the field, we’re not only using drones to capture aerial images of whales, we’re taking fecal, biopsy, and phytoplankton samples. We’re recording the location, temperature, water conditions, wind conditions, cloud cover, date/time, water depth, and so much more. Because all of this data will help us and help other scientists answer critical questions. Thus, to my fellow scientists, I feel your pain and I applaud you, because I too know that the challenges that come with big data are worth it. And, to the non-scientists out there, hopefully this gives you some insight as to why we scientists ask for external hard drives as gifts.