Finding the right fit: a journey into cetacean distribution models

Solène Derville, Entropie Lab, French National Institute for Sustainable Development (IRD – UMR Entropie), Nouméa, New Caledonia

 Ph.D. student under the co-supervision of Dr. Leigh Torres

Species Distribution Models (SDM), also referred to as ecological niche models, may be defined as “a model that relates species distribution data (occurrence or abundance at known locations) with information on the environmental and/or spatial characteristics of those locations” (Elith & Leathwick, 2009)⁠. In the last couple decades, SDMs have become an indispensable part of the ecologists’ and conservationists’ toolbox. What scientist has not dreamed of being able to summarize a species’ environmental requirements and predict where and when it will occur, all in one tiny statistical model? It sounds like magic… but the short acronym “SDM” is the pretty front window of an intricate and gigantic research field that may extend way beyond the skills of a typical ecologist (even so for a graduate student like myself).

As part of my PhD thesis about the spatial ecology of humpback whales in New Caledonia, South Pacific, I was planning on producing a model to predict their distribution in the region and help spatial planning within the Natural Park of the Coral Sea. An innocent and seemingly perfectly feasible plan for a second year PhD student. To conduct this task, I had at my disposal more than 1,000 sightings recorded during dedicated surveys at sea conducted over 14 years. These numbers seem quite sufficient, considering the rarity of cetaceans and the technical challenges of studying them at sea. And there was more! The NGO Opération Cétacés  also recorded over 600 sightings reported by the general public in the same time period and deployed more than 40 satellite tracking tags to follow individual whale movements. In a field where it is so hard to acquire data, it felt like I had to use it all, though I was not sure how to combine all these types of data, with their respective biases, scales and assumptions.

One important thing about SDM to remember: it is like a cracker section in a US grocery shop, there is sooooo much choice! As I reviewed the possibilities and tested various modeling approaches on my data I realized that this study might be a good opportunity to contribute to the SDM field, by conducting a comparison of various algorithms using cetacean occurrence data from multiple sources. The results of this work was just published  in Diversity and Distributions:

Derville S, Torres LG, Iovan C, Garrigue C. (2018) Finding the right fit: Comparative cetacean distribution models using multiple data sources and statistical approaches. Divers Distrib. 2018;00:1–17. https://doi. org/10.1111/ddi.12782

There are simply too many! Anonymous grocery shops, Corvallis, OR
Credit: Dawn Barlow

If you are a new-comer to the SDM world, and specifically its application to the marine environment, I hope you find this interesting. If you are a seasoned SDM user, I would be very grateful to read your thoughts in the comment section! Feel free to disagree!

So what is the take-home message from this work?

  • There is no such thing as a “best model”; it all depends on what you want your model to be good at (the descriptive vs predictive dichotomy), and what criteria you use to define the quality of your models.

The predictive vs descriptive goal of the model: This is a tricky choice to make, yet it should be clearly identified upfront. Most times, I feel like we want our models to be decently good at both tasks… It is a risky approach to blindly follow the predictions of a complex model without questioning the meaning of the ecological relationships it fitted. On the other hand, conservation applications of models often require the production of predicted maps of species’ probability of presence or habitat suitability.

The criteria for model selection: How could we imagine that the complexity of animal behavior could be summarized in a single metric, such as the famous Akaike Information criterion (AIC) or the Area under the ROC Curve (AUC)? My study, and that of others (e.g. Elith & Graham  H., 2009),⁠ emphasize the importance of looking at multiple aspects of model outputs: raw performance through various evaluation metrics (e.g. see AUCdiff; (Warren & Seifert, 2010)⁠, contribution of the variables to the model, shape of the fitted relationships through Partial Dependence Plots (PDP, Friedman, 2001),⁠ and maps of predicted habitat suitability and associated error. Spread all these lines of evidence in front of you, summarize all the metrics, add a touch of critical ecological thinking to decide on the best approach for your modeling question, and Abracadabra! You end up a bit lost in a pile of folders… But at least you assessed the quality of your work from every angle!

  • Cetacean SDMs often serve a conservation goal. Hence, their capacity to predict to areas / times that were not recorded in the data (which is often scarce) is paramount. This extrapolation performance may be restricted when the model relationships are overfitted, which is when you made your model fit the data so closely that you are unknowingly modeling noise rather than a real trend. Using cross-validation is a good method to prevent overfitting from happening (for a thorough review: Roberts et al., 2017)⁠. Also, my study underlines that certain algorithms inherently have a tendency to overfit. We found that Generalized Additive Models and MAXENT provided a valuable complexity trade-off to promote the best predictive performance, while minimizing overfitting. In the case of GAMs, I would like to point out the excellent documentation that exist on their use (Wood, 2017)⁠, and specifically their application to cetacean spatial ecology (Mannocci, Roberts, Miller, & Halpin, 2017; Miller, Burt, Rexstad, & Thomas, 2013; Redfern et al., 2017).⁠
  • Citizen science is a promising tool to describe cetacean habitat. Indeed, we found that models of habitat suitability based on citizen science largely converged with those based on our research surveys. The main issue encountered when modeling this type of data is the absence of “effort”. Basically, we know where people observed whales, but we do not know where they haven’t… or at least not with the accuracy obtained from research survey data. However, with some information about our citizen scientists and a little deduction, there is actually a lot you can infer about opportunistic data. For instance, in New Caledonia most of the sightings were reported by professional whale-watching operators or by the general public during fishing/diving/boating day trips. Hence, citizen scientists rarely stray far from harbors and spend most of their time in the sheltered waters of the New Caledonian lagoon. This reasoning provides the sort of information that we integrated in our modeling approach to account for spatial sampling bias of citizen science data and improve the model’s predictive performance.

Many more technical aspects of SDM are brushed over in this paper (for detailed and annotated R codes of the modeling approaches, see supplementary information of our paper). There are a few that are not central to the paper, but that I think are worth sharing:

  • Collinearity of predictors: Have you ever found that the significance of your predictors completely changed every time you removed a variable? I have progressively come to discover how unstable a model can be because of predictor collinearity (and the uneasy feeling that comes with it …). My new motto is to ALWAYS check cross-correlation between my predictors, and do it THOROUGHLY. A few aspects that may make a big difference in the estimation of collinearity patterns are to: (1) calculate Pearson vs Spearman coefficients, (2) check correlations between the values recorded at the presence points vs over the whole study area, and (3) assess the correlations between raw environmental variables vs between transformed variables (log-transformed, etc). Though selecting variables with Pearson coefficients < 0.7 is usually a good rule (Dormann et al., 2013), I would worry of anything above 0.5, or at least keep it in mind during model interpretation.
  • Cross-validation: If removing 10% of my dataset greatly impacts the model results, I feel like cross-validation is critical. The concept is based on a simple assumption, if I had sampled a given population/phenomenon/system slightly differently, would I have come to the same conclusion? Cross-validation comes in many different methods, but the basic concept is to run the same model several times (number of times may depend on the size of your data set, hierarchical structure of your data, computation power of your computer, etc.) over different chunks of your data. Model performance metrics (e.g., AUC) and outputs (e.g., partial dependence plots) are than summarized on the many runs, using mean/median and standard deviation/quantiles. It is up to you how to pick these chunks, but before doing this at random I highly recommend reading Roberts et al. (2017).

The evil of the R2: I am probably not the first student to feel like what I have learned in my statistical classes at school is in practice, at best, not very useful, and at worst, dangerously misleading. Of course, I do understand that we must start somewhere, and that learning the basics of inferential statistics is a necessary step to, one day, be able to answer your one research questions. Yet, I feel like I have been carrying the “weight of the R2” for far too long before actually realizing that this metric of model performance (R2 among others) is simply not  enough to trust my results. You might think that your model is robust because among the 1000 alternative models you tested, it is the one with the “best” performance (deviance explained, AIC, you name it), but the model with the best R2 will not always be the most ecologically meaningful one, or the most practical for spatial management perspectives. Overfitting is like a sword of Damocles hanging over you every time you create a statistical model All together, I sometimes trust my supervisor’s expertise and my own judgment more than an R2.

Source: internet

A few good websites/presentations that have helped me through my SDM journey:

General website about spatial analysis (including SDM): http://rspatial.org/index.html

Cool presentation by Adam Smith about SDM:

http://www.earthskysea.org/!ecology/sdmShortCourseKState2012/sdmShortCourse_kState.pdf

Handling spatial data in R: http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/introductionTalk.html

“The magical world of mgcv”, a great presentation by Noam Ross: https://www.youtube.com/watch?v=q4_t8jXcQgc

 

Literature cited

Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., … Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 027–046. https://doi.org/10.1111/j.1600-0587.2012.07348.x

Elith, J., & Graham  H., C. (2009). Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models . Ecography, 32(Table 1), 66–77. https://doi.org/10.1111/j.1600-0587.2008.05505.x

Elith, J., & Leathwick, J. R. (2009). Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annual Review of Ecology, Evolution, and Systematics, 40(1), 677–697. https://doi.org/10.1146/annurev.ecolsys.110308.120159

Friedman, J. H. (2001). Greedy Function Approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. Retrieved from http://www.jstor.org/stable/2699986

Mannocci, L., Roberts, J. J., Miller, D. L., & Halpin, P. N. (2017). Extrapolating cetacean densities to quantitatively assess human impacts on populations in the high seas. Conservation Biology, 31(3), 601–614. https://doi.org/10.1111/cobi.12856.This

Miller, D. L., Burt, M. L., Rexstad, E. A., & Thomas, L. (2013). Spatial models for distance sampling data: Recent developments and future directions. Methods in Ecology and Evolution, 4(11), 1001–1010. https://doi.org/10.1111/2041-210X.12105

Redfern, J. V., Moore, T. J., Fiedler, P. C., de Vos, A., Brownell, R. L., Forney, K. A., … Ballance, L. T. (2017). Predicting cetacean distributions in data-poor marine ecosystems. Diversity and Distributions, 23(4), 394–408. https://doi.org/10.1111/ddi.12537

Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., … Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical or phylogenetic structure. Ecography, 0, 1–17. https://doi.org/10.1111/ecog.02881

Warren, D. L., & Seifert, S. N. (2010). Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21(2), 335–342. https://doi.org/10.1890/10-1171.1

Wood, S. N. (2017). Generalized additive models: an introduction with R (second edi). CRC press.

The Recipe for a “Perfect” Marine Mammal and Seabird Cruise

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Science—and fieldwork in particular—is known for its failures. There are websites, blogs, and Twitter pages dedicated to them. This is why, when things go according to plan, I rejoice. When they go even better than expected, I practically tear up from amazement. There is no perfect recipe for a great marine mammal and seabird research cruise, but I would suggest that one would look like this:

 A Great Marine Mammal and Seabird Research Cruise Recipe:

  • A heavy pour of fantastic weather
    • Light on the wind and seas
    • Light on the glare
  • Equal parts amazing crew and good communication
  • A splash of positivity
  • A dash of luck
  • A pinch of delicious food
  • Heaps of marine mammal and seabird sightings
  • Heat to approximately 55-80 degrees F and transit for 10 days along transects at 10-12 knots
The end of another beautiful day at sea on the R/V Shimada. Image source: Alexa K.

The Northern California Current Ecosystem (NCCE) is a highly productive area that is home to a wide variety of cetacean species. Many cetaceans are indicator species of ecosystem health as they consume large quantities of prey from different levels in trophic webs and inhabit diverse areas—from deep-diving beaked whales to gray whales traveling thousands of miles along the eastern north Pacific Ocean. Because cetacean surveys are a predominant survey method in large bodies of water, they can be extremely costly. One alternative to dedicated cetacean surveys is using other research vessels as research platforms and effort becomes transect-based and opportunistic—with less flexibility to deviate from predetermined transects. This decreases expenses, creates collaborative research opportunities, and reduces interference in animal behavior as they are never pursued. Observing animals from large, motorized, research vessels (>100ft) at a steady, significant speed (>10kts/hour), provides a baseline for future, joint research efforts. The NCCE is regularly surveyed by government agencies and institutions on transects that have been repeated nearly every season for decades. This historical data provides critical context for environmental and oceanographic dynamics that impact large ecosystems with commercial and recreational implications.

My research cruise took place aboard the 208.5-foot R/V Bell M. Shimada in the first two weeks of May. The cruise was designated for monitoring the NCCE with the additional position of a marine mammal observer. The established guidelines did not allow for deviation from the predetermined transects. Therefore, mammals were surveyed along preset transects. The ship left port in San Francisco, CA and traveled as far north as Cape Meares, OR. The transects ranged from one nautical mile from shore and two hundred miles offshore. Observations occurred during “on effort” which was defined as when the ship was in transit and moving at a speed above 8 knots per hour dependent upon sea state and visibility. All observations took place on the flybridge during conducive weather conditions and in the bridge (one deck below the flybridge) when excessive precipitation was present. The starboard forward quarter: zero to ninety degrees was surveyed—based on the ship’s direction (with the bow at zero degrees). Both naked eye and 7×50 binoculars were used with at least 30 percent of time binoculars in use. To decrease observer fatigue, which could result in fewer detected sightings, the observer (me) rotated on a 40 minutes “on effort”, 20 minutes “off effort” cycle during long transits (>90 minutes).

Alexa on-effort using binoculars to estimate the distance and bearing of a marine mammal sighted off the starboard bow. Image source: Alexa K.

Data was collected using modifications to the SEEbird Wincruz computer program on a ruggedized laptop and a GPS unit was attached. At the beginning of each day and upon changes in conditions, the ship’s heading, weather conditions, visibility, cloud cover, swell height, swell direction, and Beaufort sea state (BSS) were recorded. Once the BSS or visibility was worse than a “5” (1 is “perfect” and 5 is “very poor”) observations ceased until there was improvement in weather. When a marine mammal was sighted the latitude and longitude were recorded with the exact time stamp. Then, I noted how the animal was sighted—either with binoculars or naked eye—and what action was originally noticed—blow, splash, bird, etc. The bearing and distance were noted using binoculars. The animal was given three generalized behavior categories: traveling, feeding, or milling. A sighting was defined as any marine mammal or group of animals. Therefore, a single sighting would have the species and the best, high, and low estimates for group size.

By my definitions, I had the research cruise of my dreams. There were moments when I imagined people joining this trip as a vacation. I *almost* felt guilty. Then, I remember that after watching water for almost 14 hours (thanks to the amazing weather conditions), I worked on data and reports and class work until midnight. That’s the part that no one talks about: the data. Fieldwork is about collecting data. It’s both what I live for and what makes me nervous. The amount of time, effort, and money that is poured into fieldwork is enormous. The acquisition of the data is not as simple as it seems. When I briefly described my position on this research cruise to friends, they interpret it to be something akin to whale-watching. To some extent, this is true. But largely, it’s grueling hours that leave you fatigued. The differences between fieldwork and what I’ll refer to as “everything else” AKA data analysis, proposal writing, manuscript writing, literature reviewing, lab work, and classwork, are the unbroken smile, the vaguely tanned skin, the hours of laughter, the sea spray, and the magical moments that reassure me that I’ve chosen the correct career path.

Alexa photographing a gray whale at sunset near Newport, OR. Image source: Alexa K.

This cruise was the second leg of the Northern California Current Ecosystem (NCCE) survey, I was the sole Marine Mammal and Seabird Observer—a coveted position. Every morning, I would wake up at 0530hrs, grab some breakfast, and climb to the highest deck: the fly-bridge. Akin to being on the top of the world, the fly-bridge has the best views for the widest span. From 0600hrs to 2000hrs I sat, stood, or danced in a one-meter by one-meter corner of the fly-bridge and surveyed. This visual is why people think I’m whale watching. In reality, I am constantly busy. Nonetheless, I had weather and seas that scientists dream about—and for 10 days! To contrast my luck, you can read Florence’s blog about her cruise. On these same transects, in February, Florence experienced 20-foot seas with heavy rain with very few marine mammal sightings—and of those, the only cetaceans she observed were gray whales close to shore. That starkly contrasts my 10 cetacean species with upwards of 45 sightings and my 20-minute hammock power naps on the fly-bridge under the warm sun.

Pacific white-sided dolphins traveling nearby. Image source: Alexa K.

Marine mammal sightings from this cruise included 10 cetacean species: Pacific white-sided dolphin, Dall’s porpoise, unidentified beaked whale, Cuvier’s beaked whale, gray whale, Minke whale, fin whale, Northern right whale dolphin, blue whale, humpback whale, and transient killer whale and one pinniped species: northern fur seal. What better way to illustrate these sightings than with a map? We are a geospatial lab after all.

Cetacean Sightings on the NCCE Cruise in May 2018. Image source: Alexa K.

This map is the result of data collection. However, it does not capture everything that was observed: sea state, weather, ocean conditions, bathymetry, nutrient levels, etc. There are many variables that can be added to maps–like this one (thanks to my GIS classes I can start adding layers!)–that can provide a better understanding of the ecosystem, predator-prey dynamics, animal behavior, and population health.

The catch from a bottom trawl at a station with some fish and a lot of pyrosomes (pink tube-like creatures). Image source: Alexa K.

Being a Ph.D. student can be physically and mentally demanding. So, when I was offered the opportunity to hone my data collection skills, I leapt for it. I’m happiest in the field: the wind in my face, the sunshine on my back, surrounded by cetaceans, and filled with the knowledge that I’m following my passion—and that this data is contributing to the greater scientific community.

Humpback whale photographed traveling southbound. Image source: Alexa K.

Wildlife of the Western Antarctic Peninsula

Erin Pickett, MS Student, Fisheries and Wildlife Department, OSU

This time last week, I was on a research vessel crossing the Drake Passage. The Drake extends from the tip of the Western Antarctic Peninsula to South America’s Cape Horn, and was part of the route I was taking home from Antarctica. Over the past three months I have been working on a long-term ecological research (LTER) project based out of Palmer Station, a U.S. based research facility located on Anvers Island.

Image: http://www.tetonat.com/2009/11/06/bon-voyage-off-to-antarctica-with-iceaxe-expeditions/
Image: http://www.tetonat.com/2009/11/06/bon-voyage-off-to-antarctica-with-iceaxe-expeditions/

While in Antarctica, I was working on the cetacean component of the Palmer LTER project, which I’ve described in previous blog posts. In lieu of writing more about what it is like to work and live on the Antarctic Peninsula, I thought I’d share some photos with you. Working on the water everyday while searching for whales provided me with many opportunities to photograph the local wildlife. I hope you’ll enjoy a few of my favorite shots.

An update from the Antarctic Peninsula

By: Erin Pickett

Yesterday someone said to me, “I don’t know if it was sunrise or sunset, but it was beautiful”. So it goes on the R/V Lawrence M. Gould (LMG), the surrounding scenery is incredible but the work schedule on this research ship makes it difficult to remember what time of day it is.

Here on the Antarctic Peninsula, the sun never really sets and our daily schedules are dependent on things like the diel vertical migration of krill, the current wind speed and the amount of sea ice in between us and our study species, the humpback whale. For these reasons, we sometimes find ourselves starting our workday at odd hours, like 11:45 pm (or 4:00 am). As a reminder, I am currently working on research vessel on a project called the Palmer long term ecological research (LTER) project.  You can read my first blog post about that here. We are about one week into our journey and so far, so good!

Our journey began in Punta Arenas, Chile, where we spent two days loading our research supplies onto the LMG and getting outfitted with cold weather gear. From Punta Arenas we headed south through the straights of Magellan and then across the Drake Passage. Along the way we spotted a variety of cetaceans including minke, fin, sei and humpback whales, and Commerson’s and Peale’s dolphins. I spent as much of our time in transit as I could looking for seabirds, the most numerous being white-chinned and cape petrels, southern giant petrels, and black-browed albatrosses. Spotting either a royal or a wandering albatross was always exciting. An eleven foot wingspan allows these albatross to glide effortlessly above the water and this makes for a beautiful sight!

We have spent the last four days transiting between various sampling stations around Palmer deep, which is an underwater canyon just south of our home base at Palmer station. When conditions allowed, we loaded up our tagging and biopsy gear into a small boat and went to look for humpback whales. We’ve been incredibly successful with the limited amount of time we’ve had on the water and this morning we finished deploying our sixth tag.

We brought a few different types of satellite tags with us to deploy on humpback whales. One type is an implantable satellite tag that transmits location data over a long period of time. These data allow us to gain a better understanding of the large-scale movement and distribution patterns of these animals. The other tag we deploy is a suction cup tag, so called because four small suction cups attach the tag to the whale. These suction cup tags are multi-sensor tags that measure location as well as fine scale underwater movement (e.g. pitch, roll, and heading). They are also equipped with forward and backward facing cameras and most importantly, radio transmitters! This allows us to recover the tags once they fall off the animal and float to the surface (after about 24 hours). The data we get from these tags will allow us to quantify fine-scale foraging behavior in terms of underwater maneuverability, prey type and the frequency, depth and time of day that feeding occurs.

When we deployed each of these tags we also obtained a biopsy sample and fluke photos. Fluke photos and biopsy samples allow us to distinguish between individual animals, and the biopsy samples will also be used to study the demographics of this population through genetic analysis.

Now that we’ve deployed all of our satellite tags and have recovered the suction cup tag just in the nick of time (!), we are starting our first major transect line toward the continental shelf. We will be continuing south along these grid lines for the next week.

My lab mate Logan Pallin and I will be continuing to write about our trip over the next couple of months on another blog we created especially for this project. You can find it here: blogs.oregonstate.edu/LTERcetaceans

I’ll leave you with a few of my favorite photos of the trip so far!