Behind the scenes of modeling

By Olivia Hamilton, PhD Candidate, Institute of Marine Science,

University of Auckland

I am going to take you behind the scenes of modeling. No, I do not mean the kind of modeling where six-foot tall glamazons such as Cindy Crawford get paid exorbitant amounts of money to dress up in fabulous outfits, strike a pose, and attend A-list parties. I am talking about statistical modeling. This usually involves wearing sweatpants, sitting at your computer for extended periods of time, and occasionally turning to a block of chocolate for comfort.

Species distribution models (SDM), also known as habitat models, are a powerful tool for informing conservation and management of animal populations. They essentially enable us to identify important areas of habitat by describing the relationship between the spatial distribution pattern of a species and the attributes of their physical environment. It is logistically difficult to observe top marine predators such as whales, dolphins, sharks, and seabirds. This difficulty is because a) they move, and b) we only get to observe them during the small portion of their lives that they spend near or at the surface of the water. Environmental variables such as water depth and slope do not necessarily influence the habitat use patterns of top predators directly, but we can use them in our models as proxies for more important ecological determinants of habitat use that are more difficult to collect data for, such as the distribution of their prey.

Some SDM take this a step further by enabling us to make predictions about a species’ distribution in areas or time periods that we did not survey. This predictive capacity can provide us with a more holistic understanding of their how animals use their range, and the ability to anticipate distribution patterns under variable conditions (think climate change 100 years from now).

The idea of understanding how sharks, dolphins, whales, and seabirds are using the Hauraki Gulf in New Zealand is an extremely exciting prospect for a nosy biologist like me. I have always had a fascination with mega-fauna, and more specifically with large predators. To me, uncovering the reasons that drive their habitat use patterns is the equivalent to finding a pearl in an oyster. However, that’s just me being selfish. The best thing about creating predictive habitat models for mega-fauna in the Gulf is that we will gain a better understanding of how to manage and protect them. The SDM that I am using are called Boosted Regression Trees (BRT). They are a relatively new kid on the habitat modeling block, but are recognized as a powerful tool for making habitat predictions with. Dream result.

My Master’s thesis had a focus on abundance estimates and social structure analyses; everything I have learned about habitat modelling while in the GEMM Lab at Oregon State University was from scratch. One of the largest lessons that I learned was how much behind the scenes preparation is needed before you can even get to the actual modeling point. The length of the preparation stage is proportional to the size of the dataset. Needless to say, the years’ worth of multi-species aerial survey data that I have collected has kept me quite busy.

The first step was to create pseudo-absences.

Pseudo-what you say?

When we are out on the water, or in the plane, and we see animals of interest, we record their geographic location. As a result, our presence sightings are represented as points in space. However, in order to identify areas of preferred habitat we need to also describe the range of environmental conditions that are available to the population. To do this, we also need to obtain environmental data from where animals were not seen, otherwise known as absence data. As I mentioned earlier, observing marine animals is difficult. This makes it difficult to obtain confirmed absence data. Luckily, some savvy scientists came up with the idea of creating pseudo-absences. The idea is to basically use the area in which sightings were not made to generate randomly placed absence points.

As simple as that?

Of course not.

When generating pseudo-absences, we want to make sure that they are placed in areas that reflect true absences. Poor environmental conditions affect our ability to detect animals, especially when travelling along at 160km/h at 500ft in a small plane. After making some exploratory plots of the various environmental conditions relative to sighting frequencies, we identified what conditions hindered our ability to see animals (Fig. 1 & 2). Stretches of the track that we flew in poor conditions were then removed before generating the pseudo-absences.








Fig. 1. Example of exploratory plots looking at the relationship between detection rates and the amount of glare coverage within our viewing area. Fig. 2. shows that very few detections of common dolphins were made when the glare coverage exceeded 60% and 3 shows that detection rates for gannets were acceptable up to 80% glare coverage. Any stretches of a particular survey that exceeded these values were excluded before pseudo-absences were generated.

The next step was to decide where to place the pseudo-absences along the track-lines. To do this, we used all sightings data for each species to create density plots (Fig. 2), and then distributed our pseudo-absences in an inverse proportion to their density (Fig. 3). That way, we were distributing a higher number of absences in areas of known lower density, and therefore obtaining a representative sample of environmental variables in areas that reflected true absences.olivia2Fig. 2: Density plot of all common dolphin sightings over 22 aerial surveys in the Hauraki Gulf. Red represents the highest density and blue represents the lowest density.


Fig. 3:  Aerial track-lines flown in the Hauraki Gulf, New Zealand on 19 March 2014. Triangle symbols represent pseudo-absences and black circles represent presence sightings for that day.

Next what?

Step two involved creating environmental layers that would be included as predictor variables in our models. Instead of chucking any old variable in there, we needed to decide what physical or biological features of the environment would be ecologically relevant for explaining the different species distributions. For example, one of the variables we are using is tidal height/flow. Tidal movement pushes around potential food for marine animals and therefore influences how they use their space.  Some others environmental variables included in our models were proximity to potential prey patches (zooplankton and fish), sea surface temperature, and the type of substrate (sand, mud, gravel).

Finally, we are ready for the main event. Ladies and gentlemen, I introduce to you preliminary results for one of my study species, the Nationally Endangered Bryde’s whale (Fig. 4). These plots show us the relative influence of each the environmental variables on the distribution of Bryde’s whales in the Hauraki Gulf. The percentage value associated with each of the plots tells us how much influence each variable had in the model. We can see that the time of the year (month), the distribution of food (zooplankton and fish), and the difference in water temperature over the year have the most influence on the distribution of Bryde’s whales. This makes complete ecological sense. Prey distribution is one of the main ecological drivers of the distribution of predators both in time and space. Temperature is one of the main drivers for the distribution of prey species. As the water temperature changes throughout the year within the Gulf, so does the availability of the Bryde’s whales prey items. In turn, this influences how much time they spend in the Gulf. When prey is around, the Bryde’s whales are never far away. Eating is a very important part of the day for these 90,000 lbs whales; therefore it pays to stay close to their food supply.

Olivia4Fig. 4: Relative influence of environmental predictors on the distribution of Bryde’s whales within the Hauraki Gulf, New Zealand.

The show is not over yet, folks. While the code is all running smoothly, there is still a bit of fine-tuning to do. I am currently working on this, re-running these models over and over, trying to iron out the creases. At the moment, I am creating SDMs for four of my study species: Bryde’s whales, common dolphins, bronze whaler sharks, and gannets. Once we are satisfied with how things are running, I will start stage two of the modeling process: the prediction maps.

Next year, we will conduct several more aerial surveys in the Hauraki Gulf with the aim of validating our habitat models.

How is that for a cliffhanger?

Stay tuned to gain an insight into the habitat use of mega-fauna in the Hauraki Gulf, New Zealand.

Exciting news for the GEMM Lab: SMM conference and a twitter feed!

By Amanda Holdman (M.S Student)

At the end of the week, the GEMM Lab will be pilling into our fuel efficient Subaru’s and start heading south to San Francisco! The 21st Biennial Conference on the Biology of Marine Mammals, hosted by the Society of Marine Mammalogy, kicks off this weekend and the GEMM Lab is all prepped and ready!

Workshops start on Saturday prior to the conference, and I will be attending the Harbor Porpoise Workshop, where I get to collaborate with several other researchers worldwide who study my favorite cryptic species. After morning introductions, we will have a series of talks, a lunch break, and then head to the Golden Gate Bridge to see the recently returned San Francisco harbor porpoise. Sounds fun right?!? But that’s just day one. A whole week of scientific fun is to be had! So let’s begin with Society’s mission:


‘To promote the global advancement of marine mammal science and contribute to its relevance and impact in education, conservation and management’ 

And the GEMM Lab is all set to do just that! The conference will bring together approximately 2200 top marine mammal scientists and managers to investigate the theme of Marine Mammal Conservation in a Changing World. All GEMM Lab members will be presenting at this year’s conference, accompanied by other researchers from the Marine Mammal Institute, to total 34 researchers representing Oregon State University!

Here is our Lab line-up:

Our leader, Leigh will be starting us off strong with a speed talk on Moving from documentation to protection of a blue whale foraging ground in an industrial area of New Zealand

Tuesday morning I will be presenting a poster on the Spatio-temporal patterns and ecological drivers of harbor porpoises off of the central Oregon coast

Solène follows directly after me on Tuesday to give an oral presentation on the Environmental correlates of nearshore habitat distribution by the critically endangered Maui dolphin.

Florence helps us reconvene Thursday morning with a poster presentation on her work, Assessment of vessel response to foraging gray whales along the Oregon coast to promote sustainable ecotourism. 

And finally, Courtney, the most recent Master of Science, and the first graduate of the GEMM Lab will give an oral presentation to round us out on Citizen Science: Benefits and limitations for marine mammal research and education

However, while I am full of excitement and anticipation for the conference, I do regret to report that you will not be seeing a blog post from us next week. That’s because the GEMM Lab recently created a twitter feed and we will be “live tweeting” our conference experience with all of you! You can follow along the conference by searching #Marman15 and follow our Lab at @GemmLabOSU

Twitter is a great way to communicate our research, exchange ideas and network, and can be a great resource for scientific inspiration.

If you are new to twitter, like the GEMM Lab, or are considering pursuing graduate school, take some time to explore the scientific world of tweeting and following. I did and as it turns out there are tons of resources that are aimed for grad students to help other grad students.

For example:

Tweets by the thesis wisperer team (@thesiswisperer) offer advice and useful tips on writing and other grad related stuff. If you are having problems with statistics, there are lots of specialist groups such as R-package related hashtags like #rstats, or you could follow @Rbloggers and @statsforbios to name a few.

As always, thanks for following along, make sure to find us on twitter so you can follow along with the GEMM Labs scientific endeavors.



Successfully a Master, or at Least a Bit More Enlightened

By Courtney Hann (M.S. Marine Resource Management)

A week ago, I successfully defended my Masters of Science thesis on “Citizen Science Research: A Focus on Historical Whaling Data and a Current Citizen Science Project, Whale mAPP”, which included a 60 minute presentation to my committee, colleagues, friends, and family. Although a bit nervous at the start, my two weeks of revisions and practice prepared me to enjoy the experience once it started, and be thankful for all of the guidance and knowledge I have gained while at Oregon State University and with the Geospatial Ecology of Marine Megafauna Lab.

PresentationM.S. pic

My thesis focused on the value of collaboration and creativity in developing new methods for gathering and analyzing marine mammal data; and was driven by the overall question of

How do we study marine mammals over vast spatial and temporal scales without breaking the bank, while still being scientifically rigorous?

This is important because marine mammal data collected over large spatial and temporal scales is relatively rare, and requires extensive collaboration and funding (Calambokidis et al. 2008; Dahlheim et al. 2009). A majority of marine mammal research is conducted over limited time frames (weeks to months) and on local spatial scales, requiring the data to be extrapolated out in order to understand regional patterns (Baker et al. 1985; Rosa et al. 2012). As a result, ecological modeling and other analyses are limited by geographic and temporal scale (Hamazaki 2002; Redfern et al. 2006).

I presented two potential approaches to the use of citizen science data to cost-effectively study marine mammal distributions across vast spatial and temporal scales. The first method is described below:

(1) Use the oldest form of large cetacean citizen science data, historical whaling records, to analyze species trends across extensive spatial and temporal scales. Amazingly, these 200-year-old records provide some of the most informative data for highlighting regional and global marine mammal distributions and abundance estimates (Gregr and Trites 2001; Torres et al. 2013). This information is vital for adapting management strategies as populations recover, change their distribution due to climate changes, or undergo various interactions with humans (net entanglements, ship strikes, competition for commercially important fish and invertebrate species, etc.).

Replicating such datasets today is not fiscally feasible with traditional research methods, but distribution data is still vital for understanding how populations have changed over time and how they are responding to large-scale climate and anthropogenic changes. Modern day citizen science research may be the solution to collecting such baseline data. Therefore, the following second method was evaluated:

(2) Data collected by 39 volunteers using the marine mammal citizen science app, Whale mAPP (, over the summer of 2014 was examined to interpret various spatial, users, and species biases present in the dataset. In addition, the educational benefits, user motivations, and suggestions for revisions to the citizen science project were investigated with two user surveys. Results were used to revise Whale mAPP and highlight both the potential and limitations of citizen science data collected with Whale mAPP.

While I believe in the power of citizen science research for expanding our knowledge of large-scale marine mammal distributions, it is important to continue to interpret the biases in the dataset and truly examine how we can use the results for research. For, although collecting an abundance of data may be fun and exciting, careful examination of the methods and analyses techniques are vital if we hope to one day use the data to inform management and conservation decisions. I hope that my research contributes not only to this knowledge, but also to opening our eyes to the value of embracing a new method of data collection. Such a method relies on collaboration across various disciplines including biologists, managers, educators, app developers, volunteers, and statisticians. Maybe someday a current citizen science project, such as Whale mAPP, will provide a dataset as vast, abundant, and valuable as historical whaling records. Even the possibility of accomplishing such a goal is worth fighting for.


Literature Cited

Baker, C. S., L.M. Herman, A. Perry, et al. 1985. Population characteristics and migration of summer and late-season humpback whales (Megaptera novaengliae) in Southeastern Alaska. Marine Mammal Science 1:304–323.

Calambokidis, J., E.A. Falcone, T.J. Quinn, et al. 2008. SPLASH: Structure of Populations, Levels of Abundance and Status of Hump- back Whales in the North Pacific. Final Report for Contract AB133F-03-RP- 00078 prepared by Cascadia Research for U.S. Department of Commerce.

Dahlheim, M. E., P.A. White and J.M. Waite. 2009. Cetaceans of Southeast Alaska: distribution and seasonal occurrence. J. Biogeogr 36:410–426

Gregr, E.J., A.W. Trites. 2001. Predictions of critical habitat for five whale species in the waters of coastal British Columbia. Canadian Journal of Fisheries and Aquatic Sciences 58:1265–1285

Hamazaki, T. 2002. Spatiotemporal prediction models of cetacean habitats in the mid-western North Atlantic Ocean (from Cape Hatteras, North Carolina, USA to Nova Scotia, Canada). Marine Mammal Science 18:920–939.

Redfern, J.V., M.C. Ferguson, E.A. Becker, et al. 2006. Techniques for cetacean-habitat modeling. Marine Ecology Progress Series 310: 271–295.

Rosa, L. D., J.K. Ford and A.W. Trites. 2012. Distribution and relative abundance of humpback whales in relation to environmental variables in coastal British Columbia and adjacent waters. Cont. Shelf Res. 36:89–104.

Torres, L. G., T. D. Smith, P. Sutton, A. MacDiarmid, J. Bannister, and T. Miyashita. 2013. From exploitation to conservation: habitat models using whaling data predict distribution patterns and threat exposure of an endangered whale. Diversity and Distributions 19:1138-1152.