Behind the scenes of modeling

By Olivia Hamilton, PhD Candidate, Institute of Marine Science,

University of Auckland

I am going to take you behind the scenes of modeling. No, I do not mean the kind of modeling where six-foot tall glamazons such as Cindy Crawford get paid exorbitant amounts of money to dress up in fabulous outfits, strike a pose, and attend A-list parties. I am talking about statistical modeling. This usually involves wearing sweatpants, sitting at your computer for extended periods of time, and occasionally turning to a block of chocolate for comfort.

Species distribution models (SDM), also known as habitat models, are a powerful tool for informing conservation and management of animal populations. They essentially enable us to identify important areas of habitat by describing the relationship between the spatial distribution pattern of a species and the attributes of their physical environment. It is logistically difficult to observe top marine predators such as whales, dolphins, sharks, and seabirds. This difficulty is because a) they move, and b) we only get to observe them during the small portion of their lives that they spend near or at the surface of the water. Environmental variables such as water depth and slope do not necessarily influence the habitat use patterns of top predators directly, but we can use them in our models as proxies for more important ecological determinants of habitat use that are more difficult to collect data for, such as the distribution of their prey.

Some SDM take this a step further by enabling us to make predictions about a species’ distribution in areas or time periods that we did not survey. This predictive capacity can provide us with a more holistic understanding of their how animals use their range, and the ability to anticipate distribution patterns under variable conditions (think climate change 100 years from now).

The idea of understanding how sharks, dolphins, whales, and seabirds are using the Hauraki Gulf in New Zealand is an extremely exciting prospect for a nosy biologist like me. I have always had a fascination with mega-fauna, and more specifically with large predators. To me, uncovering the reasons that drive their habitat use patterns is the equivalent to finding a pearl in an oyster. However, that’s just me being selfish. The best thing about creating predictive habitat models for mega-fauna in the Gulf is that we will gain a better understanding of how to manage and protect them. The SDM that I am using are called Boosted Regression Trees (BRT). They are a relatively new kid on the habitat modeling block, but are recognized as a powerful tool for making habitat predictions with. Dream result.

My Master’s thesis had a focus on abundance estimates and social structure analyses; everything I have learned about habitat modelling while in the GEMM Lab at Oregon State University was from scratch. One of the largest lessons that I learned was how much behind the scenes preparation is needed before you can even get to the actual modeling point. The length of the preparation stage is proportional to the size of the dataset. Needless to say, the years’ worth of multi-species aerial survey data that I have collected has kept me quite busy.

The first step was to create pseudo-absences.

Pseudo-what you say?

When we are out on the water, or in the plane, and we see animals of interest, we record their geographic location. As a result, our presence sightings are represented as points in space. However, in order to identify areas of preferred habitat we need to also describe the range of environmental conditions that are available to the population. To do this, we also need to obtain environmental data from where animals were not seen, otherwise known as absence data. As I mentioned earlier, observing marine animals is difficult. This makes it difficult to obtain confirmed absence data. Luckily, some savvy scientists came up with the idea of creating pseudo-absences. The idea is to basically use the area in which sightings were not made to generate randomly placed absence points.

As simple as that?

Of course not.

When generating pseudo-absences, we want to make sure that they are placed in areas that reflect true absences. Poor environmental conditions affect our ability to detect animals, especially when travelling along at 160km/h at 500ft in a small plane. After making some exploratory plots of the various environmental conditions relative to sighting frequencies, we identified what conditions hindered our ability to see animals (Fig. 1 & 2). Stretches of the track that we flew in poor conditions were then removed before generating the pseudo-absences.

Fig. 1. Example of exploratory plots looking at the relationship between detection rates and the amount of glare coverage within our viewing area. Fig. 2. shows that very few detections of common dolphins were made when the glare coverage exceeded 60% and 3 shows that detection rates for gannets were acceptable up to 80% glare coverage. Any stretches of a particular survey that exceeded these values were excluded before pseudo-absences were generated.

The next step was to decide where to place the pseudo-absences along the track-lines. To do this, we used all sightings data for each species to create density plots (Fig. 2), and then distributed our pseudo-absences in an inverse proportion to their density (Fig. 3). That way, we were distributing a higher number of absences in areas of known lower density, and therefore obtaining a representative sample of environmental variables in areas that reflected true absences.Fig. 2: Density plot of all common dolphin sightings over 22 aerial surveys in the Hauraki Gulf. Red represents the highest density and blue represents the lowest density.

Fig. 3: Aerial track-lines flown in the Hauraki Gulf, New Zealand on 19 March 2014. Triangle symbols represent pseudo-absences and black circles represent presence sightings for that day.

Next what?

Step two involved creating environmental layers that would be included as predictor variables in our models. Instead of chucking any old variable in there, we needed to decide what physical or biological features of the environment would be ecologically relevant for explaining the different species distributions. For example, one of the variables we are using is tidal height/flow. Tidal movement pushes around potential food for marine animals and therefore influences how they use their space. Some others environmental variables included in our models were proximity to potential prey patches (zooplankton and fish), sea surface temperature, and the type of substrate (sand, mud, gravel).

Finally, we are ready for the main event. Ladies and gentlemen, I introduce to you preliminary results for one of my study species, the Nationally Endangered Bryde’s whale (Fig. 4). These plots show us the relative influence of each the environmental variables on the distribution of Bryde’s whales in the Hauraki Gulf. The percentage value associated with each of the plots tells us how much influence each variable had in the model. We can see that the time of the year (month), the distribution of food (zooplankton and fish), and the difference in water temperature over the year have the most influence on the distribution of Bryde’s whales. This makes complete ecological sense. Prey distribution is one of the main ecological drivers of the distribution of predators both in time and space. Temperature is one of the main drivers for the distribution of prey species. As the water temperature changes throughout the year within the Gulf, so does the availability of the Bryde’s whales prey items. In turn, this influences how much time they spend in the Gulf. When prey is around, the Bryde’s whales are never far away. Eating is a very important part of the day for these 90,000 lbs whales; therefore it pays to stay close to their food supply.

Fig. 4: Relative influence of environmental predictors on the distribution of Bryde’s whales within the Hauraki Gulf, New Zealand.

The show is not over yet, folks. While the code is all running smoothly, there is still a bit of fine-tuning to do. I am currently working on this, re-running these models over and over, trying to iron out the creases. At the moment, I am creating SDMs for four of my study species: Bryde’s whales, common dolphins, bronze whaler sharks, and gannets. Once we are satisfied with how things are running, I will start stage two of the modeling process: the prediction maps.

Next year, we will conduct several more aerial surveys in the Hauraki Gulf with the aim of validating our habitat models.

How is that for a cliffhanger?

Stay tuned to gain an insight into the habitat use of mega-fauna in the Hauraki Gulf, New Zealand.

2 thoughts on “Behind the scenes of modeling”

Amao says:

January 12, 2016 at 5:01 pm

Hi,

i will like to know what software or r package you used for Density plot in Fig. 2 above.

1. Florence says:
  
  January 14, 2016 at 12:12 am
  
  The density plots were created in ArcGIS using the kernel density tool. You must have a Spatial Analyst license in order to use it.

Share this:

2 thoughts on “Behind the scenes of modeling”

Leave a Reply Cancel reply