# Species distribution modeling: Part statistics, part philosophy, and there is no “right answer”

By Dawn Barlow, PhD student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Just like that, I have wrapped up year 1 of my PhD in Wildlife Science. For my PhD, I am investigating the ecology and distribution of blue whales in New Zealand across multiple spatial and temporal scales. In a region where blue whales overlap with industrial activity, there is considerable interest from managers to be able to reliably forecast when and where blue whales are most likely to be in the area. In a series of five chapters and utilizing multiple different data sources (dedicated boat surveys, oceanographic data, acoustic recordings, remotely sensed environmental data, opportunistic blue whale sightings information), I will attempt to describe, quantify, and predict where blue whales are found in relation to their environment. Each chapter will evaluate the distribution of blue whales relative to the environment at different scales in space (ranging from 4 km to 25 km resolution) and time (ranging from daily to seasonal resolution). One overarching method I am using throughout my PhD is species distribution modeling. Having just completed my research review with my doctoral committee last week, I’ll share this aspect of my research proposal that I’ve particularly enjoyed reading, writing, and thinking about.

Species distribution models (SDMs), which are sometimes referred to as habitat models or ecological niche models, are mathematical algorithms that combine observations of a species with environmental conditions at their observed locations, to gain ecological insight and predict spatial distributions of the species (Elith and Leathwick, 2009; Redfern et al., 2006). Any model is just one description of what is occurring in the natural world. Just as there are many ways to describe something with words and many languages to do so, there are many options for modeling frameworks and approaches, with stark and nuanced differences. My labmate and friend Solene Derville has equated the number of choices one has for SDMs to the cracker section in an American grocery store. When navigating all of these choices and considerations, it is important to remember that no model will ever be completely correct—it is our best attempt at describing a complex natural system—and as an analyst we need to do the best that we can with the data available to address the ecological questions at hand. As it turns out, the dividing line between quantitative analysis and philosophy is thin at times. What may seem at first like a purely objective, statistical endeavor requires careful consideration and fundamental decision-making on the part of the analyst.

Ecosystems are multifaceted, complex, and hierarchical. They are comprised of multiple physical and biological components, which operate at multiple scales across space and time. As Dr. Simon Levin stated in at 1989 MacArthur Award lecture on the topic of scale in ecology:

“A good model does not attempt to reproduce every detail of the biological system; the system itself suffices for that purpose as the most detailed model of itself. Rather, the objective of a model should be to ask how much detail can be ignored without producing results that contradict specific sets of observations, on particular scales of interest” (Levin, 1992).

The question of scale is central to ecology. As many biology students learn in their first introductory classes, parsimony is “The principle that the most acceptable explanation of an occurrence, phenomenon, or event is the simplest, involving the fewest entities, assumptions, or changes” (Oxford Dictionary). In other words, the best explanation is the simplest one. One challenge in ecological modeling, including SDMs, is to select spatial and temporal scales as coarse as possible for the most parsimonious—the most straightforward—model, while still being fine enough to capture relevant patterns. Another critical consideration is the scale of the question you are interested in answering. The scale of the analysis must match the scale at which you want to make inferences about the ecology of a species.

Similarly, the issue of complexity is central to distribution modeling. Overly simple models may not be able to adequately describe the relationship between species occurrence and the environment. In contrast, highly complex models may have very high explanatory power, but risk ascribing an ecological pattern to noise in the data (Merow et al., 2014), in other words, finding patterns that aren’t real. Furthermore, highly complex models tend to have poorer predictive capacity than simpler models (Merow et al., 2014). There is a trade-off between descriptive and predictive power in SDMs (Derville et al., 2018). Therefore, a key component in the SDM process is establishing the end goal of the model with respect to the region of interest, scale, explanatory power, predictive capacity, and in many cases management need.

Finally, any model is ultimately limited by the data available and the scale at which it was collected (Elith and Leathwick, 2009; Guillera-Arroita et al., 2015; Redfern et al., 2006). Prior knowledge of what environmental features are important to the species of interest is often limited at the time of the data collection effort, and data collection is constrained by when it is logistically feasible to sample. For example, we collect detailed oceanographic data during the summer months when it is practical to get out on the water, satellite imagery of sea surface temperature might be unavailable during times of cloud cover, and people are more likely to report blue whale sightings in areas where there is more human activity. Therefore, useful SDMs that address both ecological and management needs typically balance the scale of analysis and model complexity with the limitations of the data.

Managers and politicians within the New Zealand government are interested in a tool to predict when and where blue whales are most likely to be, based on sound ecological analysis. This is one of the end-goals of my PhD, but in the meantime, I am grappling with the appropriate scales of analysis, and attempting to balance questions of model complexity, explanatory power, and predictive capacity. There is no single, correct answer, and so my process is in part quantitative analysis, part philosophy, and all with the goal of increased ecological understanding and conservation of a species.

References:

Derville, S., Torres, L. G., Iovan, C., and Garrigue, C. (2018). Finding the right fit: Comparative cetacean distribution models using multiple data sources and statistical approaches. Divers. Distrib. 24, 1657–1673. doi:10.1111/ddi.12782.

Elith, J., and Leathwick, J. R. (2009). Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annu. Rev. Ecol. Evol. Syst. 40, 677–697. doi:10.1146/annurev.ecolsys.110308.120159.

Guillera-Arroita, G., Lahoz-Monfort, J. J., Elith, J., Gordon, A., Kujala, H., Lentini, P. E., et al. (2015). Is my species distribution model fit for purpose? Matching data and models to applications. Glob. Ecol. Biogeogr. 24, 276–292. doi:10.1111/geb.12268.

Levin, S. A. (1992). The problem of pattern and scale. Ecology 73, 1943–1967.

Merow, C., Smith, M. J., Edwards, T. C., Guisan, A., Mcmahon, S. M., Normand, S., et al. (2014). What do we gain from simplicity versus complexity in species distribution models? Ecography (Cop.). 37, 1267–1281. doi:10.1111/ecog.00845.

Redfern, J. V., Ferguson, M. C., Becker, E. A., Hyrenbach, K. D., Good, C., Barlow, J., et al. (2006). Techniques for cetacean-habitat modeling. Mar. Ecol. Prog. Ser. 310, 271–295. doi:10.3354/meps310271.