# GEOG 566

Advanced spatial statistics and GIScience

June 11, 2018

### Assessing the importance of 3 predictors of fire occurrence and the random effect of site with a GLMM model

Filed under: Exercise/Tutorial 3 2018 @ 4:11 pm

Question

The question I asked was whether fire occurrence from 1700-1918, a binary (0 – no fire, 1 = fire) response, was related to three explanatory variables:

Climate – Annual Palmer Drought Severity Index (PDSI) a measure of moisture stress reconstructed from tree rings (range -6.466 – 7.234; severe drought to cool and wet). In exercise 2 I found using superposed epoch analysis that large and extensive fires occurred during years of extreme drought, and that previous year PDSI was not related to fire occurrence.

Interval – The amount of years since the last fire event (range 1-46). This variable is a surrogate for fuel, a primary constraint on fire spread. The longer we go without fire the more fuel accumulates potentially increasing the probability of fire spread.

Note – Calculating Time since fire for each year at each plot was quite tricky. I eventually found a solution with the dplyr package.  This code sums the amount of rows (years) between events (FIRE_PLU)) within each plot and adds the sum of years since fire to each year (row) in a new column. This almost does what I want, but starts at 0 on each year/row when a fire occurred.  The interval in the year of fire occurrence should be the years since the last fire (e.g. 20 not 0) so I shifted fire occurrence forward in time one year (FIRE_PLUS).

binary2<-binary1 %>% group_by(SP_ID, idx = cumsum(FIRE_PLUS == 1L)) %>% mutate(counter = row_number ()) %>%  ungroup %>% select(-idx)

Lodgepole – The proportion of area occupied by lodgepole pine within a 2 km buffer surrounding each sample plot.  Lodgepole pine stands have lower productivity and relatively sparse understory fuels.  Fire occurrence may be less frequent in areas where lodgepole pine forest limits fire spread.

My data looks like this:

 SP_ID YEAR FIRE FIRE_PLUS PDSI INTERVAL PICO SD200 1700 0 0 2.935 13 40 SD200 1701 0 0 2.028 14 40 SD200 1702 0 0 4.049 15 40 SD200 1703 0 0 -5.034 16 40 SD200 1704 0 0 2.306 17 40 SD200 1705 1 0 0.773 18 40 SD200 1706 0 1 -2.451 1 40

SP_ID is the sample plot identifier (n =52), YEAR is the year of observation, FIRE is the binary record of whether a fire occurred, FIRE_PLUS is fire occurrence shifted so I could calculate interval since fire for each year and each plot, INTERVAL is the year since fire occurrence, and PICO is the proportion of area around each plot occupied by lodgepole.

I’m also interested in the relative importance in the model of each explanatory variable, and whether the response varies by sample plot.

Approach

I used the LME4 package in R to construct generalized linear mixed models (GLMM; Bolker et al. 2009) to complete my analysis. The GLMM approach is useful for a binary response where data is nested within itself.  In this case my response is nested within sample plots and a mixed model allows me to pool information about fire occurrence across sites.

With a binary response sample size is limited to the amount of 1 (fire) occurrences of the response variable. In my case I have ~200 binary observations at each plot (years), but fire occurs in only 8-28 years depending on the sample plot. Therefore sample sizes would be too small for 3 predictor variables. The GLMM approach allowed me to pool information across plot, and determine whether sample plot, the random effect in the model, created a different response to the fixed effects (climate, interval, lodgepole).

I followed the following steps

1. Checked for independence of the explanatory variables
2. Specified the model with sample plot as the random effect, climate, interval, and lodgepole as the fixed effects, and fire as the response. This is a GLMM model fir by maximum likelihood with a logit link.

fire.glmm1 <- glmer(FIRE ~ PDSI+INTERVAL+PICO + (1|SP_ID), family = “binomial”, data = fire.data, control=glmerControl(optCtrl=list(maxfun=2e5)))

1. Checked the model fit and assumption of residual independence. To do this I used the DHARMa package and the following vignette that describes how residual interpretation for GLMMs is problematic and provides a solution.

https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html

If the model is specified correctly then the observed data should look like it was created from the fitted model. Hence, for a correctly specified model, all values of the cumulative distribution should appear with equal probability. That means we expect the distribution of the residuals to be flat, regardless of the model structure (Poisson, binomial, random effects and so on).

Overall residuals for the full model

simulation_residuals

Residuals vs. Variables – Residuals for each fixed effect

PDSI_RESIDUALS

Temporal autocorrelation of residuals

Temporal_Autocorrelation

ACF plots of Residuals

ACF_RESIDUALS

4. I used R2 for GLMM (Nakagawa and Schielzeth 2012) to indicate the relative importance of the fixed effects and to determine the importance of sample point (random effect). The marginal R squared values are those associated with your fixed effects, the conditional ones are those of your fixed effects plus the random effects. To determine the relative importance of the fixed effects I dropped 1 fixed effect at a time and compared changes in R2 among models.

5. To interpret the influence of each fixed effect I exponentiatied the coefficients and graphed the probability functions.

Results

All three fixed effects were significant in the GLMM model, but PICO was barely significant (p value = 0.0458). For every 1 unit increase in PDSI the probability of fire occurring decreased by ~25%, for each year without fire (Interval) the probability of fire increased by 7%, and for each percent increase in lodgepole forest around a sample point the probability of fire decreased by 1%.  The difference in marginal and conditional R2 for the full model was small. The random intercept of site explained little additional variance. This suggests there’s not a lot more to be explained at the sample plot level.

R2m       R2c 0.2072182 0.2329679

Comparing conditional R2 among models while dropping fixed effects indicated that climate a top-down driver consistent across sample points accounted for most of the R2 (0.15 for PDSI, 0.5 for Interval, and 0.1 for Lodgepole).

These results suggested that Bottom-up drivers such as local topography, composition, or microclimate seem to have little influence on fire occurrence. I was surprised to see that climate had a stronger influence on fire occurrence within a year, but this makes sense as I think more about time since fire and fuel accumulation.  If fuel recovers quickly (e.g. 2-5 years) in this landscape then it won’t be limiting for most years.  If fuel is generally not a limiting constraint on fire then fire occurrence shouldn’t change dramatically with time since fire.  Essentially 5 years without fire and 20 years without fire have the same influence on whether a fire occurs and fire occurrence is simply more likely over a longer period of years. Thus climate is a more influential predictor of fire occurrence.

Overall most of the variance was not explained.  If climate and fuels are generally not limiting then fire occurrence in a year would occur as a result of human and lightning ignitions.  Data for these predictors is not available.

Critique

Overall I found GLMM modeling to be quite challenging and based on the amount of R packages and recent publications on the topic it seems like an evolving statistical method. I had trouble understanding the implications of the Random effect sample point accounting for little of the R2.  I interpret this to mean that there’s not a lot more to be explained at the sample point level that has not been included.

Interpreting coefficients from GLMM model with a logit link is also a challenging and there are few examples available online.

References

Bolker, B.M.,Brooks,M.E.,Clark,C.J.,Geange,S.W.,Poulsen,J.R.,Stevens,M.H.H. & White, J.S.S. (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution, 24,127–135.

Florian Hartig (2018). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. R package version 0.2.0. http://florianhartig.github.io/DHARMa/

Nakagawa S. and Schielzeth H (2012) A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution Vol 4(2):133-142.