GEOG 566

         Advanced spatial statistics and GIScience

June 11, 2018

Analysis of flowering phenology as a step towards understanding phenological synchrony

Filed under: 2017,Final Project @ 5:55 pm

BACKGROUND: While the cinnabar moth (Tyria jacobaeae) was intentionally introduced to North America as a biological control insect, its release had unintended consequences as it established on a non-target native perennial herb, Senecio triangularis (Diehl and McEvoy, 1988). Populations of triangularis colonized by the cinnabar moth can experience heavy foliar herbivory (up to 100%) without showing reductions in plant size or seed set (Rodman, 2017). Herbivory of flowers is less common given a mismatch between the flowering time of S. triangularis and peak feeding stages for the moth, but when it does occur, floral/seed herbivory may have direct impacts on population dynamics. Previous work has shown that that larvae experiencing phenological synchrony with S. triangularis flowers decreased seed set by 95% (Rodman, 2017); and that S. triangularis is seed-limited, so that reduction of seed set decreases seedling recruitment to the next generation (Lunde, unpublished data).

Phenological synchrony is defined in this study as the overlap of “virulent” larval stages (late stages: L4 and L5) and “vulnerable” flower stages (early stages: primordia (P), buds (B), young flowers (YF) & flowers (F)). Synchrony varies by site and year, and the environmental drivers causing this variation are unknown. Knowing which populations of S. triangularis would be most likely to experience seed herbivory by cinnabar larvae could help managers track and respond to cinnabar moth presence and the risk to S. triangularis on a site-by-site basis. Because phenological synchrony was a rare occurrence for the five sites represented in 2017 data, most of the analyses described here focus on characterizing the timing of flowering of S. triangularis, while later work will focus on the timing of the L4 and L5 stage “virulent” cinnabar larvae.


RESEARCH QUESTION: How does the timing of “vulnerable” flower stages differ by site? Can I describe and quantify between site-differences in the timing of these stages? And how well do differences in candidate environmental factors account for differences in the occurrence of vulnerable flowers?


DATASET: This project used data collected from July to September, 2017, at five sites in a region of the Willamette National Forest near Oakridge, Oregon. A sixth site, Bristow, was lost due to fire, but is occasionally present in data analyses. Phenology scores (for cinnabar larvae and Senecio triangularis) and environmental variables were taken at each site during surveys at regular 10-day intervals. Some data were taken at the individual plant level, while some variables (such as snowmelt date) could not be measured in a meaningful manner below the site level. Data details:

  1. Population locations: a polygon layer outlining the 5 meadow sites surveyed over the 2017 season.
  2. Phenology: counts of vulnerable and invulnerable flower heads (capitula), and counts of cinnabar larvae in peak feeding stages (4th & 5th instars), collected from random subsets of tagged plants during each survey. Binary response variables created from phenology scores included plant vulnerability (1=vulnerable, 0=invulnerable), larval virulence (1=virulent larvae present, 0=absent), and phenological synchrony (1=co-occurrence of virulent larvae & vulnerable plants, 0 = one or none occurred).
  3. The following environmental variables are included in the phenology dataset: ambient temperature at 36 inches from the ground; soil temperature and soil moisture at 6 inch depth. Soil moisture and temperature readings were taken at each individual plant.
  4. Growing degree days for each site and survey day were modeled using a single triangle method with developmental thresholds set as above 5˚ and below 37˚ C. Growing degree days are defined as accumulated daily heat gain above and below developmental thresholds, here set to 5˚ and below 37˚ C. Developmental thresholds were estimated from previous studies on phenology of alpine flowers (Kudo and Suzuki, 1999); whereas I recognize that 10˚ C may be more commonly used as the minimum temperature for development of insects.


  1. HYPOTHESES: Alpine and subalpine perennial plant species have been shown to vary widely in terms of which environmental factors drive flowering phenology (Dunne et al., 2003). I expect to see differences in flowering phenology between sites. I expected differences between sites to correlate with differences in the abiotic environment: growing degree days, soil temperature, and soil moisture. And in order to test this, I also needed to explore how these three environmental variables were correlated to one another, so that I could understand issues underlying the analysis of these independent explanatory variables.

Because insects are ectotherms whose development is largely constrained by their ability to gain heat externally, the phenology of insects is often predicted on growing degree days (Johnson et al., 2007); and I would also expect to see variation between sites in the timing of occurrence for virulent larvae, perhaps driven more strongly by degree days, and less strongly by soil moisture and temperature. But I was not able to address that hypothesis during the analysis presented here.



  • Lognormal Regression of Flowering Phenology Data – the first approach I used to understand the patterns of flowering phenology present in my 2017 data, I used the fitdistrplus package in r to compare and fit phenomenological curves to counts of vulnerable (F-stage) capitula on given degree days. This required combining degree days of observation date and capitula counts into a single response variable to create a frequency distribution that a density functions could be fit to. I used splitstackexchange and methods described in Emerson and Murtaugh (2012) to create an observation of time (in growing degree days) for each event (F stage capitulum). This method led to the fitting of lognormal curves with the nls() function and parameter estimates given by the fitdist() function within The “goodness of fit” for the lognormal curve fit to all observations was assessed for data from each individual site as an attempt to describe variation in the timing of F-stage flowering by site. Visual comparison of lognormal curves fit to each site gave further insight into left-right shifts of these curves.


  • Assessing correlation in soil temperature and moisture data –in order to examine patterns in variation of soil temperature and moisture, particularly with respect to time, I converted dates and times to POSIXct objects using the lubridate After examining soil temperature against time of day of survey, I used the lm() function to fit linear regressions to this data series, and used the mutate function within the dplyr package to create a new variable of “adjusted soil temperature” normalized for time-of-day (Noon). I fit a linear regression to soil temperature against growing degree days for all sites, and repeated this for each individual site to see if time-of-season had significantly different impacts on soil temperature per site. I also looked at the relationship between soil moisture and time-of-day as well as time-of-season, but found less distinct trends and did not normalize for time-of-day.


  • Logistic Regression of probability of plant “vulnerability” against growing degree day, soil temperature, and soil moistureby simplifying phenology scores per capitulum to a binary response of “vulnerable” or “invulnerable” for an entire plant, I was able to analyze how the probability p of a plant being vulnerable varied with growing degree day, soil temperature, or soil moisture using logistic regressions (Hosmer et al., 2013). I used the glm() function in r to fit logistic regressions (family=binomial) to binary data for growing degree days, adjusted soil temperature (see part 2), and soil moisture. I used the predict() function to test how these fitted models performed in terms of classifying plants as vulnerable or invulnerable with a threshold value at 0.50 (datacamp 2018). I compared predicted response curves for the five sites visually, and made individual pairwise comparisons between two sites by using the coefficient estimates for site contrast parameters (Faustini 2000).


RESULTS: This project was an exploration of the data I had gathered in 2017, and produced a series of comparative figures showing variation in flowering phenology, measured environmental factors, and the relationship between the two. Early in analysis, a frequency distribution of stages counted per site per date showed that phenological synchrony (co-occurrence of F stage or prior with L4 and L5 larvae) was quite rare in the data (Figure 1).

Flowering phenology (as proportion of capitula counted in stage F) was best characterized by lognormal curves using scaled growing degree days (in hundreds). A curve was fit to data from all sites, and then both root mean squared error (RMSE) and mean absolute error (MAE) were calculated for data from each individual site relative to this all-site curve (summarized in Figure 2).

Figure 2. Lognormal curve fit by nls shown against F stage observations by site. Inset shows root mean squared error (RMSE) and mean absolute error (MAE) calculated overall and by site.

Figure 3. Lognormal curves of proportion of capitula found in F stage against growing degree days (hundreds).

This method showed some variation in goodness of fit but didn’t lend insight into the direction or magnitude of “shift” in timing between sites. A visual comparison of individual curves fit to each site showed slight shifts in the peak of each curve by site (Figure 3). This could be partially caused by the fact that a single survey represented different degree days between sites. However, Buck and Waterdog, which were modeled as having the same degree days, had different peaks in this figure, showing some ability to distinguish timing by site using this method.

Analysis of soil temperature found that data had a lurking effect of time-of-day of survey, which was characterized with a linear regression (Figure 4).

Normalizing soil temperature by time-of-day to a common time (Noon) allowed seasonal trends in soil temperature to emerge (Figure 5), where originally the impact of time-of-day obscured these trends (Figure 6).

Figure 5. Trends of adjusted soil temperature across the season (measured in hundreds of growing degree days).

Figure 6. Trends of non-adjusted soil temperature across the season (measured in hundreds of growing degree days)

Soil moisture was not found to be impacted by time of day, but was found to have low variability between most sites except one (Figure 7).

Logistic regression of a per-plant measure of “vulnerability” (as a binary response, 1 = any vulnerable capitula present and 0 = no vulnerable capitula present) against growing degree day showed a clear effect of growing degree day on the probability of plant vulnerability, with increasing degree days leading to a decrease in vulnerability with a clear threshold-like response (Figure 8).

Figure 8. Logistic regression of plant vulnerability (vulnerable = 1) against growing degree days (hundreds) with fitted curve shows a strong switch in probability of vulnerability from high to low with increasing growing degree days.

A similar regression against soil temperature (adjusted for time-of-day) showed that increasing soil temperature also decreased the probability of plant vulnerability, but with a more gradual response rather than a threshold-type response (Figure 9).

Figure 9. Logistic regression of plant vulnerability (vulnerable = 1) against adjusted soil temperature (°F) with fitted curve shows increasing adjusted soil temperature accounts for some decrease in probability of vulnerability.

Per site logistic regression of plant vulnerability against soil temperature showed that models (shown as predicted response curves) fit better for some sites than others, and slopes appeared to differ but not inflection points (Figure 10). This difference could be further resolved by fitting logistic regressions to two sites and analyzing the coefficient of the interaction term (which represents difference in slope of the second site) (Faustini 2000). The poorness of fit for Blair data in this case could be due to one missing survey date of soil temperature measurements.

Figure 10. Logistic regression of probability of plant vulnerability against adjusted soil temperature (degrees F) for all sites (grey dashed curve) and individual sites.

Soil moisture had an opposite relationship with plant vulnerability, with increasing soil moisture associated with increasing probability of plant vulnerability, using the same methods described above (Figure 11). Comparisons of predicted response curves by site for soil moisture showed a variety of slopes depending on the site, with all curves fitting reasonably well (Figure 12).

Figure 11. Logistic regression of probability of plant vulnerability against soil moisture (percent) shows an increase in probability of vulnerability with increased soil moisture.

Figure 12. Logistic regression of probability of plant vulnerability against soil moisture (percent) for all sites (grey dashed curve) and individual sites.


SIGNIFICANCE: Previous work showed that the loss of seeds due to cinnabar larval herbivory could significantly impact population dynamics for Senecio triangularis. The analyses presented here showed the variation in plant phenology, environmental factors, and the relationship between the two for the overall data set and for each site, individually. The biggest finding of this project was that phenological synchrony was too rare in this data set to model directly, as I had originally set out to do. Focusing instead on flowering phenology allowed development of methods and a greater understanding of the data captured in 2017.

For each variable examined, differences could be seen between sites with each site showing distinct behaviors. Further analysis of a full logistic regression model including all terms of interest could help understand the relative contributions of each explanatory variable to the change in probability of plant vulnerability.

For certain variables, it seems I may not have captured a wide enough range of variation in focal explanatory variables to draw clear connections between differences in phenology and these explanatory factors. For instance, soil moisture was largely similar between four sites and then quite distinct for a fifth site, Buck Mountain. Future work will focus on capturing greater variation in the abiotic environment, and hopefully also capturing greater variation in plant phenology.


LEARNING OUTCOMES: Through this project, my understanding and flexibility in R coding has increased by leaps and bounds. This includes restructuring data frames using dplyr pipelines (wide to long data, and collapsing nested data to compare at larger spatial/temporal scales), subsetting at will, and plotting data from every angle (mostly with ggplot). I discovered many useful packages (fitdistrplus, lubridate, splitstackexchange) and delved back into some previously encountered but less-well understood analyses (logistic regressions). I used ArcMap a few times to extract GPS points and other meadow statistics, but will hope to weave together the current analyses with explanatory variables not yet fully explored after the conclusion of this course.


STATISTICAL LEARNING: In terms of statistics used in this project, I learned about fitting phenomenological curves to data and analyzing goodness of fit using RMSE and MAE. I also learned about de-trending data to remove lurking effects such as the time-of-day effect in my soil temperature data. This will be very useful when dealing with other environmental factors that vary with time-of-day, and for which I do not have continuous data. I learned about fitting and analyzing logistic regression models to binary data using visual comparisons as well as numerical comparisons and interpretations for changes in slope and intercept. I was also able to understand and use methods developed by Emerson and Murtaugh (2012) for measuring “time of stage” for phenological data.



Datacamp. (2018). Logistic Regression in R Tutorial.

Diehl, J.W., and McEvoy, P.B. (1988). Impact of the Cinnabar Moth (Tyria jacobaeae) on Senecio triangularis, a Non-target Native Plant in Oregon. In Proceeding VII International Symposium on Biological Control of Weeds, (Rome, Italy), p.

Dunne, J.A., Harte, J., and Taylor, K.J. (2003). Subalpine Meadow Flowering Phenology Responses to Climate Change: Integrating Experimental and Gradient Methods. Ecol. Monogr. 73, 69–86.

Faustini, J.M. (2000). Stream channel response to peak flows in a fifth-order mountain watershed. PhD Thesis.

Hosmer Jr, D.W., Lemeshow, S., and Sturdivant, R.X. (2013). Applied logistic regression (John Wiley & Sons).

Kudo, G., and Suzuki, S. (1999). Flowering phenology of alpine plant communities along a gradient of snowmelt timing.

Murtaugh, P.A., Emerson, S.C., Mcevoy, P.B., and Higgs, K.M. (2012). The Statistical Analysis of Insect Phenology. Environ. Entomol. 41, 355–361.

Rodman, M. (2017). Non-target Effects of Biological Control: Ecological Risk of Tyria jacobaeae to Senecio triangularis in Western Oregon. Oregon State University.

Print Friendly, PDF & Email


  1.   jonesju — June 17, 2018 @ 3:33 pm    

    Katarina, a couple of additional thoughts. The strength of the logistic model is that it tells you when the transition from 0 to 1 occurs, and how rapidly it occurs. The shapes of the logistic regressions at some of the sites are quite flat, indicating little or no response, while those at other sites are steep, indicating a rapid response. Also, some sites respond at lower (higher) values. Moreover, some sites which seem unresponsive to soil temperature are quite responsive to soil moisture, and vice versa. It would be quite valuable to interpret these differences in biological terms.

  2.   jonesju — June 15, 2018 @ 6:59 am    

    Very good work. Next steps: how are soil moisture, soil temp, and gdd correlated? How can you combine these in a full model? It would be helpful to have more reflection on the biological, ecophysiological reasons for increasing “vulnerability” with soil moisture, temperature, and/or radiation.

  3.   swanssam — June 15, 2018 @ 6:26 am    

    Hey Kat, really nice work. You’re an R data-visualization whiz. The results of your logistic regression are really interesting – do you think that the weaker relationship between adjusted soil temperature and vulnerability is because growing degree day and soil temperature are likely very correlated? I’d guess growing degree days has a more causal relationship than adjusted soil temperature – would you agree? Sorry I missed your presentation on Monday, but good luck in your future work!

  4.   leatherl — June 12, 2018 @ 11:58 am    

    Kat, this is a super interesting way to visualize the distinct relationships between phenology and local environmental variables at each site! I’m curious– what are the input data in your first logistic regressions of vulnerability vs. growing degree days, etc. (Figure 8)? Is this one site, or multiple? And the logistic regressions clearly show distinct relationships at each of the sites, but I’m curious how you use those functions to interpret what the differences mean? And the logistic regression seems to leave a lot of variation unexplained! How do you interpret the predictive ability of the logistic regression?

    •   lundea — June 12, 2018 @ 5:08 pm    

      Hey Lila, good questions — Figure 8 shows vulnerability against growing degree days for all surveyed plants and sites. This shows the expected relationship that all flowering capitula switch from vulnerable to invulnerable with advancement of the season. For the regressions that I ran on separate sites, this was fairly exploratory… A mechanistic model should include all independent variables we think are influencing the response; and my variables violate assumptions of independence. Figure 8 confirms that flowering phenology is strongly explained by time-in-season or growing degree days; and sites which have soil moisture and temperature changing more strongly throughout the season will show up in these by-site regressions as explaining more of the variation in the probability of vulnerability. So, a site with moist soils across the season will have flowers switching from vulnerable to invulnerable regardless of how moist the soils remain, because season has advanced. If a site has soils starting moist and becoming fairly dry with the season, drier soil will correlate more strongly with invulnerable plants. This partly goes along with our understanding of phenological responses as being triggered by a variety of pathways (so, can be in response to several stimuli). Meanwhile, I am hoping to account for more complexity in future models, but could find no way to visualize logistic regressions with multiple explanatory variables.

RSS feed for comments on this post. TrackBack URI

Leave a comment

You must be logged in to post a comment.

© 2018 GEOG 566   Powered by WordPress MU    Hosted by