GEOG 566

         Advanced spatial statistics and GIScience

Archive for Exercise/Tutorial 2 2018

May 30, 2018

Trends in 2017 Phenology Study Soil Temperature and Moisture Data

Filed under: 2018,Exercise/Tutorial 2 2018 @ 9:58 am


For this exercise, I looked into environmental variables that I had measured in conjunction with plant and insect phenology data during the 2017 growing season. In 2017, explanatory variables that I tracked included: soil temperature & soil moisture at a depth of 8cm, measured per individual plant during each survey; ambient temperature at 30cm and 100cm from the ground, measured continuously at each site; sun exposure per site, which was estimated using a tool called the solar pathfinder; and snow disappearance date per site, which will be modeled – data not yet acquired.

Focusing on soil temperature and moisture, an initial task was to examine how much variation was captured for each variable at each site. If the five sampled sites do not show adequate variation in terms of soil moisture and temperature, then it would not be a useful exercise to attempt to use these variables as explanatory variables in models of plant and insect phenology. The questions answered in this exercise were: what is the range of values observed for soil temperature and moisture across 2017 surveys? How do these values vary across the season, and between sites? Because my data exploration quickly uncovered a lurking effect of time of day on soil temperature, I found that I needed to characterize and adjust for this effect before I could address these questions, so an intermediate question became: can I remove or lessen the impact of time-of-day on my measurements of soil temperature?  A description of those methods follows.


An initial exploration of the soil temperature data found that soil temperature values were affected by the time of day at which the measurements were taken, with soils reading warmer the later in the day they were measured. For ease of manipulation, I converted character date & time columns to datetime or POSIXct objects using the lubridate package. I used the lm function along with ggplot2 to fit a linear regression to a plot of soil temperature as a function of time of day. I used mutate to create an “adjusted soil temperature” variable by using the linear regression to estimate what each individual soil temperature value would have been had they all been measured at noon. I plotted the adjusted soil temperature against growing degree days to look at seasonal trends both for all data combined and for each site, individually. I used RColorBrewer to directly manipulate color palettes for objects in ggplot2.


While trying to assess whether these data showed a linear trend, I found myself bumping against the limitations of treating time as a discrete factor; so I converted dates & times of surveys to datetime or POSIXct objects using the lubridate package. Using existing character columns for time and date, I created a new “datetime” variable with values in a mdy_hm format (eg. 2017-07-13 13:00:00). I also created a “time” variable which expressed time of day in the same format, fixing a dummy date (2018-05-28) so that these POSIXct objects would only vary by the time. These two steps put the data in a much more flexible form, with time expressed as a continuous and consistently spaced variable. I eventually converted time-of-day to decimal time for ease of use in linear formulas, etc.

An initial exploration of the soil temperature data found that soil temperature values were affected by the time of day at which the measurements were taken, with soils reading warmer the later in the day they were measured (Figure 1). This pattern interfered with my ability to examine and interpret patterns I was actually interested in, such as soil warming across the season, and variation between sites; so I sought to “detrend” the data from this time-of-day effect. I used ggplot to plot soil temperature versus time of day (which ranged from 9am to 7pm); and used the lm function to fit a linear model to the data (Figure 2). The linear fit appeared to be a good fit with no systematic deviances, which was confirmed by examining residuals (Figure 3).

Figure 1. A plot of soil temperature observations (across all sites and surveys) against time of day (decimal time– 720=noon) shows a positive trend.

Figure 2. A linear regression of soil temperature data against measurement time of day (blue) shows a shift of more than 5ºF in average soil temperature when comparing measurements taken early in the day to those taken late in the day.

Figure 3. Residual plots from the linear regression of soil temperature against time of day in decimal degree days shows no distinct patterns or deviations in the residuals.

Because this appeared to be a relatively good fit, I used this linear regression to create adjusted soil temperature values for all sites & survey dates. In essence, this meant manually de-trending the data. I extracted the slope estimated in the lm object, and used this slope of the overall linear trend to estimate what each soil temperature reading would have read had every point been taken at noon (value = 720 in decimal time). The new “adjusted soil temp” variable was created using the mutate function. I confirmed that I had successfully remapped soil temperature observations to noon by passing parallel lines between the observed points and the new, adjusted soil temperature measurements (Figure 4). A plot of adjusted soil temperatures against time of day (decimal time) further confirmed that I had succesfully removed the trend present in the original data (Figure 5).

Figure 4. Parallel lines confirm that adjusted soil temperature values (series: adjstemp) represent what each measured soil temperature value would be at noon, if the daily trend is adequately described by the linear regression.

Figure 5. Plot of adjusted soil temperature values against time of day (decimal time) confirms that these adjusted values no longer exhibit a positive linear trend.


Having removed the time-of-day trend from my soil temperature data, I returned my attention to answering the questions of interest: do soil temperatures exhibit trends across the season, and do these trends differ by site? Plotting adjusted soil temperature against growing degree days (in hundreds) showed a positive relationship between soil temperature and growing degree days; a trend that held within and between sites. This trend also appeared to be reasonably described by a linear regression; so I fit both an overall linear model (grey) and per-site linear models (Figure 6).

Figure 6. Adjusted soil temperature observations plotted against growing degree days (in hundreds) show a positive trend, with warmer soil temperatures measured later in the season. Linear regressions for all data (in grey), and linear regressions for each individual site (see legend), show similar behaviors across most sites.

Comparing these to linear regressions that would have resulted from non-adjusted soil temperatures shows that de-trending the data for time-of-day effects allowed seasonal trends to emerge, so I conclude that the de-trending of soil temperatures for time-of-day trend was relatively successful (Figure 7). Some limitations are discussed below in the “Critiques” section.

Figure 7. Linear regression of non-adjusted soil temperature observations show overall seasonal trends would be obscured without the removal of the time-of-day trend lurking in the original data.

Linear regressions for each site had similar slopes and soil temperature ranges, with the exception of Bristow Prairie, which had unbalanced sampling since the site was made inaccessible by a fire after just two survey dates. This may imply that sites used in this study did not represent enough variation in soil temperature to adequately test the hypothesis that soil temperature is a driver of flowering phenology for Senecio triangularis (my focal plant species). However, I measured soil temperature at each individual plant, and the data show within-site variation in soil temperature and phenology, so I will continue to explore and test whether there is any correlation between soil temperature and flowering phenology. The daily trends in soil temperature shown through this analysis lead me to believe that my measurements on a given day or at a given point were strongly impacted by sun exposure and daily weather.

A quick glance at soil moisture found that soil moisture measurements (in percent soil moisture) were not impacted by the time of day that measurements were taken, but remained relatively stable. However, sites surveyed did not capture much variation in soil moisture with the exception of Buck Mountain, which was the wettest site throughout the season (Figure 8).

Figure 8. Plotting soil temperature (percent) by unique plant ID shows similar range and variation for five sites, with distinctly wetter soils at one site (Buck Mountain).


This procedure assumed that the linear regression for data from all sites and dates was a reasonable approximation of how soil temperatures warmed within a day at each site and date. At the individual site level, comparing adjusted to non-adjusted soil temperatures (when plotted against survey days) shows that de-trending for time-of-day removed some but not all potential influence of time-of-day on soil temperature readings (Figure 9a. & b.). The impact of time-of-day on soil temperature may be directly linked to the timing and duration of sun exposure at each site. Figure 9a. and b. show that the lastest time point (17:00) had cooler soil temperatures than mid-afternoon; and this may be directly caused by the East-facing site losing sun exposure in the evening. These data also show that it is difficult to disentangle effects of time-of-day with effects of season; since the later-season measurements at this site both happened to be taken early in the day.

Figure 9a. Non-adjusted soil temperature observations at Buck Mountain (values are jittered to show overlapping observations) plotted against growing degree days (hundreds), with color denoting time of day that measurements were taken. The data show elevated soil temperatures for measurements taken in the middle of the day (blue, aqua). This is a site-level trend that is not necessarily accounted for by the overall linear trend used to adjust soil temperature measurements.

Figure 9b. Adjusted soil temperature observations at Buck Mountain plotted against growing degree days (hundreds), with color denoting time of day that measurements were taken.

In terms of r packages used, I was surprised that there was not an easier way to plot linear regressions in ggplot. I ended up using the geom_abline() argument within the ggplot function, specifying intercept and slope which I extracted from lm objects. Another option was geom_smooth() argument with (method=’lm’); but this did not allow extrapolation of the lines beyond the provided data range, and automatically added confidence interval ribbons which were too visually busy in this context.

Because it was not possible for me to measure soil temperature at the same time of day at each site, this post hoc analysis allowed me to correct for the impact of time of day. It did not, however, account for the impact of weather at the time of collection (such as cloudy skies) and other factors. A better solution would be to take continuous data for variables such as this which experience daily fluctuations; but even with that data, decisions would need to be made about how to characterize the daily fluctuations and derive a single measure to compare across sites and surveys, and relate to response variables (flowering & insect phenology) measured at discrete time-points.

Exploring the Distribution of Stream Habitat Characteristics and Observed Beaver Dam Locations in the West Fork Cow Creek Watershed

Filed under: 2018,Exercise/Tutorial 2 2018 @ 9:56 am

Question that I asked:

How do stream reaches with observed beaver dams sites in West Fork Cow Creek relate to stream gradient, active channel width and valley bottom width?

In this exercise I developed a few charts to consider the distribution of the stream habitat in terms of stream gradient, active channel width and valley bottom width in the West Fork Cow Creek relative to observed beaver dams sites. The intent of these charts is to explain how I arrived at the criteria that was ultimately used in my first exercise to generate metrics of habitat ‘patches’.

Tools used:

I used a couple of graphing packages from R to relate the distribution of stream habitat to observed dam sites. These packages include plot, ggplot, and easyGgplot2. I also used the SQL and Calculate functions in ArcMap

Steps to generate charts:

This was a simple process overall. The first step to was to generate dummy variables for my stream reaches to indicate if beaver dams had been observed or not. Because the number of dams observed was relatively low I was able to select all stream reaches by hand using the selection tool, then using the calculate function to add a 1 to all selected rows in a new field titled Dams Observed and 0 to all those where dams were not observed. Once that was complete I exported the stream network shapefile and read it into R.

Generating charts was simply an iteration code I scavenged from StackExchange and other sources online.



Stream gradients are fairly well distributed throughout the basin (figure 1), ranging from 0 to over 8 degrees, but average just over 2 degrees. Figure 2 shows this same distribution of stream reaches with observed dams site superimposed in red. Figure 3 shows a similar plot but with stream reaches with observed dams sites normalized as a density curve. Overall dams sites occurred on those reaches with the lowest gradients.

Figure 1

Figure 2

Figure 3







Active Channel Width:

Active channel widths (ACW) are heavily skewed in the basin with most occurring at 5m or less (figure 4). Figure 5 overlays reaches with dams but are difficult to see because of the scaling. These differences are much more apparent in figure 6 depicting the density curves for all stream reaches and reaches with observed dams. Overall dams occurred on streams with ACW between 3 and 6 meters.

Figure 4

Figure 5

Figure 6







Valley Bottom Width:

Valley bottom width (VBW) ranges from 10m to over 400m but most (>0.5) occur between 25m and 100m (figure7. However, dam sites were observed between 25m and just over 150m. (figures 8, 9)


Figure 7

Figure 8

Figure 9







Variable comparison via scatter plots:

Figures 10, 11 and 12 show the distribution of all streams (grey circles) and those with observed dams (red triangles) across these habitat metrics.

Figure 10

Figure 11

Figure 12







Critique of the Method:

Coding in a dummy variable through the SQL and Calculate functions of ArcMap, while simple, were very helpful and will be even more so as I expand the analysis to the rest of the Umpqua Basin.

The R-code for the overlapping histograms was helpful but because of the scaling differences between all stream reaches and stream reaches with observed dams was large it was difficult to compare. The density plots provided a reasonable improvement for comparing the distributions of the two reach types.

May 29, 2018

Exploring relationships among environmental variables and time for hikers in Grand Teton National Park

Filed under: 2018,Exercise/Tutorial 2 2018 @ 9:56 am

Question that you asked:

Brief context: Throughout this class we have been exploring how human behavior and movement is influenced by environmental factors: are there underlying features in the surrounding environment that relate to the patterns of a recreationists’ movement? For example, are there environmental conditions that tend to result in shorter step lengths and acute turning angles for the hiker (spending more time in area, stopping, sightseeing) versus longer step lengths and obtuse turning angles (in transit, hiking)?  The dataset I am using contains hundreds of high resolution GPS tracks of day hikers at a popular lake in Grand Teton National Park. Overall, I hypothesize a positive relationship between the distance the recreationist is to the water feature and the length of their step and turning angle, i.e. as the distance between the water and the recreationist increase, their step length will increase and turning angle will become more obtuse. Another way to frame it: the closer to water the recreationist is, the more likely they will stop and view the lake, the further away from water, the more likely they will speed up.

Specific Questions for this Exercise: Before testing this hypothesis, I needed to ensure other environmental variables wouldn’t confound this relationship I’m interested in exploring. Therefore, my questions for this exercise are:

1.) What is the relationship between elevation and distance to water?

2.) What is the relationship between elevation and vegetation type?

3.) What is the relationship between vegetation type and distance to water?

Then, within the actual movement path of the recreationist:

4.) What is the relationship among the recreationists’ movement path, time, vegetation, and distance to water?

The reason why I chose these variables is because a recreationist may have shorter step lengths when they are closer to water not because of the water feature (my hypothesis), but rather because of the change in elevation, or the type of vegetation (conifer woodland vs meadow etc). Thus, in an effort to prevent any spurious relationship in my results, I am using this exercise as an opportunity to examine the relationships between these independent variables.

Tool/Approach used:

I worked in ArcMap and R to address these questions. ArcMap allowed the functionality for me to delineate the study area, randomly select data points for analysis, and extract the values of my three independent variables to build an appropriate attribute table for subsequent analysis in R.

Using R I created simple scatter plots,  box plots, and spatially explicit plots of the data. These results allowed me to visually analyze the relationships between the variables while also providing me with a richer understanding of the environmental features of the study area and the recreationists’ movement path.

Description of steps used to complete the analysis:

(1) Define study area -> per Julia’s suggestion,  I selected 200 random points within the study area for analysis. Using the buffer tool I created a 100m buffer around the trail network in Grand Tetons National Park (GRTE). I created a buffer around all of the trails in GRTE. Because my  dataset contained only day users, I used the digitize tool in editor to delineate the study area specific to the recreationists in my sample (Figure 2). I then used the clip tool to clip out the buffer within the study area that I defined.

Figure 1. Applying a buffer around all of the trails in GRTE. Too much!

Figure 2. Using the clip tool to limit and define study area for analysis.

(2) Generate random points for analysis -> I used the create random points tool to create 200 random points within the defined buffer area (see figure 3). Note: I intentionally removed any data point that was created in the lake.

Figure 3. Random points created within defined study area.

(3) Build attribute table -> I used various spatial joining tools to assign each randomly selected point with the following attribute data: vegetation type, elevation, and distance (m) to the water feature. For vegetation type I did a simple spatial join with the existing vegetation layer that I had previously imported into GIS. For elevation I used the extract values to points tool. To calculate distance to water I used the join and relate tool in ArcMap which calculates the nearest distance between the random point and the polygon that I had previously creating denoting the lake boundary. I also extracted the xy values for each randomly selected point using the Features tool in ArcToolbox and extract XY values.

(4) Use R for visual analysis -> I exported the attribute table I built in GIS and imported it into R. Within R I used the ggplot package to plot the values of each variable of interest.

(5) After examining the environmental features around the study area, I was interested specifically in the environmental features experienced by the five hikers in my sub-sample. Thus, back in ArcGIS I used the same sub-sample of five tracks that I have been working with throughout the course to extract vegetation data, elevation data, and the distance of the track from the water shoreline (see Figure 4).

Figure 4. The tracks of the five recreationists.

Description of results obtained

Among the 200 randomly selected points, I learned about the environmental features defining the area.

Figure 5. Elevation and vegetation type of 200 randomly selected points in study area.



Figure 6. Distance to water and vegetation type of 200 randomly selected points in study area.


Figure 7. Elevation and distance to water of 200 randomly selected points in study area.

In general, coniferous woodland dominates the study area, particularly areas close to the shoreline. Additionally, the area does not have much variation in elevation. Of the 200 randomly selected data points, the range in elevation is 400 meters.

For the next step, I was curious to explore how the hikers’ actual movement path related to these environmental variables, particularly vegetation. I was curious to see if people spent more time close to shore because of the water feature (my hypothesis), or perhaps because of the surrounding vegetation? Or are people spending less time near the shoreline because of these environmental features?

The figures below represent the environmental features extracted from the movement paths of a sub-sample of five hikers:

Figure 8. Distance to water and vegetation type of the movement path of five recreationists.


Figure 9. Elevation and vegetation type of the movement path of five recreationists.


Figure 10. Elevation and distance to water of the movement path of five recreationists.

The next set of figures includes time as a new variable of interest. These figures provide me with an initial visual to better understand the relationship between the length of time people spend in the area and the vegetation type they are hiking in.

Figure 11. Track 1 – the relationship between time spent in area and distance from water. Colors indicate the vegetation type.


 Figure 12. Track 1 – the relationship between the latitude coordinates of the recreationist’s movement and time spent in area. Colors indicate the vegetation type.


Figure 13. Track 1 – The path of the recreationist. Colors indicate the vegetation type.

As you may notice, the predominant vegetation type is conifer woodland. I wonder if the presence of conifer influences the amount of time they spend near water. In other words, does the conifer woodland influence a person to keep hiking until they reach a more open area (meadow, shrubland etc.)?  Examining the relationship between presence of conifer and time spent near water is important because my initial hypothesis is that people will spend more time near water, and less time further away from water. Perhaps the presence of conifer is playing a role in this relationship. In exercise 3 I will examine if people are spending significantly less time in conifer despite being near water.

To see output for all tracks:







Critique of the method:

Overall, the tools available in ArcMap and R provided the functionality for me to explore and examine the relationships among the movement path of the recreationist and environmental factors. I was able to successfully extract the attributes I needed from ArcGIS and then use R to manipulate, calculate, and visualize the data. It was helpful showing Julia and Laura the initial figures which guided me into exploring the relationships among conifer presence, distance to water, and the time the recreationist spent in the area. So, if anything, my critique is the learning curve I went through to wrap my around these new concepts! Overall, this was both a fun and enlightening exercise.

May 23, 2018

Examining time-series of downscaled alpine air temperature and snowfall across different climate classes in Alaska in comparison to major climate indices.

Filed under: 2017,Exercise/Tutorial 2 2018 @ 11:52 am

Question asked

Do large scale patterns of climate, e.g. the Pacific Decadal Oscillation, explain long-term variability in alpine snowfall and air temperature in Alaska? Is their effect spatially variable across the state?


Data / Tool / Approach used

To answer this question, I used monthly and seasonally aggregated data from daily model output for 6 mountainous domains in Alaska where Dall sheep are present. The domains are situated within different climate divisions of Alaska (Bieniek et al., 2012)and are as follows;

Name Climate Division Abbreviation
Brooks Range Northeast Interior BRKS
Gates of the Arctic North Slope GAAR
Denali Southeast Interior DENA
Lake Clark Bristol Bay LACL
Wrangell St Elias Southeast Interior WRST
Yukon Charley Southeast Interior YUCH

These domains are connected to summer surveys of sheep populations that I will use to examine relationships between summer recruitment and different metrics characterizing the seasonal snowpack in Exercise 3.

Fig.2; BRKS SnowModel domain showing base DEM (black/white), vegetation layer (mixed colours), sheep survey area (pale white), MERRA2 forcing data, and snow course and SNOTEL stations for assimilation

I first ran autocorrelations on the monthly and seasonal snow data to detect whether there were any significant instances of repeating patterns that might describe a larger scale process at play. Following this I cross-correlated the monthly and seasonal snow data to indices of larger scale climate patterns that might affect Alaska. These indices were downloaded from in monthly format and include;

Name Abbreviation
Pacific Decadal Oscillation PDO
Arctic Oscillation AO
East Pacific/North Pacific Oscillation EP_NP
North Pacific Pattern NP
West Pacific Index WP
Pacific North American Index PNA
North Atlantic Oscillation NAO


Steps followed to complete analysis

SnowModel – data preparation

The first step in these analyses were to aggregate daily SnowModel data into monthly and seasonal values. Matlab was used to do this and the snow/climate metrics I used included; snow depth (snod), snow density (rosnow), snowfall (spre), percent forageable area (snow under 30 cm depth and 330 kg m-3 density, pc_area), and air temperature (tair). The mean of all metrics was taken except for snowfall where the cumulative sum was found. The seasons are as follows; autumn – September, October and November; winter – December, January and February; spring – March, April and May; summer – June, July and August.

These data were then visually inspected for trends or patterns via heatmaps and time series graphs.

Fig. 3; heat map of mean monthly snow depth for WRST 1980 – 2016

Climate Indices – data preparation

The climate indices are already in month form so additional processing took place in R to aggregate them to seasonal values by taking their mean.

Autocorrelation and Cross-correlation

The acf function is R was used to perform autocorrelations and cross-correlations on the variables after further data wrangling using the functions within the tidyverse package (

Autocorrelations were conducted for each month and season going from 1980 to 2016 with a maximum lag time of 30 years. Cross-correlations were conducted with monthly air temperatures and snowfall to each climate indice from 1980 to 2016 with a maximum lag time of 12 months.


Brief description of results obtained


Domain Monthly autocorrelation figure Seasonal autocorrelation figure Significant correlations
Brooks BRKS_monthly_figure BRKS_seasonal_figure Positive autocorrelation with winter snowfall at 12 year lag
Denali DENA_monthly_figure DENA_seasonal_figure Positive autocorrelation with summer snowdepth at 1 year lag
Gates of the Arctic GAAR_monthly_figure GAAR_seasonal_figure Negative autocorrelation with spring snowfall at 7 year lag
Lake Clark LACL_monthly_figure LACL_seasonal_figure Negative autocorrelation with autumn snowfall at 1 year lag
Wrangell St Elias WRST_monthly_figure WRST_seasonal_figure  No significant autocorrelations
Yukon Charley YUCH_monthly_figure YUCH_seasonal_figure  No significant autocorrelations



Domain Climate Division Monthly cross-correlation figure Significant correlations
Brooks Northeast Interior BRKS_ccf_fig Strong positive correlation between snowfall and AO at 0 month lag.

PDO dominant for air T in seasonal cycle with a maximum lag of 2 months

Denali Southeast Interior DENA_ccf_fig Snowfall – positive correlation with AO, negative correlation with WP and PNA at 0 month lag

Air T – PDO dominant as BRKS, PNA positive & significant at lag 0

Gates of the Arctic North Slope GAAR_ccf_fig  Snowfall – positive with AO at lag 0

Air T – PDO dominant as per BRKS. NAO negative & significant at lag 0

Lake Clark Bristol Bay LACL_ccf_fig  Snowfall – strong positive correlation between WP and NP at lag 0

Air T – PDO dominant as per BRKS. AO negative & significant at lag 0

Wrangell St Elias Southeast Interior WRST_ccf_fig  Snowfall – PNA negative & significant at lag 0

Air T – PDO as per BRKS, NAO negative and significant at lag 0

Yukon Charley Southeast Interior YUCH_ccf_fig Snowfall – PNA negative & significant at lag 0

Air T – PDO as per BRKS, NAO negative and significant at lag 0

Critique of method

The method employed above has been useful in showing that the different ranges of Dall Sheep in Alaska have climate subject to different larger scale patterns. In the case of Denali it also shows that these relationships might not be uniform within Bieniek et al’s (2012) climate divisions. An issue with the approach is not including snow metrics that relate directly to the conditions that affect sheep. Snowfall is likely important but isn’t necessarily a reliable guide to how much snow is found on the ground as in these environments wind redistribution and sublimation are key processes governing snowpack evolution. Carrying on this work it would be interesting to cross-correlate other indices such as change in snow depth, or total sublimation. An issue arises here when using cross-correlation as the snowpack evolution is strongly autocorrelated throughout the season, so alternative approaches need to be explored. Likewise identifying the importance of the cumulative effect of persistent patterns of climate indices on snowpacks at different stages of the snow season is not possible in this analysis.

May 21, 2018

Exploring landcover class and elevation as potential explanatory variables for movement

Filed under: Exercise/Tutorial 2 2018 @ 9:30 pm

Question Asked

For Exercise 2, I sought to explore the potential relationship between two independent, terrestrial variables of the Glacier Bay National Park and Preserve Wilderness landscape that may have explanatory power for the dependent variables of step length and turning angle generated through use of the moveHMM tool in Exercise 1. Of the publicly available geospatial data layers from the National Park Service (NPS) data clearinghouse (, two stood out as having the potential to influence kayaker movements: 1) a land cover polygon dataset, and 2) a digital elevation model raster dataset. These two variables were selected for the following potential impacts on kayaker behavior. The below listed items do not represent a comprehensive list, but rather examples of the types of impacts the landscape could have on movement of recreationists (in this case kayakers) through a landscape.

  • The type of land cover near the shore may cause kayakers to move more slowly (decreased step length) and engage in searching behaviors (greater turning angles). This could be caused by the desire to look for wildlife along the coastline while paddling, the desire to paddle slowly by a glacier that reaches the shoreline, and/or the need to look for a camping or stopping location on shore. These searching/stopping behaviors would result in decreased step lengths and increased turning angle movements. These behaviors are expected of the overnight kayakers in this study because wildlife and glacier viewing is a recreation activity of interest in Glacier Bay and, due to the multi-day nature of the trip, kayakers will necessarily need to come ashore at some time during their trip.
  • The elevation of the terrestrial surroundings also has the potential to cause kayakers to move faster (larger step lengths) and engage in behavior more characteristic of migration/continuous movement (smaller turning angles). This could be caused by the desire to paddle quickly past areas with higher elevation in which the kayaker cannot go ashore due to the lack of access or cannot see wildlife, glaciers, or other landscape level features due to obstructed views caused by large differences between kayaker elevation (at sea level) and terrestrial elevation.

Prior to engaging in analysis to understand the potential impact of land cover type and elevation on step length and turning movement (Exercise 3 Spoiler Alert!), I wanted to explore potential relationships between elevation and the landcover data. Understanding the pattern and extent of any relationship between elevation and landcover class will be helpful in understanding the nature of the relationship between landscape level factors (landcover type and elevation) and movement behavior (step length and turning angle) in subsequent analyses. Therefore, this exercise highlights exploratory analyses.

Additionally, I sought to resolve issues of mismatched spatial resolution between point, polygon, and raster data types to make associations between the point data and polygon and raster data types. While I did not have a formal question for this part of Exercise 2, practically speaking, I wanted to find a mechanism for assigning values of spatially co-occuring rasters and polygons to individual points.

Tool/Approach Used

I worked primarily in ArcMap 10.3 to explore and link the independent variables to point data, and subsequently used R to generate box and whisker plots to look at the relationships between landcover type and elevation. In ArcMap, I used a data-driven approach to identify the “area of landscape influence” for kayakers, clipped my input landscape datasets to the identified area of influence, generated a random sample of points across the landscape, and used spatial joining techniques for raster and vector data to link landscape level data to the generated random sample. I then generated a figure for the random sample of points in R, providing a visualization of the relationship between elevation (Y axis) and landcover class (X axis).

Description of Steps Used to Complete the Analysis

  • Data Access: Data were accessed from the publicly available National Park Service data clearinghouse available at the following link Downloaded data were added into an existing base map (created for Exercise 1) in ArcMap 10.3 that contained the analysis area (Glacier Bay National Park Wilderness) and the aggregated point shapefile of visitor tracks.

  • Defining the “Area of Landscape Influence”: To define the “area of landscape influence” on kayaker travel patterns, the point shapefile was converted to a line shapefile using the Points to Line (Data Management) tool in ArcMap. The Buffer (Analysis) tool was used to create a 10-kilometer buffer around the input line shapefile, thereby creating a polygon that represented 10 kilometers in the line of site of recreating kayakers. The selection of 10 kilometers was made after doing some initial testing to look at sensitive of the buffer size on the landscape. I wanted the buffer to be sufficiently large to capture the landscape a kayaker would be able to see while also being smaller than the overall Wilderness area extent. The generated buffer was then further reduced using the Clip (Analysis) tool to only include terrestrial land area in the Wilderness — this step removed all water within the park and land area outside of the park creating a land-only analysis area. The step of converting the raw point data to a single line shapefile was necessary to provide a continuous input into the Buffer tool so that the Buffer output would be a single polygon for analysis rather than individual analysis polygons around each point. The images below show the area of influence (and therefore analysis) relative to the overall landscape for landcover and elevation.


  • Generating Random Points for Analysis: To explore potential relationships between landcover and elevation outside of the visual exploration available through looking at the displayed layers in ArcMap, 200 points were randomly generated in the analysis area to represent random sample locations where landcover and elevation data could be extracted and explored in plots in R. The Create Random Points (Data Management) tool was used to generate a shapefile containing 200 randomly distributed points within the analysis area.

  • Spatially Join Layers to Points: The landcover vector data and the elevation raster data were spatially joined to the randomly generated points using the Joins and Relates option to join the point and polygon (landcover) data together and the Extract Values to Points (Spatial Analyst) tool to join the point and raster data together. Essentially, both tools complete the same task but each is designed for the input data type, either raster or vector data, that is being used for the join. In both instances, the land cover and elevation attribute information for the polygon or raster cell in which the point fell was joined to the point attribute table.
  • Export for use in R: The attribute table for the random sample of 200 points shapefile was exported from ArcMap as an Excel file using the Table to Excel (Conversion)

Description of Results Obtained

I produced two plots to explore the potential relationship between landcover class for a given point and the elevation of the digital elevation model raster cell at that point. The first plot treated each landcover class independently, with 29 different categories of landscape features including various types of vegetation, water features (snow, ice, ponds, etc.), and rock and bare ground. The plot blow is what resulted, which was difficult to read and interpret due to the number of categories and limited observations per category for anything outside of rock/bare ground (Category 46) and snow and ice (Category 99).

Because this initial output had too many categories with too few observations per category to really make any meaningful conclusions about the relationship between the variables, a simplified boxplot was created including five vegetation categories, one category for rock/bare ground, and one category for ice, snow, or other terrestrial water features. The categories are as follows:

  1. Sitka Spruce, Hemlock, or Mixed Sitka Spruce Forest/Woodland
  2. Black Cottonwood Forest/Woodland
  3. Tall Alder Shrub
  4. Low or Dwarf Shrub
  5. Willow or Mixed Willow Shrub
  6. Rock/Bare Ground
  7. Snow, Ice, or Other Terrestrial Water Feature

The simplified box plot shows that there does appear to be a relationship between landcover class and elevation, with landcover classes comprised of vegetation tending to be located at lower elevations and landcover classes comprised of rock/bare ground and snow, ice, or other terrestrial water features tending to be located at higher elevations. The median elevation values for Sitka Spruce, Black Cottonwood, and Willow land cover classes were comparable and tended to be right around 100 meters with little variation while the other two vegetation classes (Tall Alder Shrub and Low or Dwarf Shrub) had median elevation values near 300 and 500 meters respectively, with a more even spread of the distribution of points in the upper and lower quartiles around the median. Finally, the rock/bare ground and snow/ice categories have median values of 600 meters and about 1000 meters respectively, with the greatest amount of spread around the median values. The results demonstrate expected relationships, with vegetation land cover types being constrained to specific elevations while rock/bare ground and snow/ice occur less discriminately across the landscape, but overall at higher elevations than vegetative cover.

Critique of Method

Overall, the progression of tools in ArcMap was straightforward, and each tool performed the data formatting or joining function as anticipated. Stringing together the correct tools in ArcMap took some time, but I was impressed that the workflow that I wanted to accomplish could be performed in ArcMap using standalone tools. Specially, prior to this exercise, I had never used the Create Random Points or Extract Values to Points tools and was initially skeptical that the functionality existed in ArcMap. I was luckily surprised!

One limitation I encountered is the format of some of the other landscape-level data files that I had planned to use in this analysis. Originally, my Exercise 2 analysis was going to encompass potential independent variables from both land and sea. I had planned to perform the analysis presented for the terrestrial data on bathymetry, bathymetry slope, and bathymetry aspect data downloaded from the data clearing house. Unfortunately, the bathymetry layers are only available for download as .lpk files, which are layer packages that are not stand-alone layers, but rather rely on underlying data from ESRI or other sources. Upon download, these three raster files do not have accessible attribute tables. I think the source of this is that the raster files themselves have the pixel type in the raster file set to “floating point” rather than integer. The integer pixel type is required to manipulate the raster files using the workflow presented above. Due to time constraints, I abandoned working the bathymetry data for Exercise 2, but I am still interested in and will need to figure out a work around for this limitation.

Finally, the resolution of the landcover class data was both a blessing and a curse. The landcover class dataset is incredibly detailed with primary classes and secondary classes identified and vegetation classes broken out into separate land cover types depending on canopy structure (open or closed) and the homogeneity/heterogeneity of the dominant plant community in the landcover polygon. This level of detail was a bit over my head, and therefore, I needed to reduce the number of categories to see make sense of the relationships present. The data layer did not come with detailed metadata, and it would have been helpful to have more information on how the data were collected, how categories were defined and identified to make more informed judgements on how land cover categories could be condensed.

How does cross-correlation of GPP and LUE with soil water content differ among years and photosynthetic functional types?

Filed under: 2018,Exercise/Tutorial 2 2018 @ 11:57 am

1. Question asked:

How is soil water content related to gross primary production (GPP) and the light use efficiency of photosynthesis (LUE) of a natural grassland? How do these relationships differ between C3- and C4-dominated grasslands?

2. Tool / approach used:

To answer this question, I calculated cross-correlation of GPP and LUE with soil water content (SWC). I then compare the maximum cross-correlation, as well as comparing the timing of the lag at which the maximum cross-correlation occurs.

My data are derived from eddy covariance flux tower locations in natural grasslands in Eastern Kansas. Soil water content, like other eddy covariance flux data, is recorded at 30-minute intervals. Here, I have aggregated flux and environmental data to daily sums and averages.

3. Steps followed to complete analysis

Data prep and selection

– Because SWC data are most consistently available at both sites between 2009-2014, I limited my analysis to data collected between those years. I further limited the data to 2010-2013 due to data availability.

– I also initially examined the water use efficiency (WUE) of production, but WUE, calculated as GPP / evapotranspiration (ET), has distinct dynamics from GPP or LUE and does not appear to be strongly cross-correlated with soil water content, so I have omitted it from this analysis.

First, I visually inspected the annual timecourse and cross-correlation of GPP and LUE with soil water content (SWC). Soil water content is recorded as a decimal (0-1) representing a percent. Here, SWC is multiplied by 10 so that it can be visualized on similar scales to GPP and LUE. These plots display daily (light shading) and the monthly average of daily GPP, LUE and soil water content.

We can clearly see distinct patterns at each site, in each year, and for each index.

For GPP- the annual patterns appear relatively similar between KFS and Konza. During 2012, the drought year, Konza appears to have a more significant reduction in GPP than does KFS.

There appear to be double peaks of GPP in 2012 for KFS site, and in 2013 for both KFS and Konza.

For both GPP and LUE, we can see a peak in SWC early in the year, that is followed later by a peak in GPP or LUE. This pattern is most evident for LUE at Konza, the C4 site.


I used the ccf() function to calculate cross-correlation between GPP and SWC, and between LUE and SWC. First, I calculated the cross-correlation of GPP and LUE, with soil water content, at each site separately, for all years from 2010-2013.

However, I’m most interested in comparing patterns of cross-correlation among years, particularly with regard to the timing of peak cross-correlation in 2012, a drought year. Does the drought year have a distinct pattern of cross-correlation between GPP, LUE, and soil water content than a non-drought year?

I then used the ccf() function to calculate cross-correlation of GPP and LUE with soil water content at each site, for each year from 2010-2013 in order to assess interannual variability in cross-correlation between GPP and LUE with soil water content. In order to visualize trends both between sites and between years, I separately plotted each site, by year– as well as each year, by site.

Lastly, to better visualize the shift in timing and magnitude of cross-correlation at each site, I extracting the date and value of the maximum and minimum cross-correlation and plotted these values for each year.

4. Results Obtained

Comparing sites, for all years

The dashed lines represent confidence intervals for a significant cross-correlation at each site. The vertical line at Lag days = 100 represents a visually-identified peak in cross-correlation, and is provided for reference when comparing cross-correlation among indecies.

We see somewhat distinct patterns between sites in the cross-correlation of GPP and LUE with soil water content. Both sites have a peak correlation of GPP and LUE with soil water content at around 100 days. This indicates that GPP and LUE tend to be well-predicted by the soil water content of about 100 days earlier– a peak in GPP or LUE follows a peak in SWC by about 100 days. If we look back at our time series of these indices, we can visually detect this pattern.

The minima in cross-correlation at about ~-25 days indicates that 25 days before SWC is at a peak, GPP tends to have a lower value. This appears to reflect a seasonal minimum in GPP– the seasonality of plant production, compared to precipitation. ~25 days before soil water content peaks, GPP is at a seasonal minimum. This makes sense, because SWC tends to peak in the spring, before plant production ramps up in earnest. 25 days before this peak, it seems likely that there is little production occurring.

For GPP: both sites have similar timing, but distinct magnitude of cross-correlation of GPP with SWC. Konza, the C4 site, is less strongly cross-correlated with soil water content, suggesting less dependence on soil water content than the C3 site.

In contrast, for LUE: both sites have similar magnitude, but more distinct timing in the cross-correlation of LUE with SWC across years. We see distinct minima as well.

Assessing variation between years, for each site

At both sites, we see interannual variation in the timing and magnitude of the cross-correlation between GPP and LUE and soil water content. GPP exhibits more consistent patterns of timing and magnitude of cross-correlation between years than does GPP. Something strange is clearly happening at Konza in 2013– which may be due to bad data, or a breakdown of the relationship between GPP and SWC. This is unexpected because 2013 was not a known drought year.

At both sites, LUE follows similar patterns of interannual variation to GPP– which makes sense, because LUE is derived from GPP. But the patterns of cross-correlation of LUE are perhaps more distinct than those of GPP between C3 and C4 sites.

Assessing variation between sites, for each year

Next, I wanted to examine how patterns of cross-correlation differed between sites, in each year.
This figure shows cross-correlation of GPP with soil water content for each year, plotted separately, from 2010-2013. This demonstrates interannual variation in the timing of the cross-correlation between GPP, but shows that both sites tend to have interannual variation in the timing and magnitude of their cross-correlation of GPP with soil water content.

For GPP, when looking at these patterns at each site, between years, we see clear interannual variation in the timing and magnitude of cross-correlation between sites. However, excluding 2013, both sites exhibit relatively similar patterns in each year in the timing and magnitude of the cross-correlation of GPP and SWC. They are particularly well-aligned in 2011. This indicates that despite the distinct resource-use efficiencies at each site, they are responding to similar environmental cues in the timing of soil water content.

For LUE, the sites exhibit more distinct patterns in the timing and magnitude of cross-correlation with soild water content. there is similarly evident interannual variation at each site in the timing and magnitude of cross-correlation with soil water content.

When compared between sites in each year, it is further evident that the timing of cross-correlation of LUE is similar between sites, the magnitude of cross-correlation of LUE differs. Konza, the C4 site, has higher cross-correlation of LUE with soil water content in both 2010 and 2012. We know that C4 grasses are more strongly controlled by seasonal precipitation than C3 grasses, so this stronger cross-correlation supports this ecophysiological distinction.

Assessing changes in the timing of maxima and minima, over each year

I have cropped the view window of these plots to exclude the 2013 values for GPP. The 2013 values of maximum cross-correlation for GPP with SWC occur at -100 days, suggesting a switch in the dependence of GPP on SWC, such that GPP leads soil water content, rather than the reverse. However, the uneven time series in 2013 means that I’m not confident in these values, so I have cut off the graph where they occur.

For GPP, the timing of maximum cross-correlation changed, but followed similar patterns for KFS and Konza.

for LUE, the changes in timing of maximum cross-correlation between LUE and SWC are slightly more distinct between sites, but still follow similar patterns.

This suggests that the patterns of cross-correlation of GPP and SWC follow similar patterns, despite the distinct photosynthetic functional types. Though we think of C4 grasses as being more water-use efficient, and also more strongly controlled by seasonal precipitation than C3 grasses, in this area, the patterns of production with water appear to be similar. The same is true for LUE

In contrast, the timing of the minimum cross-correlation value exhibits more distinct patterns among sites. This represents the relative timing of the annual minimum GPP value, relative to soil water content. It’s possible that this is more closely related to production values from the previous year– which are not a part of this analysis, which groups the data by site and by year, such that previous-year values are not considered.

5. Critique of the method

Particularly when comparing patterns of cross-correlation between years, it is useful to extract the maximum value and date, to visualize the patterns across years. Reducing the dataset to one point is valuable for interpreting the metric that I care about most in cross-correlation– when the maximum occurs.

However, this last graphic in particular compromises complexity for simplicity. And particularly, does not capture whether the shape of the relationship changes– i.e., if the cross-correlation function were to peak for a longer or shorter duration. Further, the use of a “maximum” function is somewhat simplistic in terms of identifying a peak. A more complex way to extract this information would be to fit a curve to the data, and extract the peak from the curve. In this way, a curve-fitting function might better capture the dynamics of the cross-correlation, rather than a maximum peak that could be an artifact.

We also see that the efficacy of this method is highly dependent on the quality of the data. In 2013, where GPP at Konza has multiple peaks and substantial variation, the cross-correlation function returns results that are not easily interpretable. Similarly, I am unable to use this method in years that do not have data of consistent quality available.

This method could be further improved by developing a method of considering previous-year values when looking at annual patterns of cross-correlation– perhaps by appending previous year values to an annual dataset, or finding a different way to create annual groups that still contain previous-year data to assess long-term lag correlations.

May 20, 2018

Subtidal bathymetry slope characteristics at increasing spatial scales around long-term monitoring sites

Filed under: 2018,Exercise/Tutorial 2 2018 @ 11:13 am


 I derived statistics from nearshore subtidal bathymetry (my hypothesized explanatory variable), including slope angle: (1) minimum, (2) maximum, (3) mean, and (4) standard deviation. These four statistics were calculated at increasing spatial scales around the long-term subtidal monitoring sites at San Nicolas Island, using buffers around the 10x2m2 transects at: 2m, 5m, 10m, 25m, and 50m. My overarching objective was to quantify the relative levels of physical substrate heterogeneity, with an eye towards correlating these statistics with metrics from the time series for exercise #3, allowing me to relate variation in bathymetry to the variable frequencies of community shift exhibited over the decades in the subtidal at San Nicolas Is.


 I first plotted GPS coordinates of the long-term sites onto the side-scan sonar bathymetry. These GPS points mark the 0m and 50m start and end of the mainline, off of which our 10x2m2 transects run. Some slight observer error in GPS point accuracy is expected due to current, surge, kelp, etc., and if present, is likely under 5m. Given the spatial scale of this inquiry is at the site level (i.e., out to 50m), and not at fine scale (i.e., 1m), this minor deviation is not expected to significantly hamper results.



Fig. 1: Two low-relief sites (upper two), a high-relief site (bottom left), and a low/moderate-relief site with surrounding high-relief structure, i.e., high heterogeneity (bottom right).

I plotted the 50m mainline with shapefiles, and likewise plotted the 10x2m2 transects of interest (using the distance measures in ArcGIS to ensure my transects were plotted at the proper distance down the mainline). I created buffers around the transects at 2m, 5m, 10m, 25m, 50m.



Fig. 2: 2m, 5m, 10m, 25m, and 50m buffers for the four sites plotted in Fig. 1.

I then used the ‘Mask Grab’ tool to extract values of the bathymetry underlying each buffered layer for all sites, and recorded the summary statistics: slope angle minimum, maximum, mean, and standard deviation. These values were entered into Excel, and subsequently plotted in R.



 Minimum slope angle: unsurprisingly, at least one very low-relief 2m cell was present at all sites (as I’d expect from working at the sites), thus the minimum doesn’t provide all that useful information, save for that it conveyed the sensitivity of the side-scan bathymetry (i.e., values less than one were common).

Maximum slope angle: this statistic appears to have successfully captured broad differences between sites that are qualitatively low-relief (i.e., fairly flat, or low angle), and high-relief (i.e., large structure, and higher angles). In particular, the maximum at Daytona is low at the spatial scale of the actual monitoring site (2m – 10m), but then rapidly increases as the surrounding reef structure is encountered by the larger buffer layers. This captures the high level of substrate heterogeneity found at Daytona, a ‘low-relief’ site for the actual monitoring transects, but unique in the high level of surrounding high-relief structure. In contrast, the other two high-relief sites (East Dutch and West Dutch) exhibit more homogeneous high relief structure.

Mean slope angle: both the mean, max, and standard deviation captured the lack of heterogeneity at the low relief sites. At West Dutch the mean drops off, reflecting the relatively small reef, and once the buffer increase scale (e.g., 25m, 50m), a majority of the substrate is sand.

Standard deviation: these values tended to not shift much across scale, though large differences were observed among-sites.

Critique of the method

 I found this to be a straightforward method, though in hindsight, I plan to modify these methods slightly for subsequent analyses. For example, my larger spatial scales have large gaps (e.g., 10m, 25m, 50m), and to gain better insight into the actual scale at which heterogeneity shifts (or not), I need finer spatial resolution.

Secondly, instead of lumping these buffered layers together (i.e., include the previous 10m in the buffer of the 25m), I should have constructed ‘donut’ layers. This would provide a more explicit representation of how the increased spatial scale behaves in comparison to the previous scales. I will use this approach going forward, and with greater spatial resolution, as mentioned above.

PDF links for full graphs / individual sites:






May 15, 2018

Determining Fire-Climate Relationships with Superposed Epoch Analysis

Filed under: Exercise/Tutorial 2 2018 @ 12:14 pm


My overall goal is to build a model that predicts the probability of fire at a point in each year from 1700-1918 based on climate, fuel (time since fire), local environment (annual temperature and precipitation, elevation, forest type, etc), and surrounding environment.

Prior to constructing this model I need to understand how annual climate is related to fire occurrence and whether that relationship changes with fire size. This exercise should also indicate whether fuel abundance and connectivity required for fire spread is related to antecedent climate.

Questions asked?

  1. How are fire events related to climate during the fire year and antecedent years?
  2. Does this relationship vary among small, medium, and large fires?

Tool and Approach

Superposed Epoch Analysis (SEA) is a non-parametric statistical tool that uses Monte Carlo simulations to identify non-linear relationships between a time series and key events and the time period preceding and following a key event. Because statistical significance is determined by null distributions generated by a Monte Carlo randomization procedure, SEA does not require the assumptions of parametric testing. The user must determine the key, time series, and analysis window.

The key is the event of interest. It could be a rapid increase in a kelp population, volcanic eruption, increase in a pollutant, and in my case is fire years. I considered all fire years where fire was recorded at >2 contiguous points, but broke these fires into 3 groups based on fire size.

  • Small <10,000 ha (39 fire years)
  • Large 10000 – 40,000 ha (33 fire years)
  • Extensive >40,000 ha (7 fire years – 1741, 1783, 1795, 1800, 1822, 1829, 1918)

The time series may be a climate index, temperature record, growth record, etc. I used Palmer Drought Severity Index (PDSI; Palmer, 1965)) to represent drought. PDSI is a measurement of dryness based on recent precipitation and temperature, but is only available for the instrumental record ~1900 to present. Cook et al. (2004) used tree rings to provide grid network of reconstructed PDSI for North America. I downloaded reconstructed data from grid point 45 (120.0W, 42.5N), which is just to the east of my study landscape.

The window is the temporal period surrounding the key event that SEA examines. The window I specified tested whether mean annual departures for PDSI were statistically significant in the year of fire, for the five years before, and four years after the fire.

SEA can be conducted using the dplr package in R:

How SEA works

SEA creates a matrix where rows correspond to key events (fire years) and columns correspond to the time series for the specified window (Table 1). For the 7 extensive fire years the matrix would look like this. The plotted time series window is the superposed epoch (Figure 1; Prager and Hoenig 1989).

SEA calculates a mean for each collumn in the time series window called the composite. A Monte Carlo randomization procedure is then used to determine the probability of occurrence for each composite based on chance alone. This procedure randomizes the values in the matrix (usually 10,000 times but user specified) and calculates the mean for each time step in the specified window from shuffled data. The means from shuffled data are used to generate a null distribution for each column that can be used to statistically test the significance of the real data at the 95th and 99th percentile confidence levels.


Time Series (PDSI)
         Key         (Fire Year) 1741 -1.5 0.3 2.2 -3.1 1.4 -3.1 2.2 1.1 -0.9 3.1
1783 -2.2 0.3 0.7 0.5 -2.9 -5.5 2.1 2.1 -0.6 0.7
1795 2.8 1.7 2.2 -2.7 -5.2 -3.9 -0.7 0.0 -1.8 3.7
1800 -3.9 -0.7 0.0 -1.8 3.7 -5.6 3.4 2.6 3.7 -1.1
1822 -1.7 1.0 1.0 0.0 0.0 -4.3 -0.6 -2.3 4.3 2.3
1829 -2.3 4.3 2.3 -0.9 -0.6 -6.5 1.4 -1.7 4.0 -1.5
1918 2.0 1.3 -0.7 2.7 -2.1 -4.2 -0.9 -1.7 3.7 -2.3
Composite -1.0 1.2 1.1 -0.7 -0.8 -4.7 1.0 0.0 1.8 0.7

Table 1.

Figure 1. Superposed Epoch of Extensive Fire Years

What to watch out for

Outliers – SEA can be vulnerable from to leveraging from an anomalously large or small value in the time series especially when there are few key events.  Data can be relativized or transformed to minimize the influence of outliers.

Autocorrelation – If times series are autocorrelated at the same lag windows as the SEA analysis, spurious results may be produced. The ACF function in R can be used to test for autocorrelation by linear regression of each climate reconstruction year with consecutive years and with autocorrelation function plots  (Figure 2) lagged at the same window as SEA analysis (Johnston et al. 2017). PDSI exhibited no autocorrelation (P ≥ 0.12).


Figure 2. ACF plot


Results and their significance

Not every warm dry year produced a large or extensive fire, but all extensive fire years were occurred in severe drought years (mean PDSI was -4.7; p < 0.01 ), Most large fires occurred in droughts (mean PDSI = -1.29; p < 0.01). PDSI values were not lower than expected in the fire year for small fires. Antecedent years were not droughts or pluvial climate events for small, large, or extensive fires. Years with no fire were significantly wetter than years with fire of any size (PDSI = 0.5; p < 0.01)

Figure 3. SEA results for years with no fire, a small fire, a large fire, and an extensive fire

These results demonstrate that climate in antecedent years before a fire is not related to fuel abundance and connectivity.  Cool wet years that may have been necessary to produce fine fuels that carry surface fires did not occur before fire years. Only a dry hot year during the fire year was associated with large and extensive fire spread.  This suggests that fine fuel production with respect to fire spread is not related to climate. In contrast, the two years preceding extensive fire in the SW United States are significantly cool and wetter than average years.  In the SW fuel production sufficient to carry fires is related to antecedent climate.

For my investigation this result means that fuel recovery following fire is not moderated by climate after fire, and that time since fire regardless of climate in years following fire may be a strong predictor of fire occurrence.


The dplr and burnr packages both include a routine to conduct SEA. Both of these programs are designed to work with ring width chronologies and fire history chronologies (.fhx format). The original software used to conduct analysis of tree ring chronologies and fire history used file formats designed to save disk space that are not time efficient and user friendly.  Avoid using the burnr package because if works from the FHX format, which you are unlikely to have if you aren’t a dendropyrochronologist. dplR allows you to input your own data from a spreadsheet more easily, but requires a vector format (single collumn) where row names are years for climate data. This code can be used to create a vector with row names as years for a time series (PDSI in this case)

###example code

pdsi <-read.csv(“PDSI45.csv”) #read in climate data

names(pdsi) # see collumn names


#create climate file, select column of interest (RECON PDSI in this case), subset years, and delete NA rows

climate <- pdsi[,c(“YEAR”, “RECON”)]

cols <- c(“year”,”clim_index”)

colnames(climate) <- cols

climate <- climate[(climate$year >=1700 ),]

climate <- climate[!(rowSums(,]


#Make rownames the years and delete the year column leaving a dataframe with one column.

rownames(climate) <- climate$year

climate$year <- NULL




Benefits of SEA approach

  • Does not require random, sampling, normality, homogeneity of variance, independence
  • Can be used on auto correlated time series (climate data, recruitment data, growth data)



Cook, E. R., C. A. Woodhouse, C. M. Eakin, D. M. Meko, and D. W. Stahle (2004), Long‐term aridity changes in the western United States, Science, 306(5698), 1015–1018, doi:10.1126/science.1102586.

Johnston JD, Bailey JD, Dunn CJ, Lindsay A (2017) Historical fire-climate relationships in contrasting interior pacific northwest forest types. Fire Ecology Volume 13, Issue 2, 2017.

Haurwitz MW and Brier GW (1981) A critique of the superposed epoch analysis method: its application to solar-weather relations. Monthly Weather Review 109:2074-2079

McKenzie D, Hessl A, Peterson D et al. (2004) Fire and Climatic Variability in the Inland Pacific Northwest: Integrating Science and Management. Final report to the Joint Fire Science Program on Project #01-1-6-01.

Prager HM, Hoenig JM (1989) Superposed Epoch Analysis: A Randomization Test of Enivronmental Effects on Recruitment with Application to Chub Makerel. Transactions of the American Fisheries Society 118:608-618.

Michael H. Prager & John M. Hoenig (2011) Superposed Epoch Analysis: A Randomization Test of Environmental Effects on Recruitment with Application to Chub Mackerel, Transactions of the American Fisheries Society, 118:6, 608-618, DOI: 10.1577/1548-8659(1989)118<0608:SEAART>2.3.CO;2

Swetnam TW and Betancourt JL (1990) Fire-Southern Oscillation Relations in the Southwestern United States. Science Vol. 249:1017-1020.

Wayne Palmer, “Meteorological Drought”. Research paper no.45, U.S. Department of Commerce Weather Bureau, February 1965 (58 pgs)

May 10, 2018

Relationship between satellite derived and nearshore ocean temperatures

Filed under: Exercise/Tutorial 2 2018 @ 1:41 pm

My goal for this tutorial was to look at the relationship between satellite-derived sea surface temperatures (SST) and measured intertidal temperature. I want to use temperature as one of my potential explanatory variables for kelp population size, but neither of these data sets match my temporal and spatial scales. The satellite SST goes back until 1982, which covers the entirety of my kelp population series, but it is only available at a 0.5 degree latitude/longitude scale. Conversely, the intertidal measurements taken by my lab were only taken a few miles from most of the relevant kelp beds, so they match my spatial scale better. However, these nearshore measurements are only available for a fraction of the kelp population time series.


Ideally, I would be able to relate the intertidal temperatures to the larger scale satellite SST so that I could infer nearshore temperatures from satellite SST and therefore utilize the extended historical extent of these satellite SST data. To begin, I plotted the two sets of temperature data for 2007-2012 by month (see Figure 1). The satellite and intertidal temperatures match closely in the winter and spring months, but diverge in the summer. Intertidal temperatures were consistently lower in the summer months than the satellite data.

Figure 1

I then looked at the autocorrelation and cross correlation of these two data sets. Satellite temperature was highly positively autocorrelated at 12 month intervals and negatively autocorrelated at 6 months intervals, all the way out to 4-5 years. Intertidal temperatures showed a similar pattern but the autocorrelation dropped off much more steeply and dropped below significance level after about 6-12 months. Cross correlation (see Figure 2) was fairly high at a time lag of 0, around 50%. However, the cross correlation was distributed asymmetrically around time lag 0, with significant correlation at -1,-2,-3 months on the negative axis but only at 1 month on the positive axis. This suggests that the cycle for intertidal temperature is lagging that of the satellite temperature by a month or two. Referencing figure 1, this looks to be caused by the fact that satellite temperatures begin to increase in the summer a month or two before those of intertidal temperatures.


Figure 2

This summer time lag may be due to upwelling, which affects nearshore areas in Oregon more so than offshore areas. Upwelling is probably responsible for the lower intertidal summer temperatures. However, we also see distinct drops in nearshore temperature that occur around the same time as influxes of terrestrial rain and snowmelt (e.g. summer 2011). This suggests that the terrestrial water cycle could be influencing nearshore ocean temperature, and therefore, early summer lags in nearshore temperature, could also be caused by snowmelt runoff.


One of the most encouraging results for me is that the difference in satellite and intertidal temperatures follows a fairly consistent pattern (see Figure 3 and 4). On a monthly basis the difference between satellite temperatures and intertidal temperatures is fairly small in the winter months, but begins to increase in spring and hits a maximum in August. The difference between satellite and intertidal temperatures is fairly consistent for each month (although the variation increases in the winter months). This suggests to me that I could take the average difference for each month and use that to adjust the satellite SST historical temperatures to reflect roughly what kind of temperatures were happening in nearshore.

Figure 3


Figure 4

All together, these results provide a partial answer to my question as to whether I can use satellite sea surface temperatures to accurately estimate nearshore temperature on a monthly basis. I did find a relationship between satellite and intertidal values that was fairly consistent on an annual basis and monthly basis. Also, I now know I would need to account for the lag in summer temperature increases in the intertidal around April and May compared to satellite temps. One thing this has illustrated, however, it appears that the nearshore temperature data is more variable than satellite temperature potentially because of the influence of temporally variable nearshore processes, such as upwelling and terrestrial runoff. While I can use satellite SST to infer average nearshore temperatures, it will not give me good insight into the variability of nearshore temperature, which could be important to kelp population regulation. If I want to try to incorporate this variability I will probably need to look further into the effect of upwelling and terrestrial water cycles on nearshore temp

Multivariate analysis of the location of behavior changes

Filed under: Exercise/Tutorial 2 2018 @ 1:25 pm


Because the spatial distribution of behavior changes doesn’t change with boom angle, the results of Exercise 1 imply that hydraulic conditions do not drive fish behavior in this experiment (Figure 1). However, because analogous research and common sense implies that thresholds of hydraulics indeed affect fish behavior, a statistical analysis is necessary to determine if environmental and/or internal factors did affect the location of behavior changes we observed. Five hydraulic variables – water speed (m/s), turbulent kinetic energy (or TKE, m2/s2), TKE gradient (m2/s2/m), velocity gradient (m/s/m, or s-1), and acceleration (m/s2) – were drawn from the locations of every behavior change in Exercise 1. Then, the locations of behavior changes are compared with channel hydraulics, boom angle, and a fish’s visual fitness (as measured by an optomotor assay) using three methods: 1) multivariate regression analysis, 2) principal component analysis (PCA), and 3) a partial least squares regression. With this analysis, we hope to answer the question: do any of 5 hydraulic variables, the geometry of the channel, or the visual fitness of a fish correlate well with the location of its behavior change?

Figure 1. The results of Exercise 1 indicate that despite differences in hydraulics created by varying boom angle, the spatial distribution of location changes is not observed the change between boom angles. Contour lines show two-dimensional 95% confidence intervals.

Method and steps for analysis:

Three methods of regression analysis were used to answer the question above. Although previous work was conducted in Python, R is better suited for complex statistical analyses. First, a multivariate regression examined the correlation between the locations of behavior change and the independent variables. A multivariate regression enables the analysis of more than one dependent variable – in this case, the X- and Y-coordinates of a behavior change. This is not to be confused with multiple linear regression, which only analyzes the correlation of predictor variables with one dependent variable (i.e. just X or just Y). Interestingly, angle, water speed, and velocity gradient show significant, positive correlations with location (where positive locations are downstream and against the left channel wall; Figure 2). Acceleration and TKE gradient, on the other hand, show significant negative correlations with location of behavior change. TKE (red box) shows no significant correlation, nor does visual fitness (blue box). Clearly, the high correlation of hydraulics warrants further analyses to better understand which hydraulic variable, if any, truly influences behavior changes.

Figure 2. The results of a multivariate regression analysis of hydraulic, geometric, and visual variables on the locations of behavior change.

Principal component analysis is one method of identifying important variables (in the form of components) from a larger set of variables, especially when they’re highly correlated. A principal component is a linear combination of input variables that explains the variation in the original data. By quantifying the amount of variation of each principal component, an idea of influential variables can be grasped. In the case of X (the downstream position of a behavior change) the first principal and second principal components account for over 85% of the variation observed (Figure 3). Within Principal Components 1 and 2, boom angle, water speed, and velocity gradient influence the dependent variable, X, to the greatest degree, indicated by the length of arrow in Figure 4. Again, visual fitness, TKE, and now TKE gradient lack influence on X. Although promising, the results of PCA are limited by the analysis of only one dependent variable. A partial least squares regression allows a similar investigation into the principal components of behavior changes locations in the X and Y dimensions.

 Figure 3. Proportion of variation explained by principal components as a result of PCA. The first two PC’s account for over 85% of variation in the dependent variable X.









 Figure 4. The contribution (as indicated by length of arrows) of independent variables to Principal Components 1 and 2. Variables near the bottom, top, and left of the graph have more influence than those near the apex of the arrows.

Partial least squares regression, or PLS regression, combines principal components of PCA and linear regression of multivariate regression. It benefits our data because multiple dependent variables may be analyzed for principal components of many correlated predictor variables. However, the results of PLS perhaps realize our original fear – that no consistent hydraulic variable emerges as a strong predictor of the location of behavior change in our experiment (Figure 5). Instead, TKE gradient now dominates Axis 1 (analogous to Principal Component 1), while velocity gradient and angle largely influence Axis 2. A PLS regression within boom angle failed to identify consistent hydraulic or visual variables that dominate the analysis’s axes.




Figure 5. Partial least squares regression of independent hydraulic, visual, and geometric variabls on X and Y, the locations of behavior change. TKE gradient now dominates Axis 1, while velocity gradient and boom angle influence Axis 2 the greatest.


No hydraulic or visual variable measured in this experiment consistently predicted the locations of behavior changes observed in this experiment. If anything, boom angle most consistently dominates the axes, principal components, and correlations of PLS, PCA, and multivariate regressions. This implies channel geometry, independent of the hydraulics it created, had the largest influence on fish behaviors as we observed them. Taken at face value, this result seems illogical. However, it may indicate that a bias existed in the behavior changes we observed.


Critique of methods:

Multivariate regression analysis easily identifies correlations of independent variables with more than one response variable. However, highly correlated independent variables require PCA or PLS to explain variation amongst many variables. Although powerful, these analyses are more difficult to interpret (in the case of PCA and PLS) and unable to investigate more than 1 response variable (in the case of PCA).

© 2019 GEOG 566   Powered by WordPress MU    Hosted by