GEOG 566

         Advanced spatial statistics and GIScience

Archive for Tutorial 2

May 26, 2017

A space for time substitution on an impacted landscape

Filed under: Tutorial 2 @ 1:49 pm


Has the total daily energy flow through a functional group at a given elevation changed over the eighty-year interval between 1931 and 2011?


I am hoping to better understand the forces structuring small mammal communities, specifically rodents, across space and through time. In this regard, I am using trap effort-standardized abundance data to evaluate the response of species and functional groups to changes anthropogenic land use practices and climate warming. These data are spatially explicit, and although their spatial coordinates reflect the patterns of trapping, their elevational distribution captures patterns that are reflective of biological responses to a changing environment. Increasing elevation is inversely correlated with summer high temperatures and collinear with winter low temperatures, and positively correlated with increasing precipitation. There is also a trend of decreasing human impact due to land use with increasing elevation, and in the Toiyabe mountain range, human land use for cattle grazing and agriculture has also decreased over time. However, climate warming in the Great Basin has continued over the same time span and has shifted climatic bands upslope.

Species and functional groups respond to these change overtime in dynamic ways which may be recapitulated in their distribution along elevation/ climate gradients. This is referred to as the space for time hypothesis, and generally states that space, which captures the temporal gradient of interest, may be used to stand in for time; with the expectation that species distributions across space reflect their response to the variable of interest through time.

I am interested in the functional groups captured by diet preference, habitat affinity, and body size. Diet preference classes include herbivores, granivores, and omnivores, habitat affinity includes xeric adapted, mesic adapted, and generalist, and body size includes four size bins, 0-30g (size class 1), 40-80g (size class 2), 100-200g (size class 3), and 250-600g (size class 4). Any given species may be any combination of these three functional classes, although not all combinations are actually represented. Thus, patterns in the distribution of these functional groups also reflect tradeoffs that species face, and help to highlight species that may be at risk when functional categorizations become decoupled from environmental conditions. Thus, changes in the elevational distribution of functional groups over time provides key insights to species responses. However, total daily energy flow can also be estimated for these groups, because energy flow through a group is a function of both body size and abundance. This helps us to understand the interactions between abundance, body size, diet, and habitat affinity.

Thus, the question becomes, has the total daily energy flow through a functional group at a given elevation changed over the eighty years between the modern and historic; and based on the cross correlations between body size, diet, and habitat affinity, are their species or groups that appear especially at risk of local extirpation?

Analytical approach and brief description of steps followed to complete the analysis:

To perform this analysis, I used Cran-R to trap effort standardize abundance data for rodent species in the Toiyabe mountain range for both historical and modern data sets. The function in R is an iterative process which resamples all trap sites within the historic and modern data sets that have a trap effort greater than the site with the lowest number of trap nights. This process is automated using R, and generates an average species occurrence from 1,000 iterations. I then used excel to calculate the daily energy flow through each population based on body mass estimates and an equation which employs the principals of allometric scaling, Ei = aNiMib. Where Ei is energy flow (kJ/day), a and b are allometric parameters, specifically, b is the scaling parameter and equal to 0.75. N is the abundance estimate, in this case the trap effort standardized abundance, and M is the estimated body mass (Terry and Rowe 2015).

All trap localities were paired for the historic and modern time-periods, and traps site elevations were binned into three elevation ranges, bellow 5500ft, 5500-6500ft, and above 6500ft. Energy flow for each functional group was calculated and plotted based on the above stated elevation bins.

Brief description of results you obtained:

Total energy flow per day though the modern rodent community in the Toiyabe mountain range is lower than in the historic time period (Figure 1). This finding is consistent with findings by Terry and Rowe, 2015, that total energy through the rodent community in the Great Basin decreases sharply at the Holocene-Anthropocene boundary.

Using climate-elevation correlations we can substitute elevation for time in the Holocene to model the warming climate from the early to late Holocene in the Great Basin. Temperatures increase with decreasing elevation, and precipitation decreases with decreasing elevation. Historically, energy flow through xeric adapted species decreases with increasing elevation, while the opposite is true for mesic adapted species. Notably, energy flow through the historic mesic adapted species increases in a linear fashion with elevation, while in the modern energy flow through this group roughly plateaus at the middle elevation (Figure 2). While both modern and historical patterns are generally consistent with Holocene trends for these functional, neither captures the distinctive decline in energy flow through the xeric adapted group during the modern. This suggests that space for time comparisons may be valid for pre-human impact studies, but that they may not capture abrupt shifts in species responses during the Anthropocene.

Small bodied, omnivorous, habitat generalists demonstrate greatest energy flow at middle elevation in both the historical and modern time-periods, however, total energy flow through these groups are markedly decreased in the modern compared to the historic time periods at these elevations. While generalist demonstrate lower energy flow at the middle and high elevations for the modern compared to the historic, their low elevation values remain unchanged (Figure 2). This is consistent with omnivorous, habitat generalist through the Holocene and into the Anthropocene (Terry and Rowe 2015).

            Finally, herbivores demonstrate a nearly four-fold decrease in energy flow at the lowest elevations over the eighty-year period between the historic and modern time-periods (Figure 2). This is the only result that seems to strongly reflect the abrupt shift from the Holocene to the Anthropocene. This is a particularly surprising result as human land use was greater historically for this mountain range than it is today, and ground cover by herbaceous plants and grasses has increased over this time-period. This may suggest that increased temperatures at lower elevations have pushed herbivorous species upslope, indicating that these species may be more sensitive to climate. Conversely, energy flow thought granivores at low elevations increased from historic to the modern, also consistent with Terry and Rowe, 2015. This however, may be expected, as granivorous rodents in this region also tend to be xeric adapted, while herbivorous rodents tend to be mesic adapted.

            While inference based on this study is limited in scope due to the sample size of one (one mountain range), it does begin to make the case that space for time comparisons should not be assumed to hold as we move into a future where landscapes and climate are increasingly affected by human activity. By doing so, we may fail to capture the subtle ways in which certain ecological patterns are becoming decoupled, and thus, must be wary of drawing spurious conclusions based on old assumptions.

Figure 1. Total energy flow (kJ/day) for each elevational bin in the historic and modern time periods.

Figure 2. From left to right; Energy flow (kJ/day) through functional groups in the historic and modern time periods along the elevation gradient. From top to bottom, functional classifications: size class, diet preference, and habitat affinity.


Critique of the method – what was useful, what was not?

Given the nature of the data, this method of analysis was useful for me. I learned something about the data that was not obvious before my analysis, or comparison with the research at the greater temporal scale captured in the study by Terry and Rowe 2015.  While my inferences are limited, increasing the number of mountain ranges on which I perform this analysis will enable me to perform a number of more informative statistical analyses, and test the spatial gradient at both the elevational scale, and the latitudinal scale, as the three mountain ranges, for which similar data exists occur in a north to south array.


Terry, R.C. and R.J. Rowe. 2015. Energy flow and functional compensation in Great Basin small mammals under natural and anthropogenic environmental change. PNAS, 112(31):9656-9661.

May 24, 2017

Tutorial 2: Identifying clustering with a geographically weighted regression

Question You Asked

How much spatial clustering is present in the regression model of vegetation response to canopy cover? I am interested in determining if a single equation can predict the way that the two variables interact within east-side ponderosa pine forests, or if multiple equations are necessary.

Name of Tool or Approach You Used

To answer this question, I used the Geographically Weighted Regression tool in ArcMap.

Brief description of steps you followed to complete the analysis

In the GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat clustered across the landscape, I used an adaptive kernel with an AICc bandwidth method.

Brief description of results you obtained

The map of coefficients did reveal certain clusters of high coefficients and low coefficients across the landscape. However, judging by the GWR table below, this clustering may not be statistically significant. One anomaly of this assessment was the negative adjusted R2 value. A negative R2 means that the equation did not include a constant term.

Map of clustered plots

Neighbors 46
ResidualSquares 14750.95231
EffectiveNumber 3.80525057
Sigma 18.69738296
AICc 405.3014401
R2 0.048524493
R2Adjusted -0.01473284
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov

Table of original GWR results

To remedy the negative adjusted R2, I tried adding in another explanatory variable (elevation). This appeared to help the model, reducing the residual squares and bringing the adjusted R2 value back above 0.

Neighbors 46
ResidualSquares 13904.07263
EffectiveNumber 5.22764912
Sigma 18.46665082
AICc 405.8795404
R2 0.103150476
R2Adjusted 0.01015694
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov
Explanatory Field 2 Elevation

Table of remedied GWR

Critique of the method – what was useful, what was not?

This method was useful in that I could process the data in ArcMap, which is where I was already analyzing my points. It was also very helpful to visualize the coefficients in the map below. However, I am still a little unsure why the coefficient map shows such strong clustering while the output table does not show any significance.

Automating Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods

Filed under: Tutorial 2 @ 12:04 pm
  1. Question asked

As reported, I am handling more than 100 variables. In addition, my data is constantly being updated. So, I needed to be able to have a tool in order to create independent features for analyzing each of my variables. The question is: How can I automate Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods?

  1. Name of the tool or approach that was used.

I used ArcPy with Python for the automation and, ArcMap for output visualization. ArcPy is a site package that builds on (and is a successor to) the successful arcgisscripting module.

  1. A brief description of steps you followed to complete the analysis.

First, we have our input feature as shown in Figure 1. Notice the attribute table containing all our variables.

Next, we take advantage of the Model Builder in ArcMap in order to visualize our model using the tool “Feature Class to Feature Class”, as shown in Figure 2.


In addition, we export the resulting python code. The exported code is presented in Figure 3.


This python code gives us an idea of how the “Feature Class to Feature Class” tool is working. So we proceed to create our own function file as depicted in Figure 4.

Within our function file and, based on the previously exported function, we build our own function. Our function is presented in Figure   5.

Once we have our function within our function file, we can now execute our function creating a script file. The content of our created script file is shown in Figure 6.

  1. A brief description of results you obtained.

 The result of running our script file is shown in Figure 7. Here, we first read the variables of our input feature and, at the las row, the output field of the new feature is listed.

Finally, the newly created feature is seen in Figure 8. Here we can also see it corresponding attribute table. It is noticed that the created feature contains only the selected field.

  1. Critique of the method – what was useful, what was not?

  • Model Builder:

This method is useful for building and visualizing your model. And also has the capability to export a python script of your model.

  • ArcPy:

ArcPy is a very useful package that builds on the successful ArcGIS scripting module. It allows creating the cornerstone for a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. So, many variables can be analyzed without the need to access to ArcMap, saving an enormous amount of time.


ArcGIS (2017). What is ArcPy.

Tutorial 2:

Filed under: 2017,Tutorial 2 @ 11:59 am

Research Question

Building upon my question from exercise 2, my research question for exercise 3 seeks to answer how the covariance between SCF and EVI within a wildfire perimeter change over time following the wildfire.

Approach/tools used

For this exercise, I utilized a variety of software programs. While the overall goal of variogram creation required using R, ArcMap and Excel were used significantly in order to process the data into a usable format. As a result, much hopping between programs was required, which was undoubtedly quite tedious/frustrating at times but ultimately proved to be a useful learning experience.

Analysis procedure

To create a cross variogram in R using the gstat package, the process is much smoother using data in table format. As a result, one of the main challenges of creating a cross-variogram was converting my raster data into usable tabular data. To accomplish this, I first converted the SCF and EVI rasters into point shapefiles, then used the “Add XY Coordinates” tool in ArcMap to add two fields to the point shapefiles containing the X- and Y-coordinate information. Next, the attribute table from each point, shapefile was converted to a CSV table. The final step before stepping back into R was to combine all EVI and SCF data for each year into a single CSV table to ease the script writing in R.

After creating the combined EVI-SCF table, I noticed that a few of the columns had different numbers of records. It turns out that I used a slightly different extraction method for a few SCF layers – a product of conducting this analysis across multiple days and not taking diligent notes on the layers used. After correcting this issue, I read the cleaned table into R and created cross-variograms using the code snippet shown below.


My hypothesis for the years following the wildfire was to see the strongest covariance in the first year, then decreasing overall covariance in each year afterwards. However, the plots below tell a different story. There are a lot of things going on in the time-series of cross-variograms, but a couple of evident trends stick out to me. First, although each year shows two clear bumps in the cross-variogram, the lag distance of those bumps shift significantly from year to year. My guess would be that the initial shift from pre-fire to the first year post-fire was primarily due to the variation in burn severity across the landscape. The shifts in ensuing years may be due to varying rates of vegetation recovery or perhaps due to changes in snowpack distribution caused by the wildfire.

Another fascinating pattern is the change in sign of the covariance. Before the wildfire, the covariance is mostly negative except for a few thousand meters of lag distances. This suggests that between most pixels within the burn area, SCF and EVI vary negatively – if snowpack increases between two points, vegetation generally decreases between those two points. Following the wildfire, however, the covariance becomes more and more positive overall. By year 3 following the wildfire, at all distances the covariances are positive, but still showing a similar bumpiness as in the pre-burn cross-variogram. This outcome was a bit baffling, as I would expect the relationship to be strongest (most positive) in the first year post-fire, with a gradual retreat to the pre-fire state.

Another pattern of significance is the consistent cross-variogram valley that occurs at about 5000 meters in each year, even though the peaks on either side shift from year to year. This can be interpreted as the between-patch distance, and is perhaps the more important result of this method – that from year to year, even before the wildfire, the landscape shows similar distances between patches.

Figure 1: Pre-fire (2005) cross variogram between SCF and EVI

                     Figure 2: First year post-fire (2007) cross variogram between SCF and EVI

Figure 3: Second year post-fire (2008) cross variogram between SCF and EVI

Figure 4: Third year post-fire (2009) cross variogram between SCF and EVI

Method critique

By creating cross-variograms for each year following a wildfire and the pre-fire year, I was able to discern the patchiness of the EVI and SCF data as well as the shifts in the snowpack-vegetation relationship. The persistence of the bumpiness across years strengthens the evidence of two patch sizes, but the shift in bump locations results in a cloudier picture of exact patch size.

One disadvantage to this approach was the inability to determine the strength of the SCF-EVI relationship within a given year’s cross-variogram. After researching the interpretation of covariance, it became clear that covariance values can only tell you the sign of a relationship, but cannot lend any insight into the magnitude of the relationship because the data is not standardized. However, because I generated covariograms across multiple years, I was able to glean information regarding the relative shifts in covariance, which is ultimately the type of information I’m most interested in producing.

For the final project, I plan on generating additional covariogram for the next couple of year in order to observe how the covariance continues to change. I’m interested to see at what point the covariance plot starts to look like the pre-fire condition.

Tutorial 2: Nitrate Data in Relationship with Red Alder cover and Land Use

Filed under: 2017,Tutorial 2 @ 11:43 am


  1. Research Question 

Do the nitrate values that I collected in the field have a relationship with the percent of Red Alder cover in each watershed?

                Sub question: Do the nitrate values have a relationship with land use activities in each watershed tree?

  1. Name of the tool or approach that you used.

For this analysis I utilized ARC GIS and Microsoft Excel

  1. Brief description of steps you followed to complete the analysis.

First I utilized a package in Arc GIS called NetMap to delineate the 10 watersheds. This package uses GPS points to draw watershed boundaries based on elevations and flow paths.

Next I uploaded the 2011 red alder percent cover shapefile from the Oregon Spatial Library Database, and then clipped it to the watersheds.

By joining the attributes of these two shapefiles, I was able to obtain the total area for the watershed and the total area of Red Alder cover in the watershed.  Then I divided these two areas I was able to obtain percent cover for Alder and compare that to the nitrate concentrations at each site.

Table 1 Nitrate Concentrations and Percent Alder Cover

Watershed Area (m2) Alder (m2) Percent Alder Cover Average Nitrate
Cape Creek 4353700 3537.74 0.081% 1.88
Crooked Creek 31760400 3226.39 0.010% 1.35
Cummins Creek 21388800 222.89 0.001% 1.44
Gwynn Creek 2865000 5371.57 0.187% 1.61
Little Cummins Creek 3056600 361.35 0.012% 1.72
Yew Creek 9786600 969.92 0.010% 1.01
Kiser Creek 100500 1017.97 1.013% 0.23
Rock Creek 38375500 2359.26 0.006% 0.19
Middle Fork Creek 3294000 339.23 0.010% 0.17
Griffith Creek 4434800 2021.86 0.046% 0.13


Once this information was obtained, I utilized a scatterplot to identify a pattern.

Figure 1 Relationship between percent alder cover and nitrate concentration


Though the trendline indicates a negative correlation the data point at 1 percent may be an outlier. The watershed indicated by this point is small, which could inflate the true amount of alder cover in the region. With this in mind no distinctive pattern was observed for a relationship between alder cover and nitrate concentration in the ten watersheds. This information isn’t entirely surprising as the nitrate concentrations were relatively low to begin with.


Landuse Analysis 

Looking further, I utilized the 2011 NLCD data to evaluate land use in the 10 watersheds. With no usual suspects (agriculture, urban etc.) being particularly dominating in the watersheds it was difficult to link increased nitrate concentrations and land use at the field sites.  Most of the land cover was predominantly evergreen and mixed forest types which are natural for the areas sampled.


Figure 2 Coastal Watersheds Landuse


Figure 3 Coast Range Watersheds Landuse

  1. Brief description of results you obtained.

Overall nitrate concentrations over the five months in the ten watersheds are relatively low (~0.1-1ppm), and the results obtained were not particularly surprising.  There was no clear pattern linking red alder cover and nitrate concentrations in these watersheds (Table 1). Additionally, there were no surprise land use features in the watersheds, and thus no apparent link to nitrate concentrations in this analysis. Most of the land use in these watersheds are occupied by mixed and evergreen forests, neither are particularly linked to elevated nitrate concentrations. I would say this analysis was consistent with my expectations, starting with low nitrate concentrations, and suspecting most of the land use would be forest cover I predicted there likely not to be a strong link between land use and nitrate concentration. However, I expected slightly more red alder cover in the coastal watersheds and was surprised how little that cover related to the concentration of nitrate.


  1. Critique of the method – what was useful, what was not?

This method was very useful, I was able to delineate the watersheds, clip layers to identify potential sources of nitrate and utilize excel to further the analyze relationships. I used the join feature to gather information from two attibute tables to calculate the percentage of alder cover as well. I will use the skills learned and practiced in this exercise in the future to identify statistical relationships between spatially correlated (or not) data.


Spatial autocorrelation in NDVI for wheat fields and some characteristics

Filed under: 2017,Tutorial 2 @ 11:06 am

Tutorial 2 is based on tutorial 1 with a few alterations. 1) the variogram is standardized by dividing the semivariance by the standard deviation of that field, 2) only the wheat fields are used for the analysis so that the results are more easily compared, and 3) a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).


Part 1 and 2

Standardizing of variogram


where var is the Formal Class RasterVariogram. The variance column in this class can be accessed by @variogram$gamma. This column is replaced by the standardized variance, by dividing the existing column by the standard deviation. The standard deviation is calculated with the cellStats function. The input raster is ras, and ‘sd’ is the statistical option for standard deviation. na.rm=TRUE is a logical to remove NA values, and lastly, the asSample=TRUE means that the denominator for the standard deviation is n-1.

Doing the same analysis but now just focusing on one crop does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatter plot there is no pattern visible for the relationship between the sill, range, and biomass.

Part 3

For part 3 the range and sill values for all wheat fields at 5 different dates is determined. The correlation between range and sill for every date is determined. This Pearson correlation coefficient is then analyzed over time. It seems that there is an increase in correlation over time. There seems to be a high correlation with a 5%-significant p-value of 0.047.

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

Filed under: Tutorial 2 @ 1:09 am

Reva Gillman

Tutorial 2                                Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

  1. Question that I asked:

What are the geographic distributions of some of the ‘riskier’ species that arrived on Japanese Tsunami Marine Debris (JTMD)? What do their native and non-native ranges look like, and how do they compare with each other?

First of all, I had to choose 4 species with clear invasion history out of the 104 JTMD species, to focus on. Out of the 31 with invasion history, I chose Asterias amurensis (seastar) and Hemigrapsus sanguineus (Japanese shore crab) since they are known to be an issue in other regions. I then chose two species with large non-native spread: Crassostrea gigas (Japanese oyster), an economically important aquaculture species, and Teredo navalis, the shipworm with an almost global distribution that has burrowed in wooden ships all over the globe for 100’s of years.

  1. Name of the tool or approach that I used:

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species.

  1. Brief description of steps I followed to complete the analysis:

First I made an excel spreadsheet of the species data, by going through each ‘risky’ species distribution, converting regions to realms, and typing that up in the spreadsheet. Then, I did a join with Realm_Code which matched up with the attribute table from the shape file. However, I ran into issues with mapping this data, as it would only map the region that was native, or non-native, without the other regions represented at all, so I had to figure out another method.

My second attempt was to directly select regions of the shape file, and change the symbology by going into selection, and clicking on use with this symbol. This may have eventually worked, but was very hard to figure out which regions to select for each species, and there was not an intuitive way to save the selections as a new layer.

Finally, I found out how to go into Layer Properties, and manipulate the shape file from there:

a. First you left-click on the layer you want to manipulate, and select Properties from the bottom of the options.

b. Go into Symbology

c. Underneath the Categories options on the left, select unique values

d. From the value field, select the field that you are interested in, in my case that was Realm_Code

e. Select the Add all values button on the bottom, which will put all of the selected field options in the screen.

f. From there, you can select a field, and remove it, or add values as needed, which I did for each species

g. By double clicking on each field, you can select the appropriate color that you want to represent each category, you can change the outline of the symbol, the fill color, and texture. I did this to make the color of invasive realms one color, and native realms another color, and regions where the species was both native and non-native another color/texture.

h. Then, you can copy and paste the layer, to manipulate again for the species, so you don’t have to start from scratch.


4. Brief description of results you obtained.

Below, there are the 4 maps of each species distribution, to see the extent of their ranges.


Teredo navalis geographic distribution






Hemigrapsus sanguineus geographic distribution

Crassostrea gigas  Geographic Distribution


For clearer viewing of the above graphs, please download: Tutorial2

5. Critique of the method – what was useful, what was not?

This method can be tedious if you want to look at many different layers of maps. For me, it worked out because I chose to look at four different layers of species geographic distributions, but if this method was less tedious I might have looked at more. By going through and manually selecting each polygon, or realm, that I wanted to highlight, I had a lot of control over what the final map looked like, and I liked the level of detail you can find with symbology, outlines of polygons, fill color, and texture of fill. The final maps are straightforward, and easy to save each layer, copy, and paste the next layer so you don’t have to start from scratch each time.

May 23, 2017

Tutorial 2: Extracting raster values to points in Arc for regression analysis in R

Filed under: Tutorial 2 @ 10:38 pm

Research Question

As I have described in my previous blog posts, I am investigating the spatial relationship between blue whales and oceanographic variables in the South Taranaki Bight region of New Zealand. In this tutorial, I expand on the approach I took for tutorial 2 to quantify whether blue whale group size can be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration values at each sighting location. To accomplish this I used ArcMap to align different streams of spatial data and extract the relevant information, and then used R to perform statistical analyses. Specifically, I asked the following question:

“How well do remotely sensed chlorophyll-a (chl-a) concentration and sea surface temperature (SST) predict blue whale group size? “


I downloaded raster layers for chl-a concentration and SST, and used the the “extract values to points” tool in Arc to obtain chl-a and SST values for each blue whale sighting location. I then exported the attribute table from Arc and read it into R. In R, I performed both a general linear model (GLM) and a generalized additive model (GAM) to investigate how well chl-a and SST predict blue whale group size.


I downloaded both the chl-a and SST raster files from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website ( The raster cell values are averaged over a 4 km2 spatial area and a one-month time period, and for my purposes I selected the month of February 2017 as that is when the study took place.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from both the chl-a concentration and SST rasters for each of the blue whale sighting data points. The resulting contained the location of each sighting, and the number of whales sighted (group size), the chl-a concentration, and the SST for each location. All of this information is contained in the attribute table for the layer file (Fig. 1), and so I exported the attribute table as a .csv file so that it could be read in other programs besides Arc.

Figure 1. Attribute table for blue whale sightings layer, which contains blue whale group size, chl-a concentration, and SST for each sighting location.

In R, I read the in the data frame containing the information from the attribute table. In an exploration of the data, I plotted chl-a against SST to get a sense of whether there is spatial autocorrelation between the two. Additionally, I plotted group size against chl-a and SST separately to visualize any relationship.  Subsequently, I ran both a GLM and a GAM with group size as the response variable and chl-a concentration and SST as predictor variables. While a GLM fits a linear regression model, a GAM is more flexible and allows for non-linearity in the regression model. Since I am in the exploratory phase of my data analysis, I tried both approaches. The script I used to do this is copied below:


There appeared to be some relationship between SST and chl-a concentration (Fig. 2), which is what I expected considering that colder waters are generally more productive. There was no clear visual relationship between group size and chl-a or SST (Fig. 3, Fig. 4).

Figure 2. A scatter plot of chlorophyll-a concentration vs. sea surface temperature at each blue whale sighting location.

Figure 3. A scatter plot of chlorophyll-a concentration vs. blue whale group size at each blue whale sighting location.

Figure 4. A scatter plot of sea surface temperature vs. blue whale group size at each blue whale sighting location.

The result of the GLM was that there was no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size. The result of the GAM was very similar to the result of the GLM in that it also showed no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size, and the predictor variables could only explain 4.59% of the deviance in the response variable.

Critique of the Method

Overall, the method used here was useful, and it was good practice to step out of Arc and code in R. I found no significant results, and I think this may be partially due to the way that my data are currently organized. In this exercise, I extracted chl-a concentration and SST values only to the points where blue whales were sighted, which means I was working with presence-only data. I think that it might be more informative to extract values for other points along the ship’s track where we did not see whales, so that I can include values for chl-a and SST for locations where the group size is zero as well.

Additionally, something that warrants further consideration is the fact that the spatial relationship between blue whales and these environmental variables of interest likely is not linear, and so therefore might not be captured in these statistical methods. Moving forward, I will first try to include absence data in both the GLM and GAM, and then I will explore what other modeling approaches I might take to explain the spatial distribution of blue whales in relation to their environment.

Tutorial 2: Using kernel density probability surfaces to assess species interactions

Filed under: Tutorial 2 @ 4:48 pm

Overview: question clarification and approach

Continuing my exploration of changes in cougar predation patterns in time periods with and without wolves, I wanted to expand my evaluation of spatial repulsion and/or shifts to examine how cougar kill site distributions related to wolf activity. I identified several elements in my data sets, besides the presence of wolves, which could produce a shift in kill density or distribution including: 1) catch-per-unit effort discrepancies (i.e. larger sample sizes of cougars (and kill sites) in one data set), or 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). I accounted for catch-per-unit effort issues in tutorial 1, but need to account for seasonal variation in prey distribution as part of my analysis of wolf influence on cougar kill distribution. Density features continued to be a good tool to explore my presence only data (kill events). Studies of sympatric wolves and cougar have shown differences in the attributes of sites where each carnivore makes kills (slope, elevation, physiographic features), but I was interested in how cougar might be positioning themselves (e.g. where they are hunting and making kills) on the landscape relative to centers of wolf activity. Therefore, the next step in understanding my spatial problem was to determine if there were differences in when and where cougar were killing prey by asking:

Does wolf activity account for the distribution differences between pre- and post-wolf cougar kill sites?

For the purposes of this analysis the data would be “location, implicit” with the variable of interest (cougar kill sites) having a covariate (proximity or distance to wolf activity) measurable at each sampled location and the causal mechanism inferred from co-variation. Central areas of wolf activity could be identified from wolf kill density, estimated on a seasonal basis, and related to the distribution of cougar kill sites. A probability surface could be created from wolf kill density and used to calculate the proportion of pre- and post-wolf cougar kill sites within probability features as a proxy for the distribution of wolf activity. This relationship could then be compared as a joint proportion or as an expected/observed relationship across time periods. Alternatively, the isopleth polygon feature could be used to calculate a “distance-to-edge” feature relating each cougar kill to potentially increasing levels of wolf contact. This relationship could be evaluated between time periods and across seasons through ANOVA or regression.

My approach to this question was to relate the proportion of both pre- and post-wolf cougar kill sites (points) to wolf activity using the probability contours (polygon feature) of wolf kill site distribution (density raster) as a proxy for activity. I used several tools in ArcGIS, GME, and R to carry out this analysis. Geospatial Modeling Environment (GME) is a standalone application that makes use of ArcGIS shape files and R software to carry out spatial and quantitative analyses. It was created to take advantage of R computing and replaces the ‘Animal Movement’ plugin many movement ecologists made use of in previous versions of ArcGIS. GME allows a lot of flexibility in tasks related to dealing with large data sets and common analyses performed on animal location data (KDE, MCP, movement paths). Because it can operate as a GUI or command driven operator using R, it is user friendly and allows for iterative processes, quick manipulations of data, and combinations of these process not easily duplicated in Arc.



Prior to evaluating influences from wolf activity, it was necessary to address potential effects from seasonal shifts in prey distribution or density related to ungulate migration patterns. Therefore, the first step for this analysis was to subset kill data into winter (1 Nov – 30 Apr) and summer (1 May – 31 Oct) seasons. I used the ‘select by attribute’ feature in ArcGIS to subset each kill data set (wolf, cougar (pre-wolf), and cougar (post-wolf)) into season-specific shapefiles.


Figure 1. ArcGIS ‘select by attribute’ call from attribute table to subset data into season.

Similar to the procedure for tutorial 1, I used GME to create density and isopleth probability contours for wolf kill sites and the ‘countpntsinpolys’ command to add columns with the number of cougar kills (pre/post) to each seasonal wolf kill isopleth polygon. Finally, I used ArcMap to visualize the data and make a figure showcasing the process.


Figure 2. GME ‘kde’ call in the command builder GUI.



Visual examination of season-specific cougar kills relative to wolf kill activity centers added further evidence toward a spatial shift in the distribution of cougar kills between time periods with and without wolves (Figure 3). The magnitude of the repulsive effect appeared stronger in winter than summer, evident in the uniformly decreased proportions of cougar kills per probability contour of wolf activity, and increased proportion of kills outside any wolf activity between time periods with and without wolves (Table 1). The relationship appeared opposite in summer, with higher proportions of cougar kills observed in the post-wolf period than expected in all but core (25% probability surface) and outside (no wolf activity) areas (Table 2).


Figure 3. Visual comparison of seasonal wolf activity based on kill site density probability contours and the distribution of cougar kill sites across time periods with and without wolves.

Table 1. Winter wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Table 2. Summer wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Method Critique

Accounting for seasonal shifts in kill density improved visual interpretation of spatial patterns and reduced a factor that might mask influences of wolf activity. However, the observed shift in winter could still be due to shifts in prey populations as a behaviorally mediated prey response to wolves (i.e. the shift is related to prey responding to wolves, and cougar indirectly shifting distribution to follow prey). The shift could also be related to differences in prey species-specific distributions, as cougar prey predominantly on deer, but show seasonal selection of neonate elk calves in the summer, and wolves prey predominately on elk (i.e. the abundance of newborn calves in the summer creates spatial tolerance between wolves and cougar). The comparison of pre- to post-wolf cougar kill sites relative to the probability of wolf kills may be misleading, since the relationship is based on pre-wolf cougar kill data overlapping a density feature that didn’t exist (i.e. no wolves on the landscape at the time those kills were made). However, the comparison does provide some insight as the pre-wolf cougar kill distribution still represents a ‘prior’ distribution of cougar kills, and overlapping wolf activity demonstrates the proportion of kills we could expect if there were no competition (spatial avoidance/repulsion) between the two species. Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution. The method was useful and provide evidence toward the presence of a season-specific shift in where cougars are killing prey (i.e. the answer to my ‘question asked’ is yes, in winter), but the regression analysis discussed above may provide additional support.


May 22, 2017

Tutorial 2: Site Suitability

Filed under: Tutorial 2 @ 2:43 pm


For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests?

Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.

I performed site suitability/selection for this study based on the following criteria:

  • Need to identify 4 sites:
    • each site would have 4 treatments
      • (a North and South, open meadow and closed forest)
    • road access (< 300m from road)
    • sites at similar elevations
      • similar forest types
      • similar proximity to non-forest
      • each site has the same proximity to water
      • area would be of similar size
      • created by natural processes, not a clear-cut.


I used many tools to determine site suitability for the field experiment:

Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute


Map 1. Habitat types in Deschutes NF.

I started with a habitat type layer (Map 1) and DEM layer obtained from the Forest Service of Deschutes National Forest.

First, I needed a map showing aspect. To do this, I used the Aspect tool in the Spatial Analyst toolbox which used my DEM layer to create an aspect map of the area as shown in Map 2.

Once I had my aspect layer, I needed to somehow identify areas within the habitat layer that were meadows. I used the Extract by Attribute tool to select only habitat that was considered ‘meadow’ since this was what I needed. Once I had the habitat type layer to only show meadows I then used the tool Raster to Point to generate points within meadows to allow me to store future information as seen in Map 3.

From there, I inserted the roads layer, also obtained from the NFS. Roads were needed because my sites could only be <300 meters from the nearest road. Therefore, I used the Near tool to generate the distance from points (meadows) to nearest road. I also did this with my raster layer of waterbodies to make sure sites were in similar proximity to water, as that could be influencing reptile diversity.

Once the points (meadows) had information regarding proximity to roads and water, I then used the Extract Values to Points tool to obtain elevation data for each point within a meadow, because they have to be at similar elevations.

At this point, I had most of my information I needed regarding similarity between possible meadows. Next, I reclassified my aspect layer to only show north and south aspects (as seen in Map 4) because those were the only aspects I needed to make sure were present within each meadow. From there, I just opened the Attribute Table for my meadow points and used the Select by Attribute option to tell ArcMap to only show sites that were <300 m from road, and <1800 m elevation (see Map 5).  Once I narrowed down the possible sites, I manually looked at every possible meadow to see if north and south aspect was present within the open meadow and also within the bordering forest.

Results: All of this resulted in 4 suitable sites that I will use for one of my research projects, determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 6.

Critique: The way I did this worked, but I feel like there may be an easier way. This method took me a few weeks, and a lot of frustrations working with many types of files. I had never done site suitability before, and this proved to be harder than it seemed.

Map 2. Aspect created with ArcMap.




Are meadows significant biodiversity hotspots for reptiles compared to forests?

Tools/Methods: For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

Critique: For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made.

y = f(x)

> glm(type~diversity)

Call:  glm(formula = type ~ diversity)


(Intercept)    diversity

0.006173     0.098765

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual

Null Deviance:     2

Residual Deviance: 0.4198     AIC: 5.123

Map 3. Raster to Points

Map 4. Reclassified aspects. North and south only.

Map 5. Select by attributes. Meadows <300m from road and <1800m elevation.

Map 6. Results. Meadow sites being used for study.

Tutorial 2: Examining Hydrograph Relationships

Filed under: 2017,Tutorial 2 @ 2:12 pm


How does upstream-downstream connectivity, drainage area, and position within the catchment affect the relationship between different stage gage sites?


I chose to look at the relationships between the hydrographs for four sites that have upstream-downstream connectivity but have different upstream drainage areas and are located on different tributaries. I used ArcGIS to obtain the drainage areas and Excel to analyze the hydrographs. The four sites are as follows: Site 26 is the site with the largest drainage area, on Mainstem Mill Creek. The “Mainstem” site is also located on the mainstem, with a slightly smaller drainage area. Site 36 and South Fork are located on different tributaries, Gunn Creek and South Fork Mill Creek respectively, but have approximately the same drainage area. This drainage area is significantly smaller than either Site 26 or Mainstem (Figure 1).


First I used NetMap’s Watershed Delineation tool to create upstream drainage area polygons for each of the four sites I was interested in, using the attribute table to obtain the total area of each polygon (Table 1).

Then I graphed the difference in stage heights over time between Site 26, the largest site, and each of the other three sites to examine patterns in the hydrographs, creating three “stage difference” graphs. I removed any periods of missing data (especially a two-week period in February at the Mainstem site). I also graphed the raw hydrograph data for comparison (Figure 2).

Finally I combined the three stage difference graphs into one plot for easy comparison.


First I compared the hydrograph patterns of MS and SF since they are on the same tributary. Both graphs showed little difference with Site 26 during baseflow conditions. The 26 v SF graph had high positive peaks during floods which indicates that there is a large source of water going through Site 26 that is not being accounted for in the SF hydrograph. The 26 v MS graph was relatively flat which makes sense since Site 26 and MS have more similar drainage area sizes and are expected to behave more similarly since the amount of water going through them is more similar and they are closer together geographically. However, the MS site still had some high peaks during the two highest floods which indicates other sources of water (Figure 3).

Then I compared the reactions of MS and 36 since they were a similar distance from 26 but with different drainage areas. The graph looked very similar to the MS. vs. SF graph which suggests that drainage area is one of the driving factors in the shape of the hydrograph (Figure 3).

Finally I compared 36 and SF since they have similar drainage areas but are on different tributaries that both drain into Site 26. The two graphs were very similar in shape which further enforced the fact that drainage area was a driving factor. The two tributaries seem to be functioning relatively similarly despite differences in geographic location. 36 was closer to 26 during baseflow conditions which makes sense since they are closer together in terms of streamwise distance and therefore the floodwave changes less between the two points (Figure 3).


When examining the patterns in these graphs, it is obvious that there is a portion of the hydrograph that is still missing. Most of the variability in peak flows is accounted for in the Mainstem hydrograph, but there are still minor peaks during the big floods. Little of the variability is accounted for in the Site 36 and South Fork hydrographs. Looking at the map, I hypothesize that North Fork Mill Creek, which is not instrumented, represents the differences in the hydrographs that are currently not being accounted for. It would be interesting to install a LevelLogger on this tributary in the future to see if it completes the picture.

This was an interesting analysis that allowed me to further explore my data and how my study basin reacts to precipitation events. It was useful in illustrating the influences of drainage area and stream connectivity on hydrographs in different parts of the basin, and pinpointing where there is still potentially missing data that could give a more complete picture of my watershed. It is still a pretty qualitative analysis in that I could not derive any real statistics from what is happening with my hydrographs, but it paints a clear picture of what’s going on and inspires further questions.

Tutorial 2: Automated Autocorrelation of Tsunami Inundation Time Series at Various Points of Interest

Filed under: Tutorial 2 @ 12:11 pm


Tsunami inundate shorelines in a chaotic manner, resulting in varying degrees of flooding from location to location. The spatial variability of tsunami hazards has important implications for the tsunami-resistant design of structures and evacuation plans because different safety margins may be required to achieve consistent reliability levels. Currently, tsunami mitigation strategies are primarily informed prediction models due to the lack of nearshore field data from tsunami disasters. Thus, to investigate the spatial variability of tsunami hazards, we analyze the relationship between different test locations or points of interest (POIs) in our domain. The question is:

“How much does tsunami inundation vary spatially?”

Essentially saying if one observes a certain flooding depth at their location, could they assume that a similar flood level is occurring elsewhere? And how does this change with time? And how to certain coastal features affect this?


To answer this question, I examined the autocorrelation of the time series of inundation at various locations as well as the difference in inundation levels between these locations. In this case, 12 locations were selected and thus the process to analyze all of these locations was automated. All of this was implemented in R.


As before, the tsunami inundation data is stored in a large NetCDF file. Thus we use the same procedure as mentioned in Tutorial 1 to read the NetCDF file into R. This process requires the “ncdf4” package to be installed.

For this analysis, we begin by selecting the points of interest along the coast near Port Hueneme, California. These points were arbitrarily selected but were also ensured to be somewhat well distributed along the coastline and were within the domain of the dataset. Figure 1 shows the locations of the points of interest (POIs) along the coast of Port Hueneme, CA. A total of 12 points were selected and were all meant to represent some building or port location. Their names and geographical coordinates were stored in a .csv file as shown in Figure 2.


Figure 1 – POIs where time series of inundation were extracted along coast near Port Hueneme, CA

Figure 2 – CSV file storing locations and their geographic coordinates

The .csv file was read into R by using the “read.csv” function and this table was subsequently converted to a data frame as shown by the code in Figure 3. Next, the “which.min” function from the “stats” package was used in a for loop to find the indexes. Next a loop is used to extract all the different time series and plot them accordingly. Figure 4 shows the time series of inundation for each of the locations stored in the CSV file. Note that the first 3 locations are in the ocean and the rest are land locations.

The time series data was stored in a separate data frame and then the “acf” function from the “stats” package in R was run in loop form to obtain the autocorrelation values. THis was followed by using another loop with a nested loop to take the difference in inundation value between each of the locations and the autocorrelation function was applied to these as well. The code for these procedures is shown in Figure 3.

Figure 3 – R code used to read in .csv file and procedures to perform autocorrelation analysis

Figure 4 – Time series of inundation for each location shown in Figure 1


Figure 5 shows the autocorrelation plot for each of the inundation time series shown in Figure 4. These plots show that there are indeed variations between the different locations. The autocorrelation plots also appear to vary more as the POIs get further away from one another.

Figure 5 – Autocorrelation plots of the inundation time series from Figure 4

Figure 6-8 show some examples of the autocorrelation plots of differences in inundation level between two locations. Figure 6 shows the autocorrelation of the difference in inundation level between Port Hueneme and all of the other locations. A placeholder image is shown in place of the difference between itself (which doesn’t exist). Figure 7 shows the same plot but for Del Monte Produce. And Figure 8 shows this for the naval base Navsea 1388.

There are 9 more of these plots that were created by this process, but in the interest of not inundating this report, only 3 are shown. Each of these figures shows rather interesting differences for the different locations.

Figure 6 – Autocorrelation plots between difference in inundation levels for all locations relative to Port Hueneme

Figure 7 – Autocorrelation plots between difference in inundation levels for all locations relative to Del Monte Fresh Produce

Figure 8 – Autocorrelation plots between difference in inundation levels for all locations relative to Navsea 1388

Critique of Method

The method provided a very convenient way of viewing the different locations and their associated autocorrelations with the touch of a button. In the end, POIs can be easily added or removed by modifying the CSV file and the source code should be able to adjust accordingly without any modification to the code. This feature has already been tested and shown to be quite robust for handling any number of locations with coordinates within the data domain. This shows that this procedure is efficient and effective. Overall, I was very pleased with the operational capability of this method.

Observing the autocorrelations for each of the  locations and for the inundation differences between the regions was quite interesting. From a glance, I could immediately tell that there were differences between the POIs. I think this information will indeed be quite useful, though my experience with interpreting these types of autocorrelation plots will require some additional thought.

May 21, 2017

Geographically Weighted Regression Analysis of Tropical Forest

Filed under: 2017,Tutorial 2 @ 10:10 pm
  1. Question that you asked
  • Is crown width linearly related to height in space in the study area?
  1. Name of the tool or approach that you used.
  • I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the total of the tree height and the elevation (ground height).
  • GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth are dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.


  1. Brief description of steps you followed to complete the analysis.
  • The first step, load the dataset into the ArcMap. As a reminder, this dataset is the same dataset I used in exercise 2. The dataset is generated from UAV visual images. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAV visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, height, and canopy width. The result of this data processing is a data set in a table consisting of 804 detected trees with their height and crown width. Unfortunately, because the images were not georeferenced yet, the height in this tutorial is the sum of tree height and elevation.
  • The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 1 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but in this exercise I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width.

Figure 1. Geographically Weighed Regression Tool.

  • The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis.
  • To finish the analysis just click “Ok”. We can see the whole result by looking at the attribute table (see figure 2). ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 3).

Figure 2. Attribute table of Geographically Weighed Regression result.


Figure 3. Distribution of Residual Standard Deviation in study area.


  • However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 4). We also can change the number of classes and color. In my case, I choose two classes to distinguish features that are positively and negatively related. The layer after adjustment can be seen in the figure 5.

Figure 4. Layer properties:Symbology.



Figure 5. Distribution of coefficient of the explanatory variable.


  1. Brief description of results you obtained.
  • The result can be seen in Figure 2, 3, 4, and 5. The attribute table shown in the figure 2 consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height is positively related. Which means there is increase in the Height for every one unit increase of Crown width. In other words, the bigger the crown the higher the tree will be.
  • However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between Crown width and Height, and some other trees have negative linear relationship between Crown width and Height (Ground height + Tree height). The distribution of trees that have positive and negative linear relationship can be seen in Figure 5. The red points indicate trees that have negative linear relationship between Crown width and Height, which means tree with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means tree with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients are close to 0.
  • Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high height, and some other have big crown and low height. Different elevation can give significant effect because the Height data used in this case is the total of ground height (elevation) and tree height. Trees with positive linear relationship might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley).
  • The R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height.
  1. Critique of the method – what was useful, what was not?
  • The method is really useful to generate regression model for all feature (in my case, more than 800 trees). It helps to understand the distribution of the trees with different coefficient or other values, such as standard error, residual standard deviation, etc. because there is a new layer as an output that can show those values in space in the study area.
  • However, dependent and explanatory variables should be numeric fields containing a variety of values. In addition, linear regression methods, like GWR, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0). The regression model will be misspecified if it is missing a key explanatory variable.

Tutorial 2: Calculate Burn Severity Patch Sizes with SDMtools in R

Filed under: Tutorial 2 @ 4:29 pm


Often patch dynamics are examined under the lens of single disturbance events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration of patches from an initial disturbance may influences the spatial configuration of patches of a sequential disturbance.  The broader research question is focused on determining if there is a relationship between patches from an initial disturbance of beetle outbreak and a sequential disturbance of wildfire. Thus far, I have conducted analysis to determine if there is any inherent patchiness for the burn severity raster and the beetle-outbreak raster using semi-variograms and cross-variograms.  Both have indicated some level of patchiness. I wanted to go back and characterize the patches in the burn severity map to understand the distribution of patch size across the burn mosaic for the Entiako fire that burned in 2012 in central interior British Columbia. I asked the following questions:

What is the variability in patch size for burn severity classes?


The burn severity raster generated for Tutorial 1 was utilized for this exercise. Initially, I planned to conduct the analysis in FRAGSTATS, a standalone software package, which assesses landscape spatial patterns.  The raster seemed unreadable in the commonly used GEOTIFF format.  I ended up conducting the analysis in R version 3.3.2 (Team 2016) with the SDMtools Package (VanDerWal 2015).   The SDMtools package is based on the work by McGarigal et al. (2012) FRAGSTATS software package and calculates many of the same metrics. The following spatial equation was considered as a guide:

In order to calculate patch metrics, I converted the continuous raster into a classified raster.  The classification was based on break points values defined by Reilly et al. (2017) for this exercise.  Classes included a low, moderate, and high burn severity.  The low class also included unburned areas.  In the SDMtools package, I calculate patch metrics for the burn severity classes across the landscape and patch metrics for each individual burn severity class, Figure 1. The following PDF includes step-by-step instructions with r-code for generating patch metrics:



Figure 1: Burn Severity Classes for Patch Metric calculations.


In the SDMtools package, I ran two analyses, one on the whole landscape and one on each burn severity class. For the first analysis, I found that high burn severity was the smallest component for this fire event only accounting for about 15 percent of the landscape. This was interesting, because lodgepole pine is often characterized by ‘stand replacing’ high severity fire, however, moderate burn severity can also account for ‘stand replacing’ fire in lodgepole pine dominated landscapes, which was about 45 percent of the landscape.  Table 1 also includes the number of patches and total area in hectares for each burn class.


Table 1: Descriptive statistics for burn severity classes at the landscape scale.

Burn Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
Low 382 2767.14 0.394
Moderate 448 3179.79 0.452
High 422 1082.70 0.154


The second analysis examined the patch metrics for each burn severity class. Table 2 includes patch area metrics by burn severity class.  While the output includes many other variables, I wanted to look at the variability in patch size for burn classes. The high burn severity maximum and mean patch size are the most interesting part of Table 2, because they are quite different from the low and moderate burn severity.  Further statistical analysis is needed to determine if they numbers are statistically different.


Table 2: Patch Area Metrics for burn severity classes

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
Low 382 0.09 1449.45 7.24 0.18 77.7
Moderate 448 0.09 1461.69 7.10 0.18 75.3
High 422 0.09 445.95 2.57 0.18 22.5


Critique of the method – what was useful, what was not?

This analysis attempt was much more complicated than anticipate. The complications mostly came from the raster format.  The raster was formatted as floating point, which was not compatible with the stand alone Fragstats Software nor the patch grid extension in Arc GIS. The patch grid extension for Arc GIS is based on Fragstats and generates similar patch metrics (Rempel et al. 2012). I did not realize the issue was related to the  floating point raster until after I figured out how to run the SDMtools package in R (VanDerWal 2015), which generates similar outputs to Fragstats, and is based on the work of McGarigal et al. (2012). I did trouble shoot and figure out the raster issue, which lead to converting the raster from floating point to integer in R.  Additionally, a major challenge was figuring out how the patches are defined in SDMtools package. Fragstats and Patch Grid both offer an option to define the cell neighborhood rule as 4 or 8.  This option of defining the neighborhood cell rule was unclear to me in the SDMtools package. Based on a brief comparison of data outputs from Fragstats and SDMtools, I believe that patch identification tool in SDMtools classifies patches base on the 8 cell neighborhood rule. Lastly, I think that calculating patch statistics in R is less cumbersome and allows for better documentation of methods.



McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Rempel, R. S., D. Kaukinen, and A. P. Carr. 2012. Patch Analyst and Patch Grid. Ontario Ministry of Natural Resources. Center for Northern Forest Ecosystem Research, Thunder Bay, Ontario.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

VanDerWal, J. 2015. Species Distribution Modelling Tools: Tools for processing data associated with species distribution modeling exercises.


April 5, 2017

Tutorial 2

Filed under: 2017,Tutorial 2 @ 8:15 pm

This is the blog page for Spring 2017 Tutorial 2 posts

© 2017 GEOG 566   Powered by WordPress MU    Hosted by