GEOG 566

         Advanced spatial statistics and GIScience

Archive for Tutorial 2

June 2, 2017

Processing Multispectral Imagery from an Unmanned Aerial Vehicle For Phenotyping Seedlings in Common Garden Boxes: Part 2

Filed under: 2017,Tutorial 2 @ 1:52 pm


Consolidating Spectral Variables – PCA

Before any comparisons could be made between spatial and spectral variables in the study, I needed to look a bit closer at each set of variables. Retaining a large number of spectral variables in my models would make them much more complicated, therefore it would be advantageous to reduce that number. Based on a correlation matrix constructed using the R function ‘corrplot’ (Figure 14), there are several strong correlations between variables. In fact, the red edge band seems to have strong positive correlations with the other 4 Micasense Rededge bands. This clued me in that principal components analysis (PCA) could be used to condense them.

Figure 14: Correlation matrix between mean crown response for 5 spectral bands from Micasense Rededge sensor (blue, green, red, near ir, red edge) as well as two vegetative spectral indicies (NDVI and TGI).

Though I knew I wanted to do PCA to reduce the number of spatial variables in my dataset and had some evidence that it could work, it was necessary to decide how many components should be retained. By plotting the Eigen values in a Scree plot using R package ‘doParalell’ (Figure 15), it became evident that retaining three principal components would be appropriate.

Figure 15: Scree plot for 7 spectral variables. Results indicate that 3 principal components should be retained for principal components analysis based on the shape of the curve and the Eigen value = 1 line.

The R package ‘Psych’ was used to carry out PCA for 3 components on the 7 spectral variables. Results (Figure 16) indicate that the model could account for 94% of the total variance in the data. The principal components RC1, RC2, and RC3 are representative mostly of green, infrared, and blue, respectively. In short, the spectral dataset can be consolidated from 7 variables to 3 without losing much precision.

Figure 16: Output for 3 factor principal components analysis to summarize 7 spectral variables. Results show that retaining these three components would explain 94% of the variation from the original model.

Choosing Spatial Variables – Hot Spot Analysis and GWR

Though several spatial variables had been added to the data, it was unclear what relationship those variables had with the spectral data. For preliminary investigation, an ordinary least squares (OLS) model was built in ArcMap with NDVI as the dependent variable and all 4 spatial variables as explanatory variables. The results of the variable distributions and relationships plot from the OLS output (Figure 17) suggest that box number is the spatial variable most likely to have a significant effect on spectral data. Both box and plot center seem to be slightly skewed in terms of their residual plots, and nearest neighbor appears to be normally distributed.

To further inform this analysis, hot spot analysis (Getis- Ord Gi*) was carried out for the NDVI variable (Figure 18). Results demonstrate clear spatial trends. The most obvious disparity lies between the boxes. There also some evidence that position within the boxes has an effect, as the healthiest seedlings tend to appear in the center of boxes, where the least healthy seedlings tend to appear around the edges.

Figure 17: Variable and residual distributions for spatial variables in the study. Note that nearest neighbor (nn), box center, and plot center appear to have randomly distributed residuals, where box # (box_) does not.

Figure 18: Hot Spot analysis of seedlings based on NDVI shows spatial irregularities between and within boxes. This figure serves as a baseline for regressions that follow and suggests that both box # and distance to center of box may be significant covariates.

Using geographically weighted regression (GWR) in ArcMap, I was able to look at how NDVI variability changed with the inclusion of each of my spatial variables as effects. Because the box # seemed to have the greatest effect, I started by modeling it alone as a predictor of NDVI. The result (Figure 19) shows a large visual improvement in the randomness of variability of the data when compared to the hot spot analysis.  Adding box center to the model (Figure 20) only had a marginal effect on the result, though some outliers at the east and west extremes of the site were corrected.

Figure 19: GWR of NDVI by box. Reduction in number of severe outliers in box centers suggests that box effect is likely significant.

Figure 20: Adding distance to box center to model had only a small effect. The distribution of variability in the regressed model seems to be much more uniform and, presumably, closer to the corrected NDVI values without the spatial effects of box number and box center.

Though there is still much work to be done, this workflow sets the foundation for processing multispectral imagery that has or may have spatial biases and proposes ways to evaluate and work though them. Many applications exist for this type of work, but in the case of my study, the hope is to use spatially corrected NDVI values to classify seedlings based on their drought response, which could end up looking very similar to Figure 21.

Figure 21: Potential application for spatially corrected data: cluster and outlier analysis to identify individuals and groups of individuals with unexpected NDVI values compared to predictions based on spatial variables. Seedlings in high clusters (pink) or high outliers (red) could be strong candidates for genetic drought resistance.

May 26, 2017

A space for time substitution on an impacted landscape

Filed under: Tutorial 2 @ 1:49 pm


Has the total daily energy flow through a functional group at a given elevation changed over the eighty-year interval between 1931 and 2011?


I am hoping to better understand the forces structuring small mammal communities, specifically rodents, across space and through time. In this regard, I am using trap effort-standardized abundance data to evaluate the response of species and functional groups to changes anthropogenic land use practices and climate warming. These data are spatially explicit, and although their spatial coordinates reflect the patterns of trapping, their elevational distribution captures patterns that are reflective of biological responses to a changing environment. Increasing elevation is inversely correlated with summer high temperatures and collinear with winter low temperatures, and positively correlated with increasing precipitation. There is also a trend of decreasing human impact due to land use with increasing elevation, and in the Toiyabe mountain range, human land use for cattle grazing and agriculture has also decreased over time. However, climate warming in the Great Basin has continued over the same time span and has shifted climatic bands upslope.

Species and functional groups respond to these change overtime in dynamic ways which may be recapitulated in their distribution along elevation/ climate gradients. This is referred to as the space for time hypothesis, and generally states that space, which captures the temporal gradient of interest, may be used to stand in for time; with the expectation that species distributions across space reflect their response to the variable of interest through time.

I am interested in the functional groups captured by diet preference, habitat affinity, and body size. Diet preference classes include herbivores, granivores, and omnivores, habitat affinity includes xeric adapted, mesic adapted, and generalist, and body size includes four size bins, 0-30g (size class 1), 40-80g (size class 2), 100-200g (size class 3), and 250-600g (size class 4). Any given species may be any combination of these three functional classes, although not all combinations are actually represented. Thus, patterns in the distribution of these functional groups also reflect tradeoffs that species face, and help to highlight species that may be at risk when functional categorizations become decoupled from environmental conditions. Thus, changes in the elevational distribution of functional groups over time provides key insights to species responses. However, total daily energy flow can also be estimated for these groups, because energy flow through a group is a function of both body size and abundance. This helps us to understand the interactions between abundance, body size, diet, and habitat affinity.

Thus, the question becomes, has the total daily energy flow through a functional group at a given elevation changed over the eighty years between the modern and historic; and based on the cross correlations between body size, diet, and habitat affinity, are their species or groups that appear especially at risk of local extirpation?

Analytical approach and brief description of steps followed to complete the analysis:

To perform this analysis, I used Cran-R to trap effort standardize abundance data for rodent species in the Toiyabe mountain range for both historical and modern data sets. The function in R is an iterative process which resamples all trap sites within the historic and modern data sets that have a trap effort greater than the site with the lowest number of trap nights. This process is automated using R, and generates an average species occurrence from 1,000 iterations. I then used excel to calculate the daily energy flow through each population based on body mass estimates and an equation which employs the principals of allometric scaling, Ei = aNiMib. Where Ei is energy flow (kJ/day), a and b are allometric parameters, specifically, b is the scaling parameter and equal to 0.75. N is the abundance estimate, in this case the trap effort standardized abundance, and M is the estimated body mass (Terry and Rowe 2015).

All trap localities were paired for the historic and modern time-periods, and traps site elevations were binned into three elevation ranges, bellow 5500ft, 5500-6500ft, and above 6500ft. Energy flow for each functional group was calculated and plotted based on the above stated elevation bins.

Brief description of results you obtained:

Total energy flow per day though the modern rodent community in the Toiyabe mountain range is lower than in the historic time period (Figure 1). This finding is consistent with findings by Terry and Rowe, 2015, that total energy through the rodent community in the Great Basin decreases sharply at the Holocene-Anthropocene boundary.

Using climate-elevation correlations we can substitute elevation for time in the Holocene to model the warming climate from the early to late Holocene in the Great Basin. Temperatures increase with decreasing elevation, and precipitation decreases with decreasing elevation. Historically, energy flow through xeric adapted species decreases with increasing elevation, while the opposite is true for mesic adapted species. Notably, energy flow through the historic mesic adapted species increases in a linear fashion with elevation, while in the modern energy flow through this group roughly plateaus at the middle elevation (Figure 2). While both modern and historical patterns are generally consistent with Holocene trends for these functional, neither captures the distinctive decline in energy flow through the xeric adapted group during the modern. This suggests that space for time comparisons may be valid for pre-human impact studies, but that they may not capture abrupt shifts in species responses during the Anthropocene.

Small bodied, omnivorous, habitat generalists demonstrate greatest energy flow at middle elevation in both the historical and modern time-periods, however, total energy flow through these groups are markedly decreased in the modern compared to the historic time periods at these elevations. While generalist demonstrate lower energy flow at the middle and high elevations for the modern compared to the historic, their low elevation values remain unchanged (Figure 2). This is consistent with omnivorous, habitat generalist through the Holocene and into the Anthropocene (Terry and Rowe 2015).

            Finally, herbivores demonstrate a nearly four-fold decrease in energy flow at the lowest elevations over the eighty-year period between the historic and modern time-periods (Figure 2). This is the only result that seems to strongly reflect the abrupt shift from the Holocene to the Anthropocene. This is a particularly surprising result as human land use was greater historically for this mountain range than it is today, and ground cover by herbaceous plants and grasses has increased over this time-period. This may suggest that increased temperatures at lower elevations have pushed herbivorous species upslope, indicating that these species may be more sensitive to climate. Conversely, energy flow thought granivores at low elevations increased from historic to the modern, also consistent with Terry and Rowe, 2015. This however, may be expected, as granivorous rodents in this region also tend to be xeric adapted, while herbivorous rodents tend to be mesic adapted.

            While inference based on this study is limited in scope due to the sample size of one (one mountain range), it does begin to make the case that space for time comparisons should not be assumed to hold as we move into a future where landscapes and climate are increasingly affected by human activity. By doing so, we may fail to capture the subtle ways in which certain ecological patterns are becoming decoupled, and thus, must be wary of drawing spurious conclusions based on old assumptions.

Figure 1. Total energy flow (kJ/day) for each elevational bin in the historic and modern time periods.

Figure 2. From left to right; Energy flow (kJ/day) through functional groups in the historic and modern time periods along the elevation gradient. From top to bottom, functional classifications: size class, diet preference, and habitat affinity.


Critique of the method – what was useful, what was not?

Given the nature of the data, this method of analysis was useful for me. I learned something about the data that was not obvious before my analysis, or comparison with the research at the greater temporal scale captured in the study by Terry and Rowe 2015.  While my inferences are limited, increasing the number of mountain ranges on which I perform this analysis will enable me to perform a number of more informative statistical analyses, and test the spatial gradient at both the elevational scale, and the latitudinal scale, as the three mountain ranges, for which similar data exists occur in a north to south array.


Terry, R.C. and R.J. Rowe. 2015. Energy flow and functional compensation in Great Basin small mammals under natural and anthropogenic environmental change. PNAS, 112(31):9656-9661.

May 24, 2017

Tutorial 2: Identifying clustering with a geographically weighted regression

Question You Asked

How much spatial clustering is present in the regression model of vegetation response to canopy cover? I am interested in determining if a single equation can predict the way that the two variables interact within east-side ponderosa pine forests, or if multiple equations are necessary.

Name of Tool or Approach You Used

To answer this question, I used the Geographically Weighted Regression tool in ArcMap.

Brief description of steps you followed to complete the analysis

In the GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat clustered across the landscape, I used an adaptive kernel with an AICc bandwidth method.

Brief description of results you obtained

The map of coefficients did reveal certain clusters of high coefficients and low coefficients across the landscape. However, judging by the GWR table below, this clustering may not be statistically significant. One anomaly of this assessment was the negative adjusted R2 value. A negative R2 means that the equation did not include a constant term.

Map of clustered plots

Neighbors 46
ResidualSquares 14750.95231
EffectiveNumber 3.80525057
Sigma 18.69738296
AICc 405.3014401
R2 0.048524493
R2Adjusted -0.01473284
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov

Table of original GWR results

To remedy the negative adjusted R2, I tried adding in another explanatory variable (elevation). This appeared to help the model, reducing the residual squares and bringing the adjusted R2 value back above 0.

Neighbors 46
ResidualSquares 13904.07263
EffectiveNumber 5.22764912
Sigma 18.46665082
AICc 405.8795404
R2 0.103150476
R2Adjusted 0.01015694
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov
Explanatory Field 2 Elevation

Table of remedied GWR

Critique of the method – what was useful, what was not?

This method was useful in that I could process the data in ArcMap, which is where I was already analyzing my points. It was also very helpful to visualize the coefficients in the map below. However, I am still a little unsure why the coefficient map shows such strong clustering while the output table does not show any significance.

Automating Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods

Filed under: Tutorial 2 @ 12:04 pm
  1. Question asked

As reported, I am handling more than 100 variables. In addition, my data is constantly being updated. So, I needed to be able to have a tool in order to create independent features for analyzing each of my variables. The question is: How can I automate Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods?

  1. Name of the tool or approach that was used.

I used ArcPy with Python for the automation and, ArcMap for output visualization. ArcPy is a site package that builds on (and is a successor to) the successful arcgisscripting module.

  1. A brief description of steps you followed to complete the analysis.

First, we have our input feature as shown in Figure 1. Notice the attribute table containing all our variables.

Next, we take advantage of the Model Builder in ArcMap in order to visualize our model using the tool “Feature Class to Feature Class”, as shown in Figure 2.


In addition, we export the resulting python code. The exported code is presented in Figure 3.


This python code gives us an idea of how the “Feature Class to Feature Class” tool is working. So we proceed to create our own function file as depicted in Figure 4.

Within our function file and, based on the previously exported function, we build our own function. Our function is presented in Figure   5.

Once we have our function within our function file, we can now execute our function creating a script file. The content of our created script file is shown in Figure 6.

  1. A brief description of results you obtained.

 The result of running our script file is shown in Figure 7. Here, we first read the variables of our input feature and, at the las row, the output field of the new feature is listed.

Finally, the newly created feature is seen in Figure 8. Here we can also see it corresponding attribute table. It is noticed that the created feature contains only the selected field.

  1. Critique of the method – what was useful, what was not?

  • Model Builder:

This method is useful for building and visualizing your model. And also has the capability to export a python script of your model.

  • ArcPy:

ArcPy is a very useful package that builds on the successful ArcGIS scripting module. It allows creating the cornerstone for a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. So, many variables can be analyzed without the need to access to ArcMap, saving an enormous amount of time.


ArcGIS (2017). What is ArcPy.

Tutorial 2:

Filed under: 2017,Tutorial 2 @ 11:59 am

Research Question

Building upon my question from exercise 2, my research question for exercise 3 seeks to answer how the covariance between SCF and EVI within a wildfire perimeter change over time following the wildfire.

Approach/tools used

For this exercise, I utilized a variety of software programs. While the overall goal of variogram creation required using R, ArcMap and Excel were used significantly in order to process the data into a usable format. As a result, much hopping between programs was required, which was undoubtedly quite tedious/frustrating at times but ultimately proved to be a useful learning experience.

Analysis procedure

To create a cross variogram in R using the gstat package, the process is much smoother using data in table format. As a result, one of the main challenges of creating a cross-variogram was converting my raster data into usable tabular data. To accomplish this, I first converted the SCF and EVI rasters into point shapefiles, then used the “Add XY Coordinates” tool in ArcMap to add two fields to the point shapefiles containing the X- and Y-coordinate information. Next, the attribute table from each point, shapefile was converted to a CSV table. The final step before stepping back into R was to combine all EVI and SCF data for each year into a single CSV table to ease the script writing in R.

After creating the combined EVI-SCF table, I noticed that a few of the columns had different numbers of records. It turns out that I used a slightly different extraction method for a few SCF layers – a product of conducting this analysis across multiple days and not taking diligent notes on the layers used. After correcting this issue, I read the cleaned table into R and created cross-variograms using the code snippet shown below.


My hypothesis for the years following the wildfire was to see the strongest covariance in the first year, then decreasing overall covariance in each year afterwards. However, the plots below tell a different story. There are a lot of things going on in the time-series of cross-variograms, but a couple of evident trends stick out to me. First, although each year shows two clear bumps in the cross-variogram, the lag distance of those bumps shift significantly from year to year. My guess would be that the initial shift from pre-fire to the first year post-fire was primarily due to the variation in burn severity across the landscape. The shifts in ensuing years may be due to varying rates of vegetation recovery or perhaps due to changes in snowpack distribution caused by the wildfire.

Another fascinating pattern is the change in sign of the covariance. Before the wildfire, the covariance is mostly negative except for a few thousand meters of lag distances. This suggests that between most pixels within the burn area, SCF and EVI vary negatively – if snowpack increases between two points, vegetation generally decreases between those two points. Following the wildfire, however, the covariance becomes more and more positive overall. By year 3 following the wildfire, at all distances the covariances are positive, but still showing a similar bumpiness as in the pre-burn cross-variogram. This outcome was a bit baffling, as I would expect the relationship to be strongest (most positive) in the first year post-fire, with a gradual retreat to the pre-fire state.

Another pattern of significance is the consistent cross-variogram valley that occurs at about 5000 meters in each year, even though the peaks on either side shift from year to year. This can be interpreted as the between-patch distance, and is perhaps the more important result of this method – that from year to year, even before the wildfire, the landscape shows similar distances between patches.

Figure 1: Pre-fire (2005) cross variogram between SCF and EVI

                     Figure 2: First year post-fire (2007) cross variogram between SCF and EVI

Figure 3: Second year post-fire (2008) cross variogram between SCF and EVI

Figure 4: Third year post-fire (2009) cross variogram between SCF and EVI

Method critique

By creating cross-variograms for each year following a wildfire and the pre-fire year, I was able to discern the patchiness of the EVI and SCF data as well as the shifts in the snowpack-vegetation relationship. The persistence of the bumpiness across years strengthens the evidence of two patch sizes, but the shift in bump locations results in a cloudier picture of exact patch size.

One disadvantage to this approach was the inability to determine the strength of the SCF-EVI relationship within a given year’s cross-variogram. After researching the interpretation of covariance, it became clear that covariance values can only tell you the sign of a relationship, but cannot lend any insight into the magnitude of the relationship because the data is not standardized. However, because I generated covariograms across multiple years, I was able to glean information regarding the relative shifts in covariance, which is ultimately the type of information I’m most interested in producing.

For the final project, I plan on generating additional covariogram for the next couple of year in order to observe how the covariance continues to change. I’m interested to see at what point the covariance plot starts to look like the pre-fire condition.

Tutorial 2: Nitrate Data in Relationship with Red Alder cover and Land Use

Filed under: 2017,Tutorial 2 @ 11:43 am


  1. Research Question 

Do the nitrate values that I collected in the field have a relationship with the percent of Red Alder cover in each watershed?

                Sub question: Do the nitrate values have a relationship with land use activities in each watershed tree?

  1. Name of the tool or approach that you used.

For this analysis I utilized ARC GIS and Microsoft Excel

  1. Brief description of steps you followed to complete the analysis.

First I utilized a package in Arc GIS called NetMap to delineate the 10 watersheds. This package uses GPS points to draw watershed boundaries based on elevations and flow paths.

Next I uploaded the 2011 red alder percent cover shapefile from the Oregon Spatial Library Database, and then clipped it to the watersheds.

By joining the attributes of these two shapefiles, I was able to obtain the total area for the watershed and the total area of Red Alder cover in the watershed.  Then I divided these two areas I was able to obtain percent cover for Alder and compare that to the nitrate concentrations at each site.

Table 1 Nitrate Concentrations and Percent Alder Cover

Watershed Area (m2) Alder (m2) Percent Alder Cover Average Nitrate
Cape Creek 4353700 3537.74 0.081% 1.88
Crooked Creek 31760400 3226.39 0.010% 1.35
Cummins Creek 21388800 222.89 0.001% 1.44
Gwynn Creek 2865000 5371.57 0.187% 1.61
Little Cummins Creek 3056600 361.35 0.012% 1.72
Yew Creek 9786600 969.92 0.010% 1.01
Kiser Creek 100500 1017.97 1.013% 0.23
Rock Creek 38375500 2359.26 0.006% 0.19
Middle Fork Creek 3294000 339.23 0.010% 0.17
Griffith Creek 4434800 2021.86 0.046% 0.13


Once this information was obtained, I utilized a scatterplot to identify a pattern.

Figure 1 Relationship between percent alder cover and nitrate concentration


Though the trendline indicates a negative correlation the data point at 1 percent may be an outlier. The watershed indicated by this point is small, which could inflate the true amount of alder cover in the region. With this in mind no distinctive pattern was observed for a relationship between alder cover and nitrate concentration in the ten watersheds. This information isn’t entirely surprising as the nitrate concentrations were relatively low to begin with.


Landuse Analysis 

Looking further, I utilized the 2011 NLCD data to evaluate land use in the 10 watersheds. With no usual suspects (agriculture, urban etc.) being particularly dominating in the watersheds it was difficult to link increased nitrate concentrations and land use at the field sites.  Most of the land cover was predominantly evergreen and mixed forest types which are natural for the areas sampled.


Figure 2 Coastal Watersheds Landuse


Figure 3 Coast Range Watersheds Landuse

  1. Brief description of results you obtained.

Overall nitrate concentrations over the five months in the ten watersheds are relatively low (~0.1-1ppm), and the results obtained were not particularly surprising.  There was no clear pattern linking red alder cover and nitrate concentrations in these watersheds (Table 1). Additionally, there were no surprise land use features in the watersheds, and thus no apparent link to nitrate concentrations in this analysis. Most of the land use in these watersheds are occupied by mixed and evergreen forests, neither are particularly linked to elevated nitrate concentrations. I would say this analysis was consistent with my expectations, starting with low nitrate concentrations, and suspecting most of the land use would be forest cover I predicted there likely not to be a strong link between land use and nitrate concentration. However, I expected slightly more red alder cover in the coastal watersheds and was surprised how little that cover related to the concentration of nitrate.


  1. Critique of the method – what was useful, what was not?

This method was very useful, I was able to delineate the watersheds, clip layers to identify potential sources of nitrate and utilize excel to further the analyze relationships. I used the join feature to gather information from two attibute tables to calculate the percentage of alder cover as well. I will use the skills learned and practiced in this exercise in the future to identify statistical relationships between spatially correlated (or not) data.


Spatial autocorrelation in NDVI for wheat fields and some characteristics

Filed under: 2017,Tutorial 2 @ 11:06 am

Tutorial 2 is based on tutorial 1 with a few alterations. 1) the variogram is standardized by dividing the semivariance by the standard deviation of that field, 2) only the wheat fields are used for the analysis so that the results are more easily compared, and 3) a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).


Part 1 and 2

Standardizing of variogram


where var is the Formal Class RasterVariogram. The variance column in this class can be accessed by @variogram$gamma. This column is replaced by the standardized variance, by dividing the existing column by the standard deviation. The standard deviation is calculated with the cellStats function. The input raster is ras, and ‘sd’ is the statistical option for standard deviation. na.rm=TRUE is a logical to remove NA values, and lastly, the asSample=TRUE means that the denominator for the standard deviation is n-1.

Doing the same analysis but now just focusing on one crop does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatter plot there is no pattern visible for the relationship between the sill, range, and biomass.

Part 3

For part 3 the range and sill values for all wheat fields at 5 different dates is determined. The correlation between range and sill for every date is determined. This Pearson correlation coefficient is then analyzed over time. It seems that there is an increase in correlation over time. There seems to be a high correlation with a 5%-significant p-value of 0.047.

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

Filed under: Tutorial 2 @ 1:09 am

Reva Gillman

Tutorial 2                                Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

  1. Question that I asked:

What are the geographic distributions of some of the ‘riskier’ species that arrived on Japanese Tsunami Marine Debris (JTMD)? What do their native and non-native ranges look like, and how do they compare with each other?

First of all, I had to choose 4 species with clear invasion history out of the 104 JTMD species, to focus on. Out of the 31 with invasion history, I chose Asterias amurensis (seastar) and Hemigrapsus sanguineus (Japanese shore crab) since they are known to be an issue in other regions. I then chose two species with large non-native spread: Crassostrea gigas (Japanese oyster), an economically important aquaculture species, and Teredo navalis, the shipworm with an almost global distribution that has burrowed in wooden ships all over the globe for 100’s of years.

  1. Name of the tool or approach that I used:

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species.

  1. Brief description of steps I followed to complete the analysis:

First I made an excel spreadsheet of the species data, by going through each ‘risky’ species distribution, converting regions to realms, and typing that up in the spreadsheet. Then, I did a join with Realm_Code which matched up with the attribute table from the shape file. However, I ran into issues with mapping this data, as it would only map the region that was native, or non-native, without the other regions represented at all, so I had to figure out another method.

My second attempt was to directly select regions of the shape file, and change the symbology by going into selection, and clicking on use with this symbol. This may have eventually worked, but was very hard to figure out which regions to select for each species, and there was not an intuitive way to save the selections as a new layer.

Finally, I found out how to go into Layer Properties, and manipulate the shape file from there:

a. First you left-click on the layer you want to manipulate, and select Properties from the bottom of the options.

b. Go into Symbology

c. Underneath the Categories options on the left, select unique values

d. From the value field, select the field that you are interested in, in my case that was Realm_Code

e. Select the Add all values button on the bottom, which will put all of the selected field options in the screen.

f. From there, you can select a field, and remove it, or add values as needed, which I did for each species

g. By double clicking on each field, you can select the appropriate color that you want to represent each category, you can change the outline of the symbol, the fill color, and texture. I did this to make the color of invasive realms one color, and native realms another color, and regions where the species was both native and non-native another color/texture.

h. Then, you can copy and paste the layer, to manipulate again for the species, so you don’t have to start from scratch.


4. Brief description of results you obtained.

Below, there are the 4 maps of each species distribution, to see the extent of their ranges.


Teredo navalis geographic distribution






Hemigrapsus sanguineus geographic distribution

Crassostrea gigas  Geographic Distribution


For clearer viewing of the above graphs, please download: Tutorial2

5. Critique of the method – what was useful, what was not?

This method can be tedious if you want to look at many different layers of maps. For me, it worked out because I chose to look at four different layers of species geographic distributions, but if this method was less tedious I might have looked at more. By going through and manually selecting each polygon, or realm, that I wanted to highlight, I had a lot of control over what the final map looked like, and I liked the level of detail you can find with symbology, outlines of polygons, fill color, and texture of fill. The final maps are straightforward, and easy to save each layer, copy, and paste the next layer so you don’t have to start from scratch each time.

May 23, 2017

Tutorial 2: Extracting raster values to points in Arc for regression analysis in R

Filed under: Tutorial 2 @ 10:38 pm

Research Question

As I have described in my previous blog posts, I am investigating the spatial relationship between blue whales and oceanographic variables in the South Taranaki Bight region of New Zealand. In this tutorial, I expand on the approach I took for tutorial 2 to quantify whether blue whale group size can be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration values at each sighting location. To accomplish this I used ArcMap to align different streams of spatial data and extract the relevant information, and then used R to perform statistical analyses. Specifically, I asked the following question:

“How well do remotely sensed chlorophyll-a (chl-a) concentration and sea surface temperature (SST) predict blue whale group size? “


I downloaded raster layers for chl-a concentration and SST, and used the the “extract values to points” tool in Arc to obtain chl-a and SST values for each blue whale sighting location. I then exported the attribute table from Arc and read it into R. In R, I performed both a general linear model (GLM) and a generalized additive model (GAM) to investigate how well chl-a and SST predict blue whale group size.


I downloaded both the chl-a and SST raster files from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website ( The raster cell values are averaged over a 4 km2 spatial area and a one-month time period, and for my purposes I selected the month of February 2017 as that is when the study took place.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from both the chl-a concentration and SST rasters for each of the blue whale sighting data points. The resulting contained the location of each sighting, and the number of whales sighted (group size), the chl-a concentration, and the SST for each location. All of this information is contained in the attribute table for the layer file (Fig. 1), and so I exported the attribute table as a .csv file so that it could be read in other programs besides Arc.

Figure 1. Attribute table for blue whale sightings layer, which contains blue whale group size, chl-a concentration, and SST for each sighting location.

In R, I read the in the data frame containing the information from the attribute table. In an exploration of the data, I plotted chl-a against SST to get a sense of whether there is spatial autocorrelation between the two. Additionally, I plotted group size against chl-a and SST separately to visualize any relationship.  Subsequently, I ran both a GLM and a GAM with group size as the response variable and chl-a concentration and SST as predictor variables. While a GLM fits a linear regression model, a GAM is more flexible and allows for non-linearity in the regression model. Since I am in the exploratory phase of my data analysis, I tried both approaches. The script I used to do this is copied below:


There appeared to be some relationship between SST and chl-a concentration (Fig. 2), which is what I expected considering that colder waters are generally more productive. There was no clear visual relationship between group size and chl-a or SST (Fig. 3, Fig. 4).

Figure 2. A scatter plot of chlorophyll-a concentration vs. sea surface temperature at each blue whale sighting location.

Figure 3. A scatter plot of chlorophyll-a concentration vs. blue whale group size at each blue whale sighting location.

Figure 4. A scatter plot of sea surface temperature vs. blue whale group size at each blue whale sighting location.

The result of the GLM was that there was no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size. The result of the GAM was very similar to the result of the GLM in that it also showed no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size, and the predictor variables could only explain 4.59% of the deviance in the response variable.

Critique of the Method

Overall, the method used here was useful, and it was good practice to step out of Arc and code in R. I found no significant results, and I think this may be partially due to the way that my data are currently organized. In this exercise, I extracted chl-a concentration and SST values only to the points where blue whales were sighted, which means I was working with presence-only data. I think that it might be more informative to extract values for other points along the ship’s track where we did not see whales, so that I can include values for chl-a and SST for locations where the group size is zero as well.

Additionally, something that warrants further consideration is the fact that the spatial relationship between blue whales and these environmental variables of interest likely is not linear, and so therefore might not be captured in these statistical methods. Moving forward, I will first try to include absence data in both the GLM and GAM, and then I will explore what other modeling approaches I might take to explain the spatial distribution of blue whales in relation to their environment.

Tutorial 2: Using kernel density probability surfaces to assess species interactions

Filed under: Tutorial 2 @ 4:48 pm

Overview: question clarification and approach

Continuing my exploration of changes in cougar predation patterns in time periods with and without wolves, I wanted to expand my evaluation of spatial repulsion and/or shifts to examine how cougar kill site distributions related to wolf activity. I identified several elements in my data sets, besides the presence of wolves, which could produce a shift in kill density or distribution including: 1) catch-per-unit effort discrepancies (i.e. larger sample sizes of cougars (and kill sites) in one data set), or 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). I accounted for catch-per-unit effort issues in tutorial 1, but need to account for seasonal variation in prey distribution as part of my analysis of wolf influence on cougar kill distribution. Density features continued to be a good tool to explore my presence only data (kill events). Studies of sympatric wolves and cougar have shown differences in the attributes of sites where each carnivore makes kills (slope, elevation, physiographic features), but I was interested in how cougar might be positioning themselves (e.g. where they are hunting and making kills) on the landscape relative to centers of wolf activity. Therefore, the next step in understanding my spatial problem was to determine if there were differences in when and where cougar were killing prey by asking:

Does wolf activity account for the distribution differences between pre- and post-wolf cougar kill sites?

For the purposes of this analysis the data would be “location, implicit” with the variable of interest (cougar kill sites) having a covariate (proximity or distance to wolf activity) measurable at each sampled location and the causal mechanism inferred from co-variation. Central areas of wolf activity could be identified from wolf kill density, estimated on a seasonal basis, and related to the distribution of cougar kill sites. A probability surface could be created from wolf kill density and used to calculate the proportion of pre- and post-wolf cougar kill sites within probability features as a proxy for the distribution of wolf activity. This relationship could then be compared as a joint proportion or as an expected/observed relationship across time periods. Alternatively, the isopleth polygon feature could be used to calculate a “distance-to-edge” feature relating each cougar kill to potentially increasing levels of wolf contact. This relationship could be evaluated between time periods and across seasons through ANOVA or regression.

My approach to this question was to relate the proportion of both pre- and post-wolf cougar kill sites (points) to wolf activity using the probability contours (polygon feature) of wolf kill site distribution (density raster) as a proxy for activity. I used several tools in ArcGIS, GME, and R to carry out this analysis. Geospatial Modeling Environment (GME) is a standalone application that makes use of ArcGIS shape files and R software to carry out spatial and quantitative analyses. It was created to take advantage of R computing and replaces the ‘Animal Movement’ plugin many movement ecologists made use of in previous versions of ArcGIS. GME allows a lot of flexibility in tasks related to dealing with large data sets and common analyses performed on animal location data (KDE, MCP, movement paths). Because it can operate as a GUI or command driven operator using R, it is user friendly and allows for iterative processes, quick manipulations of data, and combinations of these process not easily duplicated in Arc.



Prior to evaluating influences from wolf activity, it was necessary to address potential effects from seasonal shifts in prey distribution or density related to ungulate migration patterns. Therefore, the first step for this analysis was to subset kill data into winter (1 Nov – 30 Apr) and summer (1 May – 31 Oct) seasons. I used the ‘select by attribute’ feature in ArcGIS to subset each kill data set (wolf, cougar (pre-wolf), and cougar (post-wolf)) into season-specific shapefiles.


Figure 1. ArcGIS ‘select by attribute’ call from attribute table to subset data into season.

Similar to the procedure for tutorial 1, I used GME to create density and isopleth probability contours for wolf kill sites and the ‘countpntsinpolys’ command to add columns with the number of cougar kills (pre/post) to each seasonal wolf kill isopleth polygon. Finally, I used ArcMap to visualize the data and make a figure showcasing the process.


Figure 2. GME ‘kde’ call in the command builder GUI.



Visual examination of season-specific cougar kills relative to wolf kill activity centers added further evidence toward a spatial shift in the distribution of cougar kills between time periods with and without wolves (Figure 3). The magnitude of the repulsive effect appeared stronger in winter than summer, evident in the uniformly decreased proportions of cougar kills per probability contour of wolf activity, and increased proportion of kills outside any wolf activity between time periods with and without wolves (Table 1). The relationship appeared opposite in summer, with higher proportions of cougar kills observed in the post-wolf period than expected in all but core (25% probability surface) and outside (no wolf activity) areas (Table 2).


Figure 3. Visual comparison of seasonal wolf activity based on kill site density probability contours and the distribution of cougar kill sites across time periods with and without wolves.

Table 1. Winter wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Table 2. Summer wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Method Critique

Accounting for seasonal shifts in kill density improved visual interpretation of spatial patterns and reduced a factor that might mask influences of wolf activity. However, the observed shift in winter could still be due to shifts in prey populations as a behaviorally mediated prey response to wolves (i.e. the shift is related to prey responding to wolves, and cougar indirectly shifting distribution to follow prey). The shift could also be related to differences in prey species-specific distributions, as cougar prey predominantly on deer, but show seasonal selection of neonate elk calves in the summer, and wolves prey predominately on elk (i.e. the abundance of newborn calves in the summer creates spatial tolerance between wolves and cougar). The comparison of pre- to post-wolf cougar kill sites relative to the probability of wolf kills may be misleading, since the relationship is based on pre-wolf cougar kill data overlapping a density feature that didn’t exist (i.e. no wolves on the landscape at the time those kills were made). However, the comparison does provide some insight as the pre-wolf cougar kill distribution still represents a ‘prior’ distribution of cougar kills, and overlapping wolf activity demonstrates the proportion of kills we could expect if there were no competition (spatial avoidance/repulsion) between the two species. Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution. The method was useful and provide evidence toward the presence of a season-specific shift in where cougars are killing prey (i.e. the answer to my ‘question asked’ is yes, in winter), but the regression analysis discussed above may provide additional support.


Tutorial 2

Filed under: 2017,Tutorial 2 @ 3:50 pm

For exercise 3, I finally left the world of Arc and took my data to R.


Test the radar backscatter between different-facing aspects and determine if aspect influences radar signal.

Tool used:

I used R to run the statistical analysis, but still needed to extract the data using Arc.

What I did:

I extracted the data from ArcGIS by creating 1000 random points througout my scene and running the “extract data to points” tool to pull out the backscatter and aspect at each point. Then I exported the table of point data to Excel to clean up any missing values that might throw off R. There were 4 values in total that were missing either the backscatter or aspect value. Then I saved this table as a .csv and brought it into R.

Once in R, the data needed a little bit of cleaning up. Since the default aspects from Arc range from -1 (flat) and 0 (north) to 360 (still north, all the way around the compass), using just the numeric value to classify the points would not work well. I used the following lines of code to change the aspects to factors:

NEWDATA$ASPECT <- cut(DATA$ASPECT,breaks=c(-1,0,22.5,67.5,112.5,157.5,202.5,247.5,292.5,337.5,360),labels = c(“FLAT”,”N”,”NE”,”E”,”SE”,”S”,”SW”,”W”,”NW”,”No”),include.lowest = TRUE)

levels(NEWDATA$ASPECT) <-c(“FLAT”,”N”,”NE”,”E”,”SE”,”S”,”SW”,”W”,”NW”,”N”)

The second line combines the two “north” aspects into one (i.e. 0-22.5 degrees and 337-360 degrees).


The results were promising. I expected to find no difference between aspects, and the scatterplots showed that, at least upon visual inspection, the aspects showed no difference between each other.

A boxplot also showed little difference in the average Gamma0 values (the backscatter).

Next step was to run a statistical analysis. Since the data were not normally distributed, I had to do something other than an ANOVA, so after some internet research I went with a Kruskal-Wallis test. This tests for difference in population distributions (i.e. if they are identical or not) when the distributions are not normal. The code is simple enough:

kruskal.test(GAMMA~ASPECT, data=DATA)

I also conducted an ANOVA on the log-transformed Gamma0 data, which appeared more normal than before. None of the results were significant from any test, meaning that the aspect does not have a statistically significant effect on the backscatter, which is the result I was hoping for.

Critique of the Method:

This method was useful for my dataset. I had a rather large dataset to work with in Arc, and didn’t find the statistical tools that I needed in Arc either. I am comfortable in R, and prefer it for larger datasets since it doesn’t have the bells and whistles that Arc does.

Tutorial 2: Investigate Spatial Autocorrelation (Mantel Test)

Filed under: 2017,Tutorial 2 @ 1:30 pm

For this Tutorial I revisited a principle that I learned about in the Spatio-Temporal Statistics (Variation) statistics course that I took with Dr. Jones last fall. Autocorrelation is an important component to address within ecological work, as well as geographic work. Plots that are closer to each other may have a greater probability of capturing the same species during fieldwork and it is important to know if my data are influenced by distance and spatial relationships. Examining auto-correlation within my data may also reveal information about spatial patterns related to patch size, or gradients at my field site (Salmon River Estuary).

Research Question:

My research question fits the framework “how is yi related to xi + h? I am investigating spatial autocorrelation for my field survey plots and species assemblages represented by those plots. I want to know if I can predict or expect the presence of a plant species in one plot, based on its proximity (distance) to another plot. The presence of spatial autocorrelation in my plots would imply some sort of dependency or relatedness between plots, i.e. the closer two plots are located to each other, the more likely they are to overlap in terms of species present. Spatial autocorrelation can provide context for interpreting scale of patterns and causal variables related to spatial vegetation patterns (likely elevation, salinity, land use history).

My Approach (Tools for Analysis):

PC-Ord was very helpful in structuring my data for the Mantel test. I was able to run the test several times with different combinations of data to detect patterns of auto-correlation. Again, I formatted my data in Excel and used the Excel sheets to produce matrices for the Mantel text. Each matrix pair had species data, and UTM coordinate data.

Steps to Complete Analysis:

I organized my data in excel, started a new project in PC-Ord, assigned species and location matrices, and ran a Mantel test (Group > Mantel test). I used Sorensen (ranked) distance measures for my percent cover data, and Euclidean distance for my coordinate data. I set the test to complete 999 randomized runs for a Monte Carlo comparison to test significance. I also used my species data to conduct an Indicator Analysis, to identify species that are more common, faithful, or exclusive to particular marsh that I surveyed. I directed PC-Ord to conduct an ISA, and grouped by data by tidal marsh surveyed. The ISA produced a number for each species surveyed that indicated the percentage of plots that species was found in, and whether or not it was significant for that species. From the indicator analysis, I found that certain upland marsh species are indicative of remnant sites, certain species are indicative of salty conditions, and other species are indicative of restored sites.


examples of matrix inputs from Excel to PC-Ord. The data were structured as matricies with sample units (plots) x species cover, or sample units x UTM coordinates.

Results and Outcomes:

I conducted a Mantel test on all of my data and found that none of my plots were spatially autocorrelated (Mantel’s r statistic Transect: r = 0.037797, p = 0.182182; MW plot: r = 0.027994, p = 0.164164, accept null hypothesis of no relationship). This is a little surprising, but may be indicative of noise within my dataset, and variation of species at this scale. It is possible that autocorrelation is not detectable at this scale, or perhaps I need a larger dataset with less proportional noise to sort out the autocorrelation signal. I was however able to detect spatial autocorrelation at the 1,000 square meter scale for the Modified Whittaker plots (r = 0.638224, p = 0.00010), suggesting that there may be more fine scale patichness, variation, or nestedness among plant species at each of the SRE tidal marshes. Salt may also be a confounding factor that drives spatial diversity of vegetation, in addition to dike removal, as salt is a limiting factor for some salt marsh plants; not all species are equally tolerant of it.

For my ISA test, I found that (1) Triglochin maritima, Sarcocornia perennis, and Distichlis spicata were significant indicators of salty, restored conditions, (2) Dechampsia caespitosa, Juncus arcticus var. littoralis, Potentilla pacifica, Glaux maritima, and Hordeum brachyantherum were significant indicators of upland, high elevation conditions, and (3) Carex lyngbyei was a significant indicator of restored conditions. I broke my dataset up and conducted a mantel test for each of the groups, using only plots that recorded the presence of at least one species in each of the groups. I did not find any significant autocorrelation either with any of the strong indicator species (that were all found in a high proportion of plots surveyed). I am curious if my plots were located closer to each other, and/or I had surveyed more plots over a larger area, spatial autocorrelation patterns would begin to emerge.

an example of PC-Ord output for a Mantel test. The R statistic implies the amount of correlation between species cover and distance, the p value implies the significance of the correlation/similarity in comparison to chance (computed by Monte Carlo randomizations).

I also produced a map in Arc, to visulalize the patterns of diversity I saw in my data. I coded each of my plot locations (for Transects and Modified Whittaker plots) by the dominant plant strategy type: Carex lyngbyei dominant (A plot that had at least 50% or more cover of Carex lyngbyei ), salt dominant (at least 50% or more cover of one or more salt tolerant species), or mid-high marsh dominant (at least 50% 0r more cover of a mid-high marsh species that is not Carex lyngbyei). I think the map helps to visualize spatial patterns of diversity well, on a broader scale, by grouping plots into different assemblies or types/guilds based on their content.

Critique of the method – what was useful, what was not?:

This method was manageable to set up and execute, however I feel that it was difficult to negotiate useful outcomes with my data. It is highly plausible that my data do not capture these patterns, however, it was somewhat difficult for PC-Ord to work with my data on this problem. I have many zeroes in my dataset, as not all species are represented in every single plot (I have a grid of plot number X species, with cells filled in for percent cover ranging from 0 – 100). I was not able to complete mantel tests with certain assemblage combinations, as PC-Ord could not compute a randomization outcome with the number of zeroes present (random shuffling of the dataset produced too many identical outputs because I have too many zeros). Ultimately, I think I would approach this problem again with a larger dataset over a greater extent of my study site.


May 22, 2017

Tutorial 2: Site Suitability

Filed under: Tutorial 2 @ 2:43 pm


For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests?

Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.

I performed site suitability/selection for this study based on the following criteria:

  • Need to identify 4 sites:
    • each site would have 4 treatments
      • (a North and South, open meadow and closed forest)
    • road access (< 300m from road)
    • sites at similar elevations
      • similar forest types
      • similar proximity to non-forest
      • each site has the same proximity to water
      • area would be of similar size
      • created by natural processes, not a clear-cut.


I used many tools to determine site suitability for the field experiment:

Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute


Map 1. Habitat types in Deschutes NF.

I started with a habitat type layer (Map 1) and DEM layer obtained from the Forest Service of Deschutes National Forest.

First, I needed a map showing aspect. To do this, I used the Aspect tool in the Spatial Analyst toolbox which used my DEM layer to create an aspect map of the area as shown in Map 2.

Once I had my aspect layer, I needed to somehow identify areas within the habitat layer that were meadows. I used the Extract by Attribute tool to select only habitat that was considered ‘meadow’ since this was what I needed. Once I had the habitat type layer to only show meadows I then used the tool Raster to Point to generate points within meadows to allow me to store future information as seen in Map 3.

From there, I inserted the roads layer, also obtained from the NFS. Roads were needed because my sites could only be <300 meters from the nearest road. Therefore, I used the Near tool to generate the distance from points (meadows) to nearest road. I also did this with my raster layer of waterbodies to make sure sites were in similar proximity to water, as that could be influencing reptile diversity.

Once the points (meadows) had information regarding proximity to roads and water, I then used the Extract Values to Points tool to obtain elevation data for each point within a meadow, because they have to be at similar elevations.

At this point, I had most of my information I needed regarding similarity between possible meadows. Next, I reclassified my aspect layer to only show north and south aspects (as seen in Map 4) because those were the only aspects I needed to make sure were present within each meadow. From there, I just opened the Attribute Table for my meadow points and used the Select by Attribute option to tell ArcMap to only show sites that were <300 m from road, and <1800 m elevation (see Map 5).  Once I narrowed down the possible sites, I manually looked at every possible meadow to see if north and south aspect was present within the open meadow and also within the bordering forest.

Results: All of this resulted in 4 suitable sites that I will use for one of my research projects, determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 6.

Critique: The way I did this worked, but I feel like there may be an easier way. This method took me a few weeks, and a lot of frustrations working with many types of files. I had never done site suitability before, and this proved to be harder than it seemed.

Map 2. Aspect created with ArcMap.




Are meadows significant biodiversity hotspots for reptiles compared to forests?

Tools/Methods: For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

Critique: For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made.

y = f(x)

> glm(type~diversity)

Call:  glm(formula = type ~ diversity)


(Intercept)    diversity

0.006173     0.098765

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual

Null Deviance:     2

Residual Deviance: 0.4198     AIC: 5.123

Map 3. Raster to Points

Map 4. Reclassified aspects. North and south only.

Map 5. Select by attributes. Meadows <300m from road and <1800m elevation.

Map 6. Results. Meadow sites being used for study.

Tutorial 2: Examining Hydrograph Relationships

Filed under: 2017,Tutorial 2 @ 2:12 pm


How does upstream-downstream connectivity, drainage area, and position within the catchment affect the relationship between different stage gage sites?


I chose to look at the relationships between the hydrographs for four sites that have upstream-downstream connectivity but have different upstream drainage areas and are located on different tributaries. I used ArcGIS to obtain the drainage areas and Excel to analyze the hydrographs. The four sites are as follows: Site 26 is the site with the largest drainage area, on Mainstem Mill Creek. The “Mainstem” site is also located on the mainstem, with a slightly smaller drainage area. Site 36 and South Fork are located on different tributaries, Gunn Creek and South Fork Mill Creek respectively, but have approximately the same drainage area. This drainage area is significantly smaller than either Site 26 or Mainstem (Figure 1).


First I used NetMap’s Watershed Delineation tool to create upstream drainage area polygons for each of the four sites I was interested in, using the attribute table to obtain the total area of each polygon (Table 1).

Then I graphed the difference in stage heights over time between Site 26, the largest site, and each of the other three sites to examine patterns in the hydrographs, creating three “stage difference” graphs. I removed any periods of missing data (especially a two-week period in February at the Mainstem site). I also graphed the raw hydrograph data for comparison (Figure 2).

Finally I combined the three stage difference graphs into one plot for easy comparison.


First I compared the hydrograph patterns of Mainstem (MS) and South Fork (SF) since they are on the same tributary. Both graphs showed little difference with Site 26 during baseflow conditions. The 26 v MS graph was relatively flat which makes sense since Site 26 and MS have more similar drainage area sizes and are expected to behave more similarly since the amount of water going through them is more similar and they are closer together geographically. However, the MS site still had some high peaks during the two highest floods which indicates other sources of water that are influencing the hydrograph. The 26 v SF graph had high positive peaks during floods which indicates that there is a large source of water going through Site 26 that is not being accounted for in the SF hydrograph, probably from the Mainstem site (Figure 3).

Then I compared the reactions of Mainstem and Site 36 since they were a similar distance from Site 26 but with different drainage areas. The 26 v. 36 graph looked very similar to the 26 vs. SF graph which suggests that drainage area has a greater influence on the similarity of hydrographs than distance (Figure 3).

Finally I compared Site 36 and South Fork since they have similar drainage areas but are on different tributaries that both drain into Site 26. The two graphs were very similar in shape which further enforced the fact that drainage area was a primary driving factor. The two tributaries seem to be functioning relatively similarly despite differences in geographic location. The Site 36 hydrograph was more similar to Site 26 during baseflow conditions which makes sense since they are closer together in terms of streamwise distance and therefore the floodwave changes less between the two points (Figure 3).


When examining the patterns in these graphs, it is obvious that there is a portion of the hydrograph that is still missing. Most of the variability in peak flows is accounted for in the Mainstem hydrograph, but there are still minor peaks during the big floods. Little of the variability is accounted for in the Site 36 and South Fork hydrographs. Looking at the map, I hypothesize that North Fork Mill Creek, which is not instrumented, represents the differences in the hydrographs that are currently not being accounted for. It would be interesting to install a LevelLogger on this tributary in the future to see if it completes the picture.

This was an interesting analysis that allowed me to further explore my data and how my study basin reacts to precipitation events. It was useful in illustrating the influences of drainage area and stream connectivity on hydrographs in different parts of the basin, and pinpointing where there is still potentially missing data that could give a more complete picture of my watershed. It is still a pretty qualitative analysis in that I could not derive any real statistics from what is happening with my hydrographs, but it paints a clear picture of what’s going on and inspires further questions.

Tutorial 2: Automated Autocorrelation of Tsunami Inundation Time Series at Various Points of Interest

Filed under: Tutorial 2 @ 12:11 pm


Tsunami inundate shorelines in a chaotic manner, resulting in varying degrees of flooding from location to location. The spatial variability of tsunami hazards has important implications for the tsunami-resistant design of structures and evacuation plans because different safety margins may be required to achieve consistent reliability levels. Currently, tsunami mitigation strategies are primarily informed prediction models due to the lack of nearshore field data from tsunami disasters. Thus, to investigate the spatial variability of tsunami hazards, we analyze the relationship between different test locations or points of interest (POIs) in our domain. The question is:

“How much does tsunami inundation vary spatially?”

Essentially saying if one observes a certain flooding depth at their location, could they assume that a similar flood level is occurring elsewhere? And how does this change with time? And how to certain coastal features affect this?


To answer this question, I examined the autocorrelation of the time series of inundation at various locations as well as the difference in inundation levels between these locations. In this case, 12 locations were selected and thus the process to analyze all of these locations was automated. All of this was implemented in R.


As before, the tsunami inundation data is stored in a large NetCDF file. Thus we use the same procedure as mentioned in Tutorial 1 to read the NetCDF file into R. This process requires the “ncdf4” package to be installed.

For this analysis, we begin by selecting the points of interest along the coast near Port Hueneme, California. These points were arbitrarily selected but were also ensured to be somewhat well distributed along the coastline and were within the domain of the dataset. Figure 1 shows the locations of the points of interest (POIs) along the coast of Port Hueneme, CA. A total of 12 points were selected and were all meant to represent some building or port location. Their names and geographical coordinates were stored in a .csv file as shown in Figure 2.


Figure 1 – POIs where time series of inundation were extracted along coast near Port Hueneme, CA

Figure 2 – CSV file storing locations and their geographic coordinates

The .csv file was read into R by using the “read.csv” function and this table was subsequently converted to a data frame as shown by the code in Figure 3. Next, the “which.min” function from the “stats” package was used in a for loop to find the indexes. Next a loop is used to extract all the different time series and plot them accordingly. Figure 4 shows the time series of inundation for each of the locations stored in the CSV file. Note that the first 3 locations are in the ocean and the rest are land locations.

The time series data was stored in a separate data frame and then the “acf” function from the “stats” package in R was run in loop form to obtain the autocorrelation values. THis was followed by using another loop with a nested loop to take the difference in inundation value between each of the locations and the autocorrelation function was applied to these as well. The code for these procedures is shown in Figure 3.

Figure 3 – R code used to read in .csv file and procedures to perform autocorrelation analysis

Figure 4 – Time series of inundation for each location shown in Figure 1


Figure 5 shows the autocorrelation plot for each of the inundation time series shown in Figure 4. These plots show that there are indeed variations between the different locations. The autocorrelation plots also appear to vary more as the POIs get further away from one another.

Figure 5 – Autocorrelation plots of the inundation time series from Figure 4

Figure 6-8 show some examples of the autocorrelation plots of differences in inundation level between two locations. Figure 6 shows the autocorrelation of the difference in inundation level between Port Hueneme and all of the other locations. A placeholder image is shown in place of the difference between itself (which doesn’t exist). Figure 7 shows the same plot but for Del Monte Produce. And Figure 8 shows this for the naval base Navsea 1388.

There are 9 more of these plots that were created by this process, but in the interest of not inundating this report, only 3 are shown. Each of these figures shows rather interesting differences for the different locations.

Figure 6 – Autocorrelation plots between difference in inundation levels for all locations relative to Port Hueneme

Figure 7 – Autocorrelation plots between difference in inundation levels for all locations relative to Del Monte Fresh Produce

Figure 8 – Autocorrelation plots between difference in inundation levels for all locations relative to Navsea 1388

Critique of Method

The method provided a very convenient way of viewing the different locations and their associated autocorrelations with the touch of a button. In the end, POIs can be easily added or removed by modifying the CSV file and the source code should be able to adjust accordingly without any modification to the code. This feature has already been tested and shown to be quite robust for handling any number of locations with coordinates within the data domain. This shows that this procedure is efficient and effective. Overall, I was very pleased with the operational capability of this method.

Observing the autocorrelations for each of the  locations and for the inundation differences between the regions was quite interesting. From a glance, I could immediately tell that there were differences between the POIs. I think this information will indeed be quite useful, though my experience with interpreting these types of autocorrelation plots will require some additional thought.

May 21, 2017

Geographically Weighted Regression Analysis of Tropical Forest

Filed under: 2017,Tutorial 2 @ 10:10 pm
  1. Question that you asked
  • Is crown width linearly related to height in space in the study area?
  1. Name of the tool or approach that you used.
  • I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the sum of the tree height and the elevation (ground height).
  • GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.


  1. Brief description of steps you followed to complete the analysis.
  • The first step is load the dataset into the ArcMap. As reminder, this dataset is the same dataset I used in exercise 2. The dataset is generated from UAV visual images. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAV visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, height, and canopy width. The result of this data processing is a data set in a table consisting of 804 detected trees with their height and crown width. Unfortunately, because the images were not georeferenced yet, the height in this tutorial is the sum of tree height and elevation.
  • The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 1 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but in this exercise I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width.

Figure 1. Geographically Weighed Regression Tool.

  • The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis.
  • To finish the analysis just click “Ok”. We can see the whole result by looking at the attribute table (see figure 2). ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 3).

Figure 2. Attribute table of Geographically Weighed Regression result.


Figure 3. Distribution of Residual Standard Deviation in study area.


  • However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 4). We also can change the number of classes and color. In my case, I choose three classes to distinguish features (trees) that have positive linear relationship, negative linear relationship, and trees that have coefficient (linear relationship) close to zero. The layer after adjustment can be seen in the figure 5.

Figure 4. Layer properties:Symbology.



Figure 5. Distribution of coefficient of the explanatory variable.


  1. Brief description of results you obtained.
  • The result can be seen in Figure 2, 3, 4, and 5. The attribute table shown in the figure 2 consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height is positively related. Which means there is increase in the Height for every one unit increase of Crown width. In other words, the bigger the crown the higher the tree will be.
  • However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between Crown width and Height (Ground height + Tree height), and some other trees have negative linear relationship. The distribution of trees that have positive and negative linear relationship can be seen in Figure 5. The red points indicate trees that have negative linear relationship between Crown width and Height, which means trees with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means trees with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients are close to 0.
  • Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high trunk, and some other have big crown and short trunk. In addition, different elevation can give significant effect because the Height data used in this case is the total of ground height (elevation) and tree height. Trees with positive linear relationship might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley). Trees with coefficient close to zero might be occurred because of the data that has not been georeferenced and the algorithm used in Geographically Weighted Regression analysis that included data from the neighbor features (trees). That can affect the value of coefficient in linear relationship.
  • In general, the R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height. I also can add additional explanatory variable like “tree species” to increase the accuracy of the linear model.
  1. Critique of the method – what was useful, what was not?
  • The method was really useful to generate regression model for all feature (trees). It helps to understand the distribution of the trees with different coefficient or other values, such as standard error and residual standard deviation because there is a new layer as an output that can show those values in space in the study area.
  • However, dependent and explanatory variables should be numeric fields containing a variety of values. Linear regression methods, like GWR, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0). In addition, the regression model will be misspecified if it is missing a key explanatory variable, for example in my case is elevation and tree species.




Tutorial 2: Calculate Disturbance Severity Patch Sizes with SDMtools in R

Filed under: Tutorial 2 @ 4:29 pm


Often patch dynamics are examined under the lens of single disturbance events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration of patches from an initial disturbance may influences the spatial configuration of patches of a sequential disturbance.  The broader research question is focused on determining if there is a relationship between patches from an initial disturbance of beetle outbreak and a sequential disturbance of wildfire. Thus far, I have conducted analysis to determine if there is any inherent patchiness for the burn severity raster and the beetle-outbreak raster.  Both have indicated some level of patchiness. I wanted to go back and characterize the patches for each disturbance to generate summary statistics about patches for each disturbance.  The Entiako burn perimeter delineated our landscape for both disturbance events. The Entiako fire burned in 2012 in central interior British Columbia. I asked the following questions:

What is the variability in patch size for burn severity classes and beetle severity class?


The burn severity and beetle severity raster generated from Landsat imagery using dNBR and NDMI pixel indexing approaches respectively. Initially, I planned to conduct the analysis in FRAGSTATS that assesses landscape spatial patterns.  The raster seemed unreadable in the commonly used GEOTIFF format.  I ended up conducting the analysis in R version 3.3.2 (Team 2016) with the SDMtools Package (VanDerWal 2015).   The SDMtools package is based on the work by McGarigal et al. (2012) FRAGSTATS software package and calculates many of the same metrics. The following spatial equation was considered as a guide yi = f(yi-h,i+h).

In order to calculate patch metrics, I converted the continuous raster into a classified raster.  The burn severity classification was based on break points in the dNBR values defined by Reilly et al. (2017), and the beetle severity classification was based on cross-validation with orthorectified aerial photographs.  Burn classes included a low (plus unburned), moderate, and high, and beetle class included none, low-moderate, and moderate-high. In the SDMtools package, I calculate patch metrics for each disturbance severity class, burn severity class, Figure 1 and beetle severity class Figure 2.  The following PDF includes step-by-step instructions with r-code for generating patch metrics:



Figure 1: Burn Severity Classes for Patch Metric calculations.


Figure 2: Beetle Severity Classes for Patch Metric Calculations



In the SDMtools package, I ran two analyses, one on the whole landscape and one on each burn severity class. For the first analysis, I found that high burn severity was the smallest component for this fire event only accounting for about 15 percent of the landscape (see Table 1). This was interesting, because lodgepole pine is often characterized by ‘stand replacing’ high severity fire, however, moderate burn severity can also account for ‘stand replacing’ fire in lodgepole pine dominated landscapes, which was about 45 percent of the landscape (see Table 1). The landscape was dominated by low-moderate and moderate-high beetle severity with only a small portion unaffected by beetle outbreak (see Table 2).


Table 1: Burn severity class summary for patches, area, and proportion of landscape

Burn Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
Low 382 2767.14 0.394
Moderate 448 3179.79 0.452
High 422 1082.70 0.154


Table 2: Beetle severity class summary for patches, area, and proportion of landscape

Beetle Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
None 985 780.93 0.111
Low-Moderate 656 2968.56 0.423
Moderate-High 709 3273.30 0.466



The second analysis examined the patch metrics for each disturbance severity class. Table 3 and 4 includes patch area metrics by burn severity and beetle severity respectively. While the output includes many other variables, I wanted to look at the variability in patch size for disturbance type and class. The high burn severity maximum and mean patch size are the most interesting part of Table 3, because they are quite different from the low and moderate burn severity.  Further statistical analysis is needed to determine if they numbers are statistically different.  The summary of patch size metrics for both disturbance types indicate that there are many small patches for all class types.


Table 3: Burn severity classes patch size summary in hectares

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
Low 382 0.09 1449.45 7.24 0.18 77.7
Moderate 448 0.09 1461.69 7.10 0.18 75.3
High 422 0.09 445.95 2.57 0.18 22.5


Table 4: Beetle severity classes patch size summary in hectares

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
None 985 0.09 77.58 0.79 0.18 4.04
Low-Moderate 656 0.09 2380.86 4.53 0.18 93.07
Moderate-High 709 0.09 1461.60 4.62 0.18 57.68

Critique of the method – what was useful, what was not?

This analysis attempt was much more complicated than anticipate. The complications mostly came from the raster format.  The raster was formatted as floating point, which was not compatible with the stand alone Fragstats Software nor the patch grid extension in Arc GIS. The patch grid extension for Arc GIS is based on Fragstats and generates similar patch metrics (Rempel et al. 2012). I did not realize the issue was related to the  floating point raster until after I figured out how to run the SDMtools package in R (VanDerWal 2015). I did trouble shoot and figure out the raster issue, which lead to converting the raster from floating point to integer in R.  Additionally, a major challenge was figuring out how the patches are defined in SDMtools package. Fragstats and Patch Grid both offer an option to define the cell neighborhood rule as 4 or 8.  This option of defining the neighborhood cell rule was unclear to me in the SDMtools package. Based on a brief comparison of data outputs from Fragstats and SDMtools, I believe that patch identification tool in SDMtools classifies patches base on the 8 cell neighborhood rule. I think that calculating patch statistics in R is less cumbersome and allows for better documentation of methods. While the results of this analysis are interesting to consider, they do not address the overlap in disturbance events, which could be quite interesting.




McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Rempel, R. S., D. Kaukinen, and A. P. Carr. 2012. Patch Analyst and Patch Grid. Ontario Ministry of Natural Resources. Center for Northern Forest Ecosystem Research, Thunder Bay, Ontario.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

VanDerWal, J. 2015. Species Distribution Modelling Tools: Tools for processing data associated with species distribution modeling exercises.


April 5, 2017

Tutorial 2

Filed under: 2017,Tutorial 2 @ 8:15 pm

This is the blog page for Spring 2017 Tutorial 2 posts

© 2017 GEOG 566   Powered by WordPress MU    Hosted by