GEOG 566

         Advanced spatial statistics and GIScience

Archive for 2017

May 26, 2017

A space for time substitution on an impacted landscape

Filed under: Tutorial 2 @ 1:49 pm


Has the total daily energy flow through a functional group at a given elevation changed over the eighty-year interval between 1931 and 2011?


I am hoping to better understand the forces structuring small mammal communities, specifically rodents, across space and through time. In this regard, I am using trap effort-standardized abundance data to evaluate the response of species and functional groups to changes anthropogenic land use practices and climate warming. These data are spatially explicit, and although their spatial coordinates reflect the patterns of trapping, their elevational distribution captures patterns that are reflective of biological responses to a changing environment. Increasing elevation is inversely correlated with summer high temperatures and collinear with winter low temperatures, and positively correlated with increasing precipitation. There is also a trend of decreasing human impact due to land use with increasing elevation, and in the Toiyabe mountain range, human land use for cattle grazing and agriculture has also decreased over time. However, climate warming in the Great Basin has continued over the same time span and has shifted climatic bands upslope.

Species and functional groups respond to these change overtime in dynamic ways which may be recapitulated in their distribution along elevation/ climate gradients. This is referred to as the space for time hypothesis, and generally states that space, which captures the temporal gradient of interest, may be used to stand in for time; with the expectation that species distributions across space reflect their response to the variable of interest through time.

I am interested in the functional groups captured by diet preference, habitat affinity, and body size. Diet preference classes include herbivores, granivores, and omnivores, habitat affinity includes xeric adapted, mesic adapted, and generalist, and body size includes four size bins, 0-30g (size class 1), 40-80g (size class 2), 100-200g (size class 3), and 250-600g (size class 4). Any given species may be any combination of these three functional classes, although not all combinations are actually represented. Thus, patterns in the distribution of these functional groups also reflect tradeoffs that species face, and help to highlight species that may be at risk when functional categorizations become decoupled from environmental conditions. Thus, changes in the elevational distribution of functional groups over time provides key insights to species responses. However, total daily energy flow can also be estimated for these groups, because energy flow through a group is a function of both body size and abundance. This helps us to understand the interactions between abundance, body size, diet, and habitat affinity.

Thus, the question becomes, has the total daily energy flow through a functional group at a given elevation changed over the eighty years between the modern and historic; and based on the cross correlations between body size, diet, and habitat affinity, are their species or groups that appear especially at risk of local extirpation?

Analytical approach and brief description of steps followed to complete the analysis:

To perform this analysis, I used Cran-R to trap effort standardize abundance data for rodent species in the Toiyabe mountain range for both historical and modern data sets. The function in R is an iterative process which resamples all trap sites within the historic and modern data sets that have a trap effort greater than the site with the lowest number of trap nights. This process is automated using R, and generates an average species occurrence from 1,000 iterations. I then used excel to calculate the daily energy flow through each population based on body mass estimates and an equation which employs the principals of allometric scaling, Ei = aNiMib. Where Ei is energy flow (kJ/day), a and b are allometric parameters, specifically, b is the scaling parameter and equal to 0.75. N is the abundance estimate, in this case the trap effort standardized abundance, and M is the estimated body mass (Terry and Rowe 2015).

All trap localities were paired for the historic and modern time-periods, and traps site elevations were binned into three elevation ranges, bellow 5500ft, 5500-6500ft, and above 6500ft. Energy flow for each functional group was calculated and plotted based on the above stated elevation bins.

Brief description of results you obtained:

Total energy flow per day though the modern rodent community in the Toiyabe mountain range is lower than in the historic time period (Figure 1). This finding is consistent with findings by Terry and Rowe, 2015, that total energy through the rodent community in the Great Basin decreases sharply at the Holocene-Anthropocene boundary.

Using climate-elevation correlations we can substitute elevation for time in the Holocene to model the warming climate from the early to late Holocene in the Great Basin. Temperatures increase with decreasing elevation, and precipitation decreases with decreasing elevation. Historically, energy flow through xeric adapted species decreases with increasing elevation, while the opposite is true for mesic adapted species. Notably, energy flow through the historic mesic adapted species increases in a linear fashion with elevation, while in the modern energy flow through this group roughly plateaus at the middle elevation (Figure 2). While both modern and historical patterns are generally consistent with Holocene trends for these functional, neither captures the distinctive decline in energy flow through the xeric adapted group during the modern. This suggests that space for time comparisons may be valid for pre-human impact studies, but that they may not capture abrupt shifts in species responses during the Anthropocene.

Small bodied, omnivorous, habitat generalists demonstrate greatest energy flow at middle elevation in both the historical and modern time-periods, however, total energy flow through these groups are markedly decreased in the modern compared to the historic time periods at these elevations. While generalist demonstrate lower energy flow at the middle and high elevations for the modern compared to the historic, their low elevation values remain unchanged (Figure 2). This is consistent with omnivorous, habitat generalist through the Holocene and into the Anthropocene (Terry and Rowe 2015).

            Finally, herbivores demonstrate a nearly four-fold decrease in energy flow at the lowest elevations over the eighty-year period between the historic and modern time-periods (Figure 2). This is the only result that seems to strongly reflect the abrupt shift from the Holocene to the Anthropocene. This is a particularly surprising result as human land use was greater historically for this mountain range than it is today, and ground cover by herbaceous plants and grasses has increased over this time-period. This may suggest that increased temperatures at lower elevations have pushed herbivorous species upslope, indicating that these species may be more sensitive to climate. Conversely, energy flow thought granivores at low elevations increased from historic to the modern, also consistent with Terry and Rowe, 2015. This however, may be expected, as granivorous rodents in this region also tend to be xeric adapted, while herbivorous rodents tend to be mesic adapted.

            While inference based on this study is limited in scope due to the sample size of one (one mountain range), it does begin to make the case that space for time comparisons should not be assumed to hold as we move into a future where landscapes and climate are increasingly affected by human activity. By doing so, we may fail to capture the subtle ways in which certain ecological patterns are becoming decoupled, and thus, must be wary of drawing spurious conclusions based on old assumptions.

Figure 1. Total energy flow (kJ/day) for each elevational bin in the historic and modern time periods.

Figure 2. From left to right; Energy flow (kJ/day) through functional groups in the historic and modern time periods along the elevation gradient. From top to bottom, functional classifications: size class, diet preference, and habitat affinity.


Critique of the method – what was useful, what was not?

Given the nature of the data, this method of analysis was useful for me. I learned something about the data that was not obvious before my analysis, or comparison with the research at the greater temporal scale captured in the study by Terry and Rowe 2015.  While my inferences are limited, increasing the number of mountain ranges on which I perform this analysis will enable me to perform a number of more informative statistical analyses, and test the spatial gradient at both the elevational scale, and the latitudinal scale, as the three mountain ranges, for which similar data exists occur in a north to south array.


Terry, R.C. and R.J. Rowe. 2015. Energy flow and functional compensation in Great Basin small mammals under natural and anthropogenic environmental change. PNAS, 112(31):9656-9661.

May 24, 2017

Tutorial 2: Identifying clustering with a geographically weighted regression

Question You Asked

How much spatial clustering is present in the regression model of vegetation response to canopy cover? I am interested in determining if a single equation can predict the way that the two variables interact within east-side ponderosa pine forests, or if multiple equations are necessary.

Name of Tool or Approach You Used

To answer this question, I used the Geographically Weighted Regression tool in ArcMap.

Brief description of steps you followed to complete the analysis

In the GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat clustered across the landscape, I used an adaptive kernel with an AICc bandwidth method.

Brief description of results you obtained

The map of coefficients did reveal certain clusters of high coefficients and low coefficients across the landscape. However, judging by the GWR table below, this clustering may not be statistically significant. One anomaly of this assessment was the negative adjusted R2 value. A negative R2 means that the equation did not include a constant term.

Map of clustered plots

Neighbors 46
ResidualSquares 14750.95231
EffectiveNumber 3.80525057
Sigma 18.69738296
AICc 405.3014401
R2 0.048524493
R2Adjusted -0.01473284
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov

Table of original GWR results

To remedy the negative adjusted R2, I tried adding in another explanatory variable (elevation). This appeared to help the model, reducing the residual squares and bringing the adjusted R2 value back above 0.

Neighbors 46
ResidualSquares 13904.07263
EffectiveNumber 5.22764912
Sigma 18.46665082
AICc 405.8795404
R2 0.103150476
R2Adjusted 0.01015694
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov
Explanatory Field 2 Elevation

Table of remedied GWR

Critique of the method – what was useful, what was not?

This method was useful in that I could process the data in ArcMap, which is where I was already analyzing my points. It was also very helpful to visualize the coefficients in the map below. However, I am still a little unsure why the coefficient map shows such strong clustering while the output table does not show any significance.

Automating Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods

Filed under: Tutorial 2 @ 12:04 pm
  1. Question asked

As reported, I am handling more than 100 variables. In addition, my data is constantly being updated. So, I needed to be able to have a tool in order to create independent features for analyzing each of my variables. The question is: How can I automate Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods?

  1. Name of the tool or approach that was used.

I used ArcPy with Python for the automation and, ArcMap for output visualization. ArcPy is a site package that builds on (and is a successor to) the successful arcgisscripting module.

  1. A brief description of steps you followed to complete the analysis.

First, we have our input feature as shown in Figure 1. Notice the attribute table containing all our variables.

Next, we take advantage of the Model Builder in ArcMap in order to visualize our model using the tool “Feature Class to Feature Class”, as shown in Figure 2.


In addition, we export the resulting python code. The exported code is presented in Figure 3.


This python code gives us an idea of how the “Feature Class to Feature Class” tool is working. So we proceed to create our own function file as depicted in Figure 4.

Within our function file and, based on the previously exported function, we build our own function. Our function is presented in Figure   5.

Once we have our function within our function file, we can now execute our function creating a script file. The content of our created script file is shown in Figure 6.

  1. A brief description of results you obtained.

 The result of running our script file is shown in Figure 7. Here, we first read the variables of our input feature and, at the las row, the output field of the new feature is listed.

Finally, the newly created feature is seen in Figure 8. Here we can also see it corresponding attribute table. It is noticed that the created feature contains only the selected field.

  1. Critique of the method – what was useful, what was not?

  • Model Builder:

This method is useful for building and visualizing your model. And also has the capability to export a python script of your model.

  • ArcPy:

ArcPy is a very useful package that builds on the successful ArcGIS scripting module. It allows creating the cornerstone for a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. So, many variables can be analyzed without the need to access to ArcMap, saving an enormous amount of time.


ArcGIS (2017). What is ArcPy.

Tutorial 2:

Filed under: 2017,Tutorial 2 @ 11:59 am

Research Question

Building upon my question from exercise 2, my research question for exercise 3 seeks to answer how the covariance between SCF and EVI within a wildfire perimeter change over time following the wildfire.

Approach/tools used

For this exercise, I utilized a variety of software programs. While the overall goal of variogram creation required using R, ArcMap and Excel were used significantly in order to process the data into a usable format. As a result, much hopping between programs was required, which was undoubtedly quite tedious/frustrating at times but ultimately proved to be a useful learning experience.

Analysis procedure

To create a cross variogram in R using the gstat package, the process is much smoother using data in table format. As a result, one of the main challenges of creating a cross-variogram was converting my raster data into usable tabular data. To accomplish this, I first converted the SCF and EVI rasters into point shapefiles, then used the “Add XY Coordinates” tool in ArcMap to add two fields to the point shapefiles containing the X- and Y-coordinate information. Next, the attribute table from each point, shapefile was converted to a CSV table. The final step before stepping back into R was to combine all EVI and SCF data for each year into a single CSV table to ease the script writing in R.

After creating the combined EVI-SCF table, I noticed that a few of the columns had different numbers of records. It turns out that I used a slightly different extraction method for a few SCF layers – a product of conducting this analysis across multiple days and not taking diligent notes on the layers used. After correcting this issue, I read the cleaned table into R and created cross-variograms using the code snippet shown below.


My hypothesis for the years following the wildfire was to see the strongest covariance in the first year, then decreasing overall covariance in each year afterwards. However, the plots below tell a different story. There are a lot of things going on in the time-series of cross-variograms, but a couple of evident trends stick out to me. First, although each year shows two clear bumps in the cross-variogram, the lag distance of those bumps shift significantly from year to year. My guess would be that the initial shift from pre-fire to the first year post-fire was primarily due to the variation in burn severity across the landscape. The shifts in ensuing years may be due to varying rates of vegetation recovery or perhaps due to changes in snowpack distribution caused by the wildfire.

Another fascinating pattern is the change in sign of the covariance. Before the wildfire, the covariance is mostly negative except for a few thousand meters of lag distances. This suggests that between most pixels within the burn area, SCF and EVI vary negatively – if snowpack increases between two points, vegetation generally decreases between those two points. Following the wildfire, however, the covariance becomes more and more positive overall. By year 3 following the wildfire, at all distances the covariances are positive, but still showing a similar bumpiness as in the pre-burn cross-variogram. This outcome was a bit baffling, as I would expect the relationship to be strongest (most positive) in the first year post-fire, with a gradual retreat to the pre-fire state.

Another pattern of significance is the consistent cross-variogram valley that occurs at about 5000 meters in each year, even though the peaks on either side shift from year to year. This can be interpreted as the between-patch distance, and is perhaps the more important result of this method – that from year to year, even before the wildfire, the landscape shows similar distances between patches.

Figure 1: Pre-fire (2005) cross variogram between SCF and EVI

                     Figure 2: First year post-fire (2007) cross variogram between SCF and EVI

Figure 3: Second year post-fire (2008) cross variogram between SCF and EVI

Figure 4: Third year post-fire (2009) cross variogram between SCF and EVI

Method critique

By creating cross-variograms for each year following a wildfire and the pre-fire year, I was able to discern the patchiness of the EVI and SCF data as well as the shifts in the snowpack-vegetation relationship. The persistence of the bumpiness across years strengthens the evidence of two patch sizes, but the shift in bump locations results in a cloudier picture of exact patch size.

One disadvantage to this approach was the inability to determine the strength of the SCF-EVI relationship within a given year’s cross-variogram. After researching the interpretation of covariance, it became clear that covariance values can only tell you the sign of a relationship, but cannot lend any insight into the magnitude of the relationship because the data is not standardized. However, because I generated covariograms across multiple years, I was able to glean information regarding the relative shifts in covariance, which is ultimately the type of information I’m most interested in producing.

For the final project, I plan on generating additional covariogram for the next couple of year in order to observe how the covariance continues to change. I’m interested to see at what point the covariance plot starts to look like the pre-fire condition.

Tutorial 2: Nitrate Data in Relationship with Red Alder cover and Land Use

Filed under: 2017,Tutorial 2 @ 11:43 am


  1. Research Question 

Do the nitrate values that I collected in the field have a relationship with the percent of Red Alder cover in each watershed?

                Sub question: Do the nitrate values have a relationship with land use activities in each watershed tree?

  1. Name of the tool or approach that you used.

For this analysis I utilized ARC GIS and Microsoft Excel

  1. Brief description of steps you followed to complete the analysis.

First I utilized a package in Arc GIS called NetMap to delineate the 10 watersheds. This package uses GPS points to draw watershed boundaries based on elevations and flow paths.

Next I uploaded the 2011 red alder percent cover shapefile from the Oregon Spatial Library Database, and then clipped it to the watersheds.

By joining the attributes of these two shapefiles, I was able to obtain the total area for the watershed and the total area of Red Alder cover in the watershed.  Then I divided these two areas I was able to obtain percent cover for Alder and compare that to the nitrate concentrations at each site.

Table 1 Nitrate Concentrations and Percent Alder Cover

Watershed Area (m2) Alder (m2) Percent Alder Cover Average Nitrate
Cape Creek 4353700 3537.74 0.081% 1.88
Crooked Creek 31760400 3226.39 0.010% 1.35
Cummins Creek 21388800 222.89 0.001% 1.44
Gwynn Creek 2865000 5371.57 0.187% 1.61
Little Cummins Creek 3056600 361.35 0.012% 1.72
Yew Creek 9786600 969.92 0.010% 1.01
Kiser Creek 100500 1017.97 1.013% 0.23
Rock Creek 38375500 2359.26 0.006% 0.19
Middle Fork Creek 3294000 339.23 0.010% 0.17
Griffith Creek 4434800 2021.86 0.046% 0.13


Once this information was obtained, I utilized a scatterplot to identify a pattern.

Figure 1 Relationship between percent alder cover and nitrate concentration


Though the trendline indicates a negative correlation the data point at 1 percent may be an outlier. The watershed indicated by this point is small, which could inflate the true amount of alder cover in the region. With this in mind no distinctive pattern was observed for a relationship between alder cover and nitrate concentration in the ten watersheds. This information isn’t entirely surprising as the nitrate concentrations were relatively low to begin with.


Landuse Analysis 

Looking further, I utilized the 2011 NLCD data to evaluate land use in the 10 watersheds. With no usual suspects (agriculture, urban etc.) being particularly dominating in the watersheds it was difficult to link increased nitrate concentrations and land use at the field sites.  Most of the land cover was predominantly evergreen and mixed forest types which are natural for the areas sampled.


Figure 2 Coastal Watersheds Landuse


Figure 3 Coast Range Watersheds Landuse

  1. Brief description of results you obtained.

Overall nitrate concentrations over the five months in the ten watersheds are relatively low (~0.1-1ppm), and the results obtained were not particularly surprising.  There was no clear pattern linking red alder cover and nitrate concentrations in these watersheds (Table 1). Additionally, there were no surprise land use features in the watersheds, and thus no apparent link to nitrate concentrations in this analysis. Most of the land use in these watersheds are occupied by mixed and evergreen forests, neither are particularly linked to elevated nitrate concentrations. I would say this analysis was consistent with my expectations, starting with low nitrate concentrations, and suspecting most of the land use would be forest cover I predicted there likely not to be a strong link between land use and nitrate concentration. However, I expected slightly more red alder cover in the coastal watersheds and was surprised how little that cover related to the concentration of nitrate.


  1. Critique of the method – what was useful, what was not?

This method was very useful, I was able to delineate the watersheds, clip layers to identify potential sources of nitrate and utilize excel to further the analyze relationships. I used the join feature to gather information from two attibute tables to calculate the percentage of alder cover as well. I will use the skills learned and practiced in this exercise in the future to identify statistical relationships between spatially correlated (or not) data.


Spatial autocorrelation in NDVI for wheat fields and some characteristics

Filed under: 2017,Tutorial 2 @ 11:06 am

Tutorial 2 is based on tutorial 1 with a few alterations. 1) the variogram is standardized by dividing the semivariance by the standard deviation of that field, 2) only the wheat fields are used for the analysis so that the results are more easily compared, and 3) a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).


Part 1 and 2

Standardizing of variogram


where var is the Formal Class RasterVariogram. The variance column in this class can be accessed by @variogram$gamma. This column is replaced by the standardized variance, by dividing the existing column by the standard deviation. The standard deviation is calculated with the cellStats function. The input raster is ras, and ‘sd’ is the statistical option for standard deviation. na.rm=TRUE is a logical to remove NA values, and lastly, the asSample=TRUE means that the denominator for the standard deviation is n-1.

Doing the same analysis but now just focusing on one crop does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatter plot there is no pattern visible for the relationship between the sill, range, and biomass.

Part 3

For part 3 the range and sill values for all wheat fields at 5 different dates is determined. The correlation between range and sill for every date is determined. This Pearson correlation coefficient is then analyzed over time. It seems that there is an increase in correlation over time. There seems to be a high correlation with a 5%-significant p-value of 0.047.

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

Filed under: Tutorial 2 @ 1:09 am

Reva Gillman

Tutorial 2                                Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

  1. Question that I asked:

What are the geographic distributions of some of the ‘riskier’ species that arrived on Japanese Tsunami Marine Debris (JTMD)? What do their native and non-native ranges look like, and how do they compare with each other?

First of all, I had to choose 4 species with clear invasion history out of the 104 JTMD species, to focus on. Out of the 31 with invasion history, I chose Asterias amurensis (seastar) and Hemigrapsus sanguineus (Japanese shore crab) since they are known to be an issue in other regions. I then chose two species with large non-native spread: Crassostrea gigas (Japanese oyster), an economically important aquaculture species, and Teredo navalis, the shipworm with an almost global distribution that has burrowed in wooden ships all over the globe for 100’s of years.

  1. Name of the tool or approach that I used:

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species.

  1. Brief description of steps I followed to complete the analysis:

First I made an excel spreadsheet of the species data, by going through each ‘risky’ species distribution, converting regions to realms, and typing that up in the spreadsheet. Then, I did a join with Realm_Code which matched up with the attribute table from the shape file. However, I ran into issues with mapping this data, as it would only map the region that was native, or non-native, without the other regions represented at all, so I had to figure out another method.

My second attempt was to directly select regions of the shape file, and change the symbology by going into selection, and clicking on use with this symbol. This may have eventually worked, but was very hard to figure out which regions to select for each species, and there was not an intuitive way to save the selections as a new layer.

Finally, I found out how to go into Layer Properties, and manipulate the shape file from there:

a. First you left-click on the layer you want to manipulate, and select Properties from the bottom of the options.

b. Go into Symbology

c. Underneath the Categories options on the left, select unique values

d. From the value field, select the field that you are interested in, in my case that was Realm_Code

e. Select the Add all values button on the bottom, which will put all of the selected field options in the screen.

f. From there, you can select a field, and remove it, or add values as needed, which I did for each species

g. By double clicking on each field, you can select the appropriate color that you want to represent each category, you can change the outline of the symbol, the fill color, and texture. I did this to make the color of invasive realms one color, and native realms another color, and regions where the species was both native and non-native another color/texture.

h. Then, you can copy and paste the layer, to manipulate again for the species, so you don’t have to start from scratch.


4. Brief description of results you obtained.

Below, there are the 4 maps of each species distribution, to see the extent of their ranges.


Teredo navalis geographic distribution






Hemigrapsus sanguineus geographic distribution

Crassostrea gigas  Geographic Distribution


For clearer viewing of the above graphs, please download: Tutorial2

5. Critique of the method – what was useful, what was not?

This method can be tedious if you want to look at many different layers of maps. For me, it worked out because I chose to look at four different layers of species geographic distributions, but if this method was less tedious I might have looked at more. By going through and manually selecting each polygon, or realm, that I wanted to highlight, I had a lot of control over what the final map looked like, and I liked the level of detail you can find with symbology, outlines of polygons, fill color, and texture of fill. The final maps are straightforward, and easy to save each layer, copy, and paste the next layer so you don’t have to start from scratch each time.

May 23, 2017

Tutorial 2: Extracting raster values to points in Arc for regression analysis in R

Filed under: Tutorial 2 @ 10:38 pm

Research Question

As I have described in my previous blog posts, I am investigating the spatial relationship between blue whales and oceanographic variables in the South Taranaki Bight region of New Zealand. In this tutorial, I expand on the approach I took for tutorial 2 to quantify whether blue whale group size can be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration values at each sighting location. To accomplish this I used ArcMap to align different streams of spatial data and extract the relevant information, and then used R to perform statistical analyses. Specifically, I asked the following question:

“How well do remotely sensed chlorophyll-a (chl-a) concentration and sea surface temperature (SST) predict blue whale group size? “


I downloaded raster layers for chl-a concentration and SST, and used the the “extract values to points” tool in Arc to obtain chl-a and SST values for each blue whale sighting location. I then exported the attribute table from Arc and read it into R. In R, I performed both a general linear model (GLM) and a generalized additive model (GAM) to investigate how well chl-a and SST predict blue whale group size.


I downloaded both the chl-a and SST raster files from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website ( The raster cell values are averaged over a 4 km2 spatial area and a one-month time period, and for my purposes I selected the month of February 2017 as that is when the study took place.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from both the chl-a concentration and SST rasters for each of the blue whale sighting data points. The resulting contained the location of each sighting, and the number of whales sighted (group size), the chl-a concentration, and the SST for each location. All of this information is contained in the attribute table for the layer file (Fig. 1), and so I exported the attribute table as a .csv file so that it could be read in other programs besides Arc.

Figure 1. Attribute table for blue whale sightings layer, which contains blue whale group size, chl-a concentration, and SST for each sighting location.

In R, I read the in the data frame containing the information from the attribute table. In an exploration of the data, I plotted chl-a against SST to get a sense of whether there is spatial autocorrelation between the two. Additionally, I plotted group size against chl-a and SST separately to visualize any relationship.  Subsequently, I ran both a GLM and a GAM with group size as the response variable and chl-a concentration and SST as predictor variables. While a GLM fits a linear regression model, a GAM is more flexible and allows for non-linearity in the regression model. Since I am in the exploratory phase of my data analysis, I tried both approaches. The script I used to do this is copied below:


There appeared to be some relationship between SST and chl-a concentration (Fig. 2), which is what I expected considering that colder waters are generally more productive. There was no clear visual relationship between group size and chl-a or SST (Fig. 3, Fig. 4).

Figure 2. A scatter plot of chlorophyll-a concentration vs. sea surface temperature at each blue whale sighting location.

Figure 3. A scatter plot of chlorophyll-a concentration vs. blue whale group size at each blue whale sighting location.

Figure 4. A scatter plot of sea surface temperature vs. blue whale group size at each blue whale sighting location.

The result of the GLM was that there was no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size. The result of the GAM was very similar to the result of the GLM in that it also showed no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size, and the predictor variables could only explain 4.59% of the deviance in the response variable.

Critique of the Method

Overall, the method used here was useful, and it was good practice to step out of Arc and code in R. I found no significant results, and I think this may be partially due to the way that my data are currently organized. In this exercise, I extracted chl-a concentration and SST values only to the points where blue whales were sighted, which means I was working with presence-only data. I think that it might be more informative to extract values for other points along the ship’s track where we did not see whales, so that I can include values for chl-a and SST for locations where the group size is zero as well.

Additionally, something that warrants further consideration is the fact that the spatial relationship between blue whales and these environmental variables of interest likely is not linear, and so therefore might not be captured in these statistical methods. Moving forward, I will first try to include absence data in both the GLM and GAM, and then I will explore what other modeling approaches I might take to explain the spatial distribution of blue whales in relation to their environment.

Tutorial 2: Using kernel density probability surfaces to assess species interactions

Filed under: Tutorial 2 @ 4:48 pm

Overview: question clarification and approach

Continuing my exploration of changes in cougar predation patterns in time periods with and without wolves, I wanted to expand my evaluation of spatial repulsion and/or shifts to examine how cougar kill site distributions related to wolf activity. I identified several elements in my data sets, besides the presence of wolves, which could produce a shift in kill density or distribution including: 1) catch-per-unit effort discrepancies (i.e. larger sample sizes of cougars (and kill sites) in one data set), or 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). I accounted for catch-per-unit effort issues in tutorial 1, but need to account for seasonal variation in prey distribution as part of my analysis of wolf influence on cougar kill distribution. Density features continued to be a good tool to explore my presence only data (kill events). Studies of sympatric wolves and cougar have shown differences in the attributes of sites where each carnivore makes kills (slope, elevation, physiographic features), but I was interested in how cougar might be positioning themselves (e.g. where they are hunting and making kills) on the landscape relative to centers of wolf activity. Therefore, the next step in understanding my spatial problem was to determine if there were differences in when and where cougar were killing prey by asking:

Does wolf activity account for the distribution differences between pre- and post-wolf cougar kill sites?

For the purposes of this analysis the data would be “location, implicit” with the variable of interest (cougar kill sites) having a covariate (proximity or distance to wolf activity) measurable at each sampled location and the causal mechanism inferred from co-variation. Central areas of wolf activity could be identified from wolf kill density, estimated on a seasonal basis, and related to the distribution of cougar kill sites. A probability surface could be created from wolf kill density and used to calculate the proportion of pre- and post-wolf cougar kill sites within probability features as a proxy for the distribution of wolf activity. This relationship could then be compared as a joint proportion or as an expected/observed relationship across time periods. Alternatively, the isopleth polygon feature could be used to calculate a “distance-to-edge” feature relating each cougar kill to potentially increasing levels of wolf contact. This relationship could be evaluated between time periods and across seasons through ANOVA or regression.

My approach to this question was to relate the proportion of both pre- and post-wolf cougar kill sites (points) to wolf activity using the probability contours (polygon feature) of wolf kill site distribution (density raster) as a proxy for activity. I used several tools in ArcGIS, GME, and R to carry out this analysis. Geospatial Modeling Environment (GME) is a standalone application that makes use of ArcGIS shape files and R software to carry out spatial and quantitative analyses. It was created to take advantage of R computing and replaces the ‘Animal Movement’ plugin many movement ecologists made use of in previous versions of ArcGIS. GME allows a lot of flexibility in tasks related to dealing with large data sets and common analyses performed on animal location data (KDE, MCP, movement paths). Because it can operate as a GUI or command driven operator using R, it is user friendly and allows for iterative processes, quick manipulations of data, and combinations of these process not easily duplicated in Arc.



Prior to evaluating influences from wolf activity, it was necessary to address potential effects from seasonal shifts in prey distribution or density related to ungulate migration patterns. Therefore, the first step for this analysis was to subset kill data into winter (1 Nov – 30 Apr) and summer (1 May – 31 Oct) seasons. I used the ‘select by attribute’ feature in ArcGIS to subset each kill data set (wolf, cougar (pre-wolf), and cougar (post-wolf)) into season-specific shapefiles.


Figure 1. ArcGIS ‘select by attribute’ call from attribute table to subset data into season.

Similar to the procedure for tutorial 1, I used GME to create density and isopleth probability contours for wolf kill sites and the ‘countpntsinpolys’ command to add columns with the number of cougar kills (pre/post) to each seasonal wolf kill isopleth polygon. Finally, I used ArcMap to visualize the data and make a figure showcasing the process.


Figure 2. GME ‘kde’ call in the command builder GUI.



Visual examination of season-specific cougar kills relative to wolf kill activity centers added further evidence toward a spatial shift in the distribution of cougar kills between time periods with and without wolves (Figure 3). The magnitude of the repulsive effect appeared stronger in winter than summer, evident in the uniformly decreased proportions of cougar kills per probability contour of wolf activity, and increased proportion of kills outside any wolf activity between time periods with and without wolves (Table 1). The relationship appeared opposite in summer, with higher proportions of cougar kills observed in the post-wolf period than expected in all but core (25% probability surface) and outside (no wolf activity) areas (Table 2).


Figure 3. Visual comparison of seasonal wolf activity based on kill site density probability contours and the distribution of cougar kill sites across time periods with and without wolves.

Table 1. Winter wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Table 2. Summer wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Method Critique

Accounting for seasonal shifts in kill density improved visual interpretation of spatial patterns and reduced a factor that might mask influences of wolf activity. However, the observed shift in winter could still be due to shifts in prey populations as a behaviorally mediated prey response to wolves (i.e. the shift is related to prey responding to wolves, and cougar indirectly shifting distribution to follow prey). The shift could also be related to differences in prey species-specific distributions, as cougar prey predominantly on deer, but show seasonal selection of neonate elk calves in the summer, and wolves prey predominately on elk (i.e. the abundance of newborn calves in the summer creates spatial tolerance between wolves and cougar). The comparison of pre- to post-wolf cougar kill sites relative to the probability of wolf kills may be misleading, since the relationship is based on pre-wolf cougar kill data overlapping a density feature that didn’t exist (i.e. no wolves on the landscape at the time those kills were made). However, the comparison does provide some insight as the pre-wolf cougar kill distribution still represents a ‘prior’ distribution of cougar kills, and overlapping wolf activity demonstrates the proportion of kills we could expect if there were no competition (spatial avoidance/repulsion) between the two species. Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution. The method was useful and provide evidence toward the presence of a season-specific shift in where cougars are killing prey (i.e. the answer to my ‘question asked’ is yes, in winter), but the regression analysis discussed above may provide additional support.


Tutorial 2

Filed under: 2017 @ 3:50 pm

For exercise 3, I finally left the world of Arc and took my data to R.

Question/Objective: Test the radar backscatter between different-facing aspects and determine if aspect influences radar signal.

I used R to run the statistical analysis, but still needed to extract the data using Arc.

I extracted the data from ArcGIS by creating 1000 random points througout my scene and running the “extract data to points” tool to pull out the backscatter and aspect at each point. Then I exported the table of point data to Excel to clean up any missing values that might throw off R. There were 4 values in total that were missing either the backscatter or aspect value. Then I saved this table as a .csv and brought it into R.

Once in R, the data needed a little bit of cleaning up. Since the default aspects from Arc range from -1 (flat) and 0 (north) to 360 (still north, all the way around the compass), using just the numeric value to classify the points would not work well. I used the following lines of code to change the aspects to factors:

NEWDATA$ASPECT <- cut(DATA$ASPECT,breaks=c(-1,0,22.5,67.5,112.5,157.5,202.5,247.5,292.5,337.5,360),labels = c(“FLAT”,”N”,”NE”,”E”,”SE”,”S”,”SW”,”W”,”NW”,”No”),include.lowest = TRUE)

levels(NEWDATA$ASPECT) <-c(“FLAT”,”N”,”NE”,”E”,”SE”,”S”,”SW”,”W”,”NW”,”N”)

The second line combines the two “north” aspects into one (i.e. 0-22.5 degrees and 337-360 degrees).

The results were promising. I expected to find no difference between aspects, and the scatterplots showed that, at least upon visual inspection, the aspects showed no difference between each other.

A boxplot also showed little difference in the average Gamma0 values (the backscatter).

Next step was to run a statistical analysis. Since the data were not normally distributed, I had to do something other than an ANOVA, so after some internet research I went with a Kruskal-Wallis test. This tests for difference in population distributions (i.e. if they are identical or not) when the distributions are not normal. The code is simple enough:

kruskal.test(GAMMA~ASPECT, data=DATA)

I also conducted an ANOVA on the log-transformed Gamma0 data, which appeared more normal than before. None of the results were significant from any test, meaning that the aspect does not have a statistically significant effect on the backscatter, which is the result I was hoping for.

This method was useful for my dataset. I had a rather large dataset to work with in Arc, and didn’t find the statistical tools that I needed in Arc either. I am comfortable in R, and prefer it for larger datasets since it doesn’t have the bells and whistles that Arc does.

Tutorial 2: Investigate Spatial Autocorrelation (Mantel Test)

Filed under: 2017 @ 1:30 pm

For this Tutorial I revisited a principle that I learned about in the Spatio-Temporal Statistics (Variation) statistics course that I took with Dr. Jones last fall. Autocorrelation is an important component to address within ecological work, as well as geographic work. Plots that are closer to each other may have a greater probability of capturing the same species during fieldwork and it is important to know if my data are influenced by distance and spatial relationships. Examining auto-correlation within my data may also reveal information about spatial patterns related to patch size, or gradients at my field site (Salmon River Estuary).

Research Question:

My research question fits the framework “how is yi related to xi + h? I am investigating spatial autocorrelation for my field survey plots and species assemblages represented by those plots. I want to know if I can predict or expect the presence of a plant species in one plot, based on its proximity (distance) to another plot. The presence of spatial autocorrelation in my plots would imply some sort of dependency or relatedness between plots, i.e. the closer two plots are located to each other, the more likely they are to overlap in terms of species present. Spatial autocorrelation can provide context for interpreting scale of patterns and causal variables related to spatial vegetation patterns (likely elevation, salinity, land use history).

My Approach (Tools for Analysis):

PC-Ord was very helpful in structuring my data for the Mantel test. I was able to run the test several times with different combinations of data to detect patterns of auto-correlation. Again, I formatted my data in Excel and used the Excel sheets to produce matrices for the Mantel text. Each matrix pair had species data, and UTM coordinate data.

Steps to Complete Analysis:

I organized my data in excel, started a new project in PC-Ord, assigned species and location matrices, and ran a Mantel test (Group > Mantel test). I used Sorensen (ranked) distance measures for my percent cover data, and Euclidean distance for my coordinate data. I set the test to complete 999 randomized runs for a Monte Carlo comparison to test significance. I also used my species data to conduct an Indicator Analysis, to identify species that are more common, faithful, or exclusive to particular marsh that I surveyed. I directed PC-Ord to conduct an ISA, and grouped by data by tidal marsh surveyed. The ISA produced a number for each species surveyed that indicated the percentage of plots that species was found in, and whether or not it was significant for that species. From the indicator analysis, I found that certain upland marsh species are indicative of remnant sites, certain species are indicative of salty conditions, and other species are indicative of restored sites.


examples of matrix inputs from Excel to PC-Ord. The data were structured as matricies with sample units (plots) x species cover, or sample units x UTM coordinates.

Results and Outcomes:

I conducted a Mantel test on all of my data and found that none of my plots were spatially autocorrelated (Mantel’s r statistic Transect: r = 0.037797, p = 0.182182; MW plot: r = 0.027994, p = 0.164164, accept null hypothesis of no relationship). This is a little surprising, but may be indicative of noise within my dataset, and variation of species at this scale. It is possible that autocorrelation is not detectable at this scale, or perhaps I need a larger dataset with less proportional noise to sort out the autocorrelation signal.

For my ISA test, I found that (1) Triglochin maritima, Sarcocornia perennis, and Distichlis spicata were significant indicators of salty, restored conditions, (2) Dechampsia caespitosa, Juncus arcticus var. littoralis, Potentilla pacifica, Glaux maritima, and Hordeum brachyantherum were significant indicators of upland, high elevation conditions, and (3) Carex lyngbyei was a significant indicator of restored conditions. I broke my dataset up and conducted a mantel test for each of the groups, using only plots that recorded the presence of at least one species in each of the groups. I did not find any significant autocorrelation either with any of the strong indicator species (that were all found in a high proportion of plots surveyed). I am curious if my plots were located closer to each other, and/or I had surveyed more plots over a larger area, spatial autocorrelation patterns would begin to emerge.

an example of PC-Ord output for a Mantel test. The R statistic implies the amount of correlation between species cover and distance, the p value implies the significance of the correlation/similarity in comparison to chance (computed by Monte Carlo randomizations).

Critique of the method – what was useful, what was not?:

This method was manageable to set up and execute, however I feel that it was difficult to negotiate useful outcomes with my data. It is highly plausible that my data do not capture these patterns, however, it was somewhat difficult for PC-Ord to work with my data on this problem. I have many zeroes in my dataset, as not all species are represented in every single plot (I have a grid of plot number X species, with cells filled in for percent cover ranging from 0 – 100). I was not able to complete mantel tests with certain assemblage combinations, as PC-Ord could not compute a randomization outcome with the number of zeroes present (random shuffling of the dataset produced too many identical outputs because I have too many zeros). Ultimately, I think I would approach this problem again with a larger dataset over a greater extent of my study site.


May 22, 2017

Tutorial 2: Site Suitability

Filed under: Tutorial 2 @ 2:43 pm


For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests?

Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.

I performed site suitability/selection for this study based on the following criteria:

  • Need to identify 4 sites:
    • each site would have 4 treatments
      • (a North and South, open meadow and closed forest)
    • road access (< 300m from road)
    • sites at similar elevations
      • similar forest types
      • similar proximity to non-forest
      • each site has the same proximity to water
      • area would be of similar size
      • created by natural processes, not a clear-cut.


I used many tools to determine site suitability for the field experiment:

Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute


Map 1. Habitat types in Deschutes NF.

I started with a habitat type layer (Map 1) and DEM layer obtained from the Forest Service of Deschutes National Forest.

First, I needed a map showing aspect. To do this, I used the Aspect tool in the Spatial Analyst toolbox which used my DEM layer to create an aspect map of the area as shown in Map 2.

Once I had my aspect layer, I needed to somehow identify areas within the habitat layer that were meadows. I used the Extract by Attribute tool to select only habitat that was considered ‘meadow’ since this was what I needed. Once I had the habitat type layer to only show meadows I then used the tool Raster to Point to generate points within meadows to allow me to store future information as seen in Map 3.

From there, I inserted the roads layer, also obtained from the NFS. Roads were needed because my sites could only be <300 meters from the nearest road. Therefore, I used the Near tool to generate the distance from points (meadows) to nearest road. I also did this with my raster layer of waterbodies to make sure sites were in similar proximity to water, as that could be influencing reptile diversity.

Once the points (meadows) had information regarding proximity to roads and water, I then used the Extract Values to Points tool to obtain elevation data for each point within a meadow, because they have to be at similar elevations.

At this point, I had most of my information I needed regarding similarity between possible meadows. Next, I reclassified my aspect layer to only show north and south aspects (as seen in Map 4) because those were the only aspects I needed to make sure were present within each meadow. From there, I just opened the Attribute Table for my meadow points and used the Select by Attribute option to tell ArcMap to only show sites that were <300 m from road, and <1800 m elevation (see Map 5).  Once I narrowed down the possible sites, I manually looked at every possible meadow to see if north and south aspect was present within the open meadow and also within the bordering forest.

Results: All of this resulted in 4 suitable sites that I will use for one of my research projects, determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 6.

Critique: The way I did this worked, but I feel like there may be an easier way. This method took me a few weeks, and a lot of frustrations working with many types of files. I had never done site suitability before, and this proved to be harder than it seemed.

Map 2. Aspect created with ArcMap.




Are meadows significant biodiversity hotspots for reptiles compared to forests?

Tools/Methods: For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

Critique: For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made.

y = f(x)

> glm(type~diversity)

Call:  glm(formula = type ~ diversity)


(Intercept)    diversity

0.006173     0.098765

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual

Null Deviance:     2

Residual Deviance: 0.4198     AIC: 5.123

Map 3. Raster to Points

Map 4. Reclassified aspects. North and south only.

Map 5. Select by attributes. Meadows <300m from road and <1800m elevation.

Map 6. Results. Meadow sites being used for study.

Tutorial 2: Examining Hydrograph Relationships

Filed under: 2017,Tutorial 2 @ 2:12 pm


How does upstream-downstream connectivity, drainage area, and position within the catchment affect the relationship between different stage gage sites?


I chose to look at the relationships between the hydrographs for four sites that have upstream-downstream connectivity but have different upstream drainage areas and are located on different tributaries. I used ArcGIS to obtain the drainage areas and Excel to analyze the hydrographs. The four sites are as follows: Site 26 is the site with the largest drainage area, on Mainstem Mill Creek. The “Mainstem” site is also located on the mainstem, with a slightly smaller drainage area. Site 36 and South Fork are located on different tributaries, Gunn Creek and South Fork Mill Creek respectively, but have approximately the same drainage area. This drainage area is significantly smaller than either Site 26 or Mainstem (Figure 1).


First I used NetMap’s Watershed Delineation tool to create upstream drainage area polygons for each of the four sites I was interested in, using the attribute table to obtain the total area of each polygon (Table 1).

Then I graphed the difference in stage heights over time between Site 26, the largest site, and each of the other three sites to examine patterns in the hydrographs, creating three “stage difference” graphs. I removed any periods of missing data (especially a two-week period in February at the Mainstem site). I also graphed the raw hydrograph data for comparison (Figure 2).

Finally I combined the three stage difference graphs into one plot for easy comparison.


First I compared the hydrograph patterns of MS and SF since they are on the same tributary. Both graphs showed little difference with Site 26 during baseflow conditions. The 26 v SF graph had high positive peaks during floods which indicates that there is a large source of water going through Site 26 that is not being accounted for in the SF hydrograph. The 26 v MS graph was relatively flat which makes sense since Site 26 and MS have more similar drainage area sizes and are expected to behave more similarly since the amount of water going through them is more similar and they are closer together geographically. However, the MS site still had some high peaks during the two highest floods which indicates other sources of water (Figure 3).

Then I compared the reactions of MS and 36 since they were a similar distance from 26 but with different drainage areas. The graph looked very similar to the MS. vs. SF graph which suggests that drainage area is one of the driving factors in the shape of the hydrograph (Figure 3).

Finally I compared 36 and SF since they have similar drainage areas but are on different tributaries that both drain into Site 26. The two graphs were very similar in shape which further enforced the fact that drainage area was a driving factor. The two tributaries seem to be functioning relatively similarly despite differences in geographic location. 36 was closer to 26 during baseflow conditions which makes sense since they are closer together in terms of streamwise distance and therefore the floodwave changes less between the two points (Figure 3).


When examining the patterns in these graphs, it is obvious that there is a portion of the hydrograph that is still missing. Most of the variability in peak flows is accounted for in the Mainstem hydrograph, but there are still minor peaks during the big floods. Little of the variability is accounted for in the Site 36 and South Fork hydrographs. Looking at the map, I hypothesize that North Fork Mill Creek, which is not instrumented, represents the differences in the hydrographs that are currently not being accounted for. It would be interesting to install a LevelLogger on this tributary in the future to see if it completes the picture.

This was an interesting analysis that allowed me to further explore my data and how my study basin reacts to precipitation events. It was useful in illustrating the influences of drainage area and stream connectivity on hydrographs in different parts of the basin, and pinpointing where there is still potentially missing data that could give a more complete picture of my watershed. It is still a pretty qualitative analysis in that I could not derive any real statistics from what is happening with my hydrographs, but it paints a clear picture of what’s going on and inspires further questions.

Tutorial 2: Automated Autocorrelation of Tsunami Inundation Time Series at Various Points of Interest

Filed under: Tutorial 2 @ 12:11 pm


Tsunami inundate shorelines in a chaotic manner, resulting in varying degrees of flooding from location to location. The spatial variability of tsunami hazards has important implications for the tsunami-resistant design of structures and evacuation plans because different safety margins may be required to achieve consistent reliability levels. Currently, tsunami mitigation strategies are primarily informed prediction models due to the lack of nearshore field data from tsunami disasters. Thus, to investigate the spatial variability of tsunami hazards, we analyze the relationship between different test locations or points of interest (POIs) in our domain. The question is:

“How much does tsunami inundation vary spatially?”

Essentially saying if one observes a certain flooding depth at their location, could they assume that a similar flood level is occurring elsewhere? And how does this change with time? And how to certain coastal features affect this?


To answer this question, I examined the autocorrelation of the time series of inundation at various locations as well as the difference in inundation levels between these locations. In this case, 12 locations were selected and thus the process to analyze all of these locations was automated. All of this was implemented in R.


As before, the tsunami inundation data is stored in a large NetCDF file. Thus we use the same procedure as mentioned in Tutorial 1 to read the NetCDF file into R. This process requires the “ncdf4” package to be installed.

For this analysis, we begin by selecting the points of interest along the coast near Port Hueneme, California. These points were arbitrarily selected but were also ensured to be somewhat well distributed along the coastline and were within the domain of the dataset. Figure 1 shows the locations of the points of interest (POIs) along the coast of Port Hueneme, CA. A total of 12 points were selected and were all meant to represent some building or port location. Their names and geographical coordinates were stored in a .csv file as shown in Figure 2.


Figure 1 – POIs where time series of inundation were extracted along coast near Port Hueneme, CA

Figure 2 – CSV file storing locations and their geographic coordinates

The .csv file was read into R by using the “read.csv” function and this table was subsequently converted to a data frame as shown by the code in Figure 3. Next, the “which.min” function from the “stats” package was used in a for loop to find the indexes. Next a loop is used to extract all the different time series and plot them accordingly. Figure 4 shows the time series of inundation for each of the locations stored in the CSV file. Note that the first 3 locations are in the ocean and the rest are land locations.

The time series data was stored in a separate data frame and then the “acf” function from the “stats” package in R was run in loop form to obtain the autocorrelation values. THis was followed by using another loop with a nested loop to take the difference in inundation value between each of the locations and the autocorrelation function was applied to these as well. The code for these procedures is shown in Figure 3.

Figure 3 – R code used to read in .csv file and procedures to perform autocorrelation analysis

Figure 4 – Time series of inundation for each location shown in Figure 1


Figure 5 shows the autocorrelation plot for each of the inundation time series shown in Figure 4. These plots show that there are indeed variations between the different locations. The autocorrelation plots also appear to vary more as the POIs get further away from one another.

Figure 5 – Autocorrelation plots of the inundation time series from Figure 4

Figure 6-8 show some examples of the autocorrelation plots of differences in inundation level between two locations. Figure 6 shows the autocorrelation of the difference in inundation level between Port Hueneme and all of the other locations. A placeholder image is shown in place of the difference between itself (which doesn’t exist). Figure 7 shows the same plot but for Del Monte Produce. And Figure 8 shows this for the naval base Navsea 1388.

There are 9 more of these plots that were created by this process, but in the interest of not inundating this report, only 3 are shown. Each of these figures shows rather interesting differences for the different locations.

Figure 6 – Autocorrelation plots between difference in inundation levels for all locations relative to Port Hueneme

Figure 7 – Autocorrelation plots between difference in inundation levels for all locations relative to Del Monte Fresh Produce

Figure 8 – Autocorrelation plots between difference in inundation levels for all locations relative to Navsea 1388

Critique of Method

The method provided a very convenient way of viewing the different locations and their associated autocorrelations with the touch of a button. In the end, POIs can be easily added or removed by modifying the CSV file and the source code should be able to adjust accordingly without any modification to the code. This feature has already been tested and shown to be quite robust for handling any number of locations with coordinates within the data domain. This shows that this procedure is efficient and effective. Overall, I was very pleased with the operational capability of this method.

Observing the autocorrelations for each of the  locations and for the inundation differences between the regions was quite interesting. From a glance, I could immediately tell that there were differences between the POIs. I think this information will indeed be quite useful, though my experience with interpreting these types of autocorrelation plots will require some additional thought.

May 21, 2017

Geographically Weighted Regression (GWR) in R

Filed under: 2017,handy links @ 10:18 pm

I found an interesting link about Geographically Weighted Regression (GWR) analysis in R. I think it might be useful.

Geographically Weighted Regression Analysis of Tropical Forest

Filed under: 2017,Tutorial 2 @ 10:10 pm
  1. Question that you asked
  • Is crown width linearly related to height in space in the study area?
  1. Name of the tool or approach that you used.
  • I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the total of the tree height and the elevation (ground height).
  • GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth are dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.


  1. Brief description of steps you followed to complete the analysis.
  • The first step, load the dataset into the ArcMap. As a reminder, this dataset is the same dataset I used in exercise 2. The dataset is generated from UAV visual images. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAV visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, height, and canopy width. The result of this data processing is a data set in a table consisting of 804 detected trees with their height and crown width. Unfortunately, because the images were not georeferenced yet, the height in this tutorial is the sum of tree height and elevation.
  • The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 1 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but in this exercise I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width.

Figure 1. Geographically Weighed Regression Tool.

  • The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis.
  • To finish the analysis just click “Ok”. We can see the whole result by looking at the attribute table (see figure 2). ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 3).

Figure 2. Attribute table of Geographically Weighed Regression result.


Figure 3. Distribution of Residual Standard Deviation in study area.


  • However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 4). We also can change the number of classes and color. In my case, I choose two classes to distinguish features that are positively and negatively related. The layer after adjustment can be seen in the figure 5.

Figure 4. Layer properties:Symbology.



Figure 5. Distribution of coefficient of the explanatory variable.


  1. Brief description of results you obtained.
  • The result can be seen in Figure 2, 3, 4, and 5. The attribute table shown in the figure 2 consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height is positively related. Which means there is increase in the Height for every one unit increase of Crown width. In other words, the bigger the crown the higher the tree will be.
  • However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between Crown width and Height, and some other trees have negative linear relationship between Crown width and Height (Ground height + Tree height). The distribution of trees that have positive and negative linear relationship can be seen in Figure 5. The red points indicate trees that have negative linear relationship between Crown width and Height, which means tree with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means tree with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients are close to 0.
  • Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high height, and some other have big crown and low height. Different elevation can give significant effect because the Height data used in this case is the total of ground height (elevation) and tree height. Trees with positive linear relationship might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley).
  • The R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height.
  1. Critique of the method – what was useful, what was not?
  • The method is really useful to generate regression model for all feature (in my case, more than 800 trees). It helps to understand the distribution of the trees with different coefficient or other values, such as standard error, residual standard deviation, etc. because there is a new layer as an output that can show those values in space in the study area.
  • However, dependent and explanatory variables should be numeric fields containing a variety of values. In addition, linear regression methods, like GWR, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0). The regression model will be misspecified if it is missing a key explanatory variable.

Tutorial 2: Calculate Burn Severity Patch Sizes with SDMtools in R

Filed under: Tutorial 2 @ 4:29 pm


Often patch dynamics are examined under the lens of single disturbance events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration of patches from an initial disturbance may influences the spatial configuration of patches of a sequential disturbance.  The broader research question is focused on determining if there is a relationship between patches from an initial disturbance of beetle outbreak and a sequential disturbance of wildfire. Thus far, I have conducted analysis to determine if there is any inherent patchiness for the burn severity raster and the beetle-outbreak raster using semi-variograms and cross-variograms.  Both have indicated some level of patchiness. I wanted to go back and characterize the patches in the burn severity map to understand the distribution of patch size across the burn mosaic for the Entiako fire that burned in 2012 in central interior British Columbia. I asked the following questions:

What is the variability in patch size for burn severity classes?


The burn severity raster generated for Tutorial 1 was utilized for this exercise. Initially, I planned to conduct the analysis in FRAGSTATS, a standalone software package, which assesses landscape spatial patterns.  The raster seemed unreadable in the commonly used GEOTIFF format.  I ended up conducting the analysis in R version 3.3.2 (Team 2016) with the SDMtools Package (VanDerWal 2015).   The SDMtools package is based on the work by McGarigal et al. (2012) FRAGSTATS software package and calculates many of the same metrics. The following spatial equation was considered as a guide:

In order to calculate patch metrics, I converted the continuous raster into a classified raster.  The classification was based on break points values defined by Reilly et al. (2017) for this exercise.  Classes included a low, moderate, and high burn severity.  The low class also included unburned areas.  In the SDMtools package, I calculate patch metrics for the burn severity classes across the landscape and patch metrics for each individual burn severity class, Figure 1. The following PDF includes step-by-step instructions with r-code for generating patch metrics:



Figure 1: Burn Severity Classes for Patch Metric calculations.


In the SDMtools package, I ran two analyses, one on the whole landscape and one on each burn severity class. For the first analysis, I found that high burn severity was the smallest component for this fire event only accounting for about 15 percent of the landscape. This was interesting, because lodgepole pine is often characterized by ‘stand replacing’ high severity fire, however, moderate burn severity can also account for ‘stand replacing’ fire in lodgepole pine dominated landscapes, which was about 45 percent of the landscape.  Table 1 also includes the number of patches and total area in hectares for each burn class.


Table 1: Descriptive statistics for burn severity classes at the landscape scale.

Burn Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
Low 382 2767.14 0.394
Moderate 448 3179.79 0.452
High 422 1082.70 0.154


The second analysis examined the patch metrics for each burn severity class. Table 2 includes patch area metrics by burn severity class.  While the output includes many other variables, I wanted to look at the variability in patch size for burn classes. The high burn severity maximum and mean patch size are the most interesting part of Table 2, because they are quite different from the low and moderate burn severity.  Further statistical analysis is needed to determine if they numbers are statistically different.


Table 2: Patch Area Metrics for burn severity classes

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
Low 382 0.09 1449.45 7.24 0.18 77.7
Moderate 448 0.09 1461.69 7.10 0.18 75.3
High 422 0.09 445.95 2.57 0.18 22.5


Critique of the method – what was useful, what was not?

This analysis attempt was much more complicated than anticipate. The complications mostly came from the raster format.  The raster was formatted as floating point, which was not compatible with the stand alone Fragstats Software nor the patch grid extension in Arc GIS. The patch grid extension for Arc GIS is based on Fragstats and generates similar patch metrics (Rempel et al. 2012). I did not realize the issue was related to the  floating point raster until after I figured out how to run the SDMtools package in R (VanDerWal 2015), which generates similar outputs to Fragstats, and is based on the work of McGarigal et al. (2012). I did trouble shoot and figure out the raster issue, which lead to converting the raster from floating point to integer in R.  Additionally, a major challenge was figuring out how the patches are defined in SDMtools package. Fragstats and Patch Grid both offer an option to define the cell neighborhood rule as 4 or 8.  This option of defining the neighborhood cell rule was unclear to me in the SDMtools package. Based on a brief comparison of data outputs from Fragstats and SDMtools, I believe that patch identification tool in SDMtools classifies patches base on the 8 cell neighborhood rule. Lastly, I think that calculating patch statistics in R is less cumbersome and allows for better documentation of methods.



McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Rempel, R. S., D. Kaukinen, and A. P. Carr. 2012. Patch Analyst and Patch Grid. Ontario Ministry of Natural Resources. Center for Northern Forest Ecosystem Research, Thunder Bay, Ontario.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

VanDerWal, J. 2015. Species Distribution Modelling Tools: Tools for processing data associated with species distribution modeling exercises.


May 8, 2017

Tutorial 1: Nitrate Concentration Differences: Between Sites and Over Time

Filed under: 2017,Tutorial 1 @ 10:26 am

Research Question: 

Do the differences in nitrate concentrations indicate a pattern between site locations, and do they show a pattern over time.

Tools and Approach: 

I performed the analysis largely in excel, though ARC GIS mapping software was useful in spatial proximity analysis. I was able to produce a chart and representative scatter plot of the nitrate concentration differences over time between various sites. The sites are naturally grouped/divided into three groups based on spatial location. The first is the Rock Creek basin (and the example used in this tutorial) additionally I analyzed Mary’s Peak streams and finally the Coastal streams.


First, I sorted my concentration data that I had produced for exercise one (Table 1). Table 1 represents total nitrate concentrations in ppm for all sites. Next I subtracted the concentration data between streams within varying proximity (Table 2). The second table represents the differences in concentration values, a negative value correlates with a higher nitrate concentration in the stream that was being subtracted. This analysis allowed me t
o find concentration of the nitrate data between points (stream sites) and over time.


Rock Creek Basin:I found higher differences (indicating higher nitrate concentrations) in the mainstem rivers than the tributaries. Additionally, I saw higher concentrations of nitrate in the spring samples as compared to the winter samples. I     expected higher values for nitrate in the mainstem, such as rock creek due to the concentration of water coming from upstream. Naturally the smaller tributaries would exhibit lower concentrations than the larger order stream that catches the two. I also expected nitrate differences and concentrations to decrease in the spring time. If the inputs of nitrogen to the system remain relatively stable, then the uptake length of nitrate (the most sought after in stream nutrient cycling) would shorten and cause the concentrations of nitrate sampled to decrease.









Critique of Method:

This method was useful as it made me think about my data set in a different way. Using the differences in concentrations over time, I was able to think about two variables (Y) difference in concentration, and stream site location to assess why each site exhibited the data it did. I was able to determine that upstream site locations typically have lower concentrations of nitrate then the downstream sites. I also was able to think about how nitrate is available as the seasons change from winter to spring. At first I wasn’t able to understand why nitrate concentrations could decrease in the winter months, but a combination
of lower flows and increase macroinvertebrate activity are potential players in a decrease in concentration.


May 7, 2017

Tutorial 1: Visual data analysis

Filed under: Tutorial 1 @ 11:30 pm

Reva Gillman

Geog 566

  1. Research question: What is the geographic distribution of the native regions of the Japanese Tsunami Marine Debris (JTMD) species? Which regions are the most prevalent for JTMD species to be native to, and which are least prevalent?
  2. Approach that I used: visually represent native region frequency of JTMD species within 12 different coastal geographic regions of the world.
  3. I was able to get a shape file of the Marine Ecoregions of the World (MEOW), which are geographic coastal regions that match with my JTMD species data, into ArcMap, and categorize the 12 realms with different colors. After that was complete, I made an excel sheet of the species totals per region, which had a realm column of the same attribute of the shape file. I was then able to do a ‘join’ in ArcMap, to align the species data with the geographic shape file. Then I played around with data intervals for the species sum categorizations for the legend, to arrive with the following map of the JTMD native regions:

realms (1)


Marine Ecoregions of the World (MEOW)

  1. Results: The most prevalent region that JTMD species are native to is the Temperate Northern Pacific. This seems obvious, as JTMD originated from Japan, so we expect most species to be native to that same region of the globe. Next most prevalent native region for JTMD species is Eastern Indo-Pacific, the region southeast of Japan. However, after that the native regions that are prevalent for JTMD species begin to span the globe: Temperate Northern Atlantic, and Tropical Eastern Pacific. At the other end is the least prevalent region: the Southern Ocean, only one JTMD species is native to this cold southern region.
  2. Critique of the method – what was useful, what was not? This is a largely visual interpretation of the data, without using statistical analyses of the data. However, it is very useful to be able to visualize the native regions spatially on a map of the geographic regions, and be able to see the different species sum categories according to the legend, to determine high and low frequencies of occurrence. It is a good start, and good beginning before I further analyze other aspects of the JTMD species spatial dimensions.


Spatial autocorrelation in NDVI for agricultural fields vs the average biomass of that field

Filed under: Tutorial 1 @ 4:29 pm


  1. Question that you asked…

Looking at the spatial autocorrelation of NDVI within a field and the average wet biomass of that field, is there a correlation? When the NDVI distribution is patchy, is the biomass lower? Or maybe higher? Or is there no relation at all.

  1. Name of the tool or approach that you used…

To assess the patchiness of the NDVI I used a variogram. The variogram shows the characteristics of spatial autocorrelation. There are three parameters of interest that can be read from a variogram graph; a) the range, which gives an indication of the lag in the autocorrelation. b) the sill, which represents the variance of the variable at the range lag. And, c) the nugget. This should be zero, but due to errors in the data this can vary slightly. The variograms were used for two analysis. First, we visually compare multiple variograms from different fields categorized for biomass. And second, we analytically compare the variograms with scatterplots between i) range and biomass, ii) sill and biomass, and iii) range and sill with biomass color scheme. For the scatter plots the correlation coefficients are determined with Pearson’s R and a p-value.


  1. Brief description of steps you followed to complete the analysis…

The variograms are created in R with the ‘usdm’ package.

Step1. Create a list of all the geotiff files in a folder

Step2. Create for every file a variogram.

var<- Variogram(raster,size=50)

Make sure to use a capital V in Variogram.

Step3. Plot the variograms with points and lines

Plot(var)                              # points

Plot(var,type=’l’)               # lines

Step4. Plot all the variograms on one graph

Par(new=TRUE)                 # this line makes sure all graphs are in the same figure


Step5. Include legend with field number and average biomass

Step6. For every variogram estimate the range and the sill and put it in an excel sheet.

Step7. Create scatterplots for i) range-biomass, ii) sill-biomass, and iii) range-sill color coded for biomass. Over here I switched to Matlab, because I’m more familiar with this. But R could do the trick as well.


Step8. Calculate Pearson’s correlation coefficient and p-value.

[R,p]=corrcoef(range,biomass)                   % repeat this for the other scatter plots.

  1. Brief description of results you obtained…

In the variogram plot, Figure 1, we can see that there is a high inter field variability in spatial auto correlation for NDVI. It is difficult to tell from the graph if there is a correlation between biomass and variogram type. Also, there is a difference in crop type between the field, which has a considerable influence on the biomass. For further analysis, a distinction between crop types should be made.



Figure 1: Variograms for the different fields with average biomass


Also in the scatter plots the relation between biomass and the variogram parameters in not apparent. The Pearson’s correlation coefficients are very low, between 0.049 and 0.158 with significant p-values (for a 5% level).

Figure 2: Scatter plots for Range vs Biomass and Sill vs Biomass

Figure 3: Scatter plot for Range vs Sill with biomass color coded


  1. Critique of the method – what was useful, what was not?…

This method does not show a correlation between variogram parameters and average biomass for the fields, however it is possible that a distinction in crop type would improve the results. The biggest gap in this method is the subjective estimation of the range and sill values for the variograms. In the future, a variogram can be fitted and the range and sill parameter can be automatically generated. However, also for this variogram fitting decisions need to be made, such as fitting type, which could possibly introduce subjectivity.

May 6, 2017

Forest Structure Assessment Using Variogram in R

Filed under: 2017,Tutorial 1 @ 8:16 am

Question and tool

The questions that I try to answer in tutorial 1 is how the height and crown width vary in space in the study area, whether those variables are autocorrelated. I use variogram analysis in RStudio to answer these questions.

The variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993). The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. Most variograms are defined through several parameters; namely, the nugget effect, sill, and range.

Figure 1. The example of variogram graph

Figure 1. The example of variogram graph

  • Nugget effect represents micro-scale variation or measurement error. It is estimated from the empirical variogram as the value of y(h) for h = 0.
  • Sill representing the variance of the random field.
  • Range is the distance (if any) at which data are no longer autocorrelated.

Data Analysis and Result

The data used in this tutorial is generated from UAV visual images. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAV visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, height, and canopy width. The result of this data processing is a data set in a table consisting of 804 detected trees with their height and crown width. Unfortunately, because the images were not georeferenced yet, the height in this tutorial is the sum of tree height and elevation.

Figure 2. Detected trees by Fusion software.

In Rstudio, I use the R codes that are stated below to do the variogram analysis. First, I need to install the “gstat” package and load the package using the following R codes:

> install.packages(“gstat”)

> library(gstat)

Next, to load my data into the RStudio, I have to set my working directory. It can be done through the “session” toolbar. The first variogram analysis is for crown width and space (x and y). The R codes used for this first variogram analysis can be seen below. Based on the graph, in general,, crown width is not autocorrelated although there are some patches with different nugget, sill, and range. In the first part of the graph, there is patch with 25 range (distance) which is relatively larger than the other range. I think it can be happened due to different size of crown width in this particular study area. It seems like there are some trees with big crown and some other with smaller crown that are clustered.

> Treelist1<-read.table(“D:/Oregon State University/Term/Spring 2017/GEOG 566/Exercises/Exercise 2/treelist1.csv”, sep=”,”, header=T)

> g1 <- gstat(id = “”, formula =, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g1))

Figure 3. Variogram graph of crown width

The next variogram analysis is for height and location (x and y). The R codes and graph are shown below. From the graph, it can be seen that the semivariance is incremental with the increase of range (distance). It means that the height is autocorrelated. I think that can happen because of the height data which is the sum of tree height and elevation. Since the elevation is included in the height data, the elevation or the topography of the study area will influence the variance and the variogram analysis.

> g2 <- gstat(id = “Height”, formula = Height~1, locations = ~X+Y, +            data = Treelist1)> plot(variogram(g2))

> plot(variogram(g2))

Figure 4. Variogram graph of height.

My next variogram analysis is the cross variogram analysis. The R codes and graph are given below. It appears that the semivariance of height and crown width is negatively related, which means there is decrease in semivariance when the distance is increased. Cross-variogram values can increase or decrease with distance h depending on the correlation between variable A and variable B. Semi-variogram values, on the other hand, are by definition positive. One thing that is confusing is the negative value of semivariance. In my understanding, the semivariance should have positive value. Maybe there is mistake in my R codes that causes this confusing result.

> g3 <- gstat(g1, id = “Height”, formula = Height~1, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g3))

Figure 5. Cross variogram graph of crown width and height.

Critique of the method

As explained in the data analysis and result, variogram analysis is useful to assess how the forest structure parameters, such as crown width and height vary in space in the study area, and whether they are autocorrelated or not. In addition, for me, it is faster to do variogram in R using “gstat” package especially when you have data set in table format. I tried using my raster data for variogram analysis in R, but R crashed and there was not enough memory. Therefore, I prefer to use “gstat” and data set in table for variogram analysis. However, variogram analysis in R has limitation in term of visual presentation. It is quite hard to identify which part of the study area that has higher variability (larger patch) since the result is in graph format and not in map.

May 5, 2017

Tutorial 1: Use of the Optimized Hot Spot Analysis for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods

Filed under: Tutorial 1 @ 11:52 pm
  • 1. Question Asked

    How socio-demographic and spatial variables explain patterns of survey responses about perceptions and attitudes regarding flood-safety in neighborhoods?

    Location of survey responses and variable values (variable: “perception of flooding reaching your home”) are shown in Figure 1.1.

    Figure 1.1. Location of survey responses and variable values (variable: “perception of flooding reaching your home”)

    2. Name of the tool or approach used

    Various tools are used to accomplish the objective of setting up inputs for the optimized hot spot analysis. The motivation of optimizing this analysis is because I was not able to identify any clear spatial pattern (I was getting trivial results) on my initial hot spot analysis without optimizing it.

Four tools are used to accomplish our objective:

  • For finding out the “Analysis Field” for optimizing calculations
    • Generate Near table in Arc Map


  • For missing data:
    • Replace all missing values in numerical variables with the mean using R or/and
    • Replace all missing values in categorical variables with the median using R


  • For the spatial analysis I am using:
    • Optimized Hot Spot Analysis in ArcMap


  • For creating a prediction surface map using results from the Hot Spot Analysis:
    • Kriging in ArcMap

3. A brief description of steps followed to complete the analysis

I followed the next steps:

  • For finding out an “Analysis Field” for optimizing calculations

Find distances from neighborhood lot properties to near streams

  • Use Near features tool in ArcMap. For “Input Feature” use the lot property shape file. For “Near Features” use the streams shape file.

  • In the output table, Identify “IN_FID”, which is the ID of the input feature(lot property ID)

  • Join the output table to the input features table based on “OBJECTID” of the input feature and “IN_FID” of the output table.

  • Now you have the distances in the “Input Feature” table. Next, export the table and copy variables of interest as one more value of the Survey dataset.
  • For missing data:

I have used a very simple concept: Replace all missing values in numerical variables with the mean and/or, replace all missing values in categorical variables with the median. The task for analyzing 108 variables was coded in R. The coding steps, showing only for some variables, is as follows:

  • For the spatial analysis:

Now we are ready to perform the hot spot analysis using the “Optimized Hot Spot Analysis” tool of ArcMap. As you can see in the figure below, there is an “Analysis Field (optional)” field, which in our case is quite useful.

For my problem:

Input Features: categorical survey responses with its corresponding location.

Analysis field (optional): corresponds to the properties distance to the water streams. This variable was calculated in 3.1. This is a continuous variable.

  • For creating a prediction surface map:

For this purpose, I have used results from the Hot Spot Analysis in the “Kriging” tool of ArcMap.

For my problem:

Input point features: Feature results from the Hot Spot Analysis.

Z value field: corresponds to the “GiZ scores” results from the Hot Spot Analysis.

4. Brief description of obtained result

The following results correspond to one variable of the survey responses (the survey contains 104 categorical variables to be analyzed).

Three scenarios of spatial correlation have been analyzed.

 Scenario 1: Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of all streams (Willamette River, Marys River, Millrace) surrounding near the neighborhood. The results are shown in Figure 4.1

Figure 4.1. Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of all streams (Willamette River, Marys River, Millrace) surrounding near the neighborhood.

The hot spot analysis yields patterns formation correlated to the distance of all streams (Willamette River, Marys River, Millrace) surrounding near the neighborhood.

Scenario 2: Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of two major rivers (Willamette River and Marys River) surrounding near the neighborhood. The results are shown in Figure 4.2

Figure 4.2. Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of two major rivers (Willamette River and Marys River) surrounding near the neighborhood.

The hot spot analysis yields patterns formation correlated to the distance of two major rivers (Willamette River and Marys River) surrounding near the neighborhood.

Scenario 3: Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of a seasonal stream (Millrace) crossing the neighborhood. The results are shown in Figure 4.3

Figure 4.3. Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of a seasonal stream (Millrace) crossing the neighborhood.

The hot spot analysis yields patterns formation correlated to the distance of a seasonal stream (Millrace) crossing the neighborhood.

5. Critique of the method

  • Generate Near table:

This method is useful for finding the shortest distance between two features. This information was quite useful for my analysis.

  • Replace missing data:

Missing data was one of the issues I encountered analyzing my data. I have first tried to remove missing values using the “VIM” package in R, which uses multiple imputation methodologies, but this method didn’t work out in my case. I was getting messages of problems trying to invert matrices. This problem can be attributed to the small range of values of the categorical variables vectors. So, I have used a very simple concept: Replace all missing values in numerical variables with the mean and/or, replace all missing values in categorical variables with the median (IBM Knowledge, Center  n.d.). This helped me a lot.

  • Optimized Hot Spot Analysis:

For my problem, found more useful the “Optimized Hot spot Analysis” tool rather than the “Hot Spot Analysis (Getis-Ord Gi*)” tool because the “Analysis field” allowed me to find and optimize clusters formation in my data.

  • Kriging:

This ArcMap tool allowed me mapping cluster formation based on the hot spot analysis outputs. This tool allow better visualization of spatial patches.


Getis, A., & Ord, J. K. (2010). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189–206.

IBM Knowledge Center – Estimation Methods for Replacing Missing Values. (n.d.). Retrieved May 8, 2017, from


Tutorial 1: Calculation, Graphing, and Analysis of Correlation Coefficients in Excel

Filed under: 2017,Tutorial 1 @ 2:27 pm

Question: How well correlated are stage gage data at different locations within a single basin, based on the amount of distance between them?

Approach: To explore this question, I used the Correlation function in Excel to do a pairwise correlation analysis of my six sets of stage gage data, and then correlated the results with the distance between each pair of instruments, measured in ArcGIS.


Calculate correlation coefficients in Excel

  1. Import stage gage arrays into Excel with one column for each site. Measurements arrays won’t exactly match so make sure that the first value in each array was recorded within 5-10 minutes of the other first values.
  2. Create pairwise correlation table – blank out one half to prevent redundancy in calculations. Correlating a site with itself will always result in a correlation coefficient of 1 so blank out the diagonal as well.
  3. Calculate correlation coefficient for each pair of sites using CORREL function in Excel: =CORREL(array1,array2) in the corresponding cell (Table 1).

Table 1. Pairwise correlation table filled in with example values.

Calculate distance between sites in ArcGIS

  1. Import stream layer and site points into ArcGIS (Figure 1).

Figure 1. Map with imported stream layer (blue line) and site coordinates (yellow and green points). Stage gage locations are indicated with red stars.

  1. Site locations will probably not all be exactly on top of stream polyline. Snap site points to stream polyline using Editor Toolbar.
    1. Make sure Snapping is on in the Snapping Toolbar.
    2. Right click on site point layer and choose “Edit Features”, then “Start Editing”.
    3. Use the Edit tool to select the site you want to snap.
    4. Drag the point perpendicular (the shortest distance) from its current location toward the stream until it snaps to it.
    5. Repeat for all other sites that are not already snapped to streamline.
    6. “Save Edits” and “Stop Editing”.
  1. Split stream polyline at intersections with site points for easier measuring.
    1. Start Edit session.
    2. Use Split Tool to click at each intersection of a site point and a stream polyline. This will split the line in two at the place where you click.
    3. “Save Edits” and “Stop Editing”.
  2. Create another pairwise table in Excel to input distance data.
  3. Calculate distance between points.
    1. Start an Edit session.
    2. Highlight all portions of stream between one pair of points using Edit tool. Hold down shift key to select multiple portions at once (Figure 2).

Figure 2. Example selection of stream polyline sections between two stage gage sites. Selected lines are in light blue.

  1. Open stream polyline attribute table and right click on “Length.”
  2. Use Statistics function to calculate total length (Sum) (Figure 3).

Figure 3. Attribute table with selected stream polyline sections and statistics summary.

  1. Input total length value into Excel table.
  2. Repeat step 5 for all pairs of points.
    1. Use “Clear Selection” to clear previously selected points before starting process again.

Make correlation graphs

  1. Make x-column with pairwise distances and y-column with corresponding correlation coefficients. Include a third column with the pair for clarity (Table 2).

Table 2. Distance and correlation table.

  1. Graph x vs y and examine the relationship.

Figure 4. Plot of relationship between streamwise distance and correlation coefficient of all possible pairs of six stage gage sites.

Results: When I started this analysis, I expected to see a clear negative relationship between distance between instruments and correlation coefficient. I did not obtain this result when I performed the analysis, however. The plot that I made showed a slight negative relationship between distance and correlation coefficient. Upon further reflection, this relationship actually makes sense for the basin and is actually the result that I hoped to see. This relationship shows that regardless of location in the basin, the water levels are responding relatively uniformly, with only a slight delay between sites that are spaced further apart.

The points representing the Site 9 vs Site 36 and Site 9 vs Mainstem correlation coefficients could be considered outliers with their unusually low values. This could be because the two sites are located on different tributaries and are two of the most widely spaced apart pairs.

Utility: This method was useful for illustrating the relationships between different stage gages within my study basin. The instruments are located on different tributaries so the relationships are slightly more complicated than I expected but I think I will still be able to apply what I have learned from this analysis to extrapolate to other parts of the basin where I do not have stage data.

Tutorial 1: Raster auto correlation in R

Filed under: Tutorial 1 @ 1:47 pm

A Quick Primer

My spatial question is how the radar backscatter (measured by the Sentinel-1 satellite) relates to features on the ground. The backscatter is influenced by slope, aspect, forest characteristics (tree height, density, moisture content etc.)

The question I was asking for the first excercise was whether or not my backscatter was autocorrelating. If it is, then I will need to reconvene on how to approach my question (spoiler alert: the data are not autocorrelated).

I tried a few approaches, but the one that worked best was using R code and the raster tool in R (not a default tool but easy enough to download). However, before I could do that I had to reduce my dataset to something that was manageable.

The variogram is a useful tool to measure autocorrelation. The process boils down to this:

The value of interest at each pixel (or point) is compared to the values of interest at every other pixel and the similarity between the values is examined spatially. If there is autocorrelation in the data, then pixel values will be more similar to each other in a spatially coherent way (another way of putting it: the values of a given pixel can be predicted by the values of pixels that are nearby). Spatially clumped data are an example of autocorrelated data, whereas uniform or random data typically do not display autocorrelation.

The backscatter data collected by the satellite should not display autocorrelation, unless the underlying features were spatially autocorrelated. Since I was looking at data for a hilly, forested area, it is expected that the radar data should not be autocorrelated.

Getting to the Good Stuff: Methods

First, I had to import my data into ArcMap. This was after pre-processing the radar data into a spatially and radiometrically corrected raster with pixel dimensions of 10 m x 10 m. I attempted to run a variogram on this raster in Arc, but it crashed. So I attempted to run a variogram on just a subset of this raster, and it crashed again. Finally, I developed a variogram within Arc on just a subset of my data. Unfortunately, there were still so many points that the variogram looked more like the shower scene from Psycho.

On to R, where I brought in my original raster to run a variogram. This crashed the whole computer this time, so I decided a small subset would be better to work with. Using the same subset as before, I brought in the GeoTiff file into R (using the ‘raster’ package), making sure that the data was not corrupted (previously I had tried to export the file from Arc, but it gave each pixel an 8-bit RGB value rather than the float decimal value of backscatter).

Above, the raster as it appeared in R. The extent of the raster was much larger than the raster itself, which I attribute to the export procedures in ArcMap. I made a histogram in R to view the values of the backscatter raster (values were in digital number dNBR):

Using the ‘usdm’ package in R, creating and viewing a variogram was as simple as:

var2 = Variogram(Gamma0)

The resulting plot showed little correlation beyond the immediate lag distance, and a rapid leveling off.

This corresponds to the data being relatively non-correlated, which is precisely what I was hoping for.


ArcMap is wholly unfit for processing raster autocorrelation beyond a certain resolution. I’m not sure what that resolution is, but the data I was working with was too much for Arc to handle. It was even too much for R to handle, which is why I had to use a subset of my data and hope that it was a representative sample of the data at large. In comparing other projects within the class, it seems like autocorrelation within Arc using variograms works when you have limited data; but when the data is composed of 100,000 individual points (or pixels) it is far too much for ArcMap. I would recommend anyone using high resolution data to stay away from the variogram, and use a different autocorrelation measure. This is probably due to the process that variograms use (i.e. comparing every point to every other point) that is susceptible to a rapid inflation of required computational time.

May 4, 2017

Tutorial 1: Using a Geographically Weighted Regression to assess the spatial relationship between blue whales and chlorophyll-a

Filed under: 2017,Tutorial 1 @ 4:41 pm

Research Question

The goal of my spatial problem is to assess the relationship between blue whales in the South Taranaki Bight region of New Zealand and the environmental factors which define their habitat and influence their spatial distribution. Chlorophyll-a (chl-a) concentration is an indicator of productivity in the marine environment. Chl-a can be remotely-sensed, and the concentration reflects the abundance of phytoplankton, the tiny photosynthesizing organisms which form the base of the marine food web. Blue whales do not directly feed on phytoplankton. However, krill, which feed on aggregations of phytoplankton, are the main prey type for blue whales. Remotely-sensed chl-a can therefore be used as a proxy for the location of productive waters where I might expect to find blue whales. For this exercise, I asked the following question:

“What is the spatial relationship between blue whale group size and chlorophyll-a concentration during the 2017 survey?”


I downloaded a chl-a concentration raster layer and used the “extract values to points” tool in Arc in order to obtain a chl-a value for each blue whale group size value. I then used the geographically weighted regression (GWR) tool from Arc’s spatial statistics toolbox in order to investigate the spatial relationship between the whales and the concentration of chl-a.


The chl-a concentration layer was downloaded from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website. MODIS data can be accessed here, and available layers include numerous sources of remotely-sensed data besides chl-a concentration (Figure 1). Chl-a concentration values are averaged over a 4 km2 spatial area and a one-month time period. I downloaded the raster for February 2017, as our survey lasted for three weeks during the month of February.

Figure 1. NASA Moderate Resolution Imaging Spectrometer (MODIS) data sources, including chlorophyll-a concentration which is used in this tutorial.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from the chl-a concentration raster for each of the blue whale sighting data points. This resulted in a layer for blue whale sightings which contained the location of each sighting, the number of whales sighted at each location (group size), and the chl-a concentration for each location (Figure 2).

Figure 2. Blue whale sighting locations and group sizes overlaid with chlorophyll-a concentration.

The geographically weighted regression tool is found within the spatial statistics toolbox in Arc. The dependent variable I used in my analysis was blue whale group size, and the explanatory variable was chl-a concentration. I used a fixed kernel, and a bandwidth parameter of 20 km (Figure 3). Functionally, what this means is that the regression looks at point values that are within 20 km of one another, and then fits a linear relationship across the entire study area.

Figure 3. The geographically weighted regression tool in Arc’s spatial statistics toolbox.


The results of the GWR are shown graphically in the figure 4. The GWR fits a linear equation to the data, and the values which are plotted spatially on the map are coded according to their deviation from their predicted values. Many of the points are > 0.5 standard deviations from their predicted values. One point, which appears in red in figure 4, is > 2.5 standard deviations above its predicted value, meaning that blue whale group size at that location was much higher than expected given the chl-a concentration at that same location.

Figure 4. Result of the geographically weighted regression (GWR). Color codes represent the deviation from the expected values for blue whale group size according to the chl-a concentration value at that location.

The attribute table from the GWR output layer shows all of the components of the equation fitted by the GWR, as well as the predicted values for the dependent variable according to the fitted linear equation (Figure 5). It appears that there are several points which are far from their predicted values according to the GWR, such as the row highlighted in figure 5.

Figure 5. The attribute table associated with the geographically weighted regression (GWR) output layer. The highlighted row is an example of where the observed value was dramatically different from the predicted value.

The local R2 values are < 0.04 for all values, demonstrating that the data do not fit the relationship fitted by the GWR very well at all. My interpretation of this result is that, across the spatial scale of my study area, the relationship between blue whale group size and chl-a concentration is not linear. This is apparent from looking at the raw data as presented in figure 2, as it appears that there are several whales in an area of high chl-a concentration and some large groups of whales that are in an area of low chl-a concentration.

Critique of the Method

Although the results showed that there was no linear relationship between blue whale group size and chl-a concentration across the spatial extent of my study area, this is a very useful result. The ecological relationship between blue whales and chl-a concentration is not direct—remotely-sensed chl-a concentration indicates locations of productivity, where phytoplankton are present in high abundance. These areas of high phytoplankton are presumed to drive the location of dense aggregations of krill, the prey source for blue whales. However, there is likely both spatial and temporal lag between the phytoplankton and the blue whales. The fact that the ecological relationship of the two variables investigated here is an intricate one is reflected by the fact that a simple linear relationship does not capture how the variables are related to one another in space. I look forward to exploring this relationship further, perhaps incorporating other environmental factors that contribute to the location of krill patches such as sea surface temperature.

Tutorial 1: Using Logistic Regression vs Ordinary Least Squares Regression.

Filed under: Tutorial 1 @ 3:04 pm


  • How is chytrid fungus presence/absence related to the distance from road or trail? I chose this question because humans are known to be one of the primary causes of pathogen transmission.
    • This question involves a relationship between presence or absence of chytrid fungus (dependent variable) and distance from road or trail (explanatory).


  • I attempted to use Logistic Regression and Ordinary Least Squares for my y variable (chytrid presence/absence) related to an x variable (for this I used distance from road or trail) by using python script and R. Although OLS may not be the best test to run for my data, (logistic regression is shown to be the best based on Table 6.1 in the exercise because my x and y are both binary or categorical data) I wanted to try it anyways to see if there was any linear relationship between x and y. The OLS tool also outputs some cool visuals, figures, and plots. This could be useful for someone who has continuous data.


  • To run OLS and Logistic regression, I had to prep my data. I wanted to answer the question, is the presence of Bd more likely near a road or trail (where humans are transporting the pathogen)? Therefore, I had to get distance from sample points to the nearest road. To do this, I first had to bring in multiple layers of roads and trails within Deschutes National Forest given to me by an employee. I used the “Merge” tool to bring all the layers together. My next step was to find the distance from the sample point to the nearest road or trail in my new “merged roads and trails layer”. I used the “Near” tool which generated a value representing the distance from the sample point to the nearest road or trail. Once I had that information, I ran the Ordinary Least Squares tool and logistic regression where I used Bd as my dependent variable, and distance from road as my explanatory variable. Below is my code and used for both OLS and logistic regression.


  • I am not certain whether OLS was useful or not, but I think not because my data was not continuous which violates one of the assumptions of the test. Although my data violates assumptions made by OLS, it backed up results by the logistic regression, showing a relationship between distance from road and Bd presence.However, while the logistic regression model stated a significant difference, OLS stated a non-significant difference.Logistic regression works better for my data and was useful because it showed a significant relationship between the two variables. Below are some figures produced by OLS, and above are the statistical results along with script from both OLS and logistic regression.

May 3, 2017

Tutorial 1: Cross-Correlation between tsunami inundation time series extracted from gridded netCDF data

Filed under: Tutorial 1 @ 12:44 pm


Tsunami inundate shorelines in a chaotic manner, resulting in varying degrees of flooding from location to location. The spatial variability of tsunami hazards has important implications for the tsunami-resistant design of structures and evacuation plans because different safety margins may be required to achieve consistent reliability levels. Currently, tsunami mitigation strategies are primarily informed prediction models due to the lack of nearshore field data from tsunami disasters. Thus, to investigate the spatial variability of tsunami hazards, we analyze the relationship between different test locations in our domain. The question is:

“How much does tsunami inundation vary spatially?”

Essentially saying if one observes a certain flooding depth at their location, could they assume that a similar flood level is occurring elsewhere? And how does this change with time? And how to certain coastal features affect this?


To answer this question, I examined the cross correlation between time series of inundation extracted from simulated tsunami inundation data. All of the tools described in this tutorial were implemented in R.


My data was stored in large netCDF (.nc) files which had to be opened in R. This required the “ncdf4” package. The netCDF file was read into R by using the “ncopen” function and its contents were printed to the Console window in RStudio by using “print” as shown in figure 1. At this step we can use the “nvar_get” function to extract the time and geographic coordinate variables from “ncin” object which contains the data from the netCDF file.

Figure 1 – Reading and viewing contents of netCDF file containing tsunami simulation data.

After this, we want to extract the time series of inundation at particular geographical points specified by the user (me). In this case, I’m interested in the water level at the jetties for Channel Island Harbor (Figure 2) and Port Hueneme (Figure 3) and how correlated they are to one another. After determining the geographic coordinates for my points of interest, the closest points within the gridded data were identified by using a the minimization function “which.min(abs(desired_value – data))” in R as shown in Figure 4. Using the indices identified by the minimization procedure, the “ncvar_get” function is used obtain the particular time series. In this case, however, since the inundation data is very large, we only want to extract the desired subset and thus had to be careful with our “start” and “count” inputs in the “ncvar_get” function (see Figure 4).

Figure 2 – Geographic location of Channel Island Harbor Jetty

Figure 3 – Geographic location of Port Hueneme Jetty

Figure 4 – Code excerpt for extracting time series from user specified geographical points

After obtaining the time series, we plot to visualize in Figure 5.

Figure 5 – Time series of free surface variation at extracted jetty locations for Channel Island Harbor and Port Hueneme

Now that we have stored our inundation time series in arrays (ha.CHjetty and ha.PHjetty), we can now perform a cross correlation by using the “ccf” function from the “stats” package in R.


The result from this procedure was essentially a list of correlation coefficients for specific time lag values. This information is best visualized as a plot as shown in Figure 6.

Figure 6 – Cross correlation plot between the inundation time series at the Channel Island Harbor and Port Hueneme Jetties.

Figure 6 shows that the inundation at the two different jetties are somewhat lagged from one another, but by only a short time. Additionally, there are significant differences between the observations with the maximum correlation coefficients to be ~0.6.

Critique of Method

Overall this method was quite effective and very useful for the specific analysis. From a data analysis standpoint, this setup can be effectively automated and can be used to extract the time series at any given set of locations to compare correlations. This methodology was quite simple to implement and produced simple and interpretable results.

Furthermore, this information can be useful from the bigger question standpoint in the sense that we get pertinent information regarding the inundation levels at this different locations. If we treat the tsunami and incoming waves, we can consider the lag effect as being how long it takes for the wave to get from one jetty to the other. In this case, only minutes separate the largest effects of the wave from getting from one jetty to another which suggests that actions should be taken simultaneously to mitigate any issues that might occur.

The other aspect to note is that there are significant differences between the inundation levels observed at these two locations. This suggests that the measures that take the magnitude of tsunami inundation into account should be determined individually at these two harbor/ports. This preliminary analysis highlights a need for more localized and site-specific tsunami mitigation measures.

May 1, 2017

Tutorial 1: Assessing distribution shifts in kernel density probability surfaces

Filed under: Tutorial 1 @ 1:48 pm

Overview: question & approach

Identifying spatial repulsion and/or shifts in distribution are often examined to identify interspecific competition effects. Therefore, I wanted to assess the distribution of cougar kill sites including the spatial clustering and frequency in pre- and post-wolf time periods focusing on identifying shifts. A simple solution to my presence only data (kill events) was to create a density feature. However, there are several elements in my data sets, besides the presence of wolves, which could produce a shift in kill density or distribution that I will need to account for including: 1) catch-per-unit effort discrepancies (i.e. larger sample sizes of cougars (and kill sites) in one data set), or 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). Before getting into an investigation of causal mechanisms, a first step is to determine if there are differences in where cougar are killing prey between study time periods by asking:

Is the distribution of post-wolf cougar kill sites different than the distribution of pre-wolf cougar kill sites?

There are two ways to assess this question. For the purposes of this analysis the data would be “location, implicit” with the variable of interest (cougar kill sites post-wolf) having a covariate (cougar kill site density/distribution pre-wolf) measurable at each sampled location and the causal mechanism inferred from co-variation. Density measures for each kill site could be pulled from the density feature raster created, but would be most relatable on a prey species level since elk and mule deer behave differently at a landscape level. Alternatively, an isopleth (or contour line) probability surface created from the density features could be used to calculate the proportion of post-wolf cougar kill sites within specified probability features of the pre-wolf cougar kill site distribution.


My approach to this question was to relate the proportion of post-wolf cougar kill sites (points) to the pre-wolf cougar kill site density (raster) using the probability contours (polygon feature) of kill site density as a measure of distribution. I used several tools in ArcGIS and the Geospatial Modeling Environment (GME) to carry out this analysis. GME is a standalone application that makes use of ArcGIS shape files and R software to carry out spatial and quantitative analyses.


Figure 1. Geospatial Modeling Environment (GME) home page.



I had already created point shape files of cougar kill sites for each time period in ArcCatalog, and used the kde call in GME to calculate kernel density estimates, or utilization distributions, for cougar kill sites in each time period (exercise 1). For that analysis, I used the PLUGIN bandwidth and a 30-m resolution cell size. Multiple bandwidth estimators are available as well as input options for scaling, weighted fields, data subset and/or edge inflation.

Figure 2. GME kde call in the command builder GUI.


The next step for this analysis was to standardize the two cougar kill time period KDE raster data sets so the densities relative to catch (in this case kill events) per sample (cougar) were comparable. This was necessary because the sample of kill sites (and sample of cougar) in the pre-wolf time period was higher (45-48%) than the post-wolf sample of kill sites (and cougar). I used the raster calculator in ArcGIS to divide each period kill site KDE raster by that periods’ respective kills/per cougar ‘effort’.

Figure 3. Density raster data standardization using ArcGIS raster calculator.


Figure 4. Raw code for analysis. GME runs like an R GUI, therefore code can be written in any text editor and past into the command window.


Using the new density raster, I then used the ‘isopleth’ command in GME to create a polygon probability surface. The resulting polygons represent quantiles of interest (programmable) expressed as the proportion of the density surface volume. I specified isopleths for the 25th, 50th, 75th, 95th, and 99th quartiles. I also used the ‘addarea’ command in GME to calculate and add information to the isopleth shapefiles for each polygons’ area and perimeter. Next, I used the ‘countpntsinpolys’ command to add a column with the count of post-wolf cougar kills in each pre-wolf cougar kill isopleth polygon. In ArcGIS, I created another column in the attribute table and used the ‘Field Calculator’ to fill each field with the proportion of total kills. Finally, I used ArcMap to visualize the data and make a figure showcasing the process.


Figure 5. GME command text box where you can enter code directly, copy from the command builder, or paste code constructed in another editor. Because GME uses R you can program iterative analysis for loops to batch process large data sets.



Visual examination of the standardized kill density rasters demonstrated that part of the distributional shift observed was related to catch-per-unit effort influences, but a shift between the two highest density areas of kills was still evident from pre- to post-wolf time periods (Figure 6). This suggests other variables, like time effects from seasonal prey distribution changes or the presence of wolves, could also be factors influencing the distribution of kill density.

Figure 6. Comparison of pre- and post-wolf cougar kill site kernel density estimate (KDE) rasters before and after standardization for catch-per-unit effort (kills/cougar), and with 25th, 50th, 75th, 95th, and 99th isopleth probability contours. The isopleth contours for pre-wolf cougar kill site distribution is also fit with post-wolf cougar kill point locations to demonstrate posterior distributional shift.


The observed shift was further evident from the proportional changes in post-wolf cougar kills within the pre-wolf cougar kill probability surface. For example, only 28.5% of all post-wolf cougar kills were within the 50% pre-wolf cougar kill probability contour. Even if I exclude the kills outside the study area boundary the proportion of kills in each probability contour were 5-22% lower than would be expected based on the pre-wolf kill site distribution.


Table 1. Pre-wolf cougar kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of post-wolf cougar kill sites, % is the proportion of all kills within each polygon ‘donut’, and % Kills is the cumulative proportion of all post-wolf cougar kills within each quartile class.

Method Critique

Standardization of the two kill density rasters improved visual interpretation of the spatial patterns and accounted for one of the other factors (catch-per-unit effort) that might mask distributional influences related to wolf presence. However, similar to the visual examination of the non-standardized density raster this method allowed for only an implicit understanding of space-time concepts and lacked inferential measures of significance to quantify the shifts and formally relate the patterns of cougar predation across time. Further, because the isopleth contours represent quantiles expressed as the proportion of the density surface volume they were identical when created from the standardized or non-standardized density rasters. Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution. However, the mechanism behind the shift could still be related to factors other than wolf presence like seasonal shifts in prey distribution or density. Both methods (standardization and prior distribution comparison) were useful and provide evidence toward the presence of a shift in where cougars are killing prey (i.e. the answer to my ‘question asked’ is yes), but further analysis is necessary to infer causal mechanisms.

April 30, 2017

Tutorial 1: An analysis of the spatial distribution of the North American deer mouse (Peromyscus maniculatus) in the Toiyabe mountain range, NV, USA.

Filed under: 2017,Tutorial 1 @ 10:07 pm

Background & Question

The north American deer mouse (Paramyscus maniculatus) is the most widely distributed rodent species in north America and has the highest proportional abundance across many of the rodent communities in the Great Basin. Understanding its distribution in space is important for understanding its impact on the spatial arrangement of the rest rodent community to which it belongs. In this regard, I wanted to evaluate the degree to which this species is clustered in the Toiyabe mountain range in central Nevada. I expect the pattern of P. maniculatus distribution in the Toiyabe mountain range to be reflective of its distribution behavior in other similar mountain ranges in the Great Basin, and thus exerts similar forces across the landscape. The first step in understanding the relationship between the distribution of P. maniculatus and the other species in small mammal communities was to map the distribution of P. maniculatus. Here I used a descriptive statistical technique of spatial analysis, specifically, Hot Spot analysis. Because data are spatially explicit at the scale of interest the spatial relationship I was interested in measuring could be expressed using the equation for spatial autocorrelation below (eq. 1); where the number of P. maniculatus at location i is a function of the number of P. maniculatus at location i±h, where h is some displacement of space.

Eq. 1.               P. maniculatusi = f(P. maniculatusi±h)

I expected that at the scale of interest, the number of individual P. maniculatus captured would be distributed approximately equally across trap sites. Hot spot analysis provides an index of spatial clustering from clustered to random to dispersed. Clustering suggests the spatial process of attraction, while dispersal suggests repulsion between points. These patterns could be explained as the ecological processes of intraspecific commensalism and competition, respectively. Alternatively, if the distribution of points is random it may suggest that the underlying process producing the distribution of points is not captured by the sampling design, or that the distribution of individuals is not spatially dependent. I do not expect the distribution of P. maniculatus to be spatially dependent.


Raw data tables represent all trap data from 2009 – 2011 for the Toiyabe mountain range small mammal survey. Each individual animal is associated with explicit and implicit spatial data, as well as biological measurements including body size measurements and age-class. Additional information about collection and preservation methods are also associated with individual animals. I collapsed this comprehensive data set by constraining my analysis to one species, Peromyscus maniculatus.  I then collapsed this dataset further by converting it to a count of individual P. maniculatus at each unique location. Explicit location data was recorded as latitude, longitude using the NAD1927 Datum.

Using ArcMap10, I projected these points onto relief base map of North America. To accomplish this I imported the raw data as a .csv file into ArcMap.

.csv file

After importing the .csv file, set the datum for the projection by right clicking on the .csv file in the table of contents and selecting layer properties and setting the data source.

import .csv

Within data source menu, select the datum that your coordinates were measured in, and indicate the X and Y coordinate you wish to display.

.csv properties

Set datum and XY

Once the datum is set, it is easiest to then select a base map to project your points onto you’re your tool bar Select the add basemap button and choose an appropriate map. Here I used terrain with labels.

Select Basemap

Once you have your points projected onto the base map you will need to export them as a shape file. Right click on the layer you wish to export and choose the selection to make a shape file.

Export shape file

Perform Hot Spot analysis on this shape file by assigning the count data to each point.

Perform hotspot analysis

The output for the Hot Spot analysis is the confidence level that clustering (hot spot) or dispersal (cold spot) is occurring.

Hot spot output 


Hot Spot analysis of P. maniculatus based on trap line locality provided mixed insights. Specifically, this analysis does not show how P. maniculatus is distributed on the landscape, but rather how the researchers distributed trap lines on the landscape. However, it does demonstrate that across trap lines P. maniculatus essentially evenly distributed. Only one trap line represents a hot spot for this species, while all other localities were not significantly cold or hot, that is not clustered or dispersed. The single hot spot occurred at a point in space where a human food subsidy to the population likely drove increased population density, as it was located within a campground.


Final map


Despite the non-random distribution of point localities, the distribution of P. maniculatus across the points was consistent with the generalist ecology of this species. This suggests that members of small mammal community at all measured points in space will be influenced by the presence of P. maniculatus. However, the results of this analysis do not suggest that that influence of this generalist species will be uniform at different spatial localities as the habitat and other rodent species present vary in space despite the distribution of P. maniculatus.

While Hot spot analysis is a useful tool for understanding the distribution of individual occurrences in space, sampling will exert a strong influence over the inference one is capable of making based on this analysis. Trap line data do not provide much inferential power as the distribution of points is determined by the researcher and at best the hot spot analysis provides the same information that a histogram might, with count as the response variable and the latitude or longitude of each trap line as the explanatory variable. The greatest value in performing this analysis on this type of data is that it provides a standardized visual representation of the data. Which may help me see how variable the distribution of a species is between trap lines. However, moving forward I will be using elevation rather than latitude and longitude and my spatially explicit variable.


April 28, 2017

Tutorial 1: Calculating Global Moran’s Index on Point Data

Filed under: 2017,Tutorial 1 @ 2:19 pm

A quick summary of my project and objectives: I am using a subset of the FIA (Forest Inventory and Analysis program run by the USFS) dataset, which is contained in a .csv file. I want to assess the understory vegetation response to ponderosa pine canopy cover and other environmental factors, and I am using USFS plot data to simulate the process I will be using on my own dataset in the future. The dataset currently looks like this:

Obviously there are multiple variables associated with each plot, but for the sake of this exercise, I will only be assessing canopy cover and total vegetation cover. Because my dataset is not a raster, nor is it regularly gridded, I decided not to attempt a semi-variogram. I was also unable to kick my ArcMap habit for this exercise. I decided to run a Global Moran’s Index assessment on the points in ArcMap.

The first step of this process was to load my .csv into Arc. This can be accomplished by adding the file as X,Y data. However, you can’t do much with X,Y data for our purposes, so I exported the X,Y data as a shapefile. This created an attribute table holding all the same information as the .csv file. I also added a boundary of Deschutes County, which is where my subset of data is located. I used the Project tool to make sure both the points and county boundary were properly oriented on the landscape, using the WGS_1984_UTM_Zone_10N coordinate system.

From here, the process was very simple. I used the search tool to seek out a spatial autocorrelation tool, which brought up the Global Moran’s Index. I selected my points layer as the input, and selected canopy cover and total vegetation as my input fields (in separate trials).

The reports showed the levels of clustering for each variable. The reports are shown below.

Canopy cover spatial autocorrelation:

Vegetation percent cover spatial autocorrelation:

I found that the variables had very different levels of clustering. The canopy cover variable had essentially no clustering. The Moran’s Index value was -0.034, with a p-value of 0.57, meaning the level of clustering was non-significant. However, the vegetation cover was highly clustered, with a Moran’s Index value of 0.0496 and a p-value of 0.00028.

This method was useful in giving me a different perspective on the data. It is difficult to simply review a spreadsheet and try to make any meaningful interpretation of the numbers. While the tool did not give any sort of visualization, it allowed me to imagine how the vegetation might be distributed across the landscape. The results mean that canopy cover varies across the focus area randomly, so stands with thin or dense canopies can be found just about anywhere. However, stands with dense understory vegetation are probably located in similar regions, as are stands with thin understories. This might imply that other environmental factors have a strong influence on how densely vegetation grows over a wide area.

Tutorial 1: Species Area Curves with Vegetation Community Data

Filed under: 2017,Tutorial 1 @ 1:58 pm

Salmon River Estuary: Map of study site and plot locations

Species Area curves represent the relationship between a specified area of habitat and the number of species present there. Species-area curves are useful for comparing species richness data from sample units that differ in area, especially nested sampling plots. Empirically, the larger the area, the larger the number of species you are likely to capture; a principle that has held mathematically true in ecology. Species-area curves can address the adequacy of sample size and estimate the overall diversity and spatial heterogeneity within a community dataset. Species-area curves help to conceptualize the relationship between species richness and spatial scale in a sampled environment. Typically species-area curves take only species presence, into account and demonstrates average number of species per plot sampled. Species-area curves can also help researchers determine the appropriate spatial grain for a study area, as the spatial grain (the size/dimensions of the smallest sample unit) has a strong impact on the shape of a species area curve. Species area curves are a basic diversity measure that are helpful for describing the heterogeneity of a community sample. Species accumulation curves, are derived by a log transformation of the species area curve, and describe a linear trend of how species richness increases over area sampled. The species accumulation curve is also called the ‘effort’ or ‘discovery’ curve, as the species richness measured is often related to the amount of time and space sampled by a researcher. The goal for my tutorial is to demonstrate how I have created species-area and species accumulation curves to investigate spatial pattern of vegetative richness within and between tidal marshes at my study site.

Equation for Species Area Relationship: S = cAz  >>>>> log(S) = c + zlog(A)

Where S = Species Richness, A = area, and c and z are empirically determined constants from plotted data (Rosenzweig 1995).

The research question I have relevant to this exercise/tutorial is: Do species area relationships differ between restored and remnant tidal marshes at Salmon River Estuary? How do field methods, specifically, nested, rectangular (Modified Whittaker) plots and square, non-nested (Transect) plots capture these species-area relationships differently?

Between the two types of sampling techniques I applied, I would ultimately expect the Modified Whittaker plots, which cover more area, to capture the same number if not more salt marsh species compared to transect plots. I hypothesize that the reference marsh would have a greater number of species, and a steeper slope for the species-area curve compared to restored marshes. I would also expect more recently restored marshes to have less ‘steep curves’ than marshes that were restored less frequently.

Name of the tool or approach that you used:

PC-ORD is statistical software counterpart to R that that performs ordinations and multivariate analyses on community datasets. Microsoft Excel was also used for graphics. Initially I used PC-ORD to generate the average number of species per plot over cumulative area sampled, for each plot size in each tidal marsh. I then exported the data to excel and produced graphs of cumulative species over log10 area and figured out the slope of each line. I kept it simple and used software I am familiar with, to expedite the process of producing useful results I can interpret and be confident in.

Brief description of steps you followed to complete the analysis:

Step 1: Enter data into an excel spreadsheet and format for PC-ORD

Open PC-ORD. Start a new project (File > new project). Name the project appropriately and import the excel sheet into the ‘main matrix’ input. Save the project. Use the summary command to produce a species-area curve output. The output will include average number of species per plot over sampled distance. Repeat this for every plot size sampled ((x2)1 m2, 10 m2, 100 m2, 1,000 m2).

Save PC-ORD output and import back into Excel. Graph average species richness against area sampled for species area curves. Do this for transect 1 m2 plots and MW 1 m2 plots, for each tidal marsh, using Excel commands (highlight data > insert > scatter plot graph). Calculate the log10 of the area sampled and log10 of species richness, for all nested MW plots (1 m2, 10 m2, 100 m2, 1,000 m2). The trendline is a fitted linear trend that describes log species richness vs. log species area.

Brief description of results:

The species curve for transect data appears to rise and plateau quickly, whereas the MW species curve rises more steadily and plateaus later (Figure 1 a-d, Figure 2 a-d). MW plots had higher overall species richness (21) and lower species turnover (1.7) compared to transect data species richness (17) and species turnover (4.5). This is an expected result, as there were fewer larger MW plots that covered more area compared to transect plots. Furthermore, the transect plots exist on environmental gradients related to salinity and elevation, suggesting greater species turnover within sample units, compared to MW plots that are not aligned with gradients (McCune and Grace 2002; Rosenzweig 1995). Average species accumulation for transects is similar (16.26) to average species accumulation for MW plots (17.93), suggesting that both represent adequate sample size and appropriate spatial grain for this study site (Figure 2 a, b, c, d, Table 2). ANOVA and post hoc t tests indicated that all tidal marshes have significantly different species accumulation rates.

Critique of the method – what was useful, what was not?

I would describe this method as useful, fast and simple. It was reasonably accessible and easy to do, without requiring extensive knowledge of statistical programs or software packages. However, I would not have found the method as easy if I had not just completed a course at OSU that instructed me on how to use PC-ORD (taught by Dr. Bruce McCune, who created the software). Needing to use two programs to get my results may be a bit cumbersome, however with an initial quick google search I was not able to find any helpful code on how to produce semilog species area graphs and species area curves with R. It seems like pursuing R would have been more frustrating and the results would not have been as clear to me. I am satisfied with my approach and pleased with my outcome, as it is simple and the results make sense to me.

Example of Results:





Tutorial 1: Using variograms to assess patchiness in raster pixels for a single wildfire event

Filed under: 2017,Tutorial 1 @ 10:32 am

Overview: Question & Tool

Often patch dynamics are examined under the lens of single disturbance events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration of patches from an initial disturbance may influences the spatial configuration of patches of a sequential disturbance.  The broader research question is focused on determining if there is a relationship between patches from an initial disturbance of beetle outbreak and a sequential disturbance of wildfire.  Before I address the broader question, I would like to determine if the post-fire landscape indicates some inherent patchiness. The research question here asks:

What is the patchiness (variability) of  fire severity for a single fire disturbance?

In order to do this I employed a variogram to assess the spatial pattern for a post-fire image.  The post-fire image was generated for the Entiako fire that burned in 2012 in central interior British Columbia.


The burn severity raster was generated by following protocols outlined by Key and Benson (2006). I collected a pre and post fire Landsat (30m) images for the Entiako fire that burned in 2012.  Both images were from the same month to match phenology and had minimal cloud cover.  I calculated the differenced Normalized Burn Ratio (dNBR), a pixel indexing classification.  Initially, NBR is computed for each individual image:



Then the difference is taken between the pre- and post-fire NBR images:

dNBR = NBRprefire – NBRpostfire

The dNBR image is useful for differentiating a gradient of fire severity within a fire perimeter.

A variogram was used to assess the spatial variability between pixel values in the dNBR image. A variogram is a means of evaluating spatial correlation between locations, and the shape of the variogram may indicate the level of patchiness and average patch size. The following spatial equation was considered as a guide:

  • yi = f(yi-h,i+h)

The variogram analysis was conducted in R version 3.3.2 (Team 2016) with the USDM Package (Naimi 2015). I used the following code:






The variogram analysis indicated a spherical relationship in the dNBR image, which suggests some level of patchiness (Figure 1).  The sill of the variogram suggests that patches averaged about 1500m, which seemed a bit high (Figure 1).  The dNBR image contained a couple large bodies of water, which may influence the variogram.  I removed the water bodies from the dNBR image, and ran the variogram on the water free image (Figure 1).  The water free analysis shows a shift in the sill location and indicates that average patch size might be around 500m (Figure 1).  The variogram without water also seems to have a double sill, which might indicate that there are both smaller and larger patches.

Critique of the method – what was useful, what was not?

The variogram seemed like a useful descriptive statistical tool for assessing the spatial relationship of pixels, as an indicator of overall patchiness, and an indicator of average patch size.  It allowed us to examine this relationship while maintaining the continuous dNBR raster.  Conducting the variogram in R was the key to maintaining the continuous raster data.  In order to conduct the analysis in Arc GIS I would have had to convert the continuous raster to point data.  This analysis allowed us to consider both implicit and explicit concepts of time.  The dNBR image captures this idea of implicit time, by measuring the difference between the pre- and post-fire images.  The variogram allows us to capture the idea of explicit time by describing the inherent patchiness in the post-fire landscape. It is important to recognize that the variogram analysis is taking a sample of the data.  The sample is dependent on the spatial resolution and extent.  If you rerun the variogram analysis the curve is slightly different each time, because the model re-samples the data set.  In trying this a few times it did seem like the sill remained at a relatively consistent lag distance.


Figure 1. Variogram for the Entiako dNBR image with water included (a) and excluded (b). The red circles indicate the sill, which provide a general indication of patch size. With increasing distance (Lag) there is increasing variability. The dNBR images (c and d) for the Entiako Fire that burned in 2012. Image c was used for variogram a, and Image d was use for variogram b. Removal of large bodies of water is visible in d (black circle), which correspond to variogram b.



Key, C. H., and N. C. Benson. 2006. Landscape assessment (LA): Sampling and analysis methods. USDA Forest Service General Technical Report RMS-GTR-164-CD:1–55.

Naimi, B. 2015. Package “ usdm .” R Topics Document.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

April 24, 2017

Japanese tsunami marine debris biota

Filed under: 2017,My Spatial Problem @ 12:49 pm


Research problem: Japanese Tsunami Marine Debris species

Six years ago, the devastating Tohoku earthquake and tsunami struck the coast of Japan. Since then, it has become evident that hundreds of coastal species from Japan have crossed the Pacific Ocean on tsunami debris, including species that have become invasive and been known to cause ecosystem/economic damage elsewhere. As of January 2017, scientists have documented the arrival of >650 debris items, referred to as Japanese Tsunami Marine Debris Biofouling Items (JTMD-BF) 1-651. Debris items include docks, buoys, boats, pallets, and wooden structures. These items were identified as JTMD if it 1) had clear identification such as a serial or registration number that was linked to an object lost during the tsunami of 2011; 2) had clear biological evidence of originating primarily from the Tohoku coast of Japan; or 3) a combination of these factors. The BF items are a subset of all the debris lost, as some debris items that were known to be lost from Japan remain undocumented – it is possible they ended up in remote locations.  A huge effort by taxonomists to identify the species on JTMD has generated a comprehensive species list. Currently, there are around 300 taxa that have been collected on JTMD from North American and Hawai`ian coastlines since 2012. I am interested in looking at the geographic distributions of the species with invasion history compared to those without invasion history.

Dataset: JTMD Database

My work is part of an international effort to evaluate the risks associated with JTMD and associated species.  I contributed to the development of a database of life history, distributional, and environmental attributes of many JTMD species. This database is being used for reference and analysis, and can aid in efforts to assess the risk associated with JTMD species, and in determining which species are of highest concern for establishment in certain areas. The database contains a little over 100 JTMD species, with 26 attributes per species. It has information for native geographic regions and non-native (if applicable) geographic regions. The regions are the Marine Ecoregions of the World. It also has survival temperature and salinity regimes, reproductive information, habitat, depth, trophic level, abundance data, and more. The database does not include temporal information, but these species arrived on debris items in pulses for years after the tsunami event, and that is another interesting aspect of the phenomenon.

Hypotheses: While the JTMD species are largely native to the Northwest Pacific (near Japan), they are also native to regions from all over the world. I hypothesize that most species without invasion history are native to the NW Pacific, and those with invasion history are native to regions all over the world. I expect to see this represented when their native regions are mapped. For the species that have been found outside their native range, I expect to see that some are only just outside native ranges in a few areas, but some with a history of invasion are more globally distributed.

Approaches: I have never used GIS, and would like to play around with ArcGIS to better learn the program. I would like to get a hold of the Marine Ecoregions of the World (MEOW) shape file in order to put it into ArcGIS, and overlay the JTMD geographic distributional history on top of those regions. The species data has been analyzed using PC-ORD, in trait space. And the species with invasion history were compared with those without, and no significant difference in life history traits were found.

Expected outcome: I want to produce maps of the JTMD geographic distributions. If I can work in statistical relationships as well with the data I have, that would also be beneficial to my understanding of the phenomenon.

Significance: Human-mediated transport of marine species across the globe through ballast water and hull fouling has been a concern for some time. JTMD is unique in comparison to these marine vectors, in that it can transport large numbers of marine species across ocean basins. Shipping routes are direct and arrive in known locations and at measurable frequencies whereas JTMD, which is propelled by winds and currents and travels at much slower speeds than ships, can arrive almost anywhere at any time. JTMD is potentially the transport vector with the most random distribution yet described. Due to the slow rates of transport by currents rather than propulsion, the effects of drag and dislodgement are reduced on JTMD in comparison with ship hull Furthermore, JTMD transports large numbers of adults, rather than larval stages that are more common in ballast water. As of January 2017, only one JTMD species, the striped beakfish Oplegnathus fasciatus, has been observed free-living in along the west coast of North America (in Oregon and Washington). At this time, we do not know if any of these JTMD species will become established outside of their current distributional range as a result of the earthquake and tsunami. But by studying the JTMD species invasion histories, and geographic distributions of both native and non-native areas, we can better understand this large dispersal event, and inform vector management response for future events.

Experience: I am a beginner in spatial analysis, and have no experience with Arc-Info, Modelbuilder or GIS programming in Python. I do have a little bit of experience in R, at least I used it when I took ST 511 and 512, but all the coding was handed to us, we just had to copy and paste it in to R. Being a beginner I will have to do a lot of self-exploration, tutorial watching, and asking for help from peers and teachers.


April 20, 2017

Exercise 1 Part 2: Assessing spatial autocorrelation of snowpack using semivariograms

Filed under: Tutorial 1 @ 10:50 am

Research Question

For this exercise, I elected to compare the spatial autocorrelation of snow cover frequency (SCF) before and after the Grays Creek wildfire in central Idaho.

Approach/tools used

After trying out a few different tools, I decided to use ArcGIS to create semivariogram plots for my SCF data within the Grays Creek wildfire perimeter.

Analysis procedure

First, a bit of data processing was required for this exercise. Since conducting spatial autocorrelation analyses in ArcGIS is more attainable using point data, I converted my SCF data from its original raster format into an array of equally spaced points (see example in figure below). A point was created at the center of each 500m pixel. This process was completed for both the winter (Oct. 1 – May 1) preceding the wildfire and the winter following the wildfire. Finally, the semivariogram clouds were produced using ArcMap’s geostatistical analyst toolbox.


SCF Pre-fire Semivariogram cloud:

SCF Post-fire Semivariogram cloud:

Overall, the semivariograms display similar distributions between the pre- and post-wildfire data. The semivariogram clouds show the empirical semivariance values for each pair of points plotted against the lag (distance) separating these two points. For this data, higher semivariance values correspond to large differences in SCF, while low values indicate similar SCF values. Although subtle, it does appear as though the post-fire semivariogram has a less defined trend of increasing semivariance with distance. There’s not enough information to say whether this is a result of severely burned wildfire patches, but it would be interesting to examine this potential relationship further.

As expected, data point pairs generally become less similar as the distance between them increases. In both plots, the semivariogram clouds display a steady increasing trend. This trend is likely a product of the snowpack gradient that exists across elevations at a local scale – as elevation increases, so does snowpack. In other words, points that are closer together will be more similar in elevation, and since snowpack is closely linked with elevation at a local scale, nearby points will also have more similar SCF values.


Method critique

The main critique of this procedure was the variable that I chose to analyze. Because snowpack is undoubtedly spatially autocorrelated at close distances (due to the strict elevational relationship), the semivariogram did not reveal any patterns that I was not already aware of.

However, I anticipate the semivariogram to be a useful spatial analysis tool as I move forward with my research. The semivariogram statistics would probably be much more appropriate for either my vegetation index data or burn severity data as these datasets have a less predictable distribution across a landscape.

April 10, 2017

Assessment of Ecohydrologic Dynamics of the Mill Creek Municipal Watershed

Filed under: 2017,My Spatial Problem @ 7:14 pm


The overall objective of my M.S. Thesis research is to disentangle the potential effects of water diversions, forest management, and climate change on the hydrology of the Mill Creek watershed, a transboundary catchment straddling the Eastern Oregon and Washington border. Since 1989, 8,798 hectares of the headwaters of this catchment have co-managed by the U.S. Forest Service and City of Walla Walla “to provide water at a level of quality and quantity” for the City’s municipal supply [U.S. Forest Service, 1990]. Despite the management protections afforded this upper catchment, minimum instream flow recommendations by are rarely met at the most upstream monitored gauge for Mill Creek. At the base of the Municipal Watershed, a small impoundment dam has been constructed, where an intake structure generates power and carries water withdrawals through a pipeline 22 km to the municipal water treatment plant in Walla Walla. The municipal intake structure is roughly 7 km upstream of the first stream gauge, USGS 14013000, and although it was firs installed in 1913, there is no long-term record of the natural flow regime without municipal alteration.

Downstream of the municipal catchment, water rights have been historically overallocated beyond available stream discharge during periods of high demand. This is particularly a problem during the late summer months when peak water use by municipal and agricultural consumers coincides with patterns of seasonal low flow. Columbia Basin Bull Trout (Salvelinus confluentus) and Mid-Columbia Basin Steelhead (Onchorhyncus mykiss) are native to the Mill Creek Basin and have been listed as threatened under the Endangered Species Act since 1998, and 1999 respectively [Schaller et al., 2014]. Stakeholder interests are actively working to implement basin-scale flow planning to protect critical habitat for these species, and ensure flow volumes sufficient to ensure migratory passage to spawning and rearing grounds in the headwaters. To protect wildlife, fish, scenic, aesthetic, and other environmental and navigational values, minimum instream flow recommendations for Mill Creek were first established by Washington Administrative Code (WAC 173-532-030) in 1977, and revised in 2008.

An Instream flow” is broadly defined as a target flow rate, typically measured in cubic feet per second (cfs), at a specific gauging point for a defined period-of-time, often with seasonal variations [Geller, 2003] . Despite the management protections afforded this upper catchment, minimum instream flow recommendations are rarely met at the most upstream monitored gauge for the Mill Creek catchment (USGS14013000). For my master’s thesis, I am interested in teasing apart the relative role of anthropogenic development and natural ecohydrological trends which may be influencing stream discharge over time. I am interested in examining patterns in the long term hydrologic gauging record and satellite image record, to examine the empirical relationships between vegetation on streamflow discharge during summer months where minimum instream flow recommendations are rarely met.

Primary Spatial Questions to be addressed in GEOG566:

  • How has vegetation in the upslope accumulated area of the USGS Gauge 14013000 changed over the period of satellite record?
  • Is the observed seasonal change in “greenness”, as measured by NDVI, related to change in water discharge observed at the 14013000 stream gauge over the spring and summer season?
  • Does this relationship change with and without the addition of known withdrawals at the City of Walla Walla’s Intake Structure?
  • Secondary questions, that have a spatial component include: how is the management of the watershed restricting availability?


To do this we are using a simplistic conceptual model, based on the Water Balance Equation (Eq :1), to tease apart relationships of eco-hydrologic variables in the Mill Creek Drainage.

Eq 1:            Q = P – ET      →      ΔQ = ΔP – ΔET       →        ΔQ/ ΔP = ΔET

Where Q is discharge, P is Precipitation, and ET is Evapotranspiration. We are omitting groundwater and storage components from the initial analysis, although they may be included in future analysis

Description of datasets:

Q (discharge) at the 14013000 Kooskooskie Gauge on Mill Creek is available through the long-term USGS National Water Information System gauging record dating back to 1913. In addition, we have withdrawal data from the City of Walla Walla Public Utilities dating to 2000, to approximate an adjusted natural flow regime.

P (precipitation) data has been obtained from gauging stations at the Walla Walla Airport (1949- Present), and Milton Freewater (1928-Present), from the United States Historical Climatology Network, available online from Carbon Dioxide Information Analysis Center.

ET (evapotranspiration) has not been directly measured, however is an important flux in the water balance equation. Total system ET is a combined representation of the antmospheric flux of intercepted water, transpired water, soil water, and surface water.  Often there is a large difference between estimated Potential Evapotranspiration (PET), and the Actual Evapotranspiration (AET), which is a biophysical function relative to available of soil moisture, species physiology, and climatic variables. Water limited forest biomes often exhibit a water balance deficit, where potential summer atmospheric and plant demands exceed available soil moisture [Littell et al., 2010]. This ultimately will constrain photosynthesis, leading to senesce or entrance into dormant states where rates of ET decline. Generally, trees have higher rates of transpiration, as they have larger surface areas, are exposed to turbulent air flow, and root into deeper soil profiles with accessing moisture later into the season [Littell et al., 2010]. Based on these known physiological traits, I am interested in deriving the empirical relationship of ET to Multispectral Vegetative Indices as a proxy to estimate relative ET during the months of highest water stress.

The specific index utilized for this project is NDVI, the Normalized Differential Vegetative Index (NDVI) (Eq: 2), red (R) and near-infrared (NIR) surface reflectance detected by the satellite in question.

Eq 2:     NDVI= (NIR -R) / (NIR + R)

For Landsat ™ (4-7) imagery, this equation is achieved using the following bands.:

Eq 3:     NDVI = (Band 4 – Band 3) / (Band 4 + Band 3).

What does NDVI really-tell-us, and is it an appropriate measure to proxy for ET? NDVI is calculated Interpretation of NDVI value relies on the characteristics of healthy vegetation to absorb red light (for energy and photosynthesis) and reflect near-infrared light. NDVI values range from -1 to 1, are interpreted as an indicator of “Vegetative Greenness”. Greenness may change across at vegetative catchment can be linked to Eco-physiological factors of different vegetative communities and their phenology in relation to the water cycle. Pairwise dates were selected to best measure the difference in vegetation distribution following the post snowmelt green-up period, to the end of the dry season, where summer water deficit may impact ET rates.

Eq 4: NDVI Spring – NDVI Late Summer = Seasonal ΔNDVI

For the time change between each pairwise image, a corresponding change was also assessed for avaliable precipitation records and the the stream discharge record at the USGS gauge. Cumulative Discharge (cfs) was measured from Time 1 to Time 2, and converted to change in Mean Daily Discharge (by dividing the # of days in the period). This value was then normalized, by dividing measurements by the gauge drainage area of 154.36 sq km.

Cumulative Precipitation (mm) from Time 1 to Time 2 was assessed for each available set of pairwise images, and repeated or both the Milton Freewater and Walla Walla Precipitation Gauges.

ΔQ (mm/day)   –>  ΔP (mm/day)  –>   ΔQ/ΔP (mm/mm)

In my initial analysis, we looked at this empirical relationship across the entire drainage area of the 14013000 gauge. There was no clear statistical relationship in this early analysis, however, upon examination of the imagery, there appears to be clear topographic controls and perhaps vegetative patterns in seasonal pairs, and across years. For GEOG 566 I am interested in expanding these analyses to address smaller subsections of my catchment.


Intra-annual (seasonal) Change: HO: Areas with negative or low ΔNDVI values from seasonal change will include riparian forest lands and north facing slopes which will retain greenness or even increase in greenness over the course of the growing seasons, while areas with high ΔNDVI values will have encountered soil water deficits and have lower ET demand late in the Summer Season.

Inter-annual Change:  HO: Aforementioned patterns are consistent from one year to the next.

HA: There are expansions, or declines of certain regions of NDVI response, indicating potential shifts in species distribution.


HO: ΔNDVI statistics summarized by vegetation class polygons show clear spatial patterns

HA: ΔNDVI statistics summarized by vegetation class polygons show no clear spatial patterns

HO: Unsupervised classification of ΔNDVI shows similar patterns to the USFS vegetation class polygons

HA: Unsupervised classification of ΔNDVI do not spatially align with USFS vegetation class polygons

Expected outcomes:

I would like to continue refining my maps describing the spatial context for vegetation change, and establish whether there are any valid statistical relationships between vegetation/landscape change and the available hydrologic records.


The discipline of forest hydrology has a long history of studying the effects of vegetation in regulating stream-flow through evapotranspiration. In increasingly tightly regulated water environment, water resource managers are interested in the relative role of water use by trees and shrubs, and whether species distributions are changing over the catchment scale. At a regional scale, species distributions are often impacted by the ecohydrological balance between cool season surplus, and summer water deficit [Littell et al., 2010]. The management actions within the Municipal Catchment have preserved much of the natural character of the upper basin’s stream channel and riparian floodplain, thus retaining critical spawning and rearing habitat for several aquatic species of regional concern.The inability to meant instream flow recommendations at the Mill Creek Gauge USGS14013000 has, located high in the Mill Creek Basin before many of the largest water rights holders, raises questions about the future ability to manage the withdrawal structure for recommended instream flows under predicted future climate scenarios.  This information is of interest to resource managers from the City of Walla Walla Public Utilities, as well as collaborative basin planning organizations such as the Walla Walla Basin Watershed Council.

Your level of preparation:

Entering into this course, I feel adequately prepared to work within the ESRI suite of Arc-GIS software tools. I am concurrently enrolled in the GEOG562: GIS Programming for Geospatial Analyses to learn some basic Python Coding, and am working independently to refine techniques for utilizing open-source geospatial packages developed for use in R and R-Studio. I am hoping that this class will be an additional way I can gain practice with these tools, and become exposed to other relevant methodology implemented by my peers.


Littell, J. S., Oneil, E. E., McKenzie, D., Hicke, J. A., Lutz, J. A., Norheim, R. A., & Elsner, M. M. (2010). Forest ecosystems, disturbance, and climatic change in Washington State, USA. Climatic change, 102(1), 129-158.

Geller, L. D. (2003), A Guide to Instream Flow Setting in Washington State, Washington State Department of Ecology.

Littell, J. S., E. E. Oneil, D. McKenzie, J. A. Hicke, J. A. Lutz, R. A. Norheim, and M. M. Elsner (2010), Forest Ecosystems, Disturbance, and Climatic Change in Washington State, USA., Clim. Change, 102(1), 129–158.

Schaller, H. A. et al. (2014), Walla Walla River Bull Trout Ten Year Retrospective Analysis and Implications for Recovery Planning, US Fish Wildl. Serv. Columbia River Fish. Program Off. Vanc. WA.

U.S. Forest Service (1990), Umatilla Forest Land and Resource Management Plan, United States Department of Agriculture.

U.S. Forest Service (2014), Blue Mountains National Forests Proposed Revised Land Management Plan, U.S. Department of Agriculture.

April 7, 2017

Topographic constraints on climate drive range shifts in Great Basin small mammals

Filed under: My Spatial Problem @ 8:55 pm

A description of the research question that you are exploring.

In 2015, Elsen and Tingley published “Global mountain topography and the fate of montane species under climate change,” in the journal Nature – climate change. They combined two data sets, the global data set of mountain ranges from Natural Earth’s physical vectors and a high-resolution near-global DEM (SRTM30) to evaluate the assumption that available non-vertical surface area decreases with increasing elevation in mountain ranges. This is an important question in community ecology and conservation as it applies directly to a known relationship between species diversity and area established by the field of island biogeography and more widely applied in recent decades. Specifically, the relationship states that physical space increases, the number of species that space can hold increases. Elsen and Tingly addressed this question across 182 of the world’s largest mountain ranges. However, they did not apply their analytical methods to any of the mountains in the basin and range province, except for the Sierra Nevada Mountain range which make its western border and the Rocky Mountains on its eastern border. The basin and range in comprised of north-south trending mountain ranges and have been the focus of several studies addressing questions relevant to island biogeography. Additionally, modern (2009 – 2016) and historical (1929-1931) small mammal surveys have established small mammal community composition for a number of these mountain ranges. In order to better understand the dynamics underlying the distribution of small mammal species along elevation gradients in these ranges I would like to perform similar analysis to Elsen and Tingley on three ranges for which historical and modern small mammal communities have been studied, the Toiyabe mountain range, Ruby Mountain range, and the Snake range. Specifically, these analyses would address the question; to which hypsographic classification are species range shifts subject? I hope to better contextualize the range dynamics of species in response to climate vegetation changes. Furthermore, understanding how available area changes with elevation in these ranges will enable better predictions about a species ability to remain within its thermal envelope by moving up slope.


A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

Predominantly, I plan to use the SRTM30 DEM in combination with land-sat data to evaluate area as a function of elevation. I am not sure which dataset will be best to use for vegetation analysis. I have spatially explicit data on the distribution of small mammal species along elevation. However, small mammal species data is not continuous along the elevation gradient, or within habitat/elevation bands.

The following description of the SRTM30 dataset has been copied from


SRTM30_PLUS V1.0 November 11, 2004

INTRODUCTION: This data consists of 33 files of global topography in the same format as the SRTM30 products distributed by the USGS EROS data center. The grid resolution is 30 seconds which is roughly one kilometer. Land data are based on the 1-km averages of topography derived from the USGS SRTM30 gridded DEM data product created with data from the NASA Shuttle Radar Topography Mission. GTOPO30 data are used for high latitudes where SRTM data are not available. Ocean data are based on the Smith and Sandwell global 2-minute grid between latitudes +/- 72 degrees. Higher resolution grids have been added from the LDEO Ridge Multibeam Synthesis Project and the NGDC Coastal Multibeam Data. Arctic bathymetry is from the International Bathymetric Chart of the Oceans (IBCAO) [Jakobsson et al., 2003]. All data are derived from public domain sources and these data are also in the public domain. The pixel-registered data are stored in 33 files with names corresponding to the upper left corner of the array shown below.

The USGS SRTM30 data and documentation is available at


Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

Elsen and Tingley produced four categories of hypsographic classification based on the relationship between area and elevation for mountain ranges. These categories are Diamond, Hourglass, Inverse pyramid, and pyramid, these categories describe mountains that increase then decrease in area with elevation, decrease then increase, increase, or decrease, respectively. They found that approximately 68% of the 182 ranges they analyzed were not categorized by the pyramid shape category. Importantly, the pyramid category describes “the dominant assumption in ecology and conservation that area decreases monotonically with elevation from a mountain range’s base (Tingley and Elsen 2015).” Despite this finding, the basin and range province is the result of 40 million years of geologic and geomorphic processes, which include continental rifting, erosion, mantle uplift, Pleistocene pluvial lake bank erosion, glaciation, and Holocene warming and drying. Specifically of these forces, I expect that the listric normal faulting that characterizes the Basin and Range to have resulted in pyramid shaped mountains. I also expect that ranges will be spatially auto-correlated, with respect to the skew of east versus west facing slopes.


Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data. 

I would like to learn how to overlay mountain range polygons atop a hi-resolution digital elevation model. I will need to learn how to use the R-package “Raster” to make the hypsographic curves described above and by Elsen and Tingley. I would also like to use Arc to make maps.


Expected outcome:

I would like elucidate the area-elevation relationship to which small mammal species, and all species are subject to in the mountains of the Great Basin. Ideally, I would also love to evaluate available area for habitat type based on the distribution of vegetation type and % cover along elevation. In this regard, I would like to produce maps of habitat types and graphs that illustrate the percent cover of a habitat type as a percent of total available space on the mountain. It would also be nice to show which microhabitat features are associated with climate/ weather measurement stations in place on these mountain ranges. In addition, I would be excited if I could make predictions about the constraints on a species ability to move upslope based on available area, this may be represented as a plot of species richness against a combined elevation-area score.



The Great Basin has a unique geological and physographic history in North America and has an experienced accelerated pace of human impact throughout the Anthropocene. The combination of human land-use practices and climate change has significantly altered small mammal community composition, pushing it outside of its range of natural variability. Understanding the physical constraints to a species ability to persist on the landscape is a critical to predicting how shifting community dynamics are constrained by the physical environment.


Your level of preparation:

I feel comfortable using R and exploring new packages and I am typically able to find the support I need when I am trying to understand or apply the tools available in a new R package.  I have limited experience with Arc-Info and/or Arc-GIS, however I do have knowledge of a number of the basic rules to follow when using these tools. I have extremely limited experience with Python.

Socio-Demographic and People’s Intentions Relationship in a Neighborhood System Adapting to Floods

Filed under: 2017,My Spatial Problem @ 1:58 pm

Research question.

For this problem I want to answer:

How socio-demographic and spatial variables explain patterns of survey responses about attitudes regarding flood-safety in neighborhoods?  

Description of the dataset.

The dataset for the study has the following characteristics:

  • Data is obtained from Household voluntary survey
  • Convenience sampling. Households located within the 100 and 500  FEMA’s Flood hazard Map. All  at once.
  • Each participant answers socio-demographic and predefined intentions’ questionnaire
  • A printed coded survey questionnaire was mailed out to residents. The code is used to identify resident addresses.
  • 103 variables have been collected
  • Most of the variables are categorical

      An example of a typical variable to be analyzed:

  • Variable: Suppose your current home was to flood, how confident are you in the following possible conditions? – I will be able to evacuate my home before flooding begins (This variable is expressed in 5 categories of discrete values without hierarchy):
  • Categories:
    • Very confident
    • Confident
    • Neutral
    • Somewhat confident
    • Not confident at all

The spatial data consists of a map identifying land properties within the boundaries of the 100 year and 500 year flood hazard Fema’s map has been developed, as shown in Figure 1. The survey has been mailed out to randomly selected properties withing this map boundaries.

Figure 1. Map for South Corvallis affected properties according to FEMA’s 100 year and 500 year flood categories.


Attitudes regarding flood-safety in neighborhoods are clustered according to socio-demographic and spatial factors.


Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data.

Figure 2. Principal component (

For Geographic pattern and cluster analysis I will test:

  • Average Nearest Neighbor
  • Spatial Autocorrelation (Moran’s I)
  • High/Low Clustering (Getis-Ord General G)
  • Cluster and Outlier Analysis (Anselin Local Moran’s I)
  • Hot Spot Analysis (Getis-Ord Gi*)

Expected outcome:

I would like to find statistical relationships between the categorical variables collected from the survey according to its spatial location that define patterns formation. And also, maps of these relationships within the 100 year and 500 Fema’s Map of Figure 1.   


This research will contribute to policy and decision making for neighborhood adaptation to climate change. Patterns identification of attitudes regarding flood-safety in neighborhoods is important for planning adaptation to different flooding scenarios in order to minimize personal risks.

Level of preparation:

a) Arc-Info: Intermediate

b) Model Builder and/or GIS programming in Python: Intermediate

c) R, or other relevant  spatial analysis software: Beginner-Intermediate

Spreading of Red Blotch disease in vineyards

Filed under: 2017,My Spatial Problem @ 1:37 pm


Spatial patterns in disease spreading in agriculture are important to understand. For example, what causes the infection and how to avoid further spreading. In this project, we focus on the Red Blotch virus in vineyards located in Oregon. Red Blotch affects the sugar content in the grapes, which changes the taste of the produced wine, and symptoms involve red blotches on the leaves. Using remote sensing we might be able to develop an early warning system and get a better spatial understanding on how this disease spreads. Nowadays, the virus is spotted when the red blotches appear, which is in general too late to remove the plant and avoid spreading. Another method is to apply an PCR, Polymerase Chain Reaction, test which can detect infection in the leaves by looking at its DNA or RNA. Both these methods are inefficient in terms of time, labor, and money. Using an Unmanned Aerial Vehicle, UAV, or better known as a drone, enables us to get a bird’s perspective on the vineyards with the flexibility of controlling the spatial resolution (flying higher or lower) and temporal resolution (fly whenever you want). This is the big advantageous of UAV’s over satellite based imagery.

The first goal in this research is to develop the early warning system, but closely related, and maybe even dependent on that, is the second goal of understanding the spatial distribution of the disease spread.

The research question for this class will be; Is there a spatial correlation in the spread of the Red Blotch disease in vineyards in study sites in Oregon?


Methods and Materials

We will be using UAV’s to acquire the aerial imagery. Two different sensors will be used, the multispectral and hyperspectral camera. These differ in the number of bands, respectively 5 and 270 bands. The hyperspectral bandwidths range between 400-1000nm. The advantageous of hyperspectral over multispectral is the precision of the bands. This gives us more in detailed spectral information on the vineyards. Hyperspectral is used for its ability to detect chlorophyll fluorescence, which is highly correlated to plant health.

For the multispectral we use the MicaSence Multispectral Camera, RedEdge. Which has a spatial resolution of 8 cm/pixel at an elevation of 400 ft. Depending on the elevation we can increase or decrease this resolution.

The hyperspectral sensor is the Nano-Hyperspec, Headwall Photonics. Depending on the above ground level the spatial resolution is about 5 cm.

Throughout the growing season, starting in May until October, we will be flying every month or twice a month over a couple of vineyards. The exact location is still to be determined.



I expect that there is a high spatial correlation in the spreading of the disease. The virus is transmitted by grafting, but also a vector can play a role as vineyards seem to infect neighbors.

For my main research goal, I expect that we will be able to develop an early detection method as some symptoms of the disease are preceded by physiological changes in the plants, which we will hopefully be able to detect.



In this class, I would like to learn more about the spatial correlation and how to deal with that. In previous research, I have dealt with comparisons between mean values from spatial data, assuming independence in measurements. However, when you are analyzing a problem that is spatially correlated you need to take that into account.


Expected outcome

For this class I will be producing a method to determine which vineyards are affected and which ones are at risk of being affected. Most likely, this will be in the form of map. For this risk map, we need to know the statistical relationship between infected and not infected plants, and how large the probability of infection is.



Red blotch can significantly decrease the sugar content in the grapes, which can change the taste of the wines. At the moment, farmers remove vineyard which are infected. The earlier we know which vineyards are infected, the earlier they can be removed, and new plants can be planted.


Level of Preparation

  1. Arc-info advanced
  2. Modelbuilder and/or GIS programming in Python novice, but taking GIS-science III, GIS programming in Python
  3. R advanced beginner, some experience from STATS 511 and 512
  4. Matlab advanced

Stream Geomorphic Change Detection and Analysis in an Oregon Coast Range Basin

Filed under: My Spatial Problem @ 1:07 pm


My master’s thesis involves quantifying geomorphic change in streams after the addition of a large wood jam for salmon habitat restoration purposes. Large wood has been shown to have many positive effects on salmon habitat, including creating pools and backwater areas that are essential for juvenile salmon survival. This past summer, I surveyed seven stream reaches within a single basin in the Oregon Coast range to capture the instream and floodplain topography before the addition of a large wood restoration structure. I plan to survey these sites again next summer to capture changes in topography caused by the interaction of wood, water, and sediment.

The overall goal of my thesis is to examine how upstream drainage area and bankfull width affect the amount of geomorphic change after large wood is added to a stream reach. Since I only have the first half of my data gathered at this point, I will utilize cross sectional data from adjacent LW sites gathered by a previous Masters student for my analysis, developing the methods that I will eventually employ with my own data once I gather the second round of survey data next summer.

The questions I am hoping to answer are:

  • What are the most effective methods for making an accurate raster interpolation from survey points?
  • What is the best way to quantify geomorphic change?
  • How does stream size affect the amount of geomorphic change caused by the addition of a LW jam?


To conduct my analysis, I will use topographic data that I gathered using a Total Station over two separate summers, 2015 and 2016. The 2015 survey represents the stream topography before the LW was added, while the 2016 represents the topography after the LW was added and has interacted with the stream bed over one winter high flow season.

These topographic data consist of XYZ values that make up stream cross sections. The cross sections were spaced one half bankfull width apart. Points on the cross section were taken at the vertical and horizontal inflection points. The points were collected at a relatively fine scale, with an average of one point collected per 0.25-1 meter. Points were collected at a higher density in areas with more variation in elevation, usually the stream channel while points were more spaced out on the floodplain surrounding the stream channel where the ground is usually flatter, with less change in elevation. These values were exported as a .csv to ArcGIS after the survey was completed.

The survey points were not georeferenced. Instead, a control network of known survey points was established to ensure repeatability for revisit surveys. The first benchmark point was designated as the datum and assigned an arbitrary coordinate of 10000 m, 10000 m, 1000 m. Then other benchmarks were surveyed in using the high precision Total Station setting.

Possible sources of error in the data can be linked to field data collection methods. Examples include: the Total Station not being set up exactly level and aligned over the benchmark points, the benchmark point moving slightly, or the survey rod not being held exactly perpendicular to the ground surface on the point that is being marked. All of these variables can introduce small errors into the resulting survey points. Much care was taken to eliminate as much uncertainty in the data as possible, but it is certain that there were at least a few centimeters of error incorporated throughout the survey.

Figure 1. Points gathered at the mainstem Mill Creek site in 2015 and 2016.


My research focuses on geomorphic change at two levels: the reach level and the basin level. There have been many studies that examine geomorphic change induced by large wood jams at the reach level. Based on these previous studies, I would expect the channel to become more heterogeneous in terms of elevation and width after the addition of a LW jam. I would also expect the stream to get wider due to more inundation of the floodplain during winter high flows. Pools will form and deepen upstream of the jam.

At the basin level, I would expect LW jam reaches in streams of intermediate size to experience the most geomorphic change. This is based on the balance between two driving factors: percent contact between the LW and the stream channel (greatest at small sites) and size and duration of overbank flows (greatest at large sites). Since the data gathered by the previous Masters student was not intended to compare differences in geomorphic change regulated by scale, it will be hard for me to make conclusions using this dataset but I can still attempt to compare the amount of change between the three stream reaches.


Last term in GIS II, I experimented with different interpolation methods in ArcGIS to determine which method would produce the most accurate raster, based on comparison of interpolated raster elevation values and known survey point elevations. I also incorporated TIN editing processes to create more accurate DEMs. Then I can use these rasters to compare the stream geomorphology before and after the addition of a LW jam.

In this class I want to continue to refine my interpolation methods and incorporate geomorphic change detection analyses and statistics, including accounting for interpolation error and, ideally, field data collection error as well. I have recently been reading some papers that present new ideas for minimizing interpolation error, including removing the trend in the data before interpolating and then adding the trend back in afterwards so I am interested in exploring this method.

Once I am satisfied with my DEMs, I will use ArcGIS to subtract the “before” and “after” DEMs to create a DEM of difference (DoD). Then I will use either ArcGIS or R to quantify the amount of geomorphic change (percent or raw) that occurred.


  • Figures quantifying geomorphic change between the two surveys in terms of stream elevation and width
  • Statistical relationships between amount of geomorphic change before and after the addition of a LW jam
  • Statistical relationships between amount of geomorphic change at sites of different size (based on upstream drainage area and bankfull width)
  • Error analyses to determine how much of the change is “real”, incorporating field data collection and interpolation error


Many engineered LW studies have been conducted that examine LW jam effects on stream geomorphic change and juvenile salmonid populations. There have been many reach-scale analyses and a few larger-scale studies that synthesize data from many watersheds but there has been little quantification of the geomorphic and biological responses to engineered large wood jams on the watershed scale within an individual basin. This kind of research can help refine stream restoration efforts and optimize resource allocation.


ArcGIS: Intermediate level (I spent last term working with it. I can perform most basic manipulations and figure out what I don’t know how to do from the help menus and the internet.)

Python: Beginner level (I have used it in two classes to perform basic calculations and plot data/results.)

R: Very limited experience (I used it a few times in my undergraduate Statistics class but that was 7 years ago so I will probably be very rusty.)

Spatial Relationships of Tsunami Prediction Data for Data Assimilation Use

Filed under: My Spatial Problem @ 1:06 pm

My Spatial Problem Blog Post

  1. A description of the research question that you are exploring.

The primary objective of my work in this course is to obtain spatial correlation structures in simulated tsunami prediction data. These correlation structures are needed to implement a data assimilation procedure which will be used for risk and uncertainty analysis. Data assimilation is essentially a process by which observations are incorporated into a model state of a numerical model [1]. To incorporate observations, specific information about the correlation structures in these data need to be determined. In particular, the probability distributions and covariance matrices of the datasets will have to be determined/calculated.

The “research questions” I have are therefore: “What are the autocorrelation structures for the simulated tsunami inundation data?”, “How does the predicted tsunami inundation relate to the topography?”, “How can I automate the process of determining these correlation structures to incorporate into data assimilation processes?”

2. A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I will be using nearshore tsunami simulation data that predicts sea surface elevation and fluid velocities for a stretch of coast in southern California near Port Hueneme. There are 27 possible data sets each with a different initial tsunami source. This simulation data is provided by the Method of Splitting Tsunami (MOST) model which is the standard model used at the NOAA Center for Tsunami Research (NCTR) [2]. Inundation data for various simulated tsunami events over a duration of 216000 sec (= 3600 min) and over a geographical area of (-119.2469462, 34.1384259) to (-119.1949874, 34.2000000) will be used. There are 3600 time steps for each set of data on a 562 by 665 grid. The spacing between the longitude and latitudes are uneven but can be projected onto an even grid. A time series of sea surface elevation and fluid velocities is available for each geographical point (see Figure 1).

Figure 1 – Simulated tsunami inundation data using MOST v4 at Port Hueneme, CA for a simulated Cascadia source. Image is taken from the Performance Based Tsunami Engineering (PBTE) data explorer.

In addition to the tsunami data, I also have the topographical/bathymetric data for the region (which does not change in time) with the same spatial resolution as the gridded data. This data is presented in Figure 2.

Figure 2 – Topological/bathymetric data for Port Hueneme, CA in geographic coordinates (left) and UTM projected coordinates (right).

3. Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

The patterns in the data should reflect the physical characteristics of the local topography. I expect to see correlation patterns that reflect wave refraction, wave diffraction, shadowing effects, regions of dissipation and wave focusing. Overall, any patterns observed from the analysis of the data should make physical sense given the coastal geometry of Port Hueneme. I also expected the correlation structures to facilitate the data assimilation process to analyze uncertainties, but that is outside the scope of this course.

4. Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

Without much experience with geospatial data analysis, I am generally pretty open minded to techniques that help to determine correlation structures of my data. Some minimal research into geospatial data analysis techniques suggests that tools such as variogram analysis/modelling and/or spatial autocorrelation techniques could be useful for what I want to get out of the data. Since I am unfamiliar with all of these techniques, I intend to learn about any of the techniques I choose to implement for analysis.

5. Expected outcome: what do you want to produce — maps? statistical relationships? other?

I expect to be able to produce statistical relationships, namely correlation structures, possibly variograms, and covariance matrices. Additionally, I hope to be able to visualize them in some form so as to confirm their validity and for quick comparison from one tsunami source to another.

Given the amount of data at my disposal, it would nice to develop an automated process for producing a geospatial analysis (relationships, maps, etc.) at any given time step of my data. This would be greatly beneficial to my work and could be automated into the data assimilation procedure in the future.

6. Significance. How is your spatial problem important to science? to resource managers?

Tsunami inundation data has been very difficult to measure in the field throughout history due the inherent dangers of having people or instrumentation in the inundation zone. This fact, combined with the relative infrequency of tsunami events, forces managers and disaster planners to rely on prediction data from tsunami models [3]. The problem with this, however, is that these models are deterministic and do not give a good idea of what the uncertainty of these predictions are. This can be problematic when relying on these predictions to perform risk analyses.

To give managers better tools to plan for tsunami disasters, this spatial problem seeks to aid in the methodology provide uncertainty estimates for these predictions. Assessment of uncertainties for these model predictions can allow for more comprehensive and effective risk management. For the scale of the problem in this course, the spatial problem may simply highlight specific coastal features that may be particularly prone dangerous inundation conditions during a tsunami disaster. Additionally, the spatial analysis could highlight some tendencies of the tsunami predictions model that may or may not be good.

From a scientific perspective, this spatial problem is a stepping stone for an application of data assimilation that has yet to be performed. Additionally, spatial analysis of the model output data could provide valuable insight to how different features of a coastline might relate to one another in a tsunami inundation situation.

7. Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, or (d) other relevant spatial analysis software

As of the start of this course I have no experience in either Arc, GIS, R, or Python. I have sufficient experience in MATLAB and FORTRAN. I’m hoping to learn some combination Arc, R and Python and use those tools to perform the spatial analysis for my problem.


[1] Data assimilation. European Centre for Medium-Range Weather Forecasts (ECMWF). Retrieved from

[2] Numerical modeling of tidal wave runup, VV Titov, CE Synolakis (1999). Journal of Waterway, Port, Coastal, and Ocean Engineering 124 (4), 157-171,

[3] Wegscheider, S., Post, J., Zosseder, K., Mück, M., Strunz, G., Riedlinger, T., Muhari, A., and Anwar, H. Z.: Generating tsunami risk knowledge at community level as a base for planning and implementation of risk reduction strategies, Nat. Hazards Earth Syst. Sci., 11, 249-258, doi:10.5194/nhess-11-249-2011, 2011.

Spatial Relationships of Vegetation in Restored and Remnant Salt Marshes, Salmon River Estuary, Oregon

Filed under: My Spatial Problem @ 12:56 pm

My Spatial Problem Blog Post


  1. A description of the research question that you are exploring.

Salmon River is one of the smallest estuaries on the northern Oregon coast (254 acres), with the largest proportion of tidal marsh (68%) for any Oregon estuary. It borders Tillamook and Lincoln counties, and is designated as an Important Bird Area by The National Audubon Society.  Conservation Research at Salmon River Estuary has been a focus of government, non-profit, and educational institutions since the 1970’s due to concern over salmonid habitat and the impacts of sea level rise on the coast. Salmon River consists of public and protected wetlands that have been restored and protected since the U.S. Forest Service removed dikes from three sites in 1978, 1987, and 1996. Tidal flow to the ocean is currently unobstructed on sites that were previously used as pastureland and/or diked. One wetland on the estuary was never diked and is used as a reference marsh for field research, to determine functional equivalency of restored marshes. Salmon River Estuary has been a site for place-based restoration and studies of community recovery over the last 35 years (Flitcroft et al. 2016).

Factors influencing the richness and environmental integrity of Salmon River are associated with the physiognomic and taxonomic features of the plant community. This study focuses on the spatio-temporal distribution patterns of Salmon River vegetation to explore how remnant and restored marshes differ in terms of biodiversity and species composition. I expect that different durations of tidal exclusion through dike establishment will reveal differences between sites in the context of plant species composition and distribution. The biological research questions are as follows: What are the differences between remnant and restored marshes in the context of ecological gradients (tideland to upland)? How can those differences be explored with vegetation and soil data? How do these differences reflect land use history in this ecosystem? The methodological questions of interest for this research include: How do field methods, specifically nested, rectangular and non-nested, square (transect) plots capture spatial variation of vegetation differently? Do these field methods (transect versus nested plot sampling) differ in capturing species diversity? Ultimately, I would like to propose answers to who lives with whom, and why, for this specific estuary.


  1. A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I have collected species data (ocular estimations of percent coverage) from 1 m2 plots on transects from four tidal marshes: Mitchell Marsh (dike removed 1978), Y Marsh (dike removed in 1987), Salmon Creek Marsh (dike removed 1996), and one remnant marsh adjacent to Y marsh as a control (never diked). I also collected soil samples at each sampled transect plot, and tested them for salinity, conductivity, and bulk density, as well as nitrogen content. Transect plots were square shaped plots, 50 m apart in increasing distance from the tide.  My objective for data analysis is to describe the spatial patterns of the vegetation communities in tidal marshes of Salmon River Estuary after dike removal.

Stohlgren plots, also known modified Whittaker plots (MW), were established at each marsh site to collect data on species abundance for comparison with transect data. MW plots were implemented to test for patterns of diversity at multiple scales beyond what transect, square meter plots may capture within the same site. The restored and remnant sites have three MW plots each. The MW, plots were placed at a random distance 50 m along and 20 m offset from the sampled transects at each marsh for a stratified random sample design. At each MW plot, percent cover and presence/absence of species were estimated (with the aid of 1 meter square grids). Samples of all species identified were collected, pressed and are being examined to confirm identification. Elevation data were obtained from LiDAR surveys in 2015, and used to pinpoint elevation for all plots.

The datasets for my project include spreadsheets that describe percent vegetation cover, elevation, and soil characteristics per transect plot. MW plots were not sampled for soil and thus only have percent vegetation cover and elevation. The spatial grain of my study is one square meter, represented by the size of my smallest sampling frame for plots. With the nested sample plots and my stratified random sampling techniques, I have multiple spatial extents for this project. One extent could be considered the length of a transect (which vary by tidal marsh), the area of a MW plot (1,000 m2), or could arguably be extended to the entire Estuary. There are some interesting temporal aspects to my study as well; three of the four tidal marshes have experienced successive dike removal. These marshes have been surveyed for vegetation cover post dike removal and every 5 years subsequently. Incorporating these historical data will add dimension to my spatial and temporal analysis of variation at my study site.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I expect that plots from the reference marsh (Transect C, adjacent to Y Marsh) will be more diverse and heterogenous than tidal marshes that have been diked. I predict sites that have experienced dike removal more recently will have different species composition and be less diverse compared to reference sites. I also predict that time since dike removal will increase similarities between reference and restored sites. I anticipate that sites which have experience recent dike removal have soils with higher bulk density, higher nitrogen content, and higher salinity and conductivity readings, compared to remnant marsh plots. These differences are likely due to successive dike removal and differences between tidal marsh sites.

  1. Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to explore more statistical analyses with additional data related to my site. I have accessed historical data from this site related to vegetation cover on established transect plots. I want to compare the vegetation coverage I surveyed in July 2016, to coverage of decades prior with ordinations (Non-metric Multidimensional Scaling), MRPP (Multi-Response Permutation Procedures), and ISA (Indicator Species Analysis). Most of this will be achievable with PC-Ord, though there are likely opportunities with Python/ArcInfo and R as well. I also want to explore vegetation cover data in species area relationships, with species accumulation curves. There are many software options with which I can do this as well.

  1. Expected outcome: what do you want to produce — maps? statistical relationships? other?

I plan to focus primarily on producing and examining statistical relationships with my research. My project is a highly-localized case study, with a temporal component. With statistical software, I will be able to produce graphics that will aid in visual interpretation of patterns as well, in conjunction with statistics. From that point I would like to connect my statistical analyses with biological interpretations to ground my work in context and understand how the Estuary is changing over time, post dike removal.

  1. How is your spatial problem important to science? to resource managers?

The spatial problem I have chosen to investigate is of importance to scientists, as it provides further insight into how Pacific Northwest Coastal Estuaries recover from land use and disturbance, a phenomenon that has not been thoroughly studied yet. Estuaries have historically served as habitat and resources for keystone salmonid fish species, invertebrates, migratory birds, waterfowl, and mammals (such as beavers), particularly in the Pacific Northwest (Klemas 2013). Restoring these habitats is critical for protecting wildlife, managing wetland resources and eco-services, and maintaining our shorelines, especially as we face sea level rise from impending climate change. Threats to environmental stability in the case of wetlands can also harm their cultural value. Wetlands have inherent eco-beauty and are among the many natural systems associated with outdoor recreation. If wetlands are disturbed via pollution, compaction, or compositional change in vegetation, little is understood about the certainty of recovery in the context of reference conditions.

Factors influencing the richness and environmental integrity of estuaries like Salmon River are associated with the physiognomic and taxonomic features of the plant community. This study focuses on the spatio-temporal distribution patterns of Salmon River vegetation to explore how remnant and restored marshes differ in terms of biodiversity and species composition. Salmon River Estuary is especially noteworthy due to its unique management history. Salmon River is exceptional compared to other Oregon coastal wetlands, as it was federally protected before any industries could establish influence. Arguably, Salmon River has avoided most disturbance from development because of its relatively small size; there have been no instances of dredging or jetty construction for the purposes of navigation. Previous use as pastureland with dike establishment in the 1960’s is the dikes established in the 1960s and removed from 1978 onwards encapsulates the majority of known human influence on the marsh. Beginning in 1978, periodic vegetation surveys on site with long term ecological research at Salmon River has created intimate knowledge of the estuary and promoted ecological sustainability.

There has been a dramatic shift with regards to wetland protection and how our government and the public views them. Over the last few decades, policies promoting wetland conversion and development have been exchanged for protection and regulation initiatives. Wetland management goals today are largely focused on restoration to compensate for loss and damage, which has forged new industries tasked with wetland recovery and monitoring. However, some mitigation project datasets and sites are too small to collect useful data or make a meaningful impact on a large environmental scale. It is necessary to amass a variety of high quality data on larger wetland areas over longer periods of time to address how natural recovery processes may be employed for wetland conservation. Salmon River is an excellent long term case study for examining the prospect of rehabilitation for ecosystem functionality and reference conditions (Frenkel and Moran 1991; Frenkel 1995; Flitcroft et al. 2016).

  1. Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, or (d) other relevant spatial analysis software

I have previous experience with Arc that I have developed and expanded upon at Oregon State while pursuing my MS in Geography. I have taken Dr. Robert Kennedy’s Python Programming for GIS course, which introduced me to Python and provided additional experience with Modelbuilder. I have limited experience with R (I have only used it a handful of times). Ultimately, I have analyzed most if not all of my data thus far in PC-ORD, under the guidance of Dr. Bruce McCune (the software creator), which enables a variety of ordination and multivariate analysis options.

Literature Cited

Adamus, P.R., J. Larsen, and R. Scranton. 2005. Wetland Profiles of Oregon’s Coastal

Watersheds and Estuaries. Part 3 of a Hydrogeomorphic Guidebook. Report to

Coos Watershed Association, US Environmental Protection Agency, and Oregon

Depart. of State Lands, Salem.

Borde, AM, Thome, RM, Rumrill S, Miller, LM. (2003). Geospatial Habitat Change

Analysis in Pacific Northwest Coastal Estuaries. Estuaries, 26, 1104-1116.

Flitcroft, RL, Bottom, DL, Haberman, KL, Bierley, KF, Jones, KK, Simenstad, CA, Gray, A,

Ellingson, KS, Baumgartner, E, Cornwell, TJ and Campbell, LA. 2016. Expect the

unexpected: Place-Based Protections can lead to Unforeseen benefits. Aquatic

Conservation: Marine and Freshwater Ecosystems (26): 39-59.

Mather, P.M. 1976. Computational methods of multivariate analysis in physical

geography. J. Wiley & Sons, London. 532 pp.

McCune, B. and J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software,

Gleneden Beach, Oregon, USA (

National Soil Survey Center, Natural Resources Conservation Service, U.S. Department of

Agriculture (2009), Soil Survey Field and Laboratory Methods Manual. Soil Survey

Investigations Report No. 51.

Stohlgren, TJ, Falkner, MB, and LD Schell. 1995. A Modified-Whittaker nested vegetation

sampling method. Vegetatio 117: 113-121.

Weilhoefer, C, Nelson, WG, Clinton, P, Beugli, DM. (2013). Environmental

Determinants of emergent macrophyte vegetation in Pacific Northwest estuarine

tidal wetlands. Estuaries and Coasts, 36, 377-389


Habitat use of blue whales New Zealand’s industrial South Taranaki Bight region

Filed under: My Spatial Problem @ 12:44 pm

Research objectives

My research focuses on the ecology of blue whales (Balaenoptera musculus brevicauda) in New Zealand’s South Taranaki Bight region (STB). Despite the recent documentation of a foraging ground in the STB (Torres et al. 2015), blue whale distribution remains poorly understood in the southern hemisphere. The STB is New Zealand’s most industrially active marine region, and the site of active oil and gas extraction and exploration, busy shipping traffic, and proposed seabed mining (Torres 2013). This potential space-use conflict between endangered whales and industry warrants further investigation into the spatial and temporal extent of blue whale habitat in the region. My goals are to investigate the relationship between blue whale presence and their environment, and subsequently to examine how their space-use overlaps with industry presence. Specifically, I intend to:

  • Quantify the relationship between sea-surface temperature (SST), chlorophyll-a (chl-a), krill density, and blue whale presence
  • Investigate the spatial overlap between blue whale presence and oil and gas extraction platforms, the Trans-Tasman Resources Ltd. proposed seabed mining site, and shipping traffic

Map of New Zealand with the South Taranaki Region indicated by the white box.


A blue whale surfaces in front of an oil rig in the South Taranaki Bight. Photo by Kristin Hodge.


I will be working with data collected during vessel-based surveys in February of 2014, 2016, and 2017 in the STB. Blue whale sighting location and group size were recorded by observers during the surveys. To record the oceanographic conditions throughout the water column, profiles of water column depth, temperature, and salinity were recorded using a Sea-Bird microCAT (SBE 911plus) Conductivity, Temperature and Depth (CTD) sensor approximately every hour during survey and at every blue whale sighting. Krill density and patch size will be quantified from hydroacoustic backscatter data collected with a Simrad EK60 echosounder (Simrad ES120-7DD splitbeam transducer, 120kHz transceiver, 250 W, 1.024 ms pulse length, 0.5 s ping rate). The echosounder data have not yet been processed, however I hope to do so this term so that the prey data can be included in these analyses.

In addition to the in situ data collected during our surveys, I plan to incorporate satellite imagery of SST and chl-a concentration for the region. I will use satellite data generated from NASA Moderate Resolution Imaging Spectrometer (MODIS).

I have been provided with the locations of the oil and gas drilling platforms and the proposed site for the iron sands seabed mine. I will use ship automatic identification system (AIS) data for the shipping traffic layer.


  • Blue whale presence will show a positive relationship with chl-a concentration and krill density
  • Blue whale presence will show a negative relationship with SST
  • There will be apparent inter-annual differences in blue whale sighting distribution, reflecting the strong El Nino conditions seen in 2016
  • Blue whale presence will overlap spatially with industrial activities


I plan to use ArcMap to visualize blue whale presence and the layers of oceanographic data I have described previously (in situ SST and prey density, remote-sensed SST and chlorophyll-a). I will then use R to compute either a generalized linear model (GLM) or generalized additive model (GAM) to evaluate the association between blue whales and these oceanographic variables, with blue whale presence as the response variable.

For the overlap between blue whale presence and industry, I intend to use ArcMap to visualize this spatial overlap by creating buffers around the stationary platforms and the proposed mining site and examining how often blue whale sightings took place within those buffers.


I intend to produce map figures that will show oceanographic measurements interpolated over our study area, overlaid with our blue whale sighting locations for each year of study. I hope to be able to report the model results that quantify the impact of our measured oceanographic conditions on blue whale presence.

My priority for this course is the habitat analysis and modeling. The industry overlap will be more of a visual examination at this stage, and more quantitative analyses of impacts will take place subsequently once a foundational understanding of blue whale habitat use in the region has been established.


Despite their enormous size and once-large global population, relatively little is known about blue whale distribution and habitat use due to their elusive nature and relative inaccessibility for study. Blue whales have extremely high metabolic demands in addition to employing an energetically expensive lunge-feeding foraging strategy (Croll et al. 1998, Goldbogen et al. 2011, Hazen et al. 2015). The ability to consistently locate dense patches of prey is therefore critical to blue whale survival, making the documentation of foraging grounds such as the STB region important. The STB region presents a unique opportunity for studying this species in a location where they seem to be consistently found in high abundance and relatively close to shore. But beyond their relative accessibility, the understanding of blue whale habitat use on a foraging ground will contribute to the body of knowledge on these endangered and little-studied whales.

Under the New Zealand threat classification system, blue whales are currently listed as ‘Migrant’ in New Zealand waters. However, our preliminary analyses point toward the possibility of a resident population of blue whales in New Zealand. Observations of foraging, breeding, and nursing behaviors demonstrate the likelihood of the STB form multiple critical life history functions. The strong industrial presence in the region and the ongoing push for industry expansion makes gaining an understanding of the spatial and temporal extent of blue whale habitat critical for management decisions.

A blue whale mom and calf surface in the South Taranaki Bight. Photo by Dawn Barlow.


I have some experience with Arc and R from coursework and my own preliminary analyses. I think that with more time and more practice I will grow more confident in my ability to use them. I have used MATLAB some, but mostly as a platform for running more specialized software programs specific to my field. I have no experience in python.


Croll DA, Tershy BR, Hewitt RP, Demer DA, Fiedler PC, Smith SE, Armstrong W, Popp JM, Kiekhefer T, Lopez VR, Urban J, Gendron D (1998) An integrated approach to the foraging ecology of marine birds and mammals. Deep Res II 45:1353–+

Goldbogen JA, Calambokidis J, Oleson E, Potvin J, Pyenson ND, Schorr G, Shadwick RE (2011) Mechanics, hydrodynamics and energetics of blue whale lunge feeding: efficiency dependence on krill density. J Exp Biol 214:131–146

Hazen EL, Friedlaender AS, Goldbogen JA (2015) Blue whales (Balaenoptera musculus) optimize foraging efficiency by balancing oxygen use and energy gain as a function of prey density. Sci Adv 1:e1500469–e1500469

Torres LG (2013) Evidence for an unrecognised blue whale foraging ground in New Zealand. New Zeal J Mar Freshw Res 47:235–248

Torres LG, Gill PC, Graham B, Steel D, Hamner RM, Baker S, Constantine R, Escobar-Flores P, Sutton P, Bury S, Bott N, Pinkerton M (2015) Population, habitat and prey characteristics of blue whales foraging in the South Taranaki Bight, New Zealand.

Nitrate Abundance in 10 Oregon Watersheds as linked to percentage of Red Alder Forest Cover

Filed under: My Spatial Problem @ 11:31 am

Research Description

I am exploring aquatic chemistry among 10 streams in the Oregon Coast range. I am running anion analysis over the period of 1 year, sampling monthly. These streams are divided into three categories, three are located East of Mary’s Peak in the Rock Creek Watershed, three near the top of Mary’s peak (highest in elevation), and four streams are sampled on the Oregon Coast within 200 meters of the Pacific Ocean. I would like to perform a comparison analysis of anions between the three categories, and find if there is any significant difference in anion abundance between the three locations and over time (6 months of data collected so far).

Dataset Analysis

The dataset I will be analyzing includes temporal anion data (for 6 months, project still in progress) and GPS points for all 10 site locations. The anions analyzed include chloride, sulfate, fluoride, and nitrate. Nitrate may be the most interesting as far as ecological impact is concerned (known to be limited in aquatic ecosystems), but chloride has also been an abundant anion while looking at the data.


I hypothesize that due to the influence of Red Alder in the forested watersheds of the Oregon Coast range, the Oregon Coast and Rock Creek watershed streams will have a higher amount of measurable nitrate than those streams sampled near the top of the watershed (near Mary’s Peak). I also predict that Nitrate would be higher in the Fall (November and December) than in the spring (March and April) due to the “flushing” of nutrients from the forest floor after summer low flows.

Analysis Approach

I would like to perform statistical analysis on these streams as they relate to nitrate using R and map the differences using Arc GIS. Preferably I would like to delineate the 10 watersheds and then do a comparison analysis of the percentage of Red Alder in each watershed, and the amount of nitrate data collected in the stream.

Expected Outcome

I would like to produce a map showing nitrate export in all 10 study sites as well as statistical evidence of the different categorical zones. This map will include the 10 delineated watersheds, one watershed is “nested”, therefore the output will likely include 9 watersheds and rank the watersheds based on red alder abundance and nitrate concentration. I’d also like to explore a statistical analysis comparing the three categorical locations and nitrate concentrations.


Baseline data collection for anions in Oregon Coast Range streams over a year has not been done before. In the face of a changing climate, it is important to document and characterize stream chemistry for forest managers in the future. Increases in nitrates in these streams can affect local flora, fauna as well as increases nutrient loading to the ocean. This loading can affect macroinvertebrate and salmon habitat which can alter ecosystem dynamics and services that currently exist.

Level of preparation

I have basic knowledge of R, but I think it would the most useful tool for statistical analysis of the data. Arc-info is a program I at one time used daily but it has been many years since. I would like to be able to map the anion data between the 10 sites but may struggle in the beginning linking the data with Arc. I have no experience with Python or other relevant spatial analysis software.

Patch dynamics for short-interval disturbances of beetle outbreak and wildfire

Filed under: My Spatial Problem @ 11:05 am

Research Question

What controls high-severity patch size during wildfire when landscape have large portions of dead forest from a prior disturbance of beetle outbreak?


Fire is a complex landscape process that is controlled by the heterogeneity of vegetation/fuels, topography, and weather.  The landscapes created by fire may dictate forest structure and function for decades. In central interior British Columbia, lodgepole pine (Pinus contorta var. latifolia) forests have been the epicenter of a recent regional outbreak of mountain pine beetle (Dendroctonous ponderosae), which has resulted in widespread tree mortality and potentially increased forest vulnerability to wildfire. While lodgepole pine landscapes are accustomed to large-scale singular disturbances of beetle outbreaks and wildfire, beetle outbreaks followed by wildfires are less familiar. Warm winters and an abundance of mature lodgepole pine have facilitated the unprecedented scale of beetle activity in the region. The resulting tree mortality alters vegetation/fuel structure, fire behavior, and subsequent fire severity. It is unclear if the resulting landscape structure from beetle outbreak influences the subsequent patchwork from fire.  A number of regional studies have examined the controls of post-fire patch size under a single disturbance of wildfire (Collins and Stephens 2010, Harvey et al. 2016, Reilly et al. 2017).  An important knowledge gap I have identified is the role of underlying disturbance structure on patch dynamics during wildfire events.

In this course, my primary object is to determine the controls of patch size for high severity patches under conditions of consecutive landscape scale disturbances of beetle outbreak and fire.  In order to do this, there are a number of intermediary steps/questions to ask and answer:

  1. What are the patch sizes following beetle outbreaks?
  2. What are the patch sizes following wildfire?
  3. What patches are high-severity fire?
  4. What are the topographic roughness characteristics associated with each high severity patch?
  5. What is the radiative output (proxy for burning conditions) for each high-severity patch during day of burn?
  6. What is the dominant vegetation type and moisture level associated with each high severity patch prior to disturbance?
  7. What is the relationship between each explanatory variable – topography, burning conditions, vegetation, and beetle outbreak patch size, to the response variable –high-severity patch size?


Figure 1. Post-fire photo of a fire that burned through beetle-killed forest in 2014 with various patches of fire severity.

 Description of datasets

  1. The spatial extent of all layers are defined by the fire perimeters for three different fires that burned in 2012, 2013, and 2014.
  2. Tree mortality from the bark beetle outbreak has been identified by calculating a Normalized Difference Vegetation Index (NDVI) layer derived from Landsat 30m resolution from one year prior to each fire event.
  3. Fire severity patches will serve as the response variable. Fire severity patches were generated from calculating the Relative differenced Normalized Burn Ratio (RdNBR) from Landsat 30m resolution (Miller and Thode 2007). RdNBR is calculated by generating an NBR image for both pre- and post-fire, then calculating the dNBR, which is then used to calculate the RdNBR. The purpose of RdNBR is to minimize the bias of the pre-fire conditions.  This relativization process allows for comparison across fire events with minimal biasing from differences in pre-fire conditions.
  4. The pre-disturbance landscape structure including vegetation and moisture types will be based on a polygon shape file with the biogeoclimatic zones for the region. Biogeoclimatic zones are a specific classification system for British Columbia, Canada. Zones are based on climax vegetation while accounting for climatic and edaphic conditions (Meidinger and Pojar 1991).
  5. The gently rolling landscape of the central interior offers little topographic relief. While topography may not be a strong control over patch size, we will account for topographic variability with a layer for topographic roughness. Topographic roughness was calculated from a 3-by-3 moving window using a 25 m Digital Elevation Model (DEM).
  6. Weather is considered a strong control for fire severity and patch size. Day of burn weather conditions are based on the thermal band hotspot data from MODIS. The hotspot data is a point shape file that includes a recording of radiative energy for specific point locations.


My ecological hypothesis is that burning conditions (i.e. weather during day of burn) exert strong control over patch size.  Fire conducive weather, dry and windy conditions, control large patch development by facilitating crown fire in lodgepole pine dominated landscapes.  Beetle outbreaks change the landscape structure by killing trees, which results in decreased continuity of canopy fuels. The lack of canopy fuels may constrain fire spread to the surface and thus limiting patch size. However, fire weather may still override the changes in fuel arrangement and continuity created by beetle outbreak, where dry windy conditions produce large patches regardless of underlying patch characteristics from the beetle outbreak.


  1. Calculate patch variables for tree mortality from the beetle outbreak using Fragstats (McGarigal et al. 2012). Characterize patch size, complexity, and configuration.
  2. Calculate patch variables for high-severity patches using Fragstats (McGarigal et al. 2012). Characterize patch size, complexity, and configuration.
  3. Overlay layers and extract explanatory variables (topographic roughness, radiative energy, beetle outbreak patch size) that correspond to individual patches in ArcMap.
  4. Conduct multiple regression analysis to determine which variables influence high severity patch size using R statistical software.

Expected Outcomes

I would like to produce maps showing characteristics of the landscape. I would also like to have analysis showing if there is a relationship between any of these variables and patch size.


Overlapping disturbances of beetle-outbreak and wildfire will influence landscape structure and function for decades in central interior British Columbia, Canada.  It is important to characterize patch dynamics to generate ecological context for overlapping disturbances. This research can provide insight on patterns and controls for fires that were allowed to burn unhindered by suppression, which is important for understanding contemporary disturbance regimes and providing guidance for management decisions.

Level of Preparation

  • Arc: fairly proficient
  • Model builder and Python: have used both, but probably a bit rusty; novice
  • R: growing knowledge; able to conduct statistical analysis and process Landsat imagery


Collins, B. M., and S. L. Stephens. 2010. Stand-replacing patches within a “mixed severity” fire regime: Quantitative characterization using recent fires in a long-established natural fire area. Landscape Ecology 25:927–939.

Harvey, B. J., D. C. Donato, and M. G. Turner. 2016. Drivers and trends in landscape patterns of stand-replacing fire in forests of the US Northern Rocky Mountains (1984???2010). Landscape Ecology 31:2367–2383.

McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Meidinger, D. V, and J. Pojar. 1991. Ecosystems of British Columbia. Page B.C.

Miller, J. D., and A. E. Thode. 2007. Quantifying burn severity in a heterogeneous landscape with a relative version of the delta Normalized Burn Ratio (dNBR). Remote Sensing of Environment 109:66–80.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Integrating Mobile and UAV-based Lidar Point Clouds to Estimate Forest Inventory

Filed under: 2017,My Spatial Problem @ 8:49 am

1. The research question I’m exploring is: “What is the relationship do spectral reflectance signatures (across 5 multispectral bands) of southwestern white pine (P. strobiformis) seedlings in common garden boxes have with seedling position within the boxes or position within the plots?”

2. My raw data consist of 500 photos taken from a UAV of common garden boxes in northeastern Arizona. I used Agisoft Photoscan to compile the images into a 5 layer orthomosaic (Image 1) where each layer represents a discrete band from the multispectral sensor (Table 1). One of the 20 boxes was blurred in the reconstruction and omitted from subsequent processing leaving 19.


Image 1: Orthomosaic image of common garden boxes containing southwestern white pine (P. strobiformis) seedlings in Kaibab National Forest, Arizona, USA.

Table 1: Micasense Rededge multispectral sensor band designations and spectral spans.

Processed mosaic images are georeferenced using RTK GPS coordinates for targets which were arranged around perimeter of the AOI. Once georeferencing is completed, the stacked orthomosaic is exported from Photoscan as a TIFF file which can be viewed and manipulated further in ArcMap or R. Spectral indicies which inform plant physiology such as NDVI and TGI can easily be added using raster algebra (Image 2).

Image 2: Normalized Differential Vegetation Index (NDVI) layer for one box of seedlings

3. Based on visual analysis of NDVI there appears to be some patterns both at the box and plot levels. My hypothesis is that spatial statistics will reveal correlations between spectral reflective signatures of individual plants and the location of the plant within its box and/or within the plot as a whole.

4. After the NDVI and TGI layers are added to the data the real processing begins. Because plants are tightly spaced in some of the boxes, automatic segmentation of the individual seedlings was not an option. As a result, it is necessary to assign points at the center of all plants in the AOI (1697 seedlings from 19 boxes). Each seedling was assigned a ‘boxID’ field with a value corresponding to its box. Also, layers were created with points for the center of each box and for the point in the center of the whole plot.


Non vegetation features were masked from the data using an NDVI vegetation mask. All pixels within the boxes were classified into two categories using the NDVI layer (vegetation and non-vegetation). Pixels classified as vegetation within the boxes were exported as a separate raster and, Voila!, the mask was born (Image 3).

Image 3: Vegetation mask from NDVI layer for one box

The next postprocessing step is to create buffer zones around the plant points to which the plant pixels could be clipped. Rather than making the buffers uniform in size with some arbitrary diameter I wanted to use a more repeatable and defensible method. The solution I came up with was to tailor the buffer zones according to the distance to the closest neighboring point. The radii of the buffers were calculated as R= 0.45 * nn,  where nn is the distance to nearest neighbor (image 4).

Image 4: Plant buffer zones drawn according to 0.45 times the distance to the nearest neighboring point.

5, Although there will be many opportunities along the processing workflow to generate tables, figures, and images to help tell the story of this study, my main objective is quantifying the effects of box and plot position on mean canopy reflectance across the 5 bands and 2 spectral indices.  As a means for simplifying this, I plan to run PCA to try and consolidate the spectral data into a smaller number of principal components. The principal components can be compared to the spatial variables using multiple linear regression.

6. If this method works, it would be significant because genetic trials could be evaluated remotely. Common garden experiments are widely used for detecting phenotypes that are resistant to disease, drought, or pests. Traditionally, on-the-ground evaluations of plant health including subjective visual surveys are used to identify candidates for resistance. This method sets up a workflow for objectively and remotely classifying seedlings from common garden boxes and automates much of the segmentation of individual plant canopy pixels.

7. I have a strong foundation in ESRI ArcGIS and Agisoft Photoscan. In the past 9 months, I have become familiar with R for a variety of applications and plan to use it extensively in my analysis. One of the major issues I expect to deal with is related to the of the datasets I’m working with. The images and reconstructions can be 5-10GB in the raw form and tend to swell during processing. As a result, I would also like to use this class as an opportunity to think about how I can make more manageable files by extracting only the information that I’m interested in using and using geometric models to simplify highly detailed representations.

Sensitivity of Forest Structure Parameters Estimation to Unmanned Aircraft System (UAS) Data Processing Algorithms

Filed under: 2017,My Spatial Problem @ 12:40 am


The emission of Carbon Dioxide (CO2) is one of the greatest factors causing climate change. Forests, as the largest terrestrial carbon reservoir have an important role to reduce CO2 emissions and to prevent climate change. Considering the important role of forests to reduce CO2 emissions and prevent climate change, it is necessary to manage forests sustainably. In sustainable forest management, forest manager must frequently monitor and measure forest structure parameters, such as number of trees, tree locations, tree heights, crown width, biomass, and forest carbon. However, these forest structure parameter, including forest carbon, are usually measured through conventional field measurements, which are costly and labor intensive. New technology, such as Unmanned Aircraft Systems (UAS) can be used as alternative methods to monitor and measure forest structure parameters. There are some programs designed to process UAS visual imagery data and measure forest structure parameters from the data, for example Agisoft PhotoScan, PiX4D, Fusion, Trevaw, TrEx, and Wathershed.

Research question:

The purpose of my research is to evaluate several programs that are designed to measure forest structure parameters, such as number of trees, tree locations, diameter breast height (DBH), tree heights, and crown width which are needed to estimate biomass and forest carbon.  I will base my comparisons against manually segmented tree inventory counts, measured using Fusion software, that are randomly sampled from a hierarchical design. The research will answer questions: (1). How does the accuracy differ between data processing programs to measure forest structures, biomass, and carbon?; (2). What is the best program and model to measure forest structures, biomass, and carbon?. However, for the purpose of this class, I might eliminate some forest structure parameters and focus on parameters number of trees and tree locations.


I hypothesize that all software used in this study can be used to measure forest structure parameters and to estimate forest biomass and carbon, but all of the software have different accuracy due to different algorithms used in each one of the software.


This research will use Unmanned Aircraft System (UAS) visual imagery data that is taken from Sony camera NEX-5R mounted in a fix wing UAS. The data was gathered in Siberut National Park in West Sumatra Province, Indonesia. There are 3 flights with 300 images in each flight. The image resolution is 350 x 350 inch. I will discuss it with my committee members whether or not I should use all the images. I think I will focus on one flight for this class.


This study will use a modeling approach to measure forest structure parameters based on Unmanned Aircraft System (UAS) data. The sampling plots will be selected hierarchically using ArcGIS. The 3D point clouds will be generated using Agisoft PhotoScan and PiX4d. Then, based on the 3D point clouds, forest structure parameters, which include number of trees, tree locations, diameter breast height (DBH), tree heights, and crown width, will be measured using some forest data processing programs (Fusion, TreVaw, TrEx, and Watershed). In addition, I will develop allometric equations and use those equations to statistically measure biomass and carbon based on the tree heights and diameter breast height (DBH) or canopy width. The allometric equations will be based on the linear relationship between forest structure parameters (tree heights, DBH, and canopy width), forest biomass, and forest carbon. I will use RStudio to do the statistical analysis (RMSE and Linear Regression Model). Again, for the purpose of this class I will only focus on number of trees and tree locations. I also will not use all the programs for this class, Maybe I will focus on ArcGIS, Agisoft, Fusion, and TreVaw.

Image 1. The example of hierarchically selected sampling plots using ArcGIS.

Image 2. The example of 3D point cloud generating process in Agisoft PhotoScan.


Expected outcome:

If the hypothesis is true, then the accuracy of each program will be different due to different algorithm. There will be one best program and model compared to manual segmentation using Fusion. However, all the automatic segmentations will not be too different compared to manual segmentation (will be proven by statistical method).


This research will assess and propose UAS as a low cost alternative method for measuring forest structure parameters. Forest structure parameters are usually measured by forest managers using conventional methods of field ground measurements. These conventional methods are costly and labor intensive. By using UAS, forest managers will be able to monitor forest structure parameters in a faster and cheaper way. Furthermore, this research will assess and find the best program and model to estimate forest structure parameters through automatic segmentation. This new method will be very useful, especially when it is used in remote areas with limited accessibility, where it is almost impossible to do field ground measurements.

Level of Preparation:

I have some experience in using Arc GIS but it is just basic thing, and I need to learn more about this software. I used RStudio in two statistic classes (ST 511 and ST 512), and now I am taking ST 513. For other software that I will use in this research (Agisoft PhotoScan, PiX4D, Fusion, TreVaw, TrEx, and Watershed), I do not have much experience with all these software but I will learn how to use them.


April 6, 2017

Evaluating interspecific competition between wolves and cougar in northeast Oregon

Filed under: My Spatial Problem @ 3:36 pm

Background & Research Question(s)

The research I am conducing is focused on understanding and quantifying the competitive interactions between two apex predators, wolves and cougar. Populations of large carnivores have been expanding across portions of their historical range in North America, and sympatric wolves (Canis lupus) and cougars (Puma concolor) share habitat, home ranges, and prey resources (Kunkel et al. 1999, Husseman et al. 2003, Ruth 2004), suggesting these coexisting predators may be subject to the effects of interspecific competition (interference or exploitative). Interspecific competition among carnivores can affect the spatial distribution, demography, and population dynamics of the weaker predator (Lawton and Hassell 1981, Tilman 1986), but demonstrating asymmetric competition affects through quantified measures or experiments has remained difficult for highly mobile terrestrial mammals like large carnivores. Generally, it is expected that cougar are the subordinate competitor in wolf-cougar interactions, but the frequency and strength of agonistic interactions can be system specific and will determine the overall influence either predator has on the population dynamics of the community (Kortello et al. 2007, Atwood et al. 2009, Creel et al. 2001). My goal is to assess the influence wolf recolonization in northeast Oregon has had on cougar spatial dynamics. For this course, I plan to focus on two analyses associated with questions about how wolves influence the distribution of sites where cougar acquire food (prey) resources and how wolves influence cougar movement paths. More specifically, I hope to address:

  1. Is niche partitioning of kill sites evident between wolves and cougar in northeast Oregon?
  2. Has wolf presence/recolonization changed where cougar kill prey?
  3. Has wolf presence/recolonization changed cougar movement dynamics?


The data I’ll be using to address the above questions comes from location points downloaded from global positioning system (GPS) collars that were deployed on a sample of wolves and cougar in northeast Oregon (Figure 1). Collars were programmed to obtain 8 locations per day and were downloaded very 2-3 weeks to identify location “clusters” for field investigation as potential kill sites. After identifying “clusters” from location data for each carnivore, we loaded the coordinates for the clusters onto handheld GPS units and systematically searched surrounding areas for prey remains. When prey remains were located, we recorded the spatial coordinates of the kill site and determined the species, age, sex and physical condition of the prey item that was killed. This process was done before (2009 – 2012) and after (2014 – 2016) wolves recolonized a 1, 992km2 area (Mt. Emily WMU) and produced a data set of kill site locations for cougar (npre-wolf = 1,213; npost-wolf = 541) and wolves (n = 158).


Figure 1. Location of the Mt. Emily Wildlife Management Unit in northeast Oregon and global positioning system (GPS) locations for cougars monitored to determine kill site distribution and spatial dynamics pre- (2009-2012) and post-wolf recolonization (2014-2016).


Based on competition theory and emerging evidence on wolf-cougar interactions in other systems (Alexander et al. 2006, Kortello et al. 2007, Atwood et al. 2009), I expect cougar and wolves in northeastern Oregon to exhibit resource partitioning in foraging niche (resource partitioning hypothesis). Representative evidence for resource partitioning between carnivores would be if cougar kill sites occur on the landscape in areas disparate from wolf kill sites. Further, I expect the presence of wolves to affect cougar movement patterns and spatial distribution, which may be evident through a shift in the distribution and space used by cougar between pre- and post-wolf recolonization periods (niche shift [competitive exclusion] hypothesis). Under this premise, I would also expect cougar to alter their movement and space use relative to pre-wolf recolonization patterns (active avoidance hypothesis).

  1. Is niche partitioning of kill sites evident between wolves and cougar in northeast Oregon?

H1: Resource partitioning hypothesis – cougar kill site distribution on the landscape will occur spatially partitioned from wolf kill sites to avoid interference competition with wolves.

  1. Has wolf presence/recolonization changed where cougar kill prey?

H2: Niche shift (competitive exclusion) hypothesis – cougar may demonstrate altered foraging niche (prey acquisition sites will begin to occur on steeper slopes) and prey species selection (increased numbers of mule deer) between time periods with and without wolf presence.

  1. Has wolf presence/recolonization changed cougar movement dynamics?

H3: Active avoidance hypothesis – cougar will alter their spatial distribution and movement paths to avoid interference competition with wolves.


I will be using ArcMap to visualize data and extract site-specific habitat features for kill sites, but am also interested in avenues to address my questions I may not have thought of. I plan to examine changes in cougar kill site characteristics in a logistic regression framework via latent selection difference (LSD) functions (Czetwertynski 2007, Latham et al. 2013). LSDs allow for direct comparison of habitat selection between two groups of interest to contrast the difference in selection through quantifiable measurements of relationship strengths. The model takes the form,

w(x) = exp(β1x1 + β2x2 + … + βixi)

where w(x) represents the relative probability of wolves (coded as 1) occurring on the landscape compared to cougar (coded as 0) thus using predator species as the dependent variable. The selection coefficient βi is represented for each predictor variable (xi) from a vector of covariates (x) and is interpreted as the relative difference in selection between wolves and cougar, not the selection or use of a given habitat (Czetwertynski 2007). I will be using the lme4 package (Bates et al. 2011) in program R to estimate coefficients contrasting the differences between cougar and wolves, and between cougars pre- and post-wolf recolonization as it relates to kill site habitat features. For the first LSD analysis, known cougar and wolf kill sites will be regressed to evaluate the relative difference in selection for features explicit to these sites (Table 1). For the second LSD analysis, pre- (coded as 1) and post-wolf (coded as 0) cougar kill sites will be regressed to evaluate the relative selection differences in cougar kill sites as it relates to wolf presence.

I also plan to evaluate cougar movement (step length and turn angles) for differences between pre- and post-wolf time periods (question 3) using Geospatial Modelling Environment (GME version 7.2.0, Beyer 2012) and program R. I will also use R to summarize this data (dplyr, MASS, car packages) and evaluate/test for differences using ANOVA and/or non-parametric tests.

Table 1. Landscape structure characteristics of interest for use in regression and latent selection difference (LSD) function modeling in relation to wolf-cougar prey kill site occurrence in northeast Oregon.

Expected Outcome

I plan to produce visuals (graphs/tables/maps) that depict the statistical relationships between:

  • Wolf and cougar kill sites
  • Cougar kill sites pre- and post-wolf recolonization
  • Cougar movement pre- and post-wolf recolonization


Predation is recognized as a major factor influencing population dynamics in which the direct and indirect effects of predators can influence community level dynamics (Terborgh and Estes 2010, Estes et al. 2011). Predation effects on prey populations are tied to the complexities of intraguild dynamics, as the predation risk for shared prey can vary relative to the nature of predator-predator interactions as well as based on the behavioral responses of prey to predators (Atwood et al. 2009). In addition to addressing key ecological questions regarding predator-predator interactions, the results from this research will provide information on the effect of wolves on cougar populations, and potential effects of this expanded predator system on elk and mule deer populations. Knowledge gained from this study will be critical to effective management and conservation of cougar in Oregon and could be useful to other parts of western North America facing similar changes in community dynamics as wolves continue to expand their range.


  • ArcInfo: intermediate/advanced, comfortable with all ESRI products and finding independent sources where I do have knowledge gaps
  • Modelbuilder and Python: novice, have used both but limited knowledge of their strengths and limitations
  • R: intermediate, comfortable manipulating data, using base statistical packages for descriptive stats and more advanced regression modeling
  • Other: GME (intermediate), QGIS (novice)


Atwood, T.C., E.M. Gese, and K.E. Kunkel. 2009. Spatial partitioning of predation risk in a multiple predator-multiple prey system. Journal of Wildlife Management 73:876-884.

Beyer, H. 2012. Geospatial Modelling Environment (software version

Bates, D., M. Maechler, and B. Bolker. 2011. lme4: linear-mixed-effects models using S4 classes. Version 0.999375-42. Available online:

Estes, J.A., J. Terborgh, J.S. Brashares, M.E. Power, J. Berger, W.J. Bond, S.R. Carpenter, T.E. Essington, R.D. Holt, J.B.C. Jackson, R.J. Marquis, L. Oksanen, T. Oksanen, R.T. Paine, E.K. Pikitch, W.J. Ripple, S.A. Sandin, M. Scheffer, T.W. Schoener, J.B. Shurin, A.R.E. Sinclaire, M.E. Soule, R. Virtanen, and D.A. Wardle. 2011. Trophic downgrading of planet Earth. Science 333:301-306.

Creel, S., G. Sprong and N. Creel. 2001. Interspecific competition and the population biology of extinction-prone carnivores. Pages 35-60 in J.J. Gittleman, S.M. Funk, D. Macdonald, and R.K. Wayne (eds). Carnivore Conservation. Cambridge University Press, Cambridge, UK.

Czetwertynski, S.M. 2007. Effects of hunting on the demographics, movement, and habitat selection of American black bears (Ursus americanus. Ph.D. thesis, Department of Renewable Resources, University of Alberta, Edmonton, Canada.

Husseman, J.S., D.L. Murray, G. Power, C.M. Mack, and C.R. Wegner, and H. Quigley. 2003. Assessing differential prey selection patterns between two sympatric carnivores. Oikos 101:591-601.

Kortello, A.D., T.E. Hurd, and D.L. Murray. 2007. Interactions between cougar (Puma concolor) and gray wolves (Canis lupus) in Banff National Park, Alberta. Ecoscience 14:214-222.

Kunkel, K.E. and D. H. Pletscher. 1999. Species-specific population dynamics of cervids in a mulitpredator ecosystem. Journal of Wildlife Management 63:1082-1093.

Latham, A.D.M., M.C. Latham, M.S. Boyce, and S. Boutin. 2013. Spatial relationships of sympatric wolves (Canis lupus) and coyotes (C. latrans) with woodland caribou (Rangifer tarandus caribou) during calving season in a human-modified boreal landscape. Wildlife Research 40:250-260.

Lawton, J.H. and M.P. Hassell. 1981. Asymmetrical competition in insects. Nature 289:793-795.

Ruth, T.K. 2004. Patterns of resource use among cougars and wolves in northwestern Montana and southeastern British Columbia. Ph.D. dissertation, University of Idaho, Moscow, ID, USA.

Terborgh, J. and J.A. Estes. 2010. Trophic cascades: predators, prey, and the changing dynamics of nature. Washington, DC: Island Press.

Tilman, D. 1986. Resources, competition and the dynamics of plant communities. Pages 51-74 in M.J. Crawley ed. Plant Ecology. Oxford: Blackwell Scientific Publications.

Amphibian Disease in the Cascades Range, OR

Filed under: My Spatial Problem @ 1:06 pm

Carson Lillard
My Spatial Problem


Ranavirus (Rv) and chytrid fungus (Bd) are both deadly pathogens that infect amphibian hosts all over the world. Amphibian die-offs associated with both pathogens have occurred throughout North America, but not much is known about ranavirus in Oregon due to most research being concentrated on Bd. I plan to sample for ranavirus in the Cascades Range, OR. Therefore, one of my objectives is to detect the presence or absence of ranavirus in Oregon, starting in the Cascades Range via eDNA. In preparation, I would like to be able to create an occurrence probability map for ranavirus to determine sites that may contain the virus throughout Oregon to sample. However, I am not sure if needed information is available due to the lack of research on life history of the virus. After data is collected, I would like to assess influence of spatial patterns related to geography, land-use type, pathogen movement, and climate on the distribution of the two diseases, ranavirus and chytrid fungus.

  • Is ranavirus present in Oregon and at what scale?
  • Can ranavirus presence be predicted using statistical methods and life history data?
  • What kind of spatial patterns are associated with ranavirus and chytrid fungus presence, movement, and spread in the Cascades Range?


Spatial patterns related to geography, land use type, pathogen movement, and climate will affect the presence of ranavirus and chytrid fungus throughout the Cascades Range, OR.

Ranavirus will be more prevalent in areas that see high cattle and/or visitor use.

Ranavirus and chytrid will not be as prevalent in geographically isolated regions.

Bd and ranavirus will be more prevalent at sites where Pseudacris regilla (Pacific chorus frog) exist due to higher mobility.

Ranavirus and chytrid presence and load will vary spatially due to climate, temperature, and elevation.

With a warming climate, pathogens will spread to high elevation ecosystems (if they aren’t already there) and amphibian species will be more at risk of infection in those ecosystems.


I would like to learn how to map or predict occurrence of pathogens, and understand all the moving parts needed to do so. I would also like to learn the best way to represent and interpret spatial data related to pathogens since they are very complex and context-dependent. I am mainly interested in using ArcGIS programs or learn about other software that are more tailored to my question.

Expected Outcome
I would like to ultimately learn about statistical methods leading to statistical relationships based on pathogen occurrence and spatial patterns as mentioned above. If enough information is available, I would like to produce a map predicting ranavirus occurrence probability in Oregon, or globally, such as the map below showing predicted global Bd occurrence from Xie et al. 2016.


As previously stated, Ranavirus and chytrid fungus are both deadly pathogens that infect amphibians all over the world. Amphibian die-offs associated with both pathogens have occurred throughout North America, but not much is known about ranavirus in Oregon due to most research being concentrated on Bd. Ranavirus die-offs are thought to have occurred in Oregon, implying that the virus is present but no samples have been taken. Knowing whether or not ranavirus is present, managers and researchers can make more educated decisions with this baseline data. Interpreting any spatial patterns would also be helpful in learning more about life history and eco-pathology of both ranavirus and chytrid fungus.


I am intermediate to proficient with ArcMap by taking GEOG 560 and 561. I am currently in GEOG 562 and learning Python. Some novice experience with Rstudio from a stats course.

An investigation of post-wildfire forest greening dependence on snowpack.

Filed under: 2017,My Spatial Problem @ 10:14 am

Research Question

My spatial question involves the relationship between winter snowpack and the following summer’s forest greening following a severe wildfire. Specifically, I would like to conduct a multivariate regression to see how important snowpack is to revegetation following wildfires in the Columbia River Basin, with other variables including summer precipitation, soil texture, and elevation. Additionally, I will look at how the snowpack-forest greening relationship persists following a wildfire on an annual timeframe.


To estimate snowpack, I’ll be using a new snow cover frequency (SCF) product, which uses satellite imagery reflectance data from the Moderate Resolution Imaging Spectroradiometer (MODIS) to measure the percentage of snow-covered days over a user-defined span of time. MODIS imagery is acquired daily and is available from the year 2000 to the present. The spatial resolution of MODIS imagery is 500 meters. This project will consider October 1 to May 1 to be the relevant timeframe for each year considered. So in the example below, the highest pixel value for SCF (0.58) corresponds to the interpretation that 58% of the valid days from October 1 to May 1 were snow-covered.

MODIS reflectance data will also be used to estimate forest greening. The Enhanced Vegetation Index is calculated using red, infrared, and blue wavelength bands to estimate canopy greenness, a quality which depends on leaf area, chlorophyll content, and canopy structure. EVI images are readily available in the form of 16-day composites collected at a resolution of 500 m. For each summer season (June 1 – September 30), EVI data will be condensed across bands to create a maximum summer EVI image which will be utilized in the spatio-temporal analysis.

Wildfire burn perimeter and severity data will be obtained through the Monitoring Trends in Burn Severity (MTBS) project. Analysis will be constrained to wildfires within the CRB occurring over forested land cover with large areas of high burn severity. This data is available back to 1984, but will only be considered post-2000 due to the limiting data availability of MODIS imagery.

Other potential data sources include 800m resolution summer precipitation data from the PRISM climate group, 30m resolution elevation data from NASA’s Shuttle Radar Topography Mission, and vector soil texture data from the State Soil Geographic dataset.


The primary relationship driving this study is that between winter snowpack and the following summer’s revegetation following a severe wildfire. The first hypothesis of this project is that snowpack is the most important predictor variable for post-wildfire revegetation. Additionally, I expect that snowpack (estimated using SCF) is more strongly correlated with forest greening (estimated using EVI) directly following a severe wildfire, and that the strength of this correlation decreases with time following the wildfire.


Through this course, I hope to explore methods for conducting a multivariate regression across space. All of my data except the soil texture data is raster format, so I will convert that to raster to make the regression analysis more approachable. If there is time, I would also like to analyze the snowpack-greening relationship a non-burned plot of land as a control example. I expect to do most of my mapping in ArcMap, and the regression analysis potentially in R.

Expected outcomes:

  • Plots showing the R2 values (y-axis) for each wildfire over time (x-axis)
  • Maps of the selected wildfire(s), including their burn perimeter/severity, EVI, and SCF


Snow accumulation has already been shown to influence peak summer forest greenness, especially at moderate elevations (Trujillo et al. 2012). The post-wildfire relationship between snowpack and forest revegetation is critical to understand as current trends of increasing temperatures, more ephemeral snowpack and intensifying wildfire activity are all forecasted to continue. Consequently, an expanding area of western mountain regions will be undergoing disturbance, revegetation and successional growth.

Forest managers and watershed managers may find the analysis of this research useful in preparing for future climate regimes. As wildfires continue to become more prevalent, having a comprehensive understanding of the ecological impacts of such disturbances will be critical for effective management of post-wildfire landscapes.

The ecological implications of this research are multi-faceted, especially regarding the changing climate affecting western U.S. mountains. Forests in the CRB are significant contributors to carbon sequestration, as western forests are responsible for 20-40% of total carbon sequestration in the contiguous U.S. (Schimel et al. 2002; Pacala et al. 2001). Depending on how successful forests revegetate following wildfires, western forests’ role as carbon sinks versus carbon sources may become more uncertain in future climate scenarios.

Levels of experience:

ArcGIS: high

Modelbuilder/Python: low, but learning this term in Geog 562

R: low (limited to STAT 511 usage)

Other: MATLAB (only changing parameter values, not creating from scratch)


Pacala, S.W., G.C. Hurtt, D. Baker, and P. Peylin. 2001. “Consistent Land- and Atmosphere-Based U.S. Carbon Sink Estimates.” Science 292: 2316–20. doi:10.1126/science.1057320.

Schimel, David, Timothy G. F. Kittel, Steven Running, Russell Monson, Andrew Turnipseed, and Dean Anderson. 2002. “Carbon Sequestration Studied in Western U.S. Mountains.” Eos, Transactions American Geophysical Union 83 (40): 445–49. doi:10.1029/2002EO000314.

Trujillo, Ernesto, Noah P. Molotch, Michael L. Goulden, Anne E. Kelly, and Roger C. Bales. 2012. “Elevation-Dependent Influence of Snow Accumulation on Forest Greening.” Nature Geoscience 5 (10): 705–9. doi:10.1038/ngeo1571.


April 5, 2017

Final Project

Filed under: 2017,Final Project @ 8:16 pm

This is the blog page for Spring 2017 Final Project posts

Next Page »

© 2017 GEOG 566   Powered by WordPress MU    Hosted by