GEOG 566

         Advanced spatial statistics and GIScience

June 9, 2017

Final Project Post

Filed under: Final Project @ 9:39 pm

Revised Spatial Problem

During this class I worked on two separate projects.

Project 1: After much exploration of data and problems associated with my original spatial problem, I have slightly revised my questions. I originally wondered if I could obtain information about some of the factors that might influence the presence of 3 amphibian diseases (Ranavirus, Bd, and Bsal) and perhaps create a habitat suitability model. To do this, I was going to need information on the current distributions of the pathogens. I quickly realized most of this baseline information is missing therefore it is hard to create a predictability model. And so, my new approach is to create a model after I collect data on disease prevalence in my area of interest (Deschutes National Forest). Now I will collect up to 160 samples from the Deschutes National Forest and perform cross validation with half of those withheld samples to the remaining amount. By doing this, I can create a region-specific model of predicted occurrence after samples are collected. Once I collect information on pathogen distribution I also would like to answer some questions using spatial statistics afterward.  My collection starts this summer so I had no data to work with, so instead I used artificial data I created.

Slightly revised questions are:

  • Are ranavirus, Bd, and Bsal present in Deschutes NF, and at what scale?
  • What kind of spatial patterns are associated with pathogen presence, movement, and spread?
    • (Exercise 2; Tutorial 1; I asked a smaller question within this bigger question- “How is Bd presence/absence related to the distance from road or trail?”)


  • I hypothesize that there will be a greater presence of pathogens at lower elevations because of a longer seasonal activity period due to warmer temperatures.
  • I hypothesize that the more amphibian species present the more likely a pathogen is to be present because of likely transmission between hosts.
  • I hypothesize that the size of the water body will influence detection rates of pathogens because of dilution of the pathogen in bigger water bodies.
  • (Exercise 2; Tutorial 1) I hypothesize that Bd presence will be higher in areas that see higher visitor use because of higher transmission rates.

Project 2: For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed to get done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My spatial questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests/meadows?


  • Meadows are reptile biodiversity hotspots due to warmer temperatures.
  • Aspect matters for reptiles in forests and meadows due to southern-related aspects being warmer.


Dataset Description

I worked on 2 different projects, so different data was used for both.

Project 1: My first project was related to the question: “How is chytrid fungus presence/absence related to the distance from road or trail?” I chose this question because humans are known to be one of the primary causes of pathogen transmission. For this project, the following data was used:

  • Deschutes NF boundary (National Forest Service)
  • Water body layer (standing) from USGS- All of Oregon- clipped to Deschutes NF boundary
  • Road layer (National Forest Service)- all of Oregon- clipped to Deschutes NF boundary
  • Bd presence/absence points- created points in Deschutes NF within water bodies

Project 2: The second project I worked on was related to two questions and finding suitable habitat: “Are forest meadows reptile biodiversity hotspots? Does aspect matter for reptiles in forests?” Following data was used:

  • Deschutes NF boundary (National Forest Service)
  • Elevation DEM layer (USGS)- All of Oregon- clipped to Deschutes NF
  • Habitat type layer (National Forest Service- Deschutes only)
  • Aspect layer generated with ArcMap- All of Oregon- clipped to Deschutes NF
  • Roads layer (National Forest Service)- All of Oregon- clipped to Deschutes NF
  • Water layer (standing and flowing) (USGS)- All of Oregon- clipped to Deschutes NF


Project 1: I used Logistic Regression for my y variable (chytrid presence/absence) related to an x variable (for this I used distance from road or trail) by using python script and R. Logistic regression is shown to be the best based on Table 6.1 in exercise 2 because my x variable is continuous (distance to road/trail) and y is binary (present/absent). To run Logistic regression, I had to prep my data. I wanted to answer the question, is the presence of Bd more likely near a road or trail (where humans are transporting the pathogen)? Therefore, I had to get distance from sample points to the nearest road. To do this, I first had to bring in multiple layers of roads and trails within Deschutes National Forest. I used the “Merge” tool to bring all the layers together. My next step was to find the distance from the sample point to the nearest road or trail in my new “merged roads and trails layer”. I used the “Near” tool which generated a value representing the distance from the sample point to the nearest road or trail. Once I had that information, I ran logistic regression where I used Bd as my dependent variable, and distance from road as my explanatory variable. I also used (just for practice) OLS, and hotspot analysis.

Project 2: I performed site suitability/selection for this study based on the following criteria:
Need to identify 4 sites (had to be 4 to match previous experiment):

  • each site would have 4 treatments
    • (a North and South, open meadow and closed forest)
  • road access (< 300m from road)
  • sites at similar elevations
  • similar forest types
  • similar proximity to non-forest
  • each site has the same proximity to water
  • area would be of similar size
  • created by natural processes, not a clear-cut.

I used many tools to determine site suitability for the field experiment such as Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute. Overall my approach was to narrow down possible meadows in the Deschutes National Forest to meet the above criteria. For entire approach and walkthrough see Tutorial 2.


Project 1: For this project my results produced statistical relationships. Logistic regression works best for my data and was useful because it showed a significant relationship between the two variables as seen below.



Project 2: This project resulted in a map. All of my work resulted in 4 suitable sites that I will use for one of my research projects- determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 1.

For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made. Future work- would be interesting to automate this process using python.

y = f(x)
> glm(type~diversity)
Call:  glm(formula = type ~ diversity)
(Intercept)    diversity
0.006173     0.098765
Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
Null Deviance:     2
Residual Deviance: 0.4198     AIC: 5.123





Project 1: Because my data was artificial, this was more of a learning experience. However, if this data was real then this project would have allowed me to determine if areas closer to roads had higher disease prevalence. This is important because of roads being travel corridors for humans, but also amphibians. Corridors like roads allow for easier transmission between areas that may have been geographically isolated in nature.

Project 2: My results showed me areas that had all requirements I needed. Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.


I learned a lot about various statistical methods used in spatial studies, as well as improved my understanding of methods I previously used in Stats 511, such as regression. Not only do I understand the theory better but I understand more about how to actually perform these tests within R and ArcMap. My understanding of what is available in ArcMap related to statistics was improved upon. In addition, I learned how to perform habitat or site suitability selection which was a big skill I wanted to gain. My skill and ability in ArcMap/GIS definitely improved. I learned that importing and using my own data in R wasn’t as scary as I thought it was going to be.

I learned a lot about requirements needed to run various test which was super helpful. I liked having Dr. Jones PowerPoint slides that broke tests down in a simple way where you could find the most appropriate test for your data. I learned that hotspot analysis is useful if you are trying to find patterns of things such as location in disease prevalence. Would be useful for trying to predict other areas that may have outbreaks.  I did not explore much in the spatial autocorrelation world due to my data and questions being not exactly suitable for that type of analysis. My understanding of regression and multivariate methods improved. I ran OLS, and logistic regression tests which showed interesting results but had different requirements to be used such as binary or continuous data which I was not too familiar with beforehand.

On Comments

Comments given to me by fellow classmates and Dr. Jones were extremely helpful. For example, on tutorial 1/exercise 2 I confused one of my variables (distance from road) as binary when it actually was continuous. This changed tests that were appropriate for my dataset but I used logistic regression which worked. In addition, comments from classmates were very helpful and gave great insight to my work. For example, on tutorial 2/exercise 3 I performed habitat suitability based on various factors. A comment made by one of my peers expressed the importance of getting the most up-to-date layers available because factors could change rapidly, such as habitat type after a wildfire goes through an area. This gave some good insight I did not really think about before.

Print Friendly, PDF & Email

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

© 2019 GEOG 566   Powered by WordPress MU    Hosted by