GEOG 566

         Advanced spatial statistics and GIScience

June 11, 2017

Final Project: Understory Vegetation Response in Ponderosa Pine Forests

Filed under: 2017,Final Project @ 1:41 pm

Research Question

This quarter, I explored the following question: how does understory vegetation percent cover respond to ponderosa pine canopy cover and other environmental variables across Deschutes County? This question is a substitute for my thesis research, which takes place in the Willamette Valley. I substituted Deschutes County for the Valley due to the lack of available data within the Valley, and the availability of ponderosa pine plot data from the USFS’s FIA program in Deschutes.

For the first exercise of this class, I explored autocorrelation within my two main variables. Next, I asked if the relationship between the two variables was geographically influenced. Finally, I looked for the specific influences of each variable upon understory vegetation cover in the regions defined by the second exercise.


I used a subset of the Forest Service’s massive FIA dataset to replicate what I imagined my own data would look like. The biggest difference between this dataset and my own is its geographic location. Because the FIA plots are gridded, and the Willamette Valley ponderosa pine plantations are so small, there were very few FIA plots in the Willamette Valley that are entirely composed of ponderosa pine. As a substitute, I used plots from Deschutes county, which has a much higher proportion of pure ponderosa forests than the Willamette Valley does. There were 98 plots in Deschutes county that had ponderosa overstories, and 46 within that group with sufficient understory vegetation data. I used plot and subplot data to find the associated location, inventory year, canopy cover, and understory percent cover based on growth form (bare soil, forb, grass, shrub, and seedling). I included the environmental variables elevation, slope, and aspect, sourced from a DEM layer, to help explain additional variation within understory cover. I also acquired a soil map and included soil codes and drainage types in the final dataset.

Figure One: Ponderosa Pine plot dataset


I hypothesized that understory species associated with ponderosa pine forests are adapted to open canopies with abundant light availability; therefore, the greatest  percent coverage of all understory growth forms would be present in low-cover canopies. I expected that there would be considerable variance in the understory response that cannot be attributed to canopy conditions, because environmental factors and previous land use will strongly influence the understory composition. However, I included the additional environmental variables in my correlation assessment to help identify the influence that canopy cover does exert over the understory.


Part One

For exercise one, I assessed the autocorrelation within the canopy cover and understory vegetation cover. I used the Global Moran’s Index value in ArcMap to measure autocorrelation in these two variables. I had originally wanted to perform a variogram in R for this dataset, but I decided against this method, because my data is arranged as a low-density band of points, rather than a raster or a grid of points. It was also difficult to find an efficient piece of code to perform a variogram on a .csv file in R, so I decided to go with the more user-friendly ArcMap.

Figure Two: Deschutes County plot distribution

The steps to accomplish this analysis were very simple. I imported my .csv file with the FIA plots, their associated latitudes and longitudes, and other values into ArcMap as X,Y data. Then I converted the points to a shapefile. I projected this shapefile into the right UTM zone, so that the tool would work properly. Finally, I ran the Global Moran’s I tool under the Spatial Statistics > Analyzing Patterns toolbox. I performed these steps twice: once for the vegetation cover and once for the canopy cover.

Part Two

For exercise two, I performed a geographically weighted regression on canopy cover and understory vegetation cover to identify plot clusters based on the relationship between the two variables. Using the same shapefile I created in exercise one, I added the plot points to ArcMap. In ArcMap’s GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat unevenly distributed across the landscape, I used an adaptive kernel with an AICc bandwidth method, which means variable distances are allowed when assessing neighboring points.

Part Three

For exercise three, I used R to run three multiple linear regression analyses: two on the clusters identified in exercise two, and one on the entire dataset. To separate the dataset, I selected the clusters based on color intensity in ArcMap. The redder or bluer the points, the more strongly clustered they were. I exported each cluster’s attribute table in ArcMap, then converted the tables to .csv files in Excel. I opened each .csv in RStudio, and used the following code to run an MLR on both clusters and the full dataset:

# Read .csv files.

Full <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Full_Points.csv”)

Blue <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Blue_Points.csv”)

Red <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Red_Points.csv”)

# MLR fits

Full_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Full)

Blue_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Blue)

Red_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Red)

# Read results





From exercise one, I eventually found that there was no autocorrelation within either of my primary variables (canopy cover, understory vegetation). During my first trial, I did find autocorrelation within the understory vegetation variable. However, I revisited the source of my dataset, and found that the majority of the zeros that my dataset included were actually “no value” placeholders. I removed these entries from my dataset, and performed the analysis again, finding that there was no autocorrelation in the understory vegetation distribution.

Figure Three: An example of a Global Moran’s Index output

Exercise two revealed that my dataset is clustered, which was only visible after I switched the symbology to variable coefficient, rather than standard deviation. The GWR I performed on canopy cover and understory vegetation showed a different relationship between the two variables in two distinct areas. This exercise was the most mentally challenging of the three, because it took me a while to grasp the concept of a geographically weighted regression.  

Figure Four: Clustering in GWR output

In exercise three, my results were initially surprising. The cluster MLRs did not reveal any significant influences by the six explanatory variables; all p-values were well above 0.05. However, the full regression did result in low p-values for both the soil type (p = 0.0141) and soil drainage (p = 0.0471) variables, which indicates that there is a relationship between soil type and soil drainage and vegetation cover across Deschutes County ponderosa pine forests at a larger scale.


Although this type of analysis has been performed countless times in many different forest ecosystems, this approach is significant in teaching forest scientists and land managers how plant communities respond to each other and their surroundings. Understory vegetation growth can be influenced by canopy cover, but canopy cover may not always be the limiting variable. In this dataset, canopy cover did not significantly influence understory vegetation. Nor did slope, aspect, or elevation. Vegetation cover is certainly responsive to environmental variables, but the relationships may be too fine to attribute to a single measurement.

In terms of my own learning, practicing statistical analyses on this placeholder dataset helped me cement concepts I had learned on paper, but that I wasn’t sure how to apply to a research scenario. Statistics can be somewhat of an invisible science, because there isn’t always a visual representation of how the data is being processed or presented. Statistics is still a bit of a weak spot in my skillset, but these exercises helped me gain a sturdier foundation in analyzing and interpreting statistical information within a dataset.

Your Learning

The majority of my exercises were completed in ArcMap, which, out of the programs available to us, I was the most familiar with. However, I was able to use new tools, such as Global Moran’s Index and Geographically Weighted Regression. I was also glad that I was able to revisit R for the final exercise. Although it would have been nice to learn a few tasks in Python or Matlab, this class provided me the opportunity to maintain and strengthen the skills that I developed in past classes, which I don’t get the chance to use very often.

What Did You Learn About Statistics

This class was my first introduction to spatial statistics, so I had a bit of a learning curve for each exercise. Certain concepts that seemed simple enough, e.g. two variables changing in similar ways, became lost in the new terms I was learning (geographically weighted regression =/= autocorrelation!) but I think I managed to get all the terms straight in the end.

One concept that was surprising to me was the inherent neutrality of autocorrelation or clustering. Sometimes autocorrelation spells doom for a dataset, but other times it just provides an explanation for phenomenon. The quantification of autocorrelation might even be the entire purpose of a study. For my project in this class, the discover of clustering in my dataset was surprising, but not undesirable. It simply revealed a previously invisible trait of the plot data I had downloaded.

How Did You Respond to Comments

Julia’s commentary on my exercises and in person helped keep my exercises moving in a fairly linear direction. Her comments were especially valuable as I was assessing exercise two, which only became clear after I correctly displayed the coefficients.

On tutorial one, I received a comment that suggested including climatic variables in my analysis. I did end up including environmental variables in my final analysis to try to explain some of the clustering in the dataset. The other comment on tutorial one recommended both a GWR and multiple linear regression. I ended up using both of these for the last two exercises. In tutorial two, one commenter suggested I explore environmental variables as influencers in understory vegetation cover, which I did end up including in my MLR for exercise three, and the final commenter did not leave constructive suggestions.

Print Friendly, PDF & Email

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

© 2019 GEOG 566   Powered by WordPress MU    Hosted by