GEOG 566

         Advanced spatial statistics and GIScience

Archive for Final Project

June 11, 2018

Exploring historical constraints on fire across the Central Oregon Pumice Plateau

Filed under: 2018,Final Project,Final Project @ 5:45 pm


My original question was how historical fire occurrence varies spatially on the central Oregon pumice plateau. My question morphed into how did constraints on fire including climate, fuel abundance and continuity, and lodgepole pine forest influence the occurrence of fire.


My dataset consists of records of fire occurrence for 52 sample points distributed over an 85,000 ha area.

Study Area & Sample Points

At each sample point the annual occurrence of fire was reconstructed from tree rings during ~1700-1918.  Cross sections were removed from dead trees. Sections were sanded and precisely dated and injuries created by non-lethal low-severity surface fires were dated to their exact year of occurrence.


Fires scars collected at individual sample sites were composited into 1 record of fire occurrence at each sample point.  A range of 8 – 28 fires occurred at each sample point (mean 16).  In the Composite graph horizontal lines represent sample points and vertical tick marks represent fire occurrence.



I hypothesized that climate, time since fire, and lodgepole pine acts as constraints to fire occurrence across the central Oregon pumice plateau.


Earlier investigations demonstrate that fires historically occurred in dry forests in Oregon during drought years.  However fire size has not been related to climate.  It could be that fires of different sizes have additional relationships to previous year climate.  In exercise 2, I checked to see if small fires, large fires, and extensive fires were similarly related to climate.

Recent investigations north and northwest of this study area demonstrated lower fire frequency in lodgepole pine forests and that areas of lodgepole pine forest acted as intermittent barriers to fire spread.

Fuel is an obvious limiting factor to fire spread.  After fire fuel will need to recover sufficiently for fire to spread thus time since fire may be predictive of fire occurrence.


Ex 1 – Mapping fires from Binary Point Data

Prior to exploring constraints on fire occurrence I need to produce maps of fire occurrence.  To do this I used Arc GIS to evaluate using Thiessen Polygons, Kriging, and Inverse Distance Weighting (IDW) to map historical fires. Once I had made maps of fires I visually examined spatial and temporal variation in fire occurrence by creating an animation of fire events over time, mapping fire history statistics, and by identifying fire occurrence groups with cluster analysis and then mapping the groups.

Ex 2 – Determining how fire size and climate were related

I used superposed epoch analysis (SEA) to determine how annual climate and climate in antecedent years was related to fire occurrence.  By breaking fire events into size classes I was able to see if relationships with climate varied with fire extent.

Ex 3 – Using a GLMM to understand the influence of climate, time since fire, and lodgepole pine on annual fire occurrence

I used a generalized linear mixed model (GLMM) to determine the influence of fixed effects (climate, time since fire, and lodgepole) on annual fire occurrence across my study area.  Sample point was included as a random effect to see if the model varied across the study area. I used R2 for GLMM models to determine the relative importance of each fixed effect and the random intercept of sample point in the model.

The Results and the Significance

Ex 1

In exercise 1 I learned the tradeoffs between the 3 difference approaches to mapping fires from binary point data. The Thiessen polygon method provided most efficient, objective, and parsimonious method to map fire perimeters based on the distribution of my sample points.

Fire Extent mapped from Thiessen Polygons

Ex 2

Superposed Epoch analysis demonstrated that extensive fires occurred during years of extreme drought, large fires occurred during average droughts, and small fires were not related to dry or cool wet climate years. No relationships with climate in antecedent years for fires of any size or years without fire were found.


These results demonstrate that climate in antecedent years before a fire is not related to fuel abundance and connectivity. Cool wet years that may have been necessary to produce fine fuels that carry surface fires did not occur before fire years. Only a dry hot year during the fire year was associated with large and extensive fire spread. This suggests that fine fuel production with respect to fire spread is not related to climate. For my investigation this result means that fuel recovery following fire is not moderated by climate after fire. Thus time since fire may be an independent predictor of fire occurrence.

Ex 3

Using the GLMM approach I determined that climate, time since fire (interval), and lodgepole influenced annual fire occurred.  However, lodgepole was weakly significant, and only created a small change in the annual probability of fire. The conditional R2 for the GLMM model demonstrated that the random effect of site accounted for very little of the variance explained in the model. Climate and not interval as I expected explained most of the variance accounted for by the model.

For every 1 unit increase in PDSI the probability of fire occurring decreased by ~25%, for each year without fire (Interval) the probability of fire increased by 7%, and for each percent increase in lodgepole forest around a sample point the probability of fire decreased by 1%.

Significance of Ex 3 – Because lodgepole was significant, but weakly significant I want to explore other metrics that capture how lodgepole varies with respect to each plot.  It could be that lodgepole nearer the plot is more important or it could be its spatial pattern that causes variation in fire occurrence.  I was surprised that interval was less influential than climate. I interpret this as a strong indication that fuel is only limiting for a few short years after fire occurrence in the central Oregon pumice plateau. The small influence of site in the model suggests that bottom-up controls (microclimate, slope, local composition) that are site specific have little influence on fire frequency and fire region.

My next step is to focus on a single explanatory variable and switch to using site as a fixed effect with an interaction with that explanatory variable.  In addition I plan to fit 52 models, one for each site, using a single explanatory variable.  This would switch from a GLMM approach to a GLM approach, and allow me to map coefficients at each site on a map

Software Learning

In R I learned to use the LME4 package, learned how to perform SEA analysis, learned several new scripts for data wrangling, and learned how to make graphical summaries of fire history in ggplot.

In ArcGIS I learned to create animations of fire events and how to map fires with three different interpolations. I also used kriging to summarize variation in fire metrics in exercise 1.

Statistical Learning

I learned the pros and cons of different techniques to map fires, the ins and out of SEA analysis, and got an initial understanding of GLMMs. I plan to spend a lot more time working with GLMMs now that I’ve had this introduction and have found a flexible tiered approach to linear models.


Bolker, B.M.,Brooks,M.E.,Clark,C.J.,Geange,S.W.,Poulsen,J.R.,Stevens,M.H.H. & White, J.S.S. (2009) Generalized linear mixed models: a practicalguide for ecology and evolution. Trends in Ecology & Evolution, 24,127–135.

Hessl A, Miller J, Kernan J, Keenum D, McKenzie D. 2007. Mapping Paleo-Fire Boundaries form Binary Point Data: Comparing Interpolation Methods. The Professional Geographer 59:1, 87-104.

Florian Hartig (2018). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. R package version 0.2.0.

Michael H. Prager & John M. Hoenig (2011) Superposed Epoch Analysis: A Randomization Test of Environmental Effects on Recruitment with Application to Chub Mackerel, Transactions of the American Fisheries Society, 118:6, 608-618, DOI: 10.1577/1548-8659(1989)118<0608:SEAART>2.3.CO;2

Nakagawa S. and Schielzeth H (2012) A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution Vol 4(2):133-142.

Negative relationship between transect movement through species-space and substrate heterogeneity

Filed under: Final Project,Tutorial 3 2017 @ 4:32 pm


First, to gain a better understanding of the relative heterogeneity in slope angle among sites, I reexamined the cumulative spatial buffers of benthic bathymetry, and calculated non-overlapping ‘donuts’ of expanding space around each site, Mirroring tutorial 2, I then recalculated slope angle minimum, maximum, mean, and standard deviation.

Next, I expanded off multivariate analyses (NMDS) previously completed for my project. Given the observed differences among in spatial patterning of the sample units in my ordination (consistently clustered or clusters spread apart my abrupt and large movement through species-space, i.e., abrupt and lasting community change), my objective was to calculate to quantify metrics of ‘transect change through species-space over time.’


My hypotheses for this tutorial (and the final project in general) are that low-relief sites, or sites with low substrate heterogeneity are associated with dramatic community change over time. Likewise, I hypothesize that high-relief sites, or sites with high substrate heterogeneity exhibit a single stable community structure through time.

Data Used

The bathymetry data used is same as described in Tutorial 2, being side scan sonar derived (2m grain), with an extent approximately 1km offshore all around SNI.

To calculate transect movement in species-space, I used a community matrix comprised of 13 macroalgal and invertebrate species. Abundances were recorded for all species, and given the preponderance of purple urchins (1000s vs. 10s of other species), I log(x+1) transformed all species. To reduce the dimensionality of my community matrix down to the key axes, I used non-metric multidimensional scaling (NMDS) with the Bray-Curtis distance metric, a proportional area metric that tolerates messy and 0 filled community data relatively well. The ordination converged on a two-dimensional solution with stress (departure from montonicity) = 0.16.


Calculating donut buffers was relatively straightforward, and employed the ‘Erase’ tool in ArcGIS. Otherwise, the methods of creating buffers, and extracting raster buffers containing the desired statistics mirrored exercise two.

To quantify transect movement through species space, I used the adehabitatLT package in R. This package requires evenly spaced time intervals, so minor fudging was required to trick the analysis into calculating paths for the transects (these data are biannually collected, though the exact day is most definitely not consistent, given inclement field conditions and the vagaries of the sea). One complete, I extracted step lengths and net-squared displacement values, both of which were subsequently viewed as (1) distributions and as (1) a response variable plotted through time. Distances plotted against time provided insight into total community change between any two time points, while net-squared displacement plotted against time provided more insight into ‘lasting’ changes in community shift (as displacement values are relative to the starting value).

Net-squared displacement is usually viewed against expected displacement, a null hypothesis generated from randomizing step length and turning angles. Comparison of net-squared vs expected displacement provides a metric for whether your animal, or transect, is moving more or less than expected given ‘random movement.’ I choose not to focus on expected displacement, given that I don’t quite see how the null hypothesis is as relevant for movement through species space (i.e., what would a null hypothesis be for community change. . . no change, random change? Neither are fully satisfactory). Instead, I opted to view the relative differences in displacement among sites.

Finally, despite spatial scaling issues (discussed in results) for comparing these transect track metrics against substrate heterogeneity, I plotted the mean and standard deviation for step distances against the mean and standard deviation of slope angle at increasing spatial scales. This is spatially mismatched as the step distances are recorded at the smallest scale only (the transects), while the slope angle mean and standard deviation values are for the spatially increasing scales of donut buffers.


I was a little surprised to find that calculating non-overlapping donut layers did not actually differ all that much from the cumulative donut values. I suspect that redoing this and incrementally calculating every 1m addition in buffer space around the site would give more nuanced behavior, but it is unclear to me whether this would significantly change the overall pattern. (I believe it will be worthwhile to redo this as described, but in R, with code to automate).

Transect movement distances: frequency histogram (distribution) and distances against time. I expected more bimodality from some of the sites for the distances, but I didn’t account for the fact that even the sites which exhibited dramatic community change (large distances) are still relatively consistent most of the time (many short distances interspersed with a few large distances). Distances against time provided a ‘pure’ representation of community change between any two points in time. However, as these distances do not account for lasting change, they can provide a misleading representation of ‘lasting’ vs immediately reversed community change (hence the need for net displacement).

Net-squared displacement: distributions and net-squared displacement through time. The distribution of net-squared displacement provides a good visual representation of my understanding of the time series and the ecology of the system. These distributions are at this time mostly visual, and the differences among distributions are qualitative, but I can see how formally fitting a CDF may be useful in the future. Net-squared displacement through time actually provides the best visual representation of community change that I’ve generation thus far. As the displacement values account for the starting position, lasting shifts in community structure are represented, as is high community consistency.

The relationship between the mean and standard deviation of distances traveled in species space (transect movement) and the mean and standard deviation of slope angle was negative. I did not formally fit a regression or a curve, but the relationship is visually clear, and I believe a visually approximation of the relationship suffices for the purposes of this project. It is worth noting that large step distances (dramatic community change) occur at homogenous low-relief sites (low slope), while small step distances (little change in community structure) occur at heterogenous high-relief sites (high slope angle), with higher variation in slope angle. Thus, while only a cursory comparison, this analysis supports an early hypotheses that heterogenous substrate is associated with community consistency through time.

Critique of Method, My Learning, and Statistics 

 All aspects of these analyses were useful, and I find myself more limited by the spatial constraints of the data, and the subsequent limitations on testing direct relationships between statistics of community structure and substrate heterogeneity at increasing scales. This all of course is a product of the data, and doesn’t reflect on the usefulness of the methods.

I still want to find a better way of account for substrate heterogeneity. I think creating 1m donut buffers of increasing scale is a potential option, and one that would benefit from R automation. Additionally, mean and standard deviation may be too high level of a summary. I want to see the distribution of slope angles (frequency histograms).

The adehabitatLT package in R was straightforward to use, though I did have to fudge the ‘time steps’ of my data to create the desired tracks, and my application of the package was very rudimentary (i.e., calculate tracks and extract step distances and net-squared displacement).

Altogether, this class has been extremely useful as a crash-course into ArcGIS, and additionally has provided glimpses of future directions (e.g., use spatial analyses in R, and one day potentially use ESRI bridge to link ArcGIS and R). As discussed in class, my software usage was either (1) R studio, where I was already quite comfortable, or (2) ArcGIS, where each step performed in the course of the class provided useful experience to me.

Regarding statistics, it was very useful for me to go through the motions to calculate correlation and cross-correlation. I was generally familiar with these analyses from GEOG 556, but actually using them was quite illustrative. Perhaps most useful was being able to see how spatial and temporal problems are tackled, both with my own work, and through seeing how other people tackle problems. I was able to get R package recommendations from the tutorials, along with ideas for analyses and overall analytical approaches.








June 12, 2017

Final Project: “Spatial Relationships of Vegetation for Restored and Reference Salt Marshes (Salmon River Estuary, Oregon)

Filed under: Final Project @ 2:32 pm

My Spatial Problem Blog Post


  1. A description of the research question that you are exploring.

Salmon River is one of the smallest estuaries on the northern Oregon coast (800 ha), with the largest proportion of tidal marsh (400 ha) for any Oregon estuary. It borders Tillamook and Lincoln counties, and is designated as an Important Bird Area by The National Audubon Society.  Conservation Research at Salmon River Estuary has been a focus of government, non-profit, and educational institutions since the 1970’s due to concern over salmonid habitat and the impacts of sea level rise on the coast. Salmon River consists of public and protected wetlands that have been restored and protected since the U.S. Forest Service removed dikes from three sites in 1978, 1987, and 1996. Tidal flow to the ocean is currently unobstructed on sites that were previously used as pastureland and/or diked. One wetland on the estuary was never diked and is used as a reference marsh for field research, to determine functional equivalency of restored marshes. Salmon River Estuary has been a site for place-based restoration and studies of community recovery over the last 40 years (Flitcroft et al. 2016). Vegetation has been monitored at this site since dike removal, however survey records are still being deciphered from past researchers. If sufficient plot data can be recovered and confirmed, future analyses of vegetation patterns over the last 40 years will be investigated further.

Factors influencing the richness and environmental integrity of Salmon River are associated with the physiognomic and taxonomic features of the plant community. This study focuses on the spatio-temporal distribution patterns of Salmon River vegetation to explore how remnant and restored marshes differ in terms of biodiversity and species composition. I expect that different durations of tidal exclusion through dike establishment will reveal differences between sites in the context of plant species composition and distribution.

  1. A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I have collected species data (ocular estimations of percent coverage) from 1 m2 plots on transects from four tidal marshes: Mitchell Marsh (dike removed 1978), Y Marsh (dike removed in 1987), Salmon Creek Marsh (dike removed 1996), and one remnant marsh adjacent to Y marsh as a control (never diked). I also collected soil samples at each sampled transect plot, and tested them for salinity, conductivity, and bulk density, as well as nitrogen content. Transect plots were square shaped plots, 50 m apart in increasing distance from the tide.  My objective for data analysis was to describe the spatial patterns of the vegetation communities in tidal marshes of Salmon River Estuary after dike removal. I surveyed a total of 74 square meter plots on transects.

Stohlgren plots, also known modified Whittaker plots (MW, 1,000 square meters), were established at each marsh site to collect data on species abundance for comparison with transect data. MW plots were implemented to test for patterns of diversity at multiple scales beyond what transect, square meter plots may capture within the same site. The restored and remnant sites have three MW plots each, for a total of 12 MW plots. The MW, plots were placed at a random distance 50 m along and 20 m offset from the sampled transects at each marsh for a stratified random sample design. Each MW plot is a minimum of 50 meters apart, depending on where they were placed along the transect. At each MW plot, percent cover and presence/absence of species were estimated (with the aid of 1 meter square grids). Samples of all species identified were collected, pressed and are being examined to confirm identification. Elevation data were obtained from LiDAR surveys in 2015, and used to pinpoint elevation for all plots.

The data sets for my project include spreadsheets that describe percent vegetation cover, elevation, and soil characteristics per transect plot. MW plots were not sampled for soil and thus only have percent vegetation cover and elevation. The spatial grain of my study is one square meter, represented by the size of my smallest sampling frame for plots. With the nested sample plots and my stratified random sampling techniques, I have multiple spatial extents for this project. One extent could be considered the length of a transect (which vary by tidal marsh), the area of a MW plot (1,000 m2), or could arguably be extended to the entire Estuary. There are some interesting temporal aspects to my study as well; three of the four tidal marshes have experienced successive dike removal. These marshes have been surveyed for vegetation cover post dike removal and every 5 years subsequently. Incorporating these historical data will add dimension to my spatial and temporal analysis of variation at my study site.

3. Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

Are tidal marshes that were restored earlier more similar to the Reference Marsh in terms of environmental conditions (soil, elevation), and species composition when compared to those restored more recently?

I predict that restored marshes will be significantly different from the Reference Marsh. I also predict that time since dike removal will not be strong indicator of similarities between restored marshes and the reference marsh. I anticipate that sites which have experience recent dike removal have soils with higher soil salinity and conductivity, compared to remnant marsh plots.

Does species richness captured differ between restored and reference salt marshes?

I predict that Reference Marsh has higher richness compared to restored marshes. I  expect that plots from the reference marsh (Transect C, adjacent to Y Marsh) will be more diverse and heterogeneous than tidal marshes that have been diked. I predict sites that have experienced dike removal more or less recently will both have different species composition and be less diverse compared to reference sites. I hypothesize that the reference marsh will be the most diverse, with the highest richness and spatial heterogeneity of species throughout, compared to the other low marshes that have been diked.

Do Species Area relationships differ between restored and reference salt marshes?

I predict that the Reference Marsh has a greater number of species over area compared to restored marshes. I also predict that there will be spatial correlation of plant species at a larger scale, with fine scale patchiness within my site, suggesting that there may be ‘nesting’ of species or smaller pockets of diversity within the marsh, with similarities in species assemblages occurring at a larger scale.

Do field methods, specifically nested-rectangular (Modified Whittaker) plots and non-nested-square (Transect) plots capture species richness differently?

I predict that Modified Whittaker plots will capture more species than Transect plots, since MW plots will be able to address species richness at greater scales.

4. Approaches: describe the kinds of analyses you completed using your data.

I produced Mantel tests, and ISA (Indicator Species Analysis), as well as species area curves and a map of my site to compare and contrast differences in species assemblages by site. I used PC-Ord, Excel, and Arcmap to complete these analyses.

5. Results: What did you produce/find?

I conducted a Mantel test on all of my data to determine the scale at which my plot data were spatially autocorrelated (self-related). I found that none of my plots were spatially autocorrelated (Mantel’s r statistic Transect: r = 0.037797, p = 0.182182; MW plot: r = 0.027994, p = 0.164164, accept null hypothesis of no relationship). This is a little surprising, but may be indicative of noise within my dataset, and variation of species at this scale. It is possible that autocorrelation is not detectable at this scale, or perhaps I need a larger dataset with less proportional noise to sort out the autocorrelation signal. I was however able to detect spatial autocorrelation at the 1,000 square meter scale for the Modified Whittaker plots (r = 0.638224, p = 0.00010), suggesting that there may be more fine scale patichness, variation, or nestedness among plant species at each of the SRE tidal marshes. Salt may also be a confounding factor that drives spatial diversity of vegetation, in addition to dike removal, as salt is a limiting factor for some salt marsh plants; not all species are equally tolerant of it.

For the ISA (Indicator Species Analysis) test I completed to determine which species were associated with which tidal marsh environments, I found that (1) Triglochin maritima, Sarcocornia perennis, and Distichlis spicata were significant indicators of salty, restored conditions, (2) Dechampsia caespitosa, Juncus arcticus var. littoralis, Potentilla pacifica, Glaux maritima, and Hordeum brachyantherum were significant indicators of upland, high elevation conditions, and (3) Carex lyngbyei was a significant indicator of restored conditions. I broke my dataset up and conducted a mantel test for each of the groups, using only plots that recorded the presence of at least one species in each of the groups. I did not find any significant autocorrelation either with any of the strong indicator species (that were all found in a high proportion of plots surveyed). I am curious if my plots were located closer to each other, and/or I had surveyed more plots over a larger area, spatial autocorrelation patterns would begin to emerge.

an example of PC-Ord output for a Mantel test. The R statistic implies the amount of correlation between species cover and distance, the p value implies the significance of the correlation/similarity in comparison to chance (computed by Monte Carlo randomizations).

I also produced a map in Arc, to visualize the patterns of diversity I saw in my data. I coded each of my plot locations (for Transects and Modified Whittaker plots) by the dominant plant strategy type: Carex lyngbyei dominant (A plot that had at least 50% or more cover of Carex lyngbyei ), salt dominant (at least 50% or more cover of one or more salt tolerant species), or mid-high marsh dominant (at least 50%, 0r more cover of a mid-high marsh species that is not Carex lyngbyei). I think the map helps to visualize spatial patterns of diversity well, on a broader scale, by grouping plots into different assemblies or types/guilds based on their content.

Map of my site, indicating species assemblages by plot location; The Reference Marsh is dominated by high elevation species, and the previously diked marshes are all Low Marsh environments, dominated by C. lyngbyei and salt-tolerant species.

I created Species Area curves to examine how species richness increased over an incremental increase in area sampled (from 1 to 10, 100, and 1,000 square meters). A Species-Area curve represents the exponential relationship between species richness and scale of area; as the scale of area sampled increases, you may be more likely to find new species. A steeper slope for a species area curve indicates higher richness (number of species) and a shallow curve indicates lower richness. One of the things that can distinguish a species-area curve from a species accumulation curve, is the relative ‘nestedness’ of the environment being sampled. As I mentioned earlier, ‘nestedness’ is a measure of structure and distribution of species across a location, So an example of nestedness would be a location that may have a few species overall, with subsets of locations with more species, or pockets of diversity and heterogeneity. When nestedness is high, the slope of the species area curve is reduced relative to the species accumulation curve. The opposite occurs when nestedness is low. So in this case, all of the slopes of the species area curves I sampled are lower or less than the slopes of the species accumulation curves, for all sites. This suggests that in terms of the spatial patterns of diversity at Salmon River, there may be nesting or patchiness occurring, within tidal marshes, where there may be fewer species overall, especially at restored sites, with pockets of diversity. And this inference is consistent with what I observed in the field on the ground.

As you can see from this Species-Area graph, the reference marsh has the highest richness of all marshes sampled, though YMarsh is a close second as plot size dimension goes up. Mitchell Marsh and Salmon Creek marsh are both comparably low, suggesting that as scale of area increases, species richness does not increase by much. This is also a slight deviation from the species accumulation curves, which suggested Mitchell Marsh may be more diverse than YMarsh at the 1 meter scale on transect plots. While diversity and richness seem to vary within restored marshes by scale, the Reference Marsh has consistently higher richness, and Salmon Creek has consistently low richness.

This graph, shows the trend for species accumulation average number of species encountered per meter squared plot (for transect) over cumulative plots sampled. These data are the result of a combinatrix from PC-Ord that accounts for every possible combination of x number of plots sampled. You can see from this graph, that the square meter plots on the reference marsh have a significantly higher number of cumulative species, compared to all other reference marshes. Salmon Creek has the lowest, and Mitchell and YMarsh are intermediate.

also compared species accumulation (effort curve) on the square meter plots from the MW plots I sampled and found that species accumulation patterns were overall consistent. The only difference here is that Mitchell Marsh was found to have higher species accumulation over area than YMarsh, which in this case was comparable to Salmon Creek’s low diversity. The difference in YMarsh diversity could be from the placement of MW plots or patch variation, both Salmon Creek and YMarsh have large patches of C. lyngbyei. Ultimately this shows that there is variability in species richness within Restored Marshes, but consistently high richness on the reference marsh.

I conducted a multivariate analysis with my field data using the non-metric multidimensional scaling technique which is ideal for ecological data that is non-normally distributed. NMS avoids linear assumptions and uses ranked distances to linearize relationships within the dataset. This allows the user to see a wider variety of structures within the ordination and make a number of insightful observations and conclusions. Now if any one particular graphic could summarize my entire thesis, this would probably be it, and I will do my best to highlight the most salient features here. First I would like to point out that between each of the tidal marshes, which are represented by these amorphous colored convex hulls, there is very little overlap between all sites, and the reference marsh, on the left here in red, is the most divergent from any other marsh.

The reference marsh is also closely associated with a number of high marsh species that are indicative of native, or ‘reference communities.’ The Reference marsh is high up on the elevation axis, also demonstrating that it has high elevation throughout. YMarsh, the blue convex hull at the bottom. Has the longest, widest convex hull, which means that elevation varies throughout the site which is why you see sample units on the lower end of the elevation axis and towards the middle of the elevation axis. The species that are coded here and directly associated with the YMarsh site are halophytic, or salt tolerant, suggesting that these plants are found at low elevations on areas with high soil salinity and conductivity, which describes the conditions of YMarsh. Mitchell Marsh and Salmon Marsh overlap in this case, likely because they both have lower soil salinity and conductivity values from freshwater influence, and mostly mid to low elevation ranges, with the exception of a few high elevation outliers. Both of these sites seem to be associated with introduced species (reed canary grass, PHAR) or pasture grasses like Agrostis stolonifera.

Predominately, there were many instances of homogenous patches of Carex lyngbyei at both of these sites, and at YMarsh as well. In fact the only point at which all three restored sites converge is at the end of the ‘restoration’ axis over Carex lyngbyei. Carex lynbgyei is also divergent from all other species sampled, because it often occupies monotypic swaths of marsh, and is found on the lower end of the restoration axis, based on my coding schematic; areas like the reference marsh were coded with a low number, and restored marshes were given codes with numerical values increasing with the chronological order of dike removals. Carex lynbgyei sits at the end of the axis associated with the highest ‘restoration axis’ values, as it represents a strong pattern within all marshes that have experienced dike removal at any point in time, thus it is indicative of restored ecosystems. Unfortunately, there were no significant differences found between sites and any of the other soil characteristics we sampled for, but this maybe more related to sampling techniques and would be interesting to revisit further. We only collected soil from a surface depth of 10 cm, so perhaps if our samples were collected at a deeper level, we would see stronger patterns related to pH, Bulk Density, and C:N ratios. So in summary, the reference tidal marsh vegetation is richer, more diverse, and complex (heterogenous), in the number and variety of species (high marsh/low marsh) than restored salt marsh vegetation at Salmon River Estuary, across field methods, ~40 years later Carex lyngbyei has persistently dominated restored areas post dike removal which marks a significant departure from patterns of species assemblage on the reference marsh.

NMS ordination for Transect plots, examining species distribution over tidal marshes, elevation and soil salinity.

I also conducted an NMS analysis with my Modified Whittaker plot, and the patterns I observed in species associations with environmental characteristics on transect methods were consistent here as well. Though I we did not collect soil samples . I observed the same things from MW plots as I did from Transect plots. There is one small exception, where the Mitchel Marsh convex hull overlaps with the YMarsh convex hull, and this has more to do with coincidence of similar species found within MW plots on those sites, YMarsh and Mitchell Marsh both had a large presence of C. lyngbyei. In this case, Mitchell Marsh also had instances of salt tolerant species within the MW plots, suggesting that there is variation within Mitchell not only at different scales but at different extents of sampling. This also suggests or reinforces the notion by suggesting that Mitchell Marsh has saltier soils compared to Salmon Creek Marsh, despite both of them having freshwater influences. Also, despite differences in shape, all of the restored marshes’ convex hulls converge over Carex lyngbyei, which is the species mostly strongly correlated with disturbed and restored conditions.

MW NMS that shows consistent species patterns over elevation and tidal marshes. Soil samples were not collected for MW plots.

6. Significance: How is your spatial problem important to science? to resource managers?

Over 40 years later, the tidal marshes of Salmon River Estuary are still very different, and it’s possible they were different to begin with, based on their unique geographies, that influence salt inundation and soil patterns. Salmon River Estuary salt marshes also appear to have responded to and developed from disturbance differently; each is still following a different restoration pathway 40 years later. Soil salinity, elevation, and inundation patterns (channels) vary by geography among these salt marsh sites in the SRE, and have likely played a role in determining species composition by site. Extensive stands dominated by dense cover of C. lyngbyei represent an alternate stable state for vegetation of SRE salt marshes, and would be an important component to understanding novel community functions, as they relate to restoration and future scenarios. Species assemblages vary both by biogeography (soil, elevation, location) and land use history (pasture use, diking, dike removal).

The spatial problem I have chosen to investigate is of importance to scientists, as it provides further insight into how Pacific Northwest Coastal Estuaries recover from land use and disturbance, a phenomenon that has not been thoroughly studied yet. This work is valuable to land managers and conservationists who are tasked with coastal wetland mitigation in the PNW, as this case study severs as one of the few examples of long term estuary esearch on the Oregon coast. Estuaries have historically served as habitat and resources for keystone salmonid fish species, invertebrates, migratory birds, waterfowl, and mammals (such as beavers), particularly in the Pacific Northwest (Klemas 2013). Restoring these habitats is critical for protecting wildlife, managing wetland resources and eco-services, and maintaining our shorelines, especially as we face sea level rise from impending climate change. Threats to environmental stability in the case of wetlands can also harm their cultural value. Wetlands have inherent eco-beauty and are among the many natural systems associated with outdoor recreation. If wetlands are disturbed via pollution, compaction, or compositional change in vegetation, little is understood about the certainty of recovery in the context of reference conditions.

Factors influencing the richness and environmental integrity of estuaries like Salmon River are associated with the physiognomic and taxonomic features of the plant community. This study focuses on the spatio-temporal distribution patterns of Salmon River vegetation to explore how remnant and restored marshes differ in terms of biodiversity and species composition. Salmon River Estuary is especially noteworthy due to its unique management history. Salmon River is exceptional compared to other Oregon coastal wetlands, as it was federally protected before any industries could establish influence. Arguably, Salmon River has avoided most disturbance from development because of its relatively small size; there have been no instances of dredging or jetty construction for the purposes of navigation. Previous use as pastureland with dike establishment in the 1960’s is the dikes established in the 1960s and removed from 1978 onwards encapsulates the majority of known human influence on the marsh. Beginning in 1978, periodic vegetation surveys on site with long term ecological research at Salmon River has created intimate knowledge of the estuary and promoted ecological sustainability.

There has been a dramatic shift with regards to wetland protection and how our government and the public views them. Over the last few decades, policies promoting wetland conversion and development have been exchanged for protection and regulation initiatives. Wetland management goals today are largely focused on restoration to compensate for loss and damage, which has forged new industries tasked with wetland recovery and monitoring. However, some mitigation project datasets and sites are too small to collect useful data or make a meaningful impact on a large environmental scale. It is necessary to amass a variety of high quality data on larger wetland areas over longer periods of time to address how natural recovery processes may be employed for wetland conservation. Salmon River is an excellent long term case study for examining the prospect of rehabilitation for ecosystem functionality and reference conditions (Frenkel and Moran 1991; Frenkel 1995; Flitcroft et al. 2016).

So to recap, there seems to be a false association between restoration and reference conditions, in the case of Salmon River. Though Salmon River Estuary is an example of successful restoration, it does not mirror pre-disturbance ecosystem structure. Thus it seems challenging to manage for pristine environments, since pre-disturbance conditions are often not well known or pristine for that matter, the impacts of disturbance may persist over long periods of time (like C. lyngbyei) and both intact and disturbed wetlands are changing constantly so its impossible to protect them from undue influence. However, we can continue to define and promote functionality in ecosystems as function may change with structure. As a result, from the work I have done, I would recommend adapting our expectations for Salmon River and for the passive, deliberate restoration of estuaries. I would also recommend to continue to restore for function and monitor structural changes so we can understand and infer novel function in ecosystem context (Gray et al 2002)

7. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?

I have previous experience with Arc that I have developed and expanded upon at Oregon State while pursuing my MS in Geography. I was able to do some work in Arc with mapping my data, but I utilized knowledge that I had acquired previously. Ultimately, I have analyzed most if not all of my data thus far in PC-Ord, under the guidance of Dr. Bruce McCune (the software creator), which enables a variety of ordination and multivariate analysis options. I learned how to conduct Mantel tests, ISA, and other multivariate comparisons within PC-Ord.  I did not learn much additionally about Python or R, except what other students were able to complete from their tutorials.

8.What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), and (d) multivariate methods (e.g., PCA)?

I learned that statistics programs like PC-Ord can be preferable for datasets that have lower or more fine scale spatial resolution; my data were difficult to use in Arc because it’s format was not easily to interpolate. As a result, I learned a lot about Principal Components Analysis techniques to tease out patterns in my data, and look at how species assemblage patterns vary by environmental conditions and site treatments (dike removal). However, I found it helpful to visualize my data in a map, even though I was limited to the point location of my plots, and categorize them based on spatial patterns from my plot data (percent cover of species).  I also learned how to conduct a Mantel test on my data at a variety of spatial scales to look for auto-correlation.

From learning about other student’s tutorials, I learned about geographically weighted regression (GWR), and how one may examine clustering of particular environmental conditions with the GWR tool in Arc. GWR can show that certain environmental characteristics  (like canopy cover, understory vegetation cover, elevation) show positive or negative correlation in different locations. I also learned about hotspot analysis from student tutorials as well and found that it can also be used to infer spatial relationships between environmental variables and location. Hotspot analysis can be useful for looking at density of populations or biodiversity.

9. How did you respond to comments from your peers and mentors/instructors?

I received useful comments from my peers and the instructor (Dr. Julia Jones), about considering how salt inundation on the tidal marshes I study, may have a causal relationship to differences in species assemblages in addition to restoration treatment (diking and dike removal).  It’s important to acknowledge confounding factors within my data, as my study is inductive.Dr. Jones also suggested originally that I investigate spatial autocorrelation with Mantel tests, to see how variable species assemblages are within my data. I was able to incorporate feedback that helped me with the formation of my analysis and interpretation of my results.

Literature Cited

Adamus, P.R., J. Larsen, and R. Scranton. 2005. Wetland Profiles of Oregon’s Coastal

Watersheds and Estuaries. Part 3 of a Hydrogeomorphic Guidebook. Report to

Coos Watershed Association, US Environmental Protection Agency, and Oregon

Depart. of State Lands, Salem.

Borde, AM, Thome, RM, Rumrill S, Miller, LM. (2003). Geospatial Habitat Change

Analysis in Pacific Northwest Coastal Estuaries. Estuaries, 26, 1104-1116.

Flitcroft, RL, Bottom, DL, Haberman, KL, Bierley, KF, Jones, KK, Simenstad, CA, Gray, A,

Ellingson, KS, Baumgartner, E, Cornwell, TJ and Campbell, LA. 2016. Expect the

unexpected: Place-Based Protections can lead to Unforeseen benefits. Aquatic

Conservation: Marine and Freshwater Ecosystems (26): 39-59.

Mather, P.M. 1976. Computational methods of multivariate analysis in physical

geography. J. Wiley & Sons, London. 532 pp.

McCune, B. and J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software,

Gleneden Beach, Oregon, USA (

National Soil Survey Center, Natural Resources Conservation Service, U.S. Department of

Agriculture (2009), Soil Survey Field and Laboratory Methods Manual. Soil Survey

Investigations Report No. 51.

Stohlgren, TJ, Falkner, MB, and LD Schell. 1995. A Modified-Whittaker nested vegetation

sampling method. Vegetatio 117: 113-121.

Weilhoefer, C, Nelson, WG, Clinton, P, Beugli, DM. (2013). Environmental

Determinants of emergent macrophyte vegetation in Pacific Northwest estuarine

tidal wetlands. Estuaries and Coasts, 36, 377-389

Nitrate Concentrations in Space and Time and how they relate to Red Alder Cover in different watersheds

Filed under: 2017,Final Project @ 10:38 am

My Spatial Problem Blog Post (Revised)

By: Nicole Feiten

Research Description

I am exploring aquatic chemistry among 10 streams in the Oregon Coast range. I will be running anion analysis over the period of 1 year sampling monthly. These streams are divided into three categories, three are located East of Mary’s Peak in the Rock Creek Watershed, three near the top of Mary’s peak (highest in elevation), and four streams are sampled on the Oregon Coast within 200 meters of the Pacific Ocean. I would like to perform a comparison analysis of anions, especially nitrate, between the site locations and find if there is any significant difference in anion abundance between the site locations and over time (6 months of data).

Research Questions

Is there a relationship between nitrate concentration data and site locations?

Do the differences in nitrate concentrations indicate a pattern between site locations, and do they show a pattern over time?

Is there relationship of nitrate data and Red Alder tree cover?

Is there a relationship between nitrate concentrations and land use practices?
Dataset Description

The dataset I will be analyzing includes temporal anion data (for 6 months, project still in progress) and GPS points for all 10 site locations. Additionally, I used NLCD land use data sets and Red Alder Cover data sets from the Oregon Spatial Database Library. My spatial problem focused on nitrate concentration data collected at the 10 sites and the relationship between sites spatially and temporally.

The temporal scale for this analysis includes one sample per month for a six-month period.

Spatially the sites are located some distance apart, listed in the table below in Kilometers. The distances below reflect Euclidean Distance, stream distances were not used, as not all sites were not connected systematically.



Distances Rock Griffith Middle Fork Kaiser Yew Crooked Cape Gwynn L.Cumm Cumm
Rock 1.6 1.91 6.44 8.76 11.62 58.07 58.78 59.04 58.7
Griffith 1.12 5.23 7.63 10.65 56.62 57.41 57.67 57.27
Middle F. 6.17 8.61 11.7 57.23 58.09 58.23 57.9
Kaiser 2.47 5.61 51.85 52.54 52.79 52.36
Yew 3.2 49.89 50.47 50.71 50.24
Crooked 47.65 48.32 48.35 48.09
Cape 1.29 1.66 1.47
Gwynn 0.399 0.422
L. Cumm 0.338

Table 1 Distance between sites in Kilometers



I hypothesize that nitrate concentrations will be similar in sites that are closer together and more different the further apart they are located. This is a classic scenario of autocorrelation. I also suspect that stream nitrate concentrations will be higher in lower elevation streams than in higher elevations streams. Looking further, upon my Red Alder percent coverage analysis, I suspect sites with a higher percent cover of alder will have higher concentrations of nitrate in the streams.  Lastly, I will analyze land use in each of the watersheds, I suspect that watersheds that have higher amounts of agriculture and other anthropogenic activity will have a higher concentration of nitrate.

Analytical Approaches

Initially I will perform the analysis in Microsoft Excel, I am interested in seeing if sites that are closer together show similar concentrations of nitrate. First I calculated Euclidean distances in ARC GIS using the measure tool. After I obtained those distances I will use the “Correl” function in Excel to correlate known nitrate concentrations at each site (an average for the 6 months) and the distances apart the sites are.


Table 2 Correlation of nitrate concentrations and site distances

Figure 1 Correlation Analysis showing clustering of sites that are closer together


Next I will perform another analysis in excel I will sort the concentration data between streams within varying locations. This will allow me to find the concentration of the nitrate data between points (stream sites) and analyze their relationships over time.

Figure 4 Concentration of Nitrate over time in the Coastal stream sites

Finally, I will use the Oregon Spatial Database to clip layers of relating to percent alder cover in the watershed and landuse practices to identify a relationship between the nitrate concentrations and alder cover and land use.


Table 1 Nitrate Concentrations and Percent Alder Cover

Watershed Area (m2) Alder (m2) Percent Alder Cover Average Nitrate
Cape Creek 4353700 3537.74 0.081% 1.88
Crooked Creek 31760400 3226.39 0.010% 1.35
Cummins Creek 21388800 222.89 0.001% 1.44
Gwynn Creek 2865000 5371.57 0.187% 1.61
Little Cummins Creek 3056600 361.35 0.012% 1.72
Yew Creek 9786600 969.92 0.010% 1.01
Kiser Creek 100500 1017.97 1.013% 0.23
Rock Creek 38375500 2359.26 0.006% 0.19
Middle Fork Creek 3294000 339.23 0.010% 0.17
Griffith Creek 4434800 2021.86 0.046% 0.13

Figure 5 Relationship between percent alder cover and nitrate concentration



In addition to the above figures I have produced distance and correlation tables, as well as scatter plots of data to analyze for relationships. Below are examples of maps I produced in ARC by clipping land use cover and watershed size for each stream sites. Alder concentrations were produced by joining the attribute tables for the alder layer and the watershed area shape files. Even though there wasn’t as strong of a correlation of alder cover and nitrate concentrations, further research may show different results if just stream riparian area is used.

I found higher differences (indicating higher nitrate concentrations) in the mainstem rivers than the tributaries. Additionally, I saw higher concentrations of nitrate in the spring samples as compared to the winter samples. I expected higher values for nitrate in the mainstem, such as rock creek due to the volume of water coming from upstream. Naturally the smaller tributaries would exhibit lower concentrations than the larger order stream that catches the two. I also expected nitrate differences and concentrations to decrease in the spring time. If the inputs of nitrogen to the system remain relatively stable, then the uptake length of nitrate (the most sought after in stream nutrient cycling) would shorten and cause concentrations to decrease.

Overall nitrate concentrations over the past year in the ten watersheds are relatively low (~0.1-1ppm), and the results obtained were not particularly surprising.  There was no clear pattern linking red alder cover and nitrate concentrations in these watersheds (Table1). Additionally, there were no surprise land use features in the watersheds, and thus no real link to nitrate concentrations in this analysis. Most of the land use in these watersheds are occupied by mixed and evergreen forests, neither are particularly linked to elevated nitrate concentrations. I would say this analysis was consistent with my expectations, starting with low nitrate concentrations, and suspecting most of the land use would be forest cover I predicted there likely not to be a strong link between land use and nitrate concentration. However, I expected slightly more red alder cover in the coastal watersheds and was surprised how little that cover related to the concentration of nitrate.


Figure 6 Coastal Watersheds Landuse

Figure 7 Coast Range Watersheds Landuse


Scientific Significance

Baseline data collection for anions in Oregon Coast Range streams over a year has not been documented before. In the face of a changing climate, it is important to document and characterize stream chemistry for Forest managers in the future. Increases in nitrates in these streams can affect local flora, fauna as well as increases nutrient loading to the ocean. This loading can affect macroinvertebrate and salmon habitat which can alter ecosystem dynamics and services that currently exist.


What I learned about Software and Statistics

I have basic knowledge of R, but I think it would the most useful tool for statistical analysis of the data. Arc-info is a program I at one time used daily but it has been many years since. I would like to be able to map the anion data between the 10 sites but may struggle in the beginning linking the data with Arc. I have no experience with Python or other relevant spatial analysis software.


This class taught me more about spatial autocorrelation, especially because my data were grouped in this way. Since I performed the majority of my analysis in Excel, I didn’t expand my knowledge in ARC GIS as much as I had hoped. However, I did learn that if you don’t need sophisticated software to produce your analysis, it is OK to use what you already know. As I move forward in my research I am looking forward to studying multivariate methods and using the software PC-Ord.



Comment Feedback

Tutorial 1 Comments

The most useful comments from Tutorial were about having a more structured step by step process in my design. I agree that if this tutorial were to help someone in the future complete the same, or similar analysis, that more detailed instructions would be useful. I updated my tutorial to include step by step instructions in the methodology of my analysis.


Tutorial 2 Comments

In tutorial two I was able to clearly relay the knowledge and importance of my study. Like most feedback I was pleased to hear about additional analysis regarding clusters of data points that may point me in a direction to understand more about Alder cover and nitrate fixation in streams. Perhaps a buffer layer around the sites and a clip of the cover to just the buffer would show me a better picture about the linkages between red alder and nitrate concentrations. In the future, I would like to continue to use the alder layers, with a buffer added and potentially perform analysis that involve other concentrations of Anions as well.


Comments from Dr. Jones


The comments I received from Dr. Jones were overall positive and insightful. Exercise one I received feedback on the potential for nitrate to be transported via dispersal or diffusion which I was unable to justify given the data set that I was working with. Also, I found a negative correlation between Rock creek and it’s neighboring creek. I decided that Rock creek with a higher volume of water and larger watershed area, would have likely produced a different concentration of nitrate compared with the other two.


Exercise two comments included thoughts about the equation I chose, which was difficult for me to understand as I have not used equations to describe data in the past. The distances between the sites were measured using a Euclidean calculator in ARC. As well as an explanation of higher concentrations in Rock Creek due to Higher concentration of water coming from upstream, I realize this now is incorrect and what I actually meant is it is contributed to by the larger volume of water in Rock creek and additionally the larger watershed.



The third exercise brought a question about how alder cover may not be associated with nitrate concentrations. After thought, I realized that maybe it is not the percentage of Alder cover in the entire watershed that may affect stream nitrate concentrations, but potentially just the trees in the riparian buffer zone. In future work, I would like to clip buffer zones around the stream and analyze both alder cover and land use cover in just this area and run a correlation with stream nitrate concentrations.





June 11, 2017

My Spatial Problem: Revised

Filed under: Final Project @ 11:48 pm

I am interested in the understanding how mammals have responded to changes in their habitat through time and how the distribution of resources on the landscape structures small mammal communities, historically and today. I predominantly utilize stable isotope analysis of carbon and nitrogen to evaluate niche dynamics of rodents along elevation gradients and through time, approximately the last 8,000 years, in the Great Basin of the intermountain west of north America. In this regard, I want to know, how do rodents in the Great Basin respond to changes in the resource base driven by human land-use and climate change?

Original spatial problem.

When I outlined my original spatial problem, I was interested in replicating a geospatial analysis performed by Elsen and Tingley in their study, “Global mountain topography and the fate of montane species under climate change,” published in the journal Nature – climate change, in 2015. In their analysis, Elsen and Tingley calculated the proportional distribution of horizontal surface area along the slope of mountain ranges, with the objective of evaluating the assumption that available area (habitat) decreases with increasing elevation. This is an important assumption to test as it has direct implications for the way ecologists think about biodiversity in mountains. Specifically, it has been assumed that in general species richness decreases with increasing elevation due to decreasing area, as would be predicted by the species area relationship. I wanted to perform this analysis on three mountain ranges in the Great Basin that are of particular interest for my dissertation research. However, the data sets which Elsen and Tingley uses to perform their analysis were far too large for me to handle on my own, and my knowledge of the requisite skills far too limited to perform the analysis. Consequently, my spatial project began to morph, and followed a circuitous trajectory for the remainder of the term.

Exercise 1: Hot spot analysis of the North American deer mouse in the Toiyabe mountains.

Upon choosing to ditch the area-by-elevation analysis I thought I would dig into a dataset that I had explored previously with other methods. This dataset contained two matrices of data collected in the Toiyabe mountain range, NV in both 1930 and 2011. The first matrix was a species by site matrix which included the trap-effort standardized abundance of ~20 small mammal species, predominantly rodents; the second matrix contained site characteristic data including elevation, latitude, longitude, mean summer high temperature, mean winter low temperature, and mean monthly precipitation. In the past I had explored this data using principal components analysis and non-metric multidimensional scaling. I wanted to know which factors explained changes in elevational distribution of species between these two time-periods. I found that climate factors described most of the variation. Notably, all climate factors were collinear with elevation, either increasing or decreasing in proportion with elevation.

Given the spatially explicit nature of this data set, I thought it would be interesting to perform hotspot analysis on the most abundant species in the data set. In performing hotspot analysis on the distribution of this species I hoped to gain insight into the influence in might have on the spatial structuring of the rodent community in the Toiyabe mountain range, at large. However, what this analysis revealed was that transect data is inadequate for evaluating this type of question. Spatial analysis of this nature requires a distinctly different sampling design, specifically, a grid across some larger portion of the mountain range would have been ideal for the question I was hoping to answer. Instead, I discovered that hotspot analysis on transect data reveals more about the behavior of those doing the trapping, than it does of the mammals being trapped.

Exercise 2: Energy flow in the Great Basin rodent community and a test of space for time substitutions.

Following this I chose once again to shift gears and leverage the elevational information in the Toiyabe data set. Again, I borrowed inspiration from novel research, this time more directly related to my own work. In 2015, Terry and Rowe published “Energy flow and functional compensation in Great Basin small mammals under natural and anthropogenic environmental change,” in PNAS. They demonstrated that daily energy flow (kJ/day) through Great Basin rodent community had responded differentially through the Holocene to the modern to the changing climate. Furthermore, they demonstrated that functional groups within the rodent community responded differentially through time as well, and that many of the patterns that had held through the Holocene changed dramatically at the onset of the Anthropocene. I wanted to know if the dynamics demonstrated by these functional groups through time could be approximated along a spatial gradient that captures a similar change in climate. In this regard, I was testing a space for time hypothesis. I found that many of the patterns observed through the Holocene were well approximated by the spatial substitution. However, the analogy between elevation and time did not hold into the Anthropocene. Although interpretation of my analysis was predominantly qualitative, I feel comfortable making the claim that the space for time substitution of the rodent community in the Toiyabe mountain range for the rodent community of the Great Basin approximates patterns from the end-Pleistocene to the end-Holocene, but is decoupled from the dramatic changes in the Anthropocene.

In response to peer feedback I have set out to determine if the patterns observed in the above analysis would hold in a second mountain range. This analysis is in progress, and I hope to add to it, eventually, a third mountain range. I have in hand the data for the Ruby mountains in the Great Basin, and will soon have access to a similar dataset for the Snake range.

Exercise 3: Creating a map of NDMI for the Toiyabe mountain range in the Great Basin.

Despite the interesting findings of the second part of my analysis, I felt as though my work in this class had become tangential to my own interests and the focus of my dissertation. Although it is unfortunate that it took most of the term to come to this realization it has not been a loss. I have a new target; the Normalized difference moisture index (NDMI) uses landsat-7 data to estimated vegetation density and soil moisture content. I want to use this data to develop a better understanding of the spatial distribution of vegetation in the Toiyabe and Ruby mountain ranges (the focus of my dissertation). I have several questions in mind; 1) how does NDMI vary with elevation and aspect? 2) what is the NDMI value at each of the trap sites in the Toiyabe and Ruby mountain ranges? 3) are there correlations between trap-line species abundances and NDMI upon which to base predictions about range wide distributions of species? And 4) What is the degree of patchiness of NDMI in these two mountain ranges, respectively, and what predictions might we make about how this could limit or promote the movement of species laterally or up and down slope?

The above outlined questions would all be exceptionally useful to address and I hope to do so as I become more comfortable with the tools available in ArcMap and Cran-R for these types of analyses. In the interest of exploring these data I have calculated NDMI for the Toiyabe mountain range using Landsat 7 data for June 2011, a time during which small mammal trapping at this site occurred. While I have not yet been able to address the questions I outlined above, I have generated a map showing the NDMI at a 30m x 30m resolution. The color scale for this image is continuous from blue, to green, to yellow, to orange, to red, and reflects the -1(low moisure vegetation, low density) to 1(high moisture, or high vegetation density) range of values possible from each grid cell. What is most apparent in this map is that peaks in the mountain range, especially those with snow remaining, have particularly high NDMI values, and much of the remaining landscape appears uniform with intermediate to low NDMI values. While the general impression given by this map is not surprising, given the arid environment, the distribution of NDMI values evaluated at different scales could prove interesting.

The most valuable thing I learned through this course was not a software package or a particular statistical method. I learned what type of data is truly necessary to perform spatial analyses, and how to get it. Specifically, I learned that there is a vast wealth of landsat data available for free and that it can be clipped down to manageable sizes for the analyses I am interested it. While I still feel somewhat uncomfortable finding datasets, downloading them, and navigating the endless options of ArcMap, I have also made a new connection with an incredibly knowledgeable graduate student, Jessica Pierson. I hope that with her help and that of a several other peers, more knowledgeable by far than I am in these methods, I can become confident in accessing data and performing the necessary analyses in ArcMap.

Reflection on Julia’s guidance and peer feedback

I have come to this point in my spatial project by trial and error and with the encouragement of Julia Jones. Julia’s advise, as I progressed through this course was to give things a shot, and if they didn’t work that would be fine, we could just try something else. This low pressure, low stakes approach was very helpful. This has been a particularly busy term, especially outside of this class, and outside academia as well. Knowing that this course would be student driven, and what I got out of it would be entirely up to me felt daunting at first, but I have come to accept that the bulk of my learning happened at the tail end, and I’m excited that the product I am walking away with is new questions to ask and new methods to learn. I did not bend my data to just perform new analyses if I did not think they would be informative, nor did I choose to recycle old analyses. Following Julia’s guidance, I gave a few new ideas a shot, and while some of them didn’t work out, I am excited to take what I did learn beyond the context of this class to inform important aspects of my dissertation.

Feedback from my peers on the first tutorial was to suggest that different experimental design (e.g. data collection) would have been more appropriate for hotspot analysis, or to change my approach to one that could leverage the structure of my spatial data. I responded to this by shifting from lat-long coordinates to elevation as my spatial parameter. Peer feedback on my second tutorial was largely limited to acknowledgement of my limited sample size, specifically one mountain range, and that by performing similar analyses in additional mountains, I might be able to perform statistical tests, such as linear regression. As stated above, this is underway, and will be a product outside of the context of this class, as both my graduate major advisor and I have an interest in knowing how energy flow through functional groups will vary in different mountain ranges, and if spatial findings will be concordant with the temporal analysis.

Patterns of socio-demographic and spatial variables from survey responses about attitudes regarding flood-safety in neighborhoods

Filed under: Final Project @ 10:56 pm

1. Research question.

For this problem I want to answer:

How socio-demographic and spatial variables explain patterns of survey responses about attitudes regarding flood-safety in neighborhoods?  

2 Description of the dataset.

The dataset for the study has the following characteristics:

  • Data is obtained from Household voluntary survey
  • Convenience sampling. Households located within the 100 and 500  FEMA’s Flood hazard Map. All  at once.
  • Each participant answers socio-demographic and predefined intentions’ questionnaire
  • A printed coded survey questionnaire was mailed out to residents. The code is used to identify resident addresses.
  • 103 variables have been collected
  • Most of the variables are categorical

      An example of a typical variable to be analyzed:

  • Variable: Suppose your current home was to flood, how confident are you in the following possible conditions? – I will be able to evacuate my home before flooding begins (This variable is expressed in 5 categories of discrete values without hierarchy):
  • Categories:
    • Very confident
    • Confident
    • Neutral
    • Somewhat confident
    • Not confident at all

The spatial data consists of a map identifying land properties within the boundaries of the 100 year and 500 year flood hazard Fema’s map has been developed, as shown in Figure 1. The survey has been mailed out to randomly selected properties withing this map boundaries.

Figure 1. Map for South Corvallis affected properties according to FEMA’s 100 year and 500 year flood categories.

3. Hypotheses:

Attitudes regarding flood-safety in neighborhoods are clustered according to socio-demographic and spatial factors.

4. Approaches:

After testing various geospatial aproaches, as such as  the hotspot, kriging, and others. I ended up clustering my data using Spline Interpolation. For clustering, the different categories values were taking as the “Z” values in order to use the spline interpolation of independent variables.

Later, a Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data in order to find multivariate clustering.

Figure 2. Principal component (

5. Results:

I found statistical relationships between the categorical variables collected from the survey according to its spatial location that define patterns formation. And also, maps of these relationships within the 100 year and 500 Fema’s Map of Figure 1.   

Bootstrapped confidence intervals for raw and composite correlations was performed using R. The result of this analysys using the Spearman correlation is shown in Figure 3.

Figure 3. Bootstrapped confidence intervals for raw and composite correlations for 108 variables

Using Spline Interpolation, different clusters were found analizing independently each variable. Categories values were taking as the “Z” values in order to use the spline interpolation of independent variables. For instance the “Perception of flooding extent” figure below shows four categories interpolated spatially. Similarly, the other variables where studied.

Statistical analysys of the collected data was performed using R. The figure “Self-confidence in being flood-ready” below shows a comparison of people’s answer. Five variables are analyzed with answer categories from 1 to 5, being 1 very-confident and 5 not-confident-at-all. It is noticed that people are more confident on timely house evacuation and, they have low confidence in flood insurance coverage.  Similarly, the other variables where studied.

A preliminary examination of the collected data indicates that 27 variables are suitable for multivariate clustering. A Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data. The Figure “Eigenvalues of principal factors with 27 variables” shows variances greater than 1. This indicates that multivariate clustering can be performed. A two factors analysis indicates loadings for the clustered variables.


6. Significance:

This research will contribute to policy and decision making for neighborhood adaptation to climate change. Adaptation policies can better be addressed by identifying residents attitude patterns. Patterns identification of attitudes regarding flood-safety in neighborhoods is important for planning adaptation to different flooding scenarios in order to minimize personal risks.

7. Learning about geospatial software:

I have improved my knowledge of Arc-Gis, Model Builder and/or GIS programming in Python, all for geospatial processing. In addition,  I have improved my knowledge  in R for statistical analysis.

8. Learning about statistics:

I have added Principal Component Analysis (PCA) and Factor Analysis (FA) to my current statistical knowledge.

9. Answer to comments:

Primarily, I have modified my project title. This change was based on the change of my research question. Also, I have tried different methods inorder to make suitable the clustering of my categorical variables.


Getis, A., & Ord, J. K. (2010). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189–206.

IBM Knowledge Center – Estimation Methods for Replacing Missing Values. (n.d.). Retrieved May 8, 2017, from

Beetle outbreaks and wildfire: overlapping in time and space

Filed under: Final Project @ 10:53 pm


Often landscape scale disturbances are singular events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration an initial disturbance may influence the process and subsequent spatial configuration of a secondary disturbance, such as beetle outbreak followed by fire.  The cumulative effects of sequential disturbances could influence forest structure and successional trajectories through the next century for the landscapes in central interior British Columbia affect by beetle outbreaks and wildfire.

Figure 1: Post-fire photo of a fire that burned through beetle-killed forest in 2014 with various patches of fire severity.

Research Question

How is the landscape variability in fire disturbance related to the landscape variability in antecedent beetle outbreak disturbance?

Data Set

Landsat imagery was used for the analysis. Landsat data is 30-meter resolution multispectral data.  A single Landsat scene contained the wildfire event of interested (Row/Path 50/23).  Pre- and post-fire images were acquired with minimal cloud coverage and anniversary dates to match phenology.  These images were used to generate a burn severity map through the differenced Normalized Burn Ratio (dNBR) (Key and Benson 2006). The pre-fire NBR image was used to characterize the beetle outbreak on the pre-fire landscape. Pixel indexing formulas:


dNBR = NBRprefire – NBRpostfire


My ecological null hypothesis is that burning conditions (i.e. weather during day of burn) exert strong control over patch size.  Fire conducive weather, dry and windy conditions, control the disturbance variability across the landscape by facilitating crown fire in lodgepole pine dominated landscapes.  The ecological alternative hypothesis is that beetle outbreaks change the landscape structure by killing trees, which results in decreased continuity of canopy fuels. The lack of canopy fuels may constrain fire spread to the surface and may influence the disturbance variability of fire across that landscape.



A semivariogram was used to assess the spatial variability between pixel values in the dNBR image. A semivariogram is a means of evaluating spatial correlation between locations, and the shape of the semivariogram may indicate the level of patchiness and average patch size. The semivariogram analysis was conducted in R version 3.3.2 (Team 2016) with the USDM Package (Naimi 2015).


The semivariogram analysis was followed by a cross-variogram to assess the spatial relationship between fire and beetle outbreak patterns. The cross-variogram analysis was conducted with the gstat package in R version 3.3.2 (Pebesma and Graeler 2017).  It should be noted that some additional data extraction steps were needed to conduct the cross variogram.  The Gstat package does not work with raster data; therefore, a csv file was created with the cell value and UTM coordinates for each cell in the raster. The analysis was conducted in R version 3.3.2 (Team 2016).

Patch Metrics (Fragstats – Species Distribution Models)

Patch metrics were calculated with with the SDMtools Package (VanDerWal 2015) in R version 3.3.2 (Team 2016).   The SDMtools package is based on the work by McGarigal et al. (2012) FRAGSTATS software package and calculates many of the same patch metrics. In order to calculate patch metrics, I converted the continuous raster into a classified raster.  The burn severity classification was based on break points in the dNBR values defined by Reilly et al. (2017), and the beetle severity classification was based on cross-validation with orthorectified aerial photographs.  Burn classes included a low (plus unburned), moderate, and high, and beetle class included none, low-moderate, and moderate-high. Patch metrics were calculated for each disturbance severity class.

Geographically Weighted Regression

 Geographically Weighted Regression (GWR) was used as a descriptive analysis to understand the relationship between the variability in beetle outbreak and wildfire on the landscape.  The purpose of GWR is to determine areas of non-stationarity, or inconsistencies in the spatial structure over the landscape. I used the continuous raster layers for beetle outbreak and burn severity.  While there are packages in R to conduct GWR, I ended up conducting the analysis in ArcGIS (ESRI 2011).  I had difficulty trouble shooting the R packages. The raster layers were converted to point layers for the analysis.

 Spatial Overlay Analysis

 Spatial overlay analysis has been used in previous studies examining disturbances that overlap in time and space (Hart et al. 2015).  It is simply overlaying maps for each disturbance and determining where the disturbances overlap. In this case, I am looking for overlap between beetle outbreak and wildfire.  Both disturbance maps were classified as binary either burned or no burn and beetle outbreak or no beetle outbreak. Classification was based cross-referencing location with higher resolution orthorectified aerial photographs or higher resolution satellite imagery. I overlaid the two disturbance maps and multiplied them together to end up with four distinct classes: No Beetle – No Burn, No Beetle – Burn, Beetle – No Burn, and Beetle – Burn. Analysis was conducted in R version 3.3.2 (Team 2016).



The semivariogram analysis indicated a spherical relationship in the dNBR image, which suggests some level of patchiness (Figure 2).  The sill of the semivariogram suggests that patches averaged about 1500m, which seemed a bit high (Figure 2).  The dNBR image contained a couple large bodies of water, which may influence the semivariogram.  I removed the water bodies from the dNBR image, and ran the semivariogram on the water free image (Figure 2).  The water free analysis shows a shift in the sill location and indicates that average patch size might be around 500m (Figure 2).  The semivariogram without water also seems to have a double sill, which might indicate that there are both smaller and larger patches.


Figure 2: Semivariogram for the Entiako dNBR image with water included (a) and excluded (b). The red circles indicate the sill, which provide a general indication of patch size. With increasing distance (Lag) there is decreasing variability. The dNBR images (c and d)


The semivariogram analysis indicated a spherical relationship in the wildfire pattern and the beetle outbreak pattern, which suggests some level of patchiness (Figure 3).  The sill of the semivariogram suggests that patches averaged about 500-750m for the wildfire patterns and 250-500m for beetle outbreak patterns (Red circle in figure 3).  The cross-variogram in Figure 4c shows an undulating pattern, which indicates auto-correlation at multiple lag distances. Auto-correlation at multiple lag distances suggests that there are similar patches of fire severity that had similar patches of beetle outbreak severity, which may support this idea of the pattern of beetle outbreak influencing the pattern of wildfire.

The cross-variogram illustrates auto-correlation at multiple lag distances with patches of multiple sizes or patches that are regularly spaced. The low cross-variance at lag 300 (the nugget or y-intercept) might indicate that low/high burn severity is associated with areas of low/high beetle outbreak, which seems like it would correspond well to certain areas of the maps. The cross-variogram shows a second dip at a lag distance of about 2000m, which could describe patterns of low/high burn severity that are 2000m from low/high beetle outbreak.  The landscape patterns shown in the maps could also support this relationship.

Figure 3: Semivariogram for the Entiako wildfire (A) and beetle outbreak (B). The red circles indicate the sill, which provide a general indication of patch size. With increasing distance (Lag) there is increasing variability. The dNBR image (C) and NBR image (D)

Figure 4: Cross-variogram analysis for beetle outbreak and wildfire overlap. Graph a is a semivariogram for the wildfire raster. Graph b is a variogram for the beetle outbreak raster. Graph c is the cross-variogram analysis that overlays the raster used in graph a with the raster used in graph b.

Patch Metrics (Fragstats with Species Distribution Models)

In the SDMtools package, I ran two analyses, one on the whole landscape and one on each disturbance severity class. For the first analysis, I found that high burn severity was the smallest component for this fire event only accounting for about 15 percent of the landscape (see Table 1). This was interesting, because lodgepole pine is often characterized by ‘stand replacing’ high severity fire, however, moderate burn severity can also account for ‘stand replacing’ fire in lodgepole pine dominated landscapes, which was about 45 percent of the landscape (see Table 1). The landscape was dominated by low-moderate and moderate-high beetle severity with only a small portion unaffected by beetle outbreak (see Table 2).

Table 1: Burn severity class summary for patches, area, and proportion of landscape

Burn Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
Low 382 2767.14 0.394
Moderate 448 3179.79 0.452
High 422 1082.70 0.154

 Table 2: Beetle severity class summary for patches, area, and proportion of landscape

Beetle Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
None 985 780.93 0.111
Low-Moderate 656 2968.56 0.423
Moderate-High 709 3273.30 0.466

 The second analysis examined the patch metrics for each disturbance severity class. Table 3 and 4 includes patch area metrics by burn severity and beetle severity respectively. While the output includes many other variables, I wanted to look at the variability in patch size for disturbance type and class. The high burn severity maximum and mean patch size are the most interesting part of Table 3, because they are quite different from the low and moderate burn severity.  Further statistical analysis is needed to determine if they numbers are statistically different.  The summary of patch size metrics for both disturbance types indicate that there are many small patches for all class types.

Table 3: Burn severity classes patch size summary

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
Low 382 0.09 1449.45 7.24 0.18 77.7
Moderate 448 0.09 1461.69 7.10 0.18 75.3
High 422 0.09 445.95 2.57 0.18 22.5

 Table 4: Beetle severity classes patch size summary

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
None 985 0.09 77.58 0.79 0.18 4.04
Low-Moderate 656 0.09 2380.86 4.53 0.18 93.07
Moderate-High 709 0.09 1461.60 4.62 0.18 57.68

The SDM tools packages offers much more then is being demonstrated here. The Patchstat function can provides the following outputs: number of cells for each patch, number of cells in the core area, number of outer perimeter cell edges of the patch, number of internal cell edges of the patch, the area of each patch within the landscape mosaic, the interior core area of a patch, perimeter of patch, perimeter area ratio, shape index, fractal dimension index, and the core area index. The ClassStat Function also offers 28 metrics that are included in the output.  The documentation for the SDMTools package offers documentation on all of these metrics (VanDerWal 2015).

Geographically Weighted Regression

The GWR analysis is visualized (Figure 5) with the coefficients.  Reds and oranges indicate areas of positive association while blues indicate negative associations.  These associations visually make sense when going back and looking at the beetle outbreak map and burn severity map.  High beetle outbreak would have lower raster values, while high fire severity would have higher raster values. While the GWR seems to indicate some interesting patterns, I explored some of the variability with scatter plots, which is discussed below.

Figure 5: Geographically weighted regression showing the coefficients from the output. Positive values indicate a positive association, while negative values indicate a negative association.

Spatial Overlay Analysis

The spatial overlay analysis is shown in Figure 6.  Most of the landscape is covered by red, which is a combination of beetle outbreak and wildfire. There are a few small patches of live trees that were unaffected by the beetle outbreak that burned and are shown in orange. A number of these live patches that burned seemed to have burned at high severity.  There are number of areas that have beetle outbreak but did not experience fire, shown in green.  While areas with no beetle and no fire are very few and appear to have relatively small areas when looking at the map.

Figure 6: Spatial Overlay Analysis using binary disturbance classifications to determine where beetle outbreak and fire overlap in the landscape.

Inspection of GWR with spatial overlay

The final piece of this project was revisiting all the maps and thinking critically of the GWR Analysis.  Scatter plots where generated from six selected spots at the extreme ends of the GWR coefficient range in Figure 8 and 10. NBR and dNBR values were extracted and plotted in Figure 10. The purpose of generating scatter plots was to assess the relationship between these two variables in relation to the GWR coefficients. Maps for beetle outbreak (Figure 7), fire disturbance (Figure 8), GWR (Figure9), and Spatial overlay analysis (Figure 9) are shown below with rectangles outlining were values were extracted from for the scatter plots in Figure 10.

Plot 3 and 8 showed a strong linear relationship (see Figure 10).  The other plots visually did not show strong relationships.  The strong and weak linear relationships are supported by the R-squared values for those areas, where Plot 3 ranged from 0.59-0.83, Plot 8 ranged from 0.29-0.55, and Plot 4 ranged from .11-.25.  In looking at the scatter plots, the values associated with the spatial overlay analysis, beetle outbreak and fire severity have quite a range of variability across the scatter plots.

Fire is a complex process and this project has been primarily examining it through a lens of the pre-fire landscape, which overlooks weather and topography as controls of fire severity. Wildfire is a dynamic process that is influenced by fuels, weather, and topography on both macro- and micro-scales. The focus here has been solely on fuels, which likely to does not capture all the landscape variability generated by the wildfire event. Future analyses will account for topography and fire weather.  Additionally, since the focus is on forested landscapes and the interaction of overlapping disturbances in forested landscapes, future data sets may exclude non-forested areas.

Figure 7: The landscape distribution of Beetle outbreak disturbance for Entiako burn perimeter and areas associated with scatter plots in Figure 10.


Figure 8: The landscape distribution of burn severity for the Entiako fire that burned in 2012 and areas associated with scatter plots in Figure 10.

Figure 8: The landscape distribution of coefficients from the geographically weighted regression analysis and areas associated with scatter plots in Figure 10.


Figure 9: The landscape distribution of spatial overlay analysis with areas associated with scatter plot in Figure 10.


Figure 10: Scatter plots of NBR values for Beetle outbreak on the x-axis and dNBR values for burn severity on the y-axis. Point colors correspond to the spatial overlay analysis.


Overlapping disturbances of beetle-outbreak and wildfire will influence landscape structure and function for decades in central interior British Columbia, Canada.  It is important to characterize the distribution of disturbance severity across a landscape for both individual disturbances and the cumulative effects of overlapping disturbances. There is still much to explore within the analyses here and further analyses that include additional fires as well as topography and weather as explanatory variables of wildfire processes.

The spatial overlay analysis is visually the most dramatic showing much of the landscape dominated by the cumulative effects of overlapping disturbances, but also additional spots of fire mortality. This is important for future forest structure and successional trajectories.  It will be important for scientists and managers to monitor the landscapes response to the cumulative disturbance effect.  Both disturbances affect forest that offer late seral forest habitat utilized in the winter by caribou. The fire in this project plus an additional fire have affected most of the wintering habitat for this caribou herd, which is of great concern for managers.

Wildfire following beetle outbreak may facilitate or impeded lodgepole pine regeneration by preparing the surface for seedling establishment and releasing seeds or by killing or consuming seeds during the fire process. It will be important for scientist and managers to monitor regeneration on this landscape to understand forest response to overlapping disturbances.  This landscape is managed as wilderness, so it provides insight to natural processes unfolding with minimal human intervention under current climate conditions, which may be useful for predicting how surround areas may respond under similar disturbance interactions.

Software learning

Time was invested in learning many of the spatial analysis I performed in R. I thought that this might be a useful invest of time and learning.  R offers a number of spatial tools similar to ARC GIS.  I think R has a number of benefits over ARC GIS, in particular, the ability to track the workflow that could then be recycled for future analyses.  Additionally, once the code is setup, the process is semi-automated.

I was working with a particularly large fire, tens of thousands of pixels.  I found R to process the data quickly. I used only one fire event for this project, but have two other fires I want to look at, which I can now easily run through the r-scripts I have generated.

During this class, I also learned how to use R-Markdown.  R-Markdown offers an interface for coding and commenting, essentially a scientific notebook. R-Markdown allows for easy documentation of code chunks, and the ability to generate PDFs or HTML files for sharing. This documentation will allow me to more easily go back to the code later and understand what I did or share the code with someone else. It offers the ability to reproduce my work.

I did end up using ARC GIS for visualizing my raster files, data processing such as clipping, and for geographically Weighted Regression.  R does offer a couple of packages for conducting geographically weighted regression, but I was having difficulty trouble shooting the packages in R.

What did you learn about statistics?

I learned ‘oh so many things’. I learned a number of spatial statistical methods throughout this course including semi-variograms, cross-variograms, patch metrics (FRAGSTATs), geographically weighted regression, and spatial overlay analysis. Semivariograms and cross-variograms are useful descriptive analysis to assess variance as a function of scale and generally describe landscape patterns. Patch metrics (FRAGSTATS) are also a useful tool for generating metrics and describing the distribution of disturbance severity across a landscape. I think there are many more patch metrics to explore in the SDM package that may be useful for examining some of the variability in disturbance severity across the landscape, overlapping disturbances, cumulative disturbance effects, and future successional pathways for disturbed forest landscapes. The geographically weighted regression was likely the most challenging analysis to make work and draw conclusions from. I will probably be thinking about this beyond this course. It was useful for seeing where the overlapping disturbances were positively and negatively correlated, but there was still a great deal of variability in the scatter plots. The spatial overlay analysis was useful to see were areas of beetle outbreak and/or no beetle outbreak overlap with areas burned and/or unburned.  The last piece tied in here was scatter plots of the geographically weighted regression base on the beetle outbreak, fire severity and spatial overlay, which showed a great deal of variability.

Response to Comments

I received a number of useful comments on my exercises and tutorials, some of which I have been able to address in this final blog post.

  1. Exercise 2: Julia’s Comment – regarding additional interpretation of the cross-variogram was addressed in the cross-variogram section above.
  2. Exercise 3: Julia’s Comment regarding SDMtools for future FRAGSTATs users – I tried to list some of the additional metrics that are provided in the output. The package was relatively easy to use, and I provided commented code with tutorial  2 that will hopefully be useful to a future user.
  3. Exercise 3: Julia’s Comment – “Useful to know about proportions of h,m.l severity. yes mean patch sizes differ by burn severity but not median patch size – what do the full distributions of patch sizes look like for the various severity classes? Note the high st dev for L and M – indicating that likely there are a couple of very large patches affecting the mean (not the median), hence not really a difference in patch sizes. Why not? What would control patch size?” – I have attempted to generate boxplots, but have not been able to make ones that are visually worth sharing. There seem to be lots of small patches and a few very large outliers. This is worth additional exploration, but I am out of time in terms of doing this for the class.
  4. Tutorial 1 comments: Peer Comments – Create many variograms; NDVI, EVI; look at different band combinations. These comments were all considered.  I had generated an NDVI layer early one, and after reading through some remoting sensing pieces regarding Landsat data and beetle outbreak I opted to use the pre-fire NBR raster.  We talk quite a bit about the fact that the semi-variogram takes a sample of the data.  I opted not to run multiple semivariograms, since the analysis is considered a descriptive tool.  The model run provided a general understanding of the spatial variance on the landscape that seem useful. I am still considering the use of additional bands, but chose to hold off on that for this project/class.
  5. Tutorial 2: Peer Comment – Identifying any connections between layers and considering other factors. I think that the next steps for the project will be to bring in topography and weather layers, and run additional spatial statistical analysis including topography and weather. Due to time this was outside of the purview of this project.




ESRI. 2011. Arc GIS Desktop: Release 10. Environmental Systems Research Institute, Redlands, CA.

Hart, S. J., T. Schoennagel, T. T. Veblen, and T. B. Chapman. 2015. Area burned in the western United States is unaffected by recent mountain pine beetle outbreaks. Proceedings of the National Academy of Sciences of the United States of America 112:4375–80.

Key, C. H., and N. C. Benson. 2006. Landscape assessment (LA): Sampling and analysis methods. USDA Forest Service General Technical Report RMS-GTR-164-CD:1–55.

McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Naimi, B. 2015. Package “ usdm .” R Topics Document.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

VanDerWal, J. 2015. Species Distribution Modelling Tools: Tools for processing data associated with species distribution modeling exercises.


Final Project: Patterns in NDVI fields over time and compared to biomass

Filed under: Final Project @ 10:05 pm

1. The research question that you asked.
Spatial patterns in disease spreading in agriculture are important to understand. For example, what causes the infection and how to avoid further spreading. Originally, the idea was to study the Red Blotch disease in vineyards. But due to delays there was no data. With the same goal in mind I looked at NDVI (Normalized Difference Vegetation Index) data for agricultural fields and try to understand the ‘spreading’ of the NDVI to develop methods to analyze spreading of any sort in remote sensed data over time.
In my ‘updated’ spatial problem I looked at;
1) the spatial and temporal autocorrelation of the NDVI parameter, and
2) the correlation between the pattern of NDVI and average biomass in agricultural fields in Manitoba, Canada.
This results in the final research question: Does patchiness in NDVI relate to biomass and does it change over time?
2. A description of the dataset you examined, with spatial and temporal resolution and extent.
I used data from about 30 fields, each around 1 km3. Over the growing season from May until September there is RapidEye derived NDVI at a spatial resolution of 6.5 by 6.5 m. Each field has on average 15,000 pixels. The biomass data is from the SMAPVEX12 study and contains the wet biomass averaged per field.

Study Location

3. Hypotheses: predictions of patterns and processes you looked for.
It is expected that patchiness in NDVI is an indicator for a low biomass, and therefore a low yield. A uniform NDVI implies that the crops are all in the same health. Patchiness could mean that some plants may suffer stress. Over time, it is expected that the patchiness in the NDVI changes. Either the patches become more distinct, or the fields become uniform.
4. Approaches: analysis approaches you used.
The analysis can be split up in two parts. With part one focusing on the quantifying the pattern of NDVI with variograms, and part two analyzing the correlation between variogram parameters and biomass, and how the parameters change over time.

Part 1:
To assess the patchiness of the NDVI I used a variogram. The variogram shows the characteristics of spatial autocorrelation. There are three parameters of interest that can be read from a variogram graph; a) the range, which gives an indication of the lag in the autocorrelation. b) the sill, which represents the variance of the variable at the range lag. And, c) the nugget. This should be zero, but due to errors in the data this can vary slightly. The variogram is standardized by dividing the semivariance by the standard deviation of that field
Part 2:
We analytically compare the variograms with scatterplots between i) range and biomass, ii) sill and biomass, and iii) range and sill with biomass color scheme. For the scatter plots the correlation coefficients are determined with Pearson’s R and a p-value. Next, a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).

5. Results: what did you produce — maps? statistical relationships? other?

Part 1:
In the variogram plot, Figure 1, we can see that there is a high inter field variability in spatial auto correlation for NDVI. It is difficult to tell from the graph if there is a correlation between biomass and variogram type. Also, there is a difference in crop type between the field, which has a considerable influence on the biomass. For further analysis, a distinction between crop types should be made.

Variograms for all fields

Part 2:
Doing the same analysis but now just focusing on one crop, wheat, does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatterplot for biomass no pattern is visible.

Correlation between range and biomass, and sill and biomass for wheat

scatter plot with color coding for average biomass

For the sill-range correlation over time it seems that there is an increase in correlation over time. There seems to be a high correlation coefficient of 0.88 with a 5%-significant p-value of 0.047. One hypothesis is that this could be caused by the development of the shapes of the variogram. At the beginning of the growth season the patterns are ill-defined, which makes the variogram whimsical. And with the subjective reading of the variogram this might be emphasized. During the growth season the patterns become more distinct, and the variograms approach the shape of a theoretical variogram. For the perfect variogram one expects to get a perfect correlation of 1 between the sill and the range.

Range-Sill correlation over time

From the two plots showing the change of range and sill over time per field we cannot draw any conclusions. There is no trend visible.

Development of sill over time

Development of range over time

6. Significance. What did you learn from your results? How are these results important to science? to resource managers?
This ‘spatial problem’ has been useful to investigate the methods to investigate spatial correlation and correlation to other parameters. This approach could be used to understand and predict for example the spread of diseases in agricultural fields.

7. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?
I already had some experience in the software mentioned, so I didn’t improve my skills significantly. The only software I was not comfortable with was R, and I used this to calculate and visualize the variograms. The language is not much different from Matlab and Python.

8. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), and (d) multivariate methods (e.g., PCA)?
This class was an eyeopener for me regarding statistics. I investigated Moran’s I, local Moran’s I, Variograms, and Pearson’s correlation coefficient. I had never used a variogram before, and it gave me insight in the spatial autocorrelation. It was a way to quantify the patterns in the fields. For my future research, I’m interesting in Principal Component Analysis.

9. In addition, your final blog post should include a section in which you describe how you have responded to (a) the comments you received from your fellow students on Tutorial 1 and Tutorial 2 and (b) comments on Canvas from Dr. Jones on your exercises 1, 2, and 3.
During the presentations in groups of three, we discussed the approach and analysis. One thing that came up is the subjectivity of reading the variograms and using that data for correlations. I’ve considered fitting a variogram and automatically extracting the sill and range, but this was too complicated and it still needed my subjective input. Another comment was to use more data, however I calculated the significance for the correlation coefficients, which indicated that there was enough data.
Dr. Jones advised to standardize the variograms to be able to compare them.

Final Project: Assessing Forest Structure Parameters of Tropical Forest Based on Unmanned Aircraft System (UAS) Visual Imagery Data

Filed under: 2017,Final Project @ 7:36 pm
  1. Background:

The emission of Carbon Dioxide (CO2) is one of the greatest factors causing climate change. Forests, as the largest terrestrial carbon reservoir have an important role to reduce CO2 emissions and prevent climate change. Considering the important role of forests to reduce CO2 emissions and prevent climate change, it is necessary to manage forests sustainably. In sustainable forest management, forest manager must frequently monitor and measure forest structure parameters, such as number of trees, tree locations, tree heights, crown width, biomass, and forest carbon. However, these forest structure parameter are usually measured through conventional field measurements, which are costly and labor intensive. New technology, such as Unmanned Aircraft Systems (UAS) can be used as alternative methods to monitor and assess forest structure parameters. There are some programs designed to process UAS visual imagery data and measure forest structure parameters from the data, for example Agisoft PhotoScan, Fusion, ArcMap, and RStudio.

  1. Research question:

Considering the statements on the background, I dicided to assess forest structure parameters (crown width, height, and number of trees) in a particular part of Siberut National Park based on UAS visual imagery data. The research questions are: (1). How the crown width and height vary in space in the study area, and are they autocorrelated? (2). Are Crown Width and Tree Height linearly related in space in the study area?

  1. Dataset:

This project used Unmanned Aircraft System (UAS) visual imagery data that was taken from Sony camera NEX-5R mounted in a fix wing UAS. The data was gathered in Siberut National Park in West Sumatra Province, Indonesia. It was a part of my Advisor Professor Michael Wing’s project. There are 3 flights with 300 images in each flight. The image resolution is 350 x 350 inch. The images were taken in a flight with 80% sideward overlap and 80% forward overlap. I focused on the UAS visual imagery data in the first flight. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAS visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, tree positions, tree height, and crown width. One of my labmate did this data processing. The result of this data processing is a data set (points) consisting of 804 detected trees with their height and crown width (see figure 1). Unfortunately, because the images were not georeferenced yet for this process, the height in this project is the sum of tree height and elevation (ground height).

Figure 1. Detected trees by Fusion software.

  1. Hypotheses:

I hypothesize that crown width and height vary in space in the study area. There are trees that have small crown and some other that have bigger crown because differences in species and ages. In this project, the variation of the height will be affected mostly due to the elevation which is included in the height. Furthermore, I hypothesize that crown width is not autocorrelated, while height is autocorrelated since height is the total of tree height and elevation. I also hypothesize that crown width and height are linearly related. The linear relationship can be positive or negative due to some factors, such as differences in elevation and species.


  1. Approaches:

In general, I use modeling approach to answer the research questions. To answer the first research question, I did variogram analysis in RStudio. Furthermore, I did Geographically Weighted Regression (GWR) in ArcMap to answer the second research question. Description of the analysis or tool and the steps are given below:

  • Variogram Analysis

I use variogram analysis to assess how the height and crown width vary in space in the study area, and whether they are autocorrelated or not. The variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993).

The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. Most variograms are defined through several parameters; namely, the nugget effect, sill, and range.

  • Nugget effect represents micro-scale variation or measurement error. It is estimated from the empirical variogram as the value of y(h) for h = 0.
  • Sill representing the variance of the random field.
  • Range is the distance (if any) at which data are no longer autocorrelated.

Figure 2. The example of variogram graph.

In Rstudio, I used the R codes that are stated below to do the variogram analysis. First, I need to install the “gstat” package and load the package using the following R codes:

> install.packages(“gstat”)

> library(gstat)

Next, to load my data into the RStudio, I have to set my working directory. It can be done through the “session” toolbar. Then, I just need to write some R codes to do variogram analysis. The first variogram analysis is for crown width and space (x and y). The second variogram analysis is for height and location (x and y). All the R codes are given below.

Treelist1 <- read.table(“D:/Oregon State University/Term/Spring 2017/GEOG 566/Exercises/Exercise 2/treelist1.csv”, sep=”,”, header=T)

g1 <- gstat(id = “”, formula =, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g1))

g2 <- gstat(id = “Height”, formula = Height~1, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g2))


  • Geographically Weighted Regression (GWR) Analysis

I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the sum of the tree height and the elevation (ground height).

GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.

The first step is load the dataset into the ArcMap. The data set is in shape file (points) consisting of 804 detected trees with their location, height and crown width (see figure 3). As stated before, because the images were not georeferenced yet, the height is the sum of tree height and elevation.

The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 3 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but for this project I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width. Both are continuous numerical varibales.


Figure 3. Geographically Weighed Regression Tool.

The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis. To finish the analysis just click “Ok”.

  1. Results:

As given below, the results of the variogram analysis and geographically weighted regression (GWR) analysis are shown in the graphs, tables (statistical relationship), and new layers (shape file). To better understand the results, the description is given below:

  • The Result Description of Variogram Analysis

Based on the variogram graph of crown width (figure 3), in general crown width is not autocorrelated although there are some patches with different nugget, sill, and range. In the first part of the graph, there is a patch with 25 range (distance) which is relatively larger than the other range. I think it can be happened due to different size of crown width in the study area. It seems like there are some trees with big crown and some other with smaller crown that are clustered. Although quite rare, 25 m which indicates the size of the crown width is reasonable because there are some tree species in tropical forest that has even more than 25 m crown width.

Figure 3. Variogram graph of crown width.

Moreover, based on the variogram graph of height (see figure 4), it can be seen that the semivariance is incremental with the increase of range (distance). It means that height is autocorrelated. I think that can happen because of the height data which is the sum of tree height and elevation. Since the elevation is included in the height data, the elevation or the topography of the study area will influence the variance and the variogram analysis. Therefore, it seems like the study area is on a slope. As known, an autocorrelated data is not good for some statistical analysis that requires independent assumption. I can do some effort to ensure the independent assumption of the data or do some adjustment before doing further statistical analysis (e.g. calculating confidence interval). The effort that I can do is “adjusting the standard errors”. This is really important for autocorrelated data because the usual standard error will overestimate the precision when there is positive autocorrelation.

Figure 4. Variogram graph of height.


  • The Result Description of Geographically Weighted Regression (GWR)

We can see the whole result by looking at the attribute table which shows the statistical relationship between crown width and height in space in the study area (see figure 5). By default ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 6).

The attribute table consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height have positive linear relationship. Which means there is an increase in height for every one unit increase in Crown width. In other words, the bigger the crown the higher the tree will be.

Figure 5. Attribute table of Geographically Weighed Regression result.


Figure 6. Distribution of Residual Standard Deviation in study area.

However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 7). We also can change the number of classes and color. In my case, I chose three classes to distinguish features (trees) that have positive linear relationship, negative linear relationship, and trees that have coefficient (linear relationship) close to zero. The layer after adjustment can be seen in the figure 8.

Figure 7. Layer properties:Symbology.


Figure 8. Distribution of coefficient of the explanatory variable.

However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between crown width and height (Ground height + Tree height), and some other trees have negative linear relationship. The distribution of trees that have positive and negative linear relationship can be seen in Figure 8. The red points indicate trees that have negative linear relationships between Crown width and Height, which means trees with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means trees with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients of the explanatory variable are close to 0.

Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high trunk, and some other have big crown and short trunk. In addition, different elevation can give significant effect because the height data used in this project is the total of ground height (elevation) and tree height. Trees with positive linear relationships might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley). Trees with coefficient close to zero might be occurred because of the data that has not been georeferenced and the algorithm used in Geographically Weighted Regression analysis that included data from the neighbor features (trees). That can affect the value of coefficient in linear relationship.

In general, the R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height. I also can add additional explanatory variables like “tree species” to increase the accuracy of the linear model.


  1. Significance:

I learned from the results of this project that Unmanned Aircraft System (UAS) visual imagery data can be used as a low cost alternative method for measuring forest structure parameters after being processed through some software and analysis. Forest structure parameters are usually measured by forest managers using conventional methods of field ground measurements. These conventional methods are costly and labor intensive. By using UAS visual imagery data that have been processed in some software, such as Agisoft, Fusion, ArcMap, and RStudio, forest managers will be able to monitor forest structure parameters in a faster and cheaper way.

Forest manager can do point pattern analysis, variogram analysis, geographically weighted regression (GWR) analysis, or other spatial and statistical analysis to assess the structure of a particular forest. For example, I found out from this project that we can assess the linear relationship between forest structure parameters in space in the particular forest. In my case, there is linear relationship between crown width and tree height, so we can predict the growth of the trees from this linear relationship. Therefore, generally speaking, forest manager can make a linear regression model using geographically weighted regression (GWR) tool in ArcMap between two or more variables. Then, forest managers can estimate or predict the forest growth based on the linear model in a faster and cheaper way. That is why this new method will be very useful, especially when it is used in remote areas with limited accessibility, where it is almost impossible to do field ground measurements.


  1. Things that I learned: Software

From this class and this project, I learned some new and useful codes and packages in RStudio. I learned how to do some spatial and statistical analysis that I have never done before, for example variogram analysis and geographically weighted regression (GWR) analysis. Even though, I did not include the geographically weighted regression (GWR) that I did in RStudio in this project, I already shared the link that has all of the codes in the Geog 566 blog post. In addition, I realized that besides table format data (.csv) I also can use raster data as an input in RStudio. Although my experience with raster data did not go well since there was not enough memory in RStudio, I did well with my table format data (.csv). I definitely will use RStudio and some of the packages and codes in my future research.

Moreover, In ArcMap, I learned how to do point pattern analysis (spatial autocorrelation) and raster clustering analysis (unsupervised classification) in exercise 1. And I learned geographically weighted regression (GWR) analysis in exercise 3 and this final project. All of those tools will be very useful for me in assessing forest structure parameters of tropical forest. At first I only do regression analysis in RStudio, and the analysis does not consider the spatial information (location of the features) in its equation. Then, I learned from this project that I can do regression analysis in ArcMap which includes the spatial information (location of the features) in the analysis. Although I have not figured out the whole formula used in this analysis, I am satisfied that I was able to do this analysis with my data set.


  1. Things that I learned: Statistic

I learned some new statistical analysis in this class and this project. The statistical analysis that I did is Spatial Autocorrelation (Global Moran’s I), Variogram, and Geographically Weighted Regression (GWR). First, I did spatial autocorrelation (Global Moran’s I) in ArcMap. As we know, the Spatial Autocorrelation (Global Moran’s I) tool measures spatial autocorrelation based on both feature locations and feature values simultaneously. Given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random. The tool calculates the Moran’s I Index value and both a z-score and p-value to evaluate the significance of that Index.

The next statistical analysis that I did is Variogram. I did this analysis in RStudio. As mentioned before, the variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993). The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. For some statistical analysis, independent assumption is really important. Therefore, it is necessary to check the independency of the variables by doing statistical analysis like variogram.

The third statistical analysis that I did is geographically weighted regression (GWR). I did this analysis in ArcMap. GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. In this project, I was able to automatically make linear regression model for all 804 trees in my data set. For my linear model, I choose height as the response variable and crown width as the explanatory variable. GWR analysis requires all the variables as continuous numerical variables. One thing that is interesting for me is that GWR uses spatial information (tree location) in its statistical equation. This is quite new for me since all the regression models that I have learned before only use regular data set and do not put into account the spatial information (e.g. tree location). Hence, GWR will be very useful to analyze linear relationship between forest structure variables of tropical forest which have spatial information attached.


  1. Responses to Comments

Most of the responses that I got concerned about my dataset which is not georeferenced yet. I am also thinking the same, and I wish I can georefence the data. However, since it will need more data (DEM of the particular study area), time, and processing through some software which I am still learning how to use, I cannot georeference the data at this point. Nevertheless, I am working on it, and I hope I will have the georeferenced data as soon as possible. Once I have the georeferenced data, I will be able to correctly interpret the results from spatial statistical analysis (e.g. variogram and geographically weighted regression (GWR)).

I also got another comment from Dr. Jones to elaborate more in my explanation of the result, particularly explanation about study area that is probably located in a slope based on variogram analysis of height, and explanation about variogram analysis of crown width which has about 25 m in size. In response, I have tried to revise and elaborate more my explanation in this final project. Moreover, I got commend that is asking about the precision of crown width measurements in topical forest based on UAS imagery data. I think this question can be answered if I have ground truth data, which I do not have, so I can compare the UAS and ground measurements. Otherwise, to get an idea about the precision of UAS measurement compared to ground measurements, we can read some papers that assess this topic. I have read some papers, and it appears that UAS measurement is quite accurate compared to ground measurement and LiDAR measurement.



Final Project: evaluating interspecific competition between wolves and cougar in northeast Oregon

Filed under: Final Project @ 5:48 pm

Background & Research Question

The research I am conducting is focused on understanding and quantifying the competitive interactions between two apex predators, wolves and cougar. Populations of large carnivores have been expanding across portions of their historical range in North America, and sympatric wolves (Canis lupus) and cougars (Puma concolor) share habitat, home ranges, and prey resources (Kunkel et al. 1999, Husseman et al. 2003, Ruth 2004), suggesting these coexisting predators may be subject to the effects of interspecific competition (interference or exploitative). Interspecific competition among carnivores can affect the spatial distribution, demography, and population dynamics of the weaker predator (Lawton and Hassell 1981, Tilman 1986), but demonstrating asymmetric competition affects through quantified measures or experiments has remained difficult for highly mobile terrestrial mammals like large carnivores. Generally, it is expected that cougar are the subordinate competitor in wolf-cougar interactions, but the frequency and strength of agonistic interactions can be system specific and will determine the overall influence either predator has on the population dynamics of the community (Kortello et al. 2007, Atwood et al. 2009, Creel et al. 2001). My goal is to assess the influence wolf recolonization in northeast Oregon has had on cougar spatial dynamics. For this course, I plan to focus on two analyses associated with questions about wolf influence on the distribution of sites where cougar acquire food (prey) resources and the degree to which wolves influence cougar spatial dynamics. More specifically, I hope to address:


How has wolf recolonization and presence influenced the spatial distribution of cougar predation patterns?



The data I used to address my question comes from field data collection in cooperation with the Oregon Department of Fish and Wildlife and available nation-wide dataset GIS coverages. Location points from global positioning system (GPS) collars deployed on a sample of wolves and cougar were used to identify potential predation sites and the study area was a 1, 992km2 game management unit (Mt. Emily WMU) in northeast Oregon (Figure 1).


Cougar kill sites – identified through cluster analysis of GPS location data (see Anderson and Lindsey 2003 and Knopff et al. 2010) and verified through field investigation. This process was done before (2009 – 2012) and after (2014 – 2016) wolves recolonized the area and produced a dataset of kill site locations for cougar (npre-wolf = 1,213; npost-wolf = 541).


Wolf kill sites – identified through cluster analysis of GPS location data (see Sand et al. 2008, Knopff et al. 2010, and DeCesare et al. 2012) and verified through field investigation. This process produced a dataset of kill site locations for wolves (n = 158).


Distance to wolf kill sites and activity– Euclidean distance (m) to nearest wolf kill site; or Euclidean distance (m) to nearest kernel density derived isopleth probability contour edges (25%, 50%, 75%, 95%).


Canopy cover – Continuous percent (%) canopy enclosure where file pixel values range from 0 to 100 percent, with each individual value representing the area or proportion of that 30m cell covered by tree canopy. Available from National Land Cover Database (NLCD


Slope – average change in vertical elevation per unit horizontal distance (i.e. steepness), calculated as degree from 30-m DEM using spatial analyst. (Digital elevation model (DEM) datasets available from USGS National Elevation Dataset


Figure 1. Location of the Mt. Emily Wildlife Management Unit in northeast Oregon and global positioning system (GPS) locations for cougars monitored to determine kill site distribution and spatial dynamics pre- (2009-2012) and post-wolf recolonization (2014-2016).


Based on competition theory and emerging evidence on wolf-cougar interactions in other systems (Alexander et al. 2006, Kortello et al. 2007, Atwood et al. 2009), I expect cougar and wolves in northeastern Oregon to exhibit resource partitioning in foraging niche (resource partitioning hypothesis). Representative evidence for resource partitioning between carnivores could be demonstrated if cougar kill sites occur on the landscape in areas disparate from wolf kill sites. I would also expect cougar to alter their movement and space use relative to pre-wolf recolonization patterns (active avoidance hypothesis). Further, I expect the presence of wolves to affect cougar movement patterns and spatial distribution, which may be evident through a shift in the distribution and space used by cougar between pre- and post-wolf recolonization periods (niche shift [competitive exclusion] hypothesis).


Hypothesis: Niche shift (competitive exclusion) – cougar may demonstrate altered foraging niche (kill sites will begin to occur further from wolf activity centers) and overall distributional shifts between time periods with and without wolf presence.


Analysis Approaches & Methods

I used ArcMap to visualize data, evaluate cougar kill site distribution for differences between pre- and post-wolf time periods, and extract site-specific features related to kill sites. As part of my approach to answer my spatial problem, I also made use of and explored other statistical programs outside Arc, like Geospatial Modelling Environment (GME version 7.2.0, Beyer 2012) and program R to summarize data (dplyr, MASS, car packages), model relationships (lme4 package), and explore spatial relationships (adehabitatHR, ks, stam packages). The primary tool and approaches I explored were:


Ripley’s K – this tool determines spatial clustering or dispersion of features (or values associated with features) over a range of distances and allowed me to statistically measure the overall patterns in kill site data (i.e. distinguish departure from expected random distributions) for both wolves and cougar. I used the ‘multi-distance spatial cluster analysis’ tool in ArcGIS to generate observed/expected K-values, a K-function graph, and confidence envelopes for values. Larger than expected K-values indicate the distribution is more clustered than expected based on a random distribution, and lower than expected K-values indicate the distribution is more dispersed than expected.

Kernel density – This tool calculates the magnitude per unit area for point or polyline features and allowed me to visualize the frequency and distribution of kill sites (points). I used the kde tool in the Geospatial Modeling Environment (GME) to calculate kernel density estimates (KDE), or utilization distributions, for wolf kill sites and for cougar kill sites in each time period. I created density estimates using the PLUGIN bandwidth and a 30m resolution cell size. Multiple bandwidth estimators are available as well as input options for scaling, weighted fields, data subset and/or edge inflation. The sample of kill sites (and sample of cougar) in the pre-wolf time period was higher (45-48%) than the post-wolf sample of kill sites (and cougar), therefore it was necessary to standardize the two cougar kill time period KDE raster datasets so the densities relative to catch (in this case kill events) per sample (cougar) were comparable. I used the raster calculator in ArcGIS to do this by dividing each period kill site KDE raster by that periods’ respective kills/per cougar ‘effort’.

Isopleth probability surfaces –This tool allowed me to relate the proportion of post-wolf cougar kill sites (points) to both the pre-wolf cougar kill site distribution (density raster) and wolf kill site distribution (density raster) using the probability contours (polygon feature) of each density feature. The resulting polygons represent quantiles of interest (programmable) expressed as the proportion of the density surface volume. I specified isopleths for the 25th, 50th, 75th, 95th, and 99th ; 25th and 50th % isopleths are generally thought of as ‘core’ use areas and represent the highest density of the feature of interest, whereby ‘use’ decreases as you move out to 99% contours. I used GME and the ‘isopleth’ command to create polygon probability surfaces. I also used the ‘addarea’ command to calculate and add information to the isopleth shapefiles for each polygons’ area and perimeter, as well as the ‘countpntsinpolys’ command to add a column with the counts of post-wolf cougar kills in each pre-wolf cougar kill or wolf kill isopleth polygon. I also used ‘select by attribute’ and ‘intersect’ features in ArcGIS to subset out summer and winter kill site datasets to explore density feature overlap and seasonal influences on kill distribution.

Latent selection differences – this tool uses a logistic regression framework for direct comparison of selection between two groups of interest to contrast the difference through quantifiable measurements of relationship strengths (Czetwertynski 2007, Latham et al. 2013). This analysis allowed the interpretation of changes in site-specific habitat characteristics as the relative difference in selection between two binary variables, not the selection or use of a given habitat. This regression was performed: 1) between wolves (coded as 1) and cougar (coded as 0) to evaluate the relative difference in selection at kill sites, and 2) between cougar pre- (coded as 0) and post-wolf (coded as 1) to evaluate the relative selection differences in cougar kill sites as it relates periods with and without wolves. Exponentiated coefficients indicate per-unit increase/decrease of habitat or distance to a given variable such that the effect is calculated as [1-exp(coefx)] * 100, and interpreted as the relative risk of selection of variable x by group 1 compared with group 0 increased/decreased by x%.



Ripley’s K

Cougar (pre- and post-wolf) and wolf kill sites showed significant clustering relative to expected random distribution (Figure 2, Table 1). I explored multiple distances and the pattern held for all three kill datasets down to less than 10m (our hand-held GPS units had error ratings from 3-7m when recording kill site coordinates in the field). While this analysis provided irrefutable evidence kill sites for both carnivores exhibited spatial clustering, that is not surprising for animals that exhibit strong territoriality and prey on potentially clumped food resources (herds of ungulates). Thus, the results of this analysis did not help to answer my overarching question.


Figure 2. Example of associated L (d) function for pre-wolf cougar kill sites. Red line is observed spatial point processes (SPP), blue line is expected SPP, and gray dashed lines are confidence envelopes calculated from 99 simulated permutation.


Table 1. Sample of observed and expected K-values for pre-wolf cougar kill sites generated from Ripley’s K analysis.


Kernel density

Visual examination of the cougar pre- and post-wolf kill density rasters (Figure 3) did show what appeared to be a shift in the spatial distribution of where the highest density of cougar kill sites occurred, shifting from the northeast to the south part of the unit. There were several factors, besides the presence of wolves, which could have produced the observed shift in kill density, such as: 1) catch-per-unit effort issues (i.e. larger sample sizes of cougars (and kill sites) in pre-wolf data set), 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). The visual interpretation was useful in guiding further aspects of the analysis that need to be accounted for in my investigation of my spatial problem. The standardized kill density rasters demonstrated that part of the shift observed was related to catch-per-unit effort influences, but a shift between the two highest density areas of kills was still evident from pre- to post-wolf time periods. This suggests other variables, like time effects from seasonal prey distribution changes or wolf presence, could also be factors influencing the distribution of kill density. Standardization of the two kill density rasters improved visual interpretation of the spatial patterns and accounted for one of the other factors (catch-per-unit effort) that might mask distributional shifts related to wolf presence (first two panels of Figure 4). However, this method allowed only an implicit understanding of space-time concepts where I could describe distributional patterns observed through visual interpretation, but the method lacked any inferential measures of significance to quantify the shifts and formally relate the patterns of cougar predation across time.


Figure 3. Comparison of of pre- and post-wolf cougar kill site kernel density estimates (KDE) before and after standardization for catch-per-unit effort (kills/cougar). Dark red indicates high densities (frequency) of kill sites progressing to blue, indicating low (or no) density of kill sites.


Isopleth probability surfaces

Spatial changes in probability – Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution (Figure 4). The shift was evident from the proportional changes in post-wolf cougar kills within the pre-wolf cougar kill probability surface. For example, only 28.5% of all post-wolf cougar kills were within the 50% pre-wolf cougar kill probability contour (Table 2). Even if I exclude the kills outside the study area boundary the proportion of kills in each probability contour were 5-15% lower than would be expected based on the pre-wolf kill site distribution. However, the mechanism behind the shift could still be related to factors other than wolf presence like seasonal shifts in prey distribution or density.

Figure 4. Comparison of pre- and post-wolf cougar kill site kernel density estimates (KDE) isopleth probability contours showing 25th, 50th, 75th, 95th, and 99th quartiles. The isopleth contours for pre-wolf cougar kill site distribution are fit with post-wolf cougar kill point locations to demonstrate posterior distributional shift.


Table 2. Pre-wolf cougar kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of post-wolf cougar kill sites, % is the proportion of all kills within each polygon ‘donut’, and % Kills is the cumulative proportion of all post-wolf cougar kills within each quartile class.

Wolf activity & seasonal influences – Evaluation of season-specific cougar kills relative to wolf kill activity centers added further evidence toward a spatial shift in the distribution of cougar kills between time periods with and without wolves (Figure 5). The magnitude of the repulsive effect appeared stronger in winter than summer, evident in the uniformly decreased proportions of cougar kills per probability contour of wolf activity, and increased proportion of kills outside any wolf activity between time periods with and without wolves (Table 3). The relationship appeared opposite in summer, with higher proportions of cougar kills observed in the post-wolf period than expected in all but core (25%) and outside (no wolf activity) areas (Table 4). Accounting for seasonal shifts in kill density improved visual interpretation of spatial patterns and reduced another factor that might mask influences of wolf activity. However, the observed shift in winter could still be due to shifts in prey populations as a behaviorally mediated prey response to wolves (i.e. the shift is related to prey responding to wolves, and cougar indirectly shifting distribution to follow prey). The shift could also be related to differences in prey species-specific distributions, as cougar prey predominantly on deer, but show seasonal selection of neonate elk calves in the summer, and wolves prey predominately on elk (i.e. the shift is related to differences in prey species use of habitat). Unfortunately, there isn’t data on prey species at a resolution that will help further tease apart these interactions relative to shifts in prey distributions or density.


Figure 5. Visual comparison of seasonal wolf activity based on kill site density probability contours and the distribution of cougar kill sites across time periods with and without wolves.


Table 3. Winter wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.


Table 4. Summer wolf kill site probability contour attribute table. Isopleths show the 25th, 50th, 75th, 95th, and 99th quartiles for pre-wolf cougar kill density and represent the ‘prior’ distribution of cougar kills. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.

Species kill distribution overlap – Wolves consistently allowed for more of their core and total kill distributions to overlap with cougar than cougar allowed their kill range to overlap with wolves (Table 5). This was not surprising, but yielded a helpful visual and metric to compare distribution overlap evident through visual examination (Figure 6).


Figure 6. Seasonal overlap in core (50%) and 95% kill KDE probability contour between wolves (red) and cougar (blue). Dark blue and red lines represent the 50% (core) probability contours while light blue and red lines represent the 95% (total) probability contours for kill distributions.


Table 5. Species-specific proportional overlap between wolf and cougar core (50%) and 95% kill KDE probability contours. Calculated as overlap area (km2) / cougar or wolf 50% or 95% area (km2) * 100 and related to each species core and total kill area usage.


Latent Selection Differences

Cougar pre-post wolf – Relative habitat use at cougar kill sites after wolf recolonization showed little difference from pre-wolf cougar habitat use in the variables evaluated (Table 6). Cougar selected for denser canopy cover (increased by 1.6%) and steeper slopes (increased 2%) at kill after wolves recolonized the study area, but showed little difference in selection for higher elevations or further distances to wolf kill sites (< 1% increase). Conditional density plots for the variables (Figure 7) further demonstrate the ambiguity of the results relative to selection based on the explanatory variable. They also show how the conditional distribution of the dependent variable (pre/post-wolf cougar kill sites) are better determined in the ‘pre-wolf’ time period based on the explanatory variable due to the higher sample sizes of kill sites in that time period.


Wolf-cougar – Relative habitat use at wolf kill sites showed some small differences compared to cougar habitat use at kill sites in the variables evaluated (Table 7). Wolves selected for less dense canopy cover (decreased by 3.6%) and steeper slopes (increased 4.7%) at kill after compared to cougar, but showed little difference in selection of elevations or distances to wolf kill sites (< 1% increase in elevation and decrease in distance to wolf kill sites). Conditional density plots for the variables (Figure 8) further demonstrate the ambiguity of the results relative to selection based on most of the explanatory variable. The plot for slope did show a distinct transition around 55° between cougar and wolf kill sites. However, similar to the pre- and post-wolf cougar analysis, in general the plots show how the conditional distribution of the dependent variable (wolf or cougar kill sites) are likely better determined for cougar based on the explanatory variable due to the higher sample sizes of cougar kill sites.


Table 6. Latent selection differences between cougar kill sites before and after wolf recolonization. Logistic regression coefficient (β) estimates, exponentiated coefficients and associated 95% confidence intervals, and relative selection.


Figure 7. Conditional density plots for cougar pre- and post-wolf kill sites.


Table 7. Latent selection differences between wolf and cougar kill sites. Logistic regression coefficient (β) estimates, exponentiated coefficients and associated 95% confidence intervals, and relative selection.


Figure 8. Conditional density plots for wolf and cougar (post-wolf) kill sites.


The results from the analyses performed for this class begin to improve understanding the spatial aspects of the relationships between these two carnivores. Predation is recognized as a major factor influencing ungulate population dynamics. Predation effects on prey populations are tied to the complexities of intraguild dynamics, as the predation risk for shared prey can vary relative to the nature of predator-predator interactions as well as based on the behavioral responses of prey to predators (Atwood et al. 2009). In addition to addressing key ecological questions regarding predator-predator interactions, results from this research will provide information on the effect of wolves on cougar populations, and potential effects of this expanded predator system on elk and mule deer populations. Knowledge gained from this study will be critical to effective management and conservation of cougar in Oregon and could be useful to other parts of western North America facing similar changes in community dynamics as wolves continue to expand their range.


Learning Outcomes

Software – I gained increased comfort and familiarity with several programs I have used in the past (GME, R, ArcGIS), and began exploring various other packages in R (stam, ks, adhabitatHR), other programs (QGIS) and programming languages (Python), as well as new tools in ArcGIS to compare and contrast results and expand some ‘next steps’ in my analyses of wolf-cougar interactions.


Spatial statistics – Through exploration of the spatial analyst tool box in ArcGIS I learned a great deal about spatial processes and analysis tools as they relate to different data sources and overarching questions. I found that my event (presence only) data did not lend itself to many of the tools well, which forced me to think about my data structure, question(s) of interest, and how I could (or could not) use certain tools to analyze data and answer my spatial questions about wolf-cougar interactions. It was also extremely valuable to have classmates exploring both similar and dissimilar analyses that I could learn from and apply to my own data explorations.


Peer/Instructor Comments – Both my peers in class and Julia were helpful in guiding the progression of these analyses. Some, like investigating prey mediated spatial responses, I cannot address with my data. Feedback about investigating kernel range overlap sparked the species kill overlap analysis above and a next step will be to conduct a similar analysis comparing kill distribution overlap for cougar between time periods with and without wolves.



Anderson, C. and F. Lindzey. 2003. Estimating cougar predation rates from GPS location clusters. Journal of Wildlife Management 67:307-316.

Atwood et al. 2009. Spatial partitioning of predation risk in a multiple predator-multiple prey system. Journal of Wildlife Management 73:876-884.

Beyer, H. 2012. Geospatial Modelling Environment (software version

Creel et al. 2001. Interspecific competition and the population biology of extinction-prone carnivores. Pages 35-60 in J.J. Gittleman, S.M. Funk, D. Macdonald, and R.K. Wayne (eds). Carnivore Conservation. Cambridge University Press, Cambridge, UK.

Czetwertynski, S.M. 2007. Effects of hunting on the demographics, movement, and habitat selection of American black bears (Ursus americanus. Ph.D. thesis, Department of Renewable Resources, University of Alberta, Edmonton, Canada.

Husseman et al. 2003. Assessing differential prey selection patterns between two sympatric carnivores. Oikos 101:591-601.

Knopff et al. 2010. Cougar kill rate and prey composition in a multi-prey system. Journal of Wildlife Management 74:1435-1447.

Kortello et al. 2007. Interactions between cougar (Puma concolor) and gray wolves (Canis lupus) in Banff National Park, Alberta. Ecoscience 14:214-222.

Kunkel, K.E. and D. H. Pletscher. 1999. Species-specific population dynamics of cervids in a mulitpredator ecosystem. Journal of Wildlife Management 63:1082-1093.

Latham et al. 2013. Spatial relationships of sympatric wolves (Canis lupus) and coyotes (C. latrans) with woodland caribou (Rangifer tarandus caribou) during calving season in a human-modified boreal landscape. Wildlife Research 40:250-260.

Lawton, J.H. and M.P. Hassell. 1981. Asymmetrical competition in insects. Nature 289:793-795.

Ruth, T.K. 2004. Patterns of resource use among cougars and wolves in northwestern Montana and southeastern British Columbia. Ph.D. dissertation, University of Idaho, Moscow, ID, USA.

Tilman, D. 1986. Resources, competition and the dynamics of plant communities. Pages 51-74 in M.J. Crawley ed. Plant Ecology. Oxford: Blackwell Scientific Publications.


Analyzing Hydrologic Patterns in a Single Basin in the Oregon Coast Range

Filed under: Final Project @ 5:13 pm


My master’s thesis involves quantifying geomorphic change in streams after the addition of a large wood jam for salmon habitat restoration purposes. Large wood has been shown to have many positive effects on salmon habitat, including creating pools and backwater areas that are essential for juvenile salmon survival. This past summer, I surveyed seven stream reaches within a single basin in the Oregon Coast range to capture the in-stream and floodplain topography before the addition of large wood restoration structures. Three other sites were previously established in 2015 for a total of ten study sites. I plan to survey these sites again next summer to capture changes in topography caused by the interaction of wood, water, and sediment.

A major part of my fieldwork is monitoring water stage height at different locations in the drainage basin to capture the hydrologic patterns that are driving the geomorphic change at each LW site, including the frequency and duration of inundation of these log jams. I formulated three questions to help guide my analysis of this data:


  1. How well correlated are stage gage data at different locations within a single basin, based on the amount of distance between them?
  2. How does upstream-downstream connectivity, drainage area, and position within the catchment affect the relationship between different site hydrographs?
  3. How can I model the physical water surface elevation at different cross sections at my log jam sites?


To conduct my analysis, I used water stage data gathered using Level Loggers at 6 different engineered log jam sites in the Mill Creek watershed. Three Level Loggers were installed in Fall 2015 and three more were installed in Fall 2016. Stage height was recorded at 15-minute intervals at each site. These stage gage instruments are pressure transducers. Therefore, I used a seventh gage that recorded atmospheric pressure to correct the stage height in MATLAB after downloading the data in the field.

I also have discharge rating curves for four of these stage gage sites, obtained through repeat discharge measurements taken at four of the stage gage sites. These can be used to calculate the discharge for any given stage height and obtain values that I can incorporate into my model of the physical water surface elevations.

The approximate location of each instrument was recorded using a handheld GPS unit. The streamwise distance between sites varied from 0.63 to 7.51 kilometers (Table 1) calculated using attribute table statistics in ArcGIS. The upstream drainage area of each stage gage site was from 4.02 to 21.76 square kilometers (Table 2), calculated using the NetMap Toolbox in ArcGIS.

Three water surface elevation (WSE) rebar were also installed at each log jam site. The top of each WSE rebar was surveyed in with a Total Station during my summer field season. During three subsequent visits to the field, I noted the distance between the top of the rebar and the water surface, marked off in 2-cm increments. This data could be later correlated with the stage gage data in order to answer my third question.


Question 1

I expected to see a negative relationship between distance between instruments and correlation coefficient.

Question 2

I expected sites with more similar drainage area sizes that are closer together geographically will have more similar hydrograph shapes. I expect that streams that are connected will also be more similar.

Question 3

I wanted to be able to use the step-backwater equation or HEC-RAS to model the physical water surface elevations of my sites, based on the physical observations of the WSE rebar and my stage height and discharge data.


I worked mostly in Excel and ArcGIS, using ArcGIS to derive basic spatial relationships between my stage gage sites and Excel to calculate statistics from these data.

Question 1

For Question 1, I used to Excel to import and organize all of my stage gage data. I identified the time period where I had data for all six gages, then created a table to set up a pairwise correlation coefficient analysis for all combinations of the six gages for that range. Finally, I calculated the correlation coefficient for each pair of stage gage arrays in the table. To measure the streamwise distance between each pair of stage gages, I used a previously created map in ArcGIS with GPS coordinates for the six stage gages and the stream polyline on which the gages were located, selecting all portions of the stream polyline between each pair of stage gage sites and using the Statistics tool in the stream polyline attribute table to calculate the sum of the selected lengths, which I input into another pairwise Excel table. Finally I used Excel to create an X-column with the distances between the paired gages and a Y-column with the correlation coefficients between the paired gages. Then I plotted X vs. Y as a scatter plot and examined the relationship.

Question 2

For Question 2, I used attribute tables in ArcGIS to determine the upstream drainage area of the four sites I was interested in. I chose to look at hydrograph relationships at four of my sites that are all have upstream-downstream connectivity but have different upstream drainage areas and are located on different tributaries. Then I used Excel to further manipulate the stage gage data that I had started to analyze for Question 1. I graphed the difference in stage heights over time between site with the largest drainage area, and each of the three other sites to examine patterns in the hydrographs, creating three “stage difference” graphs. I also graphed the raw hydrograph data for comparison.

Question 3

For Question 3, I scaled down my analysis and created a DEM of a single log jam site in ArcGIS. Then I interpolated an elevation profile perpendicular to each of the three water surface elevation rebar which I imported into Excel. Then I used an Excel spreadsheet that utilizes the step-backwater method to calculate and graph the water surface elevations at the three rebar locations.


Question 1

Correlation coefficients ranged from 0.86 (Site 9 vs Mainstem) to 0.97 (Site 26 vs South Fork). Distances between sites ranged from 625 m (Site 36 vs Mainstem) to 7512 m (Site 9 vs South Fork). Five out of the six site pairs with the lowest correlation coefficients included the Mainstem site as part of the pair.

The plot of distance vs. correlation coefficient yielded a slight negative relationship with an R2 of 0.0117 which indicates low correlation (Figure 2).


Question 2

All sites of interest have upstream-downstream connectivity with Site 26 (Figure 3). Upstream drainage areas ranged from 4.02 to 21.76 km2 (Table 4) while pairwise distances ranged from 0.63 to 3.76 km (Table 3).

Mainstem (MS) and South Fork (SF) are on the same tributary have different upstream drainage areas and are different distances from Site 36 (Table 3, Table 4, Figure 3). Both graphs showed little difference with Site 26 during baseflow conditions. The 26 v MS graph was relatively flat but it still had some high peaks during the two largest floods of the season which indicates other sources of water that are influencing the hydrograph (Figure 4). The 26 v SF graph had high positive peaks during floods (Figure 5).

Mainstem and Site 36 are a similar distance from Site 26 but have different drainage areas and are on different tributaries (Table 3, Table 4, Figure 3). They are on also on different tributaries (Figure 3). The 26 v. 36 graph looked very similar to the 26 vs. SF graph (Figure 5).

Site 36 and South Fork have similar drainage areas but are different distances from Site 26 and also are located on different tributaries (Table 3, Table 4, Figure 3). The two graphs were very similar in shape despite differences in geographic location, though the Site 36 hydrograph was more similar to Site 26 during baseflow conditions (Figure 5).

Question 3

Two of the WSE rebar (WSE 2 & WSE 3) are located very close together, at the upstream end of the site. The third rebar, WSE 1, is located at the downstream end of the site (Figure 6). Cross section numbers correspond with WSE rebar numbers.

The November and February WSE rebar observation dates resulted in similar, higher water surface elevations with lower water surface elevations in December (Table 5).

Stage height ranged from 0.364 to 0.692 m for the three WSE rebar elevation dates, with the lowest stage height in December and the highest in November. Discharge was correlated with stage height, with the lowest flow in December, 0.85 m3/s, and the highest flow in November, 2.16 m3/s (Table 6).

The plots generated from the step-backwater spreadsheet in ArcGIS looked relatively similar for Cross Sections 1 and 3, with a lower water surface elevation in December and two very similar, higher water surface elevations in November and February (Figure 7, Figure 8). This corresponds with the patterns I observed in the WSE rebar data and the stage and discharge data (Table 5, Table 6).

Graphing the WSEs at Cross Section 2 yielded different results, with a much larger separation between the November and February WSEs. This might be due to the different topography at Cross Section 2 which does not constrict the water in the same way as it does at Cross Sections 1 & 3. Figure 5 shows the simulated water surface elevation if the water flowed all the way across the cross section into a secondary channel, though in reality it probably only accumulates in the main channel (on the right) because of the obstruction in between the two channels.

Several of the calculated water surface elevation values have large discrepancies (~0.2-0.3 m) which indicate issues with the data (Table 3). I repeated the entire analysis and also tried changing the Manning’s roughness values in the spreadsheet which can have an effect on flow levels, but neither of these methods had any significant impact on the outcome.

Graphing the observed and calculated water surface elevations in comparison with the bed elevation also illustrates inconsistencies with the data. All three observed WSE lines follow the same general shape, increasing gradually and then dropping off. The calculated WSE line for December follows the same pattern, but the lines for November and February have the opposite shape, with an almost flat water surface for most of the reach and a sudden jump up at the last cross section (Figure 6).


The overall purpose of my research is to explore the response of site-specific features that vary with scale to engineered large wood jams within an individual basin. This kind of research can help refine stream restoration efforts and optimize resource allocation. For this class, I focused on understanding the hydrologic patterns on my watershed. The patterns of discharge, floodplain and large wood inundation throughout the basin can help clarify the picture of what is truly driving the geomorphic changes that we measure in the watershed.

Question 1

When addressing Question 1, I learned that regardless of location in the basin, the water levels are responding relatively uniformly, with only a slight delay between sites that are spaced further apart. Instruments are located on different tributaries so the relationships are slightly more complicated than I expected but I think I will still be able to apply what I have learned from this analysis to extrapolate to other parts of the basin where I do not have stage data.

Question 2

For Question 2, I learned that sites with similar drainage area sizes and geographic locations behave more similarly, and that drainage area has a greater influence on the similarity of hydrographs than distance does. I was also able to identify a portion of the hydrograph that is still missing and attribute it to the water flowing through North Fork Mill Creek, which is not instrumented. It would be interesting to install a LevelLogger on this tributary in the future to see if it completes the picture

Question 3

The results of Question 3 were not very conclusive. My analysis showed that I do not have enough data points to understand what is really going on. Two of the cross sections (2 & 3) are not spaced very far apart which could also have contributed to the issues with the data. Despite these issues, this analysis gave me insight into what I need to do in the future to ensure the robustness of my dataset, namely more, spaced-out water surface elevation rebar at each site to create a more complete water surface elevation profile.

All of my findings reinforce relatively well known hydrologic concepts. I will be able to provide more interpretation of these patterns after I have gathered my second round of geomorphic data. Then I hope to be able to correlate these hydrologic patterns with the changes we see in local channel morphology around each log jam site. This kind of information will help restoration planners determine where in the basin is the best place to put a log jam.

Learning Outcomes

At the beginning of the term, I assessed my proficiency with ArcGIS, Python, and R as follows:

ArcGIS: Intermediate level (I spent last term working with it. I can perform most basic manipulations and figure out what I don’t know how to do from the help menus and the internet.)

Python: Beginner level (I have used it in two classes to perform basic calculations and plot data/results.)

R: Very limited experience (I used it a few times in my undergraduate Statistics class but that was 7 years ago so I will probably be very rusty.)

I was excited to expand my skills in Python and R but the class did not end up providing me with a lot of opportunities to do so. I was encouraged to use software that I knew and therefore ended up conducting my analyses entirely in ArcGIS and Excel. I did not have the type of spatial data that was suited to large-scale spatial statistics anyways, so my only exposure to methods such as hotspot analysis, spatial autocorrelation, regression, and multivariate methods was through the tutorial presentations, where I learned about how other people had used these approaches. It would have been more educational to apply these concepts to my own data but unfortunately that was not really an option.


Tutorial 1

In Tutorial 1, one of my reviewers suggested that my method for calculating distance between points was not very efficient. It is possible that I could have used Python to automate the process, but the process of teaching myself how to use Python in order to perform this relatively simple if repetitive task would probably have taken more time than just doing it the way I did it. If I had to perform this manipulation thousands of times instead of less than twenty times, it may have been worth it to explore other options. The same is true for my methods for calculating correlation coefficients.

Tutorial 2

In Tutorial 2, I got a lot of questions about other factors besides drainage area and geographic location, such as elevation, that could also be influencing the hydrograph patterns. One reviewer also suggested that I compare sites with similar drainage areas more closely to see if drainage size really is the biggest determining factor in flow variation under different flood and drought scenarios. These are all really great suggestions that I look forward to exploring in the future.

Exercise 1

I did not get any suggestions for how to improve Exercise 1.

Exercise 2

Based on the comments I got on Exercise 2, I converted my units of drainage area from m2 to km2 which are more reasonable and understandable. I also clarified the naming conventions for my sites. I combined what had initially been three separate graphs into one graph to more efficiently illustrate hydrograph patterns. Finally, I added a discussion of the limitations of my approach.

Exercise 3

Apparently my WSE rebar labeling was not very clear so I improved that for my final write-up.


Final Project: Understory Vegetation Response in Ponderosa Pine Forests

Filed under: 2017,Final Project @ 1:41 pm

Research Question

This quarter, I explored the following question: how does understory vegetation percent cover respond to ponderosa pine canopy cover and other environmental variables across Deschutes County? This question is a substitute for my thesis research, which takes place in the Willamette Valley. I substituted Deschutes County for the Valley due to the lack of available data within the Valley, and the availability of ponderosa pine plot data from the USFS’s FIA program in Deschutes.

For the first exercise of this class, I explored autocorrelation within my two main variables. Next, I asked if the relationship between the two variables was geographically influenced. Finally, I looked for the specific influences of each variable upon understory vegetation cover in the regions defined by the second exercise.


I used a subset of the Forest Service’s massive FIA dataset to replicate what I imagined my own data would look like. The biggest difference between this dataset and my own is its geographic location. Because the FIA plots are gridded, and the Willamette Valley ponderosa pine plantations are so small, there were very few FIA plots in the Willamette Valley that are entirely composed of ponderosa pine. As a substitute, I used plots from Deschutes county, which has a much higher proportion of pure ponderosa forests than the Willamette Valley does. There were 98 plots in Deschutes county that had ponderosa overstories, and 46 within that group with sufficient understory vegetation data. I used plot and subplot data to find the associated location, inventory year, canopy cover, and understory percent cover based on growth form (bare soil, forb, grass, shrub, and seedling). I included the environmental variables elevation, slope, and aspect, sourced from a DEM layer, to help explain additional variation within understory cover. I also acquired a soil map and included soil codes and drainage types in the final dataset.

Figure One: Ponderosa Pine plot dataset


I hypothesized that understory species associated with ponderosa pine forests are adapted to open canopies with abundant light availability; therefore, the greatest  percent coverage of all understory growth forms would be present in low-cover canopies. I expected that there would be considerable variance in the understory response that cannot be attributed to canopy conditions, because environmental factors and previous land use will strongly influence the understory composition. However, I included the additional environmental variables in my correlation assessment to help identify the influence that canopy cover does exert over the understory.


Part One

For exercise one, I assessed the autocorrelation within the canopy cover and understory vegetation cover. I used the Global Moran’s Index value in ArcMap to measure autocorrelation in these two variables. I had originally wanted to perform a variogram in R for this dataset, but I decided against this method, because my data is arranged as a low-density band of points, rather than a raster or a grid of points. It was also difficult to find an efficient piece of code to perform a variogram on a .csv file in R, so I decided to go with the more user-friendly ArcMap.

Figure Two: Deschutes County plot distribution

The steps to accomplish this analysis were very simple. I imported my .csv file with the FIA plots, their associated latitudes and longitudes, and other values into ArcMap as X,Y data. Then I converted the points to a shapefile. I projected this shapefile into the right UTM zone, so that the tool would work properly. Finally, I ran the Global Moran’s I tool under the Spatial Statistics > Analyzing Patterns toolbox. I performed these steps twice: once for the vegetation cover and once for the canopy cover.

Part Two

For exercise two, I performed a geographically weighted regression on canopy cover and understory vegetation cover to identify plot clusters based on the relationship between the two variables. Using the same shapefile I created in exercise one, I added the plot points to ArcMap. In ArcMap’s GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat unevenly distributed across the landscape, I used an adaptive kernel with an AICc bandwidth method, which means variable distances are allowed when assessing neighboring points.

Part Three

For exercise three, I used R to run three multiple linear regression analyses: two on the clusters identified in exercise two, and one on the entire dataset. To separate the dataset, I selected the clusters based on color intensity in ArcMap. The redder or bluer the points, the more strongly clustered they were. I exported each cluster’s attribute table in ArcMap, then converted the tables to .csv files in Excel. I opened each .csv in RStudio, and used the following code to run an MLR on both clusters and the full dataset:

# Read .csv files.

Full <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Full_Points.csv”)

Blue <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Blue_Points.csv”)

Red <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Red_Points.csv”)

# MLR fits

Full_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Full)

Blue_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Blue)

Red_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Red)

# Read results





From exercise one, I eventually found that there was no autocorrelation within either of my primary variables (canopy cover, understory vegetation). During my first trial, I did find autocorrelation within the understory vegetation variable. However, I revisited the source of my dataset, and found that the majority of the zeros that my dataset included were actually “no value” placeholders. I removed these entries from my dataset, and performed the analysis again, finding that there was no autocorrelation in the understory vegetation distribution.

Figure Three: An example of a Global Moran’s Index output

Exercise two revealed that my dataset is clustered, which was only visible after I switched the symbology to variable coefficient, rather than standard deviation. The GWR I performed on canopy cover and understory vegetation showed a different relationship between the two variables in two distinct areas. This exercise was the most mentally challenging of the three, because it took me a while to grasp the concept of a geographically weighted regression.  

Figure Four: Clustering in GWR output

In exercise three, my results were initially surprising. The cluster MLRs did not reveal any significant influences by the six explanatory variables; all p-values were well above 0.05. However, the full regression did result in low p-values for both the soil type (p = 0.0141) and soil drainage (p = 0.0471) variables, which indicates that there is a relationship between soil type and soil drainage and vegetation cover across Deschutes County ponderosa pine forests at a larger scale.


Although this type of analysis has been performed countless times in many different forest ecosystems, this approach is significant in teaching forest scientists and land managers how plant communities respond to each other and their surroundings. Understory vegetation growth can be influenced by canopy cover, but canopy cover may not always be the limiting variable. In this dataset, canopy cover did not significantly influence understory vegetation. Nor did slope, aspect, or elevation. Vegetation cover is certainly responsive to environmental variables, but the relationships may be too fine to attribute to a single measurement.

In terms of my own learning, practicing statistical analyses on this placeholder dataset helped me cement concepts I had learned on paper, but that I wasn’t sure how to apply to a research scenario. Statistics can be somewhat of an invisible science, because there isn’t always a visual representation of how the data is being processed or presented. Statistics is still a bit of a weak spot in my skillset, but these exercises helped me gain a sturdier foundation in analyzing and interpreting statistical information within a dataset.

Your Learning

The majority of my exercises were completed in ArcMap, which, out of the programs available to us, I was the most familiar with. However, I was able to use new tools, such as Global Moran’s Index and Geographically Weighted Regression. I was also glad that I was able to revisit R for the final exercise. Although it would have been nice to learn a few tasks in Python or Matlab, this class provided me the opportunity to maintain and strengthen the skills that I developed in past classes, which I don’t get the chance to use very often.

What Did You Learn About Statistics

This class was my first introduction to spatial statistics, so I had a bit of a learning curve for each exercise. Certain concepts that seemed simple enough, e.g. two variables changing in similar ways, became lost in the new terms I was learning (geographically weighted regression =/= autocorrelation!) but I think I managed to get all the terms straight in the end.

One concept that was surprising to me was the inherent neutrality of autocorrelation or clustering. Sometimes autocorrelation spells doom for a dataset, but other times it just provides an explanation for phenomenon. The quantification of autocorrelation might even be the entire purpose of a study. For my project in this class, the discover of clustering in my dataset was surprising, but not undesirable. It simply revealed a previously invisible trait of the plot data I had downloaded.

How Did You Respond to Comments

Julia’s commentary on my exercises and in person helped keep my exercises moving in a fairly linear direction. Her comments were especially valuable as I was assessing exercise two, which only became clear after I correctly displayed the coefficients.

On tutorial one, I received a comment that suggested including climatic variables in my analysis. I did end up including environmental variables in my final analysis to try to explain some of the clustering in the dataset. The other comment on tutorial one recommended both a GWR and multiple linear regression. I ended up using both of these for the last two exercises. In tutorial two, one commenter suggested I explore environmental variables as influencers in understory vegetation cover, which I did end up including in my MLR for exercise three, and the final commenter did not leave constructive suggestions.

June 9, 2017

Final Project Post

Filed under: Final Project @ 9:39 pm

Revised Spatial Problem

During this class I worked on two separate projects.

Project 1: After much exploration of data and problems associated with my original spatial problem, I have slightly revised my questions. I originally wondered if I could obtain information about some of the factors that might influence the presence of 3 amphibian diseases (Ranavirus, Bd, and Bsal) and perhaps create a habitat suitability model. To do this, I was going to need information on the current distributions of the pathogens. I quickly realized most of this baseline information is missing therefore it is hard to create a predictability model. And so, my new approach is to create a model after I collect data on disease prevalence in my area of interest (Deschutes National Forest). Now I will collect up to 160 samples from the Deschutes National Forest and perform cross validation with half of those withheld samples to the remaining amount. By doing this, I can create a region-specific model of predicted occurrence after samples are collected. Once I collect information on pathogen distribution I also would like to answer some questions using spatial statistics afterward.  My collection starts this summer so I had no data to work with, so instead I used artificial data I created.

Slightly revised questions are:

  • Are ranavirus, Bd, and Bsal present in Deschutes NF, and at what scale?
  • What kind of spatial patterns are associated with pathogen presence, movement, and spread?
    • (Exercise 2; Tutorial 1; I asked a smaller question within this bigger question- “How is Bd presence/absence related to the distance from road or trail?”)


  • I hypothesize that there will be a greater presence of pathogens at lower elevations because of a longer seasonal activity period due to warmer temperatures.
  • I hypothesize that the more amphibian species present the more likely a pathogen is to be present because of likely transmission between hosts.
  • I hypothesize that the size of the water body will influence detection rates of pathogens because of dilution of the pathogen in bigger water bodies.
  • (Exercise 2; Tutorial 1) I hypothesize that Bd presence will be higher in areas that see higher visitor use because of higher transmission rates.

Project 2: For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed to get done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My spatial questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests/meadows?


  • Meadows are reptile biodiversity hotspots due to warmer temperatures.
  • Aspect matters for reptiles in forests and meadows due to southern-related aspects being warmer.


Dataset Description

I worked on 2 different projects, so different data was used for both.

Project 1: My first project was related to the question: “How is chytrid fungus presence/absence related to the distance from road or trail?” I chose this question because humans are known to be one of the primary causes of pathogen transmission. For this project, the following data was used:

  • Deschutes NF boundary (National Forest Service)
  • Water body layer (standing) from USGS- All of Oregon- clipped to Deschutes NF boundary
  • Road layer (National Forest Service)- all of Oregon- clipped to Deschutes NF boundary
  • Bd presence/absence points- created points in Deschutes NF within water bodies

Project 2: The second project I worked on was related to two questions and finding suitable habitat: “Are forest meadows reptile biodiversity hotspots? Does aspect matter for reptiles in forests?” Following data was used:

  • Deschutes NF boundary (National Forest Service)
  • Elevation DEM layer (USGS)- All of Oregon- clipped to Deschutes NF
  • Habitat type layer (National Forest Service- Deschutes only)
  • Aspect layer generated with ArcMap- All of Oregon- clipped to Deschutes NF
  • Roads layer (National Forest Service)- All of Oregon- clipped to Deschutes NF
  • Water layer (standing and flowing) (USGS)- All of Oregon- clipped to Deschutes NF


Project 1: I used Logistic Regression for my y variable (chytrid presence/absence) related to an x variable (for this I used distance from road or trail) by using python script and R. Logistic regression is shown to be the best based on Table 6.1 in exercise 2 because my x variable is continuous (distance to road/trail) and y is binary (present/absent). To run Logistic regression, I had to prep my data. I wanted to answer the question, is the presence of Bd more likely near a road or trail (where humans are transporting the pathogen)? Therefore, I had to get distance from sample points to the nearest road. To do this, I first had to bring in multiple layers of roads and trails within Deschutes National Forest. I used the “Merge” tool to bring all the layers together. My next step was to find the distance from the sample point to the nearest road or trail in my new “merged roads and trails layer”. I used the “Near” tool which generated a value representing the distance from the sample point to the nearest road or trail. Once I had that information, I ran logistic regression where I used Bd as my dependent variable, and distance from road as my explanatory variable. I also used (just for practice) OLS, and hotspot analysis.

Project 2: I performed site suitability/selection for this study based on the following criteria:
Need to identify 4 sites (had to be 4 to match previous experiment):

  • each site would have 4 treatments
    • (a North and South, open meadow and closed forest)
  • road access (< 300m from road)
  • sites at similar elevations
  • similar forest types
  • similar proximity to non-forest
  • each site has the same proximity to water
  • area would be of similar size
  • created by natural processes, not a clear-cut.

I used many tools to determine site suitability for the field experiment such as Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute. Overall my approach was to narrow down possible meadows in the Deschutes National Forest to meet the above criteria. For entire approach and walkthrough see Tutorial 2.


Project 1: For this project my results produced statistical relationships. Logistic regression works best for my data and was useful because it showed a significant relationship between the two variables as seen below.



Project 2: This project resulted in a map. All of my work resulted in 4 suitable sites that I will use for one of my research projects- determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 1.

For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made. Future work- would be interesting to automate this process using python.

y = f(x)
> glm(type~diversity)
Call:  glm(formula = type ~ diversity)
(Intercept)    diversity
0.006173     0.098765
Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
Null Deviance:     2
Residual Deviance: 0.4198     AIC: 5.123





Project 1: Because my data was artificial, this was more of a learning experience. However, if this data was real then this project would have allowed me to determine if areas closer to roads had higher disease prevalence. This is important because of roads being travel corridors for humans, but also amphibians. Corridors like roads allow for easier transmission between areas that may have been geographically isolated in nature.

Project 2: My results showed me areas that had all requirements I needed. Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.


I learned a lot about various statistical methods used in spatial studies, as well as improved my understanding of methods I previously used in Stats 511, such as regression. Not only do I understand the theory better but I understand more about how to actually perform these tests within R and ArcMap. My understanding of what is available in ArcMap related to statistics was improved upon. In addition, I learned how to perform habitat or site suitability selection which was a big skill I wanted to gain. My skill and ability in ArcMap/GIS definitely improved. I learned that importing and using my own data in R wasn’t as scary as I thought it was going to be.

I learned a lot about requirements needed to run various test which was super helpful. I liked having Dr. Jones PowerPoint slides that broke tests down in a simple way where you could find the most appropriate test for your data. I learned that hotspot analysis is useful if you are trying to find patterns of things such as location in disease prevalence. Would be useful for trying to predict other areas that may have outbreaks.  I did not explore much in the spatial autocorrelation world due to my data and questions being not exactly suitable for that type of analysis. My understanding of regression and multivariate methods improved. I ran OLS, and logistic regression tests which showed interesting results but had different requirements to be used such as binary or continuous data which I was not too familiar with beforehand.

On Comments

Comments given to me by fellow classmates and Dr. Jones were extremely helpful. For example, on tutorial 1/exercise 2 I confused one of my variables (distance from road) as binary when it actually was continuous. This changed tests that were appropriate for my dataset but I used logistic regression which worked. In addition, comments from classmates were very helpful and gave great insight to my work. For example, on tutorial 2/exercise 3 I performed habitat suitability based on various factors. A comment made by one of my peers expressed the importance of getting the most up-to-date layers available because factors could change rapidly, such as habitat type after a wildfire goes through an area. This gave some good insight I did not really think about before.

Final project: Can remotely-sensed sea surface temperature and chlorophyll-a concentration predict the presence of blue whales in New Zealand’s South Taranaki Bight region?

Filed under: Final Project @ 4:59 pm

Research Question

My research focuses on the ecology of blue whales (Balaenoptera musculus brevicauda) in the South Taranaki Bight region (STB) of New Zealand (Figure 1). Despite the recent documentation of a foraging ground in the STB (Torres et al. 2015), blue whale distribution remains poorly understood in the southern hemisphere. The STB is New Zealand’s most industrially active marine region, and the site of active oil and gas extraction and exploration, busy shipping traffic, and proposed seabed mining (Torres 2013). This potential space-use conflict between endangered whales and industry warrants further investigation into the spatial and temporal extent of blue whale habitat in the region. My goal for this term was to begin to investigate the relationship between blue whales and their environment. Specifically, the question I am asking here is:

Can the number of blue whales present in an area be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration?

Figure 1. A map of New Zealand, with the South Taranaki Bight region (STB) delineated by the black box.


For these analyses, I used blue whale sighting data from our 2017 survey of the STB. Between 2 and 21 February, we conducted 1,678 km of survey effort. During this time, we recorded 32 blue whale sighting events which consisted of 68 individual whales.

In addition to the blue whale sighting data from our survey, I downloaded two rasters containing average sea-surface temperature (SST) and chlorophyll-a (chl-a) concentration for the month of February 2017 from NASA’s Moderate Resolution Imaging Spectrometer (MODIS aqua) website ( These layers use a 4 km2 grid cell size.


The STB region is characterized by a wind-driven upwelling system which originates off of Kahurangi Point and curves around Farewell Spit and into the bight (Shirtcliffe et al. 1990). Previous work has found that blue whales associate with cold, well-mixed, productive waters (Croll et al. 1998). Additionally, previous work on cetacean habitat modeling has found that animals tend to associate with oceanic fronts and eddies, where water masses of different temperatures are meeting and mixing and appear to aggregate prey (Becker et al. 2010, Cotte et al. 2011). Therefore, I hypothesize that the number of blue whales present within a given spatial area would show a negative relationship with temperature, and a positive relationship with deviation in temperature and chl-a concentration.


For my final analysis for this course, I was able to build a model which included both blue whale presence and absence data. First, I loaded the two raster layers in ArcMap and overlaid our research vessel’s trackline as well as the blue whale sighting locations (Figure 2).

Figure 2. Remotely-sensed chlorophyll-a concentration, overlaid with the trackline of our research vessel (black lines) and blue whale sighting locations (red circles).

I obtained remotely-sensed SST and chl-a values for every 4 km2 grid cell that our research vessel transited through using the “extract by mask” tool in ArcMap to extract only the portion of the rasters for which we have actual observations. Subsequently, I used the “raster to point” tool to create a layer with a single point value at the center of each cell we transited through (Figure 3). I then used the “near” tool to sum the number of blue whales sighed in each grid cell and extract that value to the points containing the oceanographic information.

Figure 3. The results of the “extract by mask” and “raster to point” operations, which were used to extract just the grid cells from the remotely-sensed chlorophyll-a concentration raster that our research vessel transited through. The points represent an extracted value for each raster grid cell.

Because it appears that there may not be a simple linear relationship between blue whale presence and the values for SST and chl-a, which can be seen in the work I did for tutorials 1 and 2, I wanted to explore what other environmental features can be gleaned from these remotely-sensed raster datasets. In order to assess the possibility of fronts influencing blue whale distribution through a coarse proxy, I computed the standard deviation of both SST and Chl-a for each cell along the ship’s trackline. To do this, I used the “focal statistics” tool within the neighborhood toolbox in ArcMap. This allowed me to calculate the standard deviation for each cell across the surrounding 5×5 cell area. Since each of my grid cells is 4km2, this means that I had a value for the standard deviation across a 20km2 area. The results of this calculation for SST are shown in figure 4.

Figure 4. A raster layer with the standard deviation of sea surface temperature over the surrounding 20 km2 for each grid cell, overlaid with the locations of blue whale sightings.

I then used the “extract values to points” tool to add sd(SST) and sd(chl-a) to the points from the ship’s trackline. The resulting attribute table contained geographic location, number of whales, SST, chl-a, sd(SST), sd(chl-a) for every grid cell that our ship transited through. I exported this attribute table as a .csv file so that it could be read into R for further statistical analysis.

Finally, I built a generalized additive model (GAM) to assess the effect of these remotely-sensed oceanographic variables on the number of blue whales present in an area, which can be written as:

Number of blue whales ~ Chl-a + SST + sd(Chl-a) + sd(SST)

I used a GAM rather than a general linear model (GLM) because my assessments from previous exercises indicated clearly that the relationship is not linear. I chose to use a Poisson distribution, because my response variable contained many values of zero and one, causing a skew in the distribution. The R script I used to run the model is copied below:


The results of the GAM are shown in Table 1. Both SST and sd(SST) were significant predictors of the number of blue whales present. There was no significant relationship between either chl-a or sd(chl-a) and the number of blue whales present. Of the two temperature variables, sd(SST) was a better predictor than just SST.

Table 1. Output from the generalized additive model. Statistically significant values are denoted by an *.

Parameter DF Chi-square p-value
















Overall, only 4.32% of the deviance in the data could be explained by the model.


Overall, the model I built here does not explain very much of the patterns that may exist in this dataset. That is not overly surprising to me, as I am working with remotely-sensed data that is averaged over a period of a month and I know from personal experience that conditions in this area fluctuate temporally. Moving forward on this project, I will incorporate in situ measurements of SST, fluorescence (a measure of chl- concentration), as well as prey density as measured by acoustic backscatter strength. The work I did this term was a valuable exercise in working with these concepts, and I believe that my model will improve with the inclusion of finer-scale predictor variables.

I believe that the results presented here do have biological relevance, and follow what is currently established by the literature. Changes in temperature may not have a direct effect on blue whale distribution, however temperature can be used as a proxy to evaluate other oceanographic phenomena such as oceanographic fronts which may be of greater direct significance in defining blue whale habitat.

This habitat work is of particular interest as relatively little is known about blue whale distribution and habitat use in the Southern Hemisphere, including in New Zealand. The high concentration of industrial activity in the STB region increases the importance of knowing when and where blue whales are most likely to be present so that management strategies can be established accordingly.

Statistical Approaches Learned

I started the term off by using a hotspot analysis in exercise 1. For exercise 2, I used a geographically weighted regression (GWR). I think that the GWR is a useful and powerful tool, however I don’t think my dataset is big enough for it to be especially useful, at least in the way that I was applying it. For exercise 3, I used a GLM and practiced moving between ArcMap and R. For the final piece of this project I performed further analyses in both ArcMap and R, ultimately settling on the GAM presented here. There were numerous other trials of various tools that did not make it into any exercises or tutorials, but were useful practice nonetheless.

One thing I think I could have spent some more time on this term is investigating spatial autocorrelation between my variables of interest, through approaches such as correlograms or variograms. Perhaps this is something I will spend a bit more time on as I move forward with my analysis.

I can certainly say that my abilities and my confidence in my use of both Arc and R have improved over the course of this term. While I was working with a dataset which is minimal compared to what I hope to include moving forward, I think that I am now prepared with several tools which will be useful when I incorporate more data. Finally, I found it tremendously helpful to discuss approaches with and receive feedback from my classmates!

Response to Comments from Previous Tutorials

In my tutorial 1, I used a geographically weighted regression to examine the relationship between chl-a concentration and blue whale group size at each blue whale sighting location. What I found was that the regression model could not fit a linear relationship between the two. Based on my findings and feedback I received, I used a non-linear model for my final analysis. One piece of feedback I received was a question about whether the behavior of the animals may affect their distribution and habitat use. I think that this is an interesting question, but that since I am just using one year of data for this analysis my sample size for different behaviors may not be large enough to draw meaningful conclusions. However, this is something I may pursue when I include more years of survey data in my analysis.

For my tutorial 2, I used a GLM to evaluate chl-a and SST as predictors of blue whale group size at blue whale sighting locations. It was suggested that I use a Poisson distribution in my regression model, which I did for the final analysis. For tutorial 2, I was not yet able to extract all of the absence values and so I was advised that my results would likely be more meaningful with the inclusion of points without whales. I was able to do this for the final analysis presented here, and the significance and interpretability of my results did improve. Additionally, I got a suggestion to use principal component analysis in order to decide which oceanographic variables to retain in my model. I think that this approach will be very useful moving forward once I have processed additional data and have several more variables to include.


Becker EA, Forney KA, Ferguson MC, Foley DG, Smith RC, Barlow J, Redfern J V. (2010) Comparing California current cetacean-habitat models developed using in situ and remotely sensed sea surface temperature data. Mar Ecol Prog Ser 413:163–183

Cotte C, d’Ovidio F, Chaigneau A, Lèvy M, Taupier-Letage I, Mate B, Guinet C (2011) Scale-dependent interactions of Mediterranean whales with marine dynamics. Limnol Oceanogr 56:219–232

Croll DA, Tershy BR, Hewitt RP, Demer DA, Fiedler PC, Smith SE, Armstrong W, Popp JM, Kiekhefer T, Lopez VR, Urban J, Gendron D (1998) An integrated approach to the foraging ecology of marine birds and mammals. Deep Res II 45:1353–+

Shirtcliffe TGL, Moore MI, Cole AG, Viner AB, Baldwin R, Chapman B (1990) Dynamics of the Cape Farewell upwelling plume, New Zealand. New Zeal J Mar Freshw Res 24:555–568

Torres LG (2013) Evidence for an unrecognised blue whale foraging ground in New Zealand. New Zeal J Mar Freshw Res 47:235–248

Torres LG, Gill PC, Graham B, Steel D, Hamner RM, Baker S, Constantine R, Escobar-Flores P, Sutton P, Bury S, Bott N, Pinkerton M (2015) Population, habitat and prey characteristics of blue whales foraging in the South Taranaki Bight, New Zealand.


June 5, 2017

My Spatial Problem Revised

Filed under: 2017,Final Project @ 1:03 pm

1) The Question I asked:

Does site aspect play a significant role in radar backscatter after processing?

2) Description of the Dataset

The backscatter dataset comes from the Sentinel-1 satellite constellation, which is equipped with C-band radar and has global coverage. The return time for a given area is 6-12 days, with a ground resolution on the order of 10 m (after pre-processing steps have been completed). I was looking at a single point in time which covered much of Western Oregon. I am using plots of land owned by Starker Forests Inc., which can be seen in Figure 1 below. Although the information on forest cover was not provided, using the plots significantly reduced the size of the area of interest (for ease of use in ArcMap), as well as ensured that at the least the areas would be even-aged, monoculture stands of Douglas fir.

Figure 1: Area of Interest with values extracted at points (green).

3) Hypotheses

The null hypothesis I was working with was that there is no significant difference between aspects in average backscatter. This is assumed to be true once the preprocessing steps have been conducted, but my project was to investigate if that is a safe assumption.

4) Approach

The first analysis I used was to test for autocorrelation. This was a frustrating process in ArcMap due to the clunky nature of that program, however once I exported my data from Arc and brought it into R the process became much smoother. The autocorrelation step was an investigative move to test if the data points I would be using were a representative sample.

The next analysis was a comparison of aspect and backscatter values, which involved another grueling battle with Arc which ended with a semi-automated workflow that exported data for later use in R.

The final analysis was to look closer at the (non)relationship between aspect and backscatter by using various statistical tools in R (namely the bartlett test and the Kruskal-Wallis test).

5) Results

I produced a few maps, but the important results came in the form of graphs and statistical values. The most important was the finding that backscatter did not correlate at all with aspect via a Kruskal-Wallis test (figure 5). Combined with a bartlett test on homogeneity of variances, I found that the values of gamma-naught (backscatter) did not vary between the different aspects. These two tests show that there is no evidence that backscatter changes with aspect (all else being equal).

Figure 2: Variogram of the datapoints used in Excercise 1

Figure 3: Scatterplot of the backscatter values according to aspect (clockwise from top-left: NW, N, NE, E, SE, S, SW, W, center: Flat).

Figure 4: Boxplot showing the means and variation of backscatter (y axis) and aspect (x axis).

Figure 5: Results of Kruskal-Wallis test on both Gamma~Aspect and log(Gamma)~Aspect. Both tests resulted in high p-values (p > 0.6), which indicates the values of gamma-naught (backscatter) do not come from different distributions. If they did, that would mean that aspect plays a role in the distribution of backscatter.

6) Significance

The significance of this project is fairly limited to users of Sentinel-1 data who are using the same pre-processing as me. It is mainly a validation of the radiometric and geometric calibration process (an important step nonetheless).

7) What did I learn about programs?

I solidified my disdain for ArcGIS as a tool for statistical analysis on large datasets. I now see Arc as a gateway software that you can pass data through, pull out what you need based on geographic means (or with model builder), and then export the data to use in R or some other program.

I learned how useful R can be in geostatistical analysis, and was exposed to a few new software packages which I will be using in my thesis work.

8) What did I learn about statistics?

I learned a few new statistical tools such as variograms and the Kruskal-Wallis test. I didn’t learn enough to explain them very well to anyone else, but I did learn enough to know what the results from each mean in relation to my data.

9) Response to critique and comments

The responses and comments I received were very helpful. David and Marja both had thoughtful insight on my tutorial 2, and brought up approaches that I did not think to do. Lody and Audrey were also helpful during the first phases (tutorial 1) since they had similar projects to mine and a similar background so we could help guide each other.

The response from Dr. Jones during class was the most helpful though. I would frequently come to a dead end in my processing or my approach, but Dr. Jones always had an idea to change the direction toward an ultimate goal by taking smaller steps. Her comments on Canvas were not as helpful in giving direction.

April 24, 2017

Japanese tsunami marine debris biota

Filed under: 2017,Final Project,My Spatial Problem 2017 @ 12:49 pm


Research problem: Japanese Tsunami Marine Debris species

Six years ago, the devastating Tohoku earthquake and tsunami struck the coast of Japan. Since then, it has become evident that hundreds of coastal species from Japan have crossed the Pacific Ocean on tsunami debris, including species that have become invasive and been known to cause ecosystem/economic damage elsewhere. As of January 2017, scientists have documented the arrival of >650 debris items, referred to as Japanese Tsunami Marine Debris Biofouling Items (JTMD-BF) 1-651. Debris items include docks, buoys, boats, pallets, and wooden structures. These items were identified as JTMD if it 1) had clear identification such as a serial or registration number that was linked to an object lost during the tsunami of 2011; 2) had clear biological evidence of originating primarily from the Tohoku coast of Japan; or 3) a combination of these factors. The BF items are a subset of all the debris lost, as some debris items that were known to be lost from Japan remain undocumented – it is possible they ended up in remote locations.  A huge effort by taxonomists to identify the species on JTMD has generated a comprehensive species list. Currently, there are around 300 taxa that have been collected on JTMD from North American and Hawai`ian coastlines since 2012. I am interested in looking at several spatial aspects of the dataset as research questions. First of all, I want to explore the geographic distributions of the species with invasion history compared to those without invasion history. I also want to look at a few species of interest, and map their geographic distributions to compare (both native and non-native on the same map). Lastly, I want to explore the different methods of transport, or vectors, that have been documented for 31 of the JTMD species, to better understand different dispersal mechanisms and patterns. 

Dataset: JTMD Database

My work is part of an international effort to evaluate the risks associated with JTMD and associated species.  I contributed to the development of a database of life history, distributional, and environmental attributes of many JTMD species. This database is being used for reference and analysis, and can aid in efforts to assess the risk associated with JTMD species, and in determining which species are of highest concern for establishment in certain areas. The database contains a little over 100 JTMD species, with 26 attributes per species. It has information for native geographic regions and non-native (if applicable) geographic regions, as well as vector information, meaning the methods of transport that species have used to disperse to different regions of the world.  The regions are classified as the Marine Ecoregions of the World. It also has survival temperature and salinity regimes, reproductive information, habitat, depth, trophic level, abundance data, and more. The database does not include temporal information, but these species arrived on debris items in pulses for years after the tsunami event, and that is another interesting aspect of the phenomenon.

Hypotheses: While the JTMD species are largely native to the Northwest Pacific (near Japan), they are also native to regions from all over the world. I hypothesize that most species without invasion history are native to the NW Pacific, and those with invasion history are native to regions all over the world. I expect to see this represented when their native regions are mapped. For the species that have been found outside their native range, I expect to see that some are only just outside native ranges in a few areas, but some with a history of invasion are more globally distributed.

For the individual maps of species of interest, I am not sure how they will compare, but I expect some to be more global, and some to be more localized, and only invasive in one realm.

For the vector types, I expect to see a lot of overlap of geographic regions between different types of vectors, because there is a lot of overlap in documentation. What I mean by overlap is that most species that were introduced to a certain region via ballast water were also introduced via aquaculture/fisheries trade, for example, and that can result in similar looking maps for different vector types.

Results: Please refer to these links for some more detail, and clearer maps on the JTMD species native region distributions: Reva Gillman_geog566exercise1 JTMD species of interest individual range maps: Reva Gillman_Exercise2 , and lastly JTMD Vector distribution: Reva Gillman_exercise3_geog566


JTMD species native regions

The following map shows the total number of JTMD species native to each region, with a legend on the side specifying what range each color represents.

Results: The most prevalent region that JTMD species are native to is the Temperate Northern Pacific. This seems obvious, as JTMD originated from Japan, so we expect most species to be native to that same region of the globe. Next most prevalent native region for JTMD species is Eastern Indo-Pacific, the region southeast of Japan. However, after that the native regions that are prevalent for JTMD species begin to span the globe: Temperate Northern Atlantic, and Tropical Eastern Pacific. At the other end is the least prevalent region: the Southern Ocean, only one JTMD species is native to this cold southern region.




Looking at Individual Species geographic ranges

What are the geographic distributions of some of the ‘riskier’ species that arrived on Japanese Tsunami Marine Debris (JTMD)? What do their native and non-native ranges look like, and how do they compare with each other?

First of all, I had to choose 4 species with clear invasion history out of the 104 JTMD species, to focus on. Out of the 31 with invasion history, I chose Asterias amurensis (seastar) and Hemigrapsus sanguineus (Japanese shore crab) since they are known to be an issue in other regions. I then chose two species with large non-native spread: Crassostrea gigas (Japanese oyster), an economically important aquaculture species, and Teredo navalis, the shipworm with an almost global distribution that has burrowed in wooden ships all over the globe for 100’s of years.

The approach I used was within ArcMap. I manipulated the Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species.

Crassostrea gigas geographic distribution



Teredo navalis geographic distribution

JTMD species vector distribution:

The maps below show the number of species that were introduced to each realm via each vector type. As you can see by looking at the first map for natural dispersal, the most prevalent regions are Temperate Northern Atlantic, and Temperate Northern Pacific. For the second map of moveable structures, the most prevalent regions are Temperate Northern Pacific, and Southern Australia/ New Zealand.

For solid ballast, the most prevalent region for this vector is Southern Australia/ New Zealand. For recreation, the most prevalent regions are Temperate Northern Atlantic, along with Southern Australia/ New Zealand.

For the method of transport of ballast water, the most prevalent regions are Temperate Northern Pacific, Temperate Northern Atlantic, along with Southern Australia/ New Zealand, and Southern Africa.  For aquaculture and fisheries trade, the most prevalent regions are Temperate Northern Pacific, Temperate Northern Atlantic, along with Southern Australia/ New Zealand.

Approaches: The approach I used was within ArcMap. I chose to visually represent native region frequency of JTMD species within 12 different coastal geographic regions of the world. For the second part of individual species maps, I manipulated the Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species. For the last part of my project, I visually represented vector distribution frequency of JTMD species within 12 different coastal geographic regions of the world, with a different map for each vector type.

Significance: Human-mediated transport of marine species across the globe through ballast water and hull fouling has been a concern for some time. JTMD is unique in comparison to these marine vectors, in that it can transport large numbers of marine species across ocean basins. Shipping routes are direct and arrive in known locations and at measurable frequencies whereas JTMD, which is propelled by winds and currents and travels at much slower speeds than ships, can arrive almost anywhere at any time. JTMD is potentially the transport vector with the most random distribution yet described. Due to the slow rates of transport by currents rather than propulsion, the effects of drag and dislodgement are reduced on JTMD in comparison with ship hull Furthermore, JTMD transports large numbers of adults, rather than larval stages that are more common in ballast water. As of January 2017, only one JTMD species, the striped beakfish Oplegnathus fasciatus, has been observed free-living in along the west coast of North America (in Oregon and Washington). At this time, we do not know if any of these JTMD species will become established outside of their current distributional range as a result of the earthquake and tsunami. But by studying the JTMD species invasion histories, and geographic distributions of both native and non-native areas, we can better understand this large dispersal event, and inform vector management response for future events.

The maps that were created during this class are useful for my research, and are really nice ways to visualize the native regions of the JTMD species, the distribution of the JTMD species, and the vector distribution of the JTMD species. I can potentially use these maps for reports and thesis work on JTMD species in the future.

My Learning: I started out the class as a beginner in spatial analysis, and with no experience in GIS,  or Arc-Info. Being a beginner, I had to do a lot of self-exploration, tutorial watching, and asking for help from peers and teachers. I am happy to report that I gained a lot of comfort using ArcMap, and can now get a shape file into ArcMap, get data as another layer over the shape file, and join data, manipulate the layer properties, and create maps with legends in ArcMap. I am happy with these skills, and I feel that I learned a lot about beginner ArcMap usage, and feel more comfortable talking about GIS mapping with others now.

Statistical knowledge: I didn’t use statistical analysis with my project, as my data was for very large realms of the marine regions of the ocean, and didn’t lend itself to statistical analysis, so I did exploratory analysis and visualizing as means of analysis. However, by talking to others in the class, and learning about what their methods were, I learned some new methods for spatial analysis, like variograms, that I can potentially use in the future.

Comments: I got very intuitive comments from students on my tutorials. One suggestion was to look at frequency of occurrence for each debris item, and map that, which I wasn’t able to incorporate because I don’t have the necessary data, but it was a good suggestion if we did have it. I also used the comments from Julia on my exercises to guide my exploration for future exercises. For example one of Julia’s suggestions was to map the number of species introduced to each region by a vector, and that is exactly what I ended up doing to visualize the vector distribution of JTMD species.



April 7, 2017

Comparison of near-shore and on-shore predictions of tsunami inundation

Filed under: Final Project,My Spatial Problem 2017 @ 1:06 pm

Research Question

A tsunami is a set of ocean waves caused by any large, abrupt disturbance of the sea surface. As evidenced by the 2004 Indian Ocean and 2011 Japan tsunamis, local tsunamis can destroy coastal communities in a matter of minutes. Tsunamis rank high on the scale of natural disasters. Since 1850 alone, tsunamis have been responsible for the loss of over 420,000 lives and billions of dollars of damage to coastal structures and habitats [1]. Predicting when and where the next tsunami will strike is currently impossible. Predictions for tsunami arrival time and impact based on the source of tsunami, however, can be predicted with modeling and measurement technology. These predictions are vital for coastal communities to make the necessary preparations to mitigate tsunami damages.

My research is involved with developing a methodology for making fast near-shore tsunami predictions based on arbitrarily defined off-shore conditions. The purpose of this methodology is to provide ensemble predictions of near-shore inundation to quantify the uncertainty of tsunami predictions. While novel, this methodology has the limitations of only being able to predict the inundation at near-shore locations and not on-shore locations. Thus the question becomes whether or not we can apply the uncertainty estimations near-shore to on-shore locations.

For this class project, I focused on comparing different near-shore locations to various on-shore locations. I examined what I defined as “major inundation events” and examined if each of these events corresponded to a significantly large wave in the near-shore. Thus my research question is:

“Do variations in the near-shore inundation time series correspond to variation for on-shore inundation?”

or more specifically for this case:

“Do predictions for major flooding events on-shore during tsunami inundation correspond to predictions of large waves near-shore?” 

Data Set

I used near-shore and on-shore tsunami simulation data that predicted sea surface elevation and fluid velocities for a stretch of coast in southern California near Port Hueneme. There were 27 possible data sets each with a different initial tsunami source. For this study I focused on the scenario where a large tsunami was generated near Alaska. This simulation data is provided by the Method of Splitting Tsunami (MOST) model which is the standard model used at the NOAA Center for Tsunami Research (NCTR) [2]. Inundation data for various simulated tsunami events over a duration of 10 hours (600 min) and over a geographical area of bounded by the coordiantes (-119.2469462, 34.1384259) to (-119.1949874, 34.2000000). There are 3600 time steps for each set of data on a 562 by 665 grid. The spacing between the longitude and latitudes are uneven but can be projected onto an even grid. A time series of sea surface elevation and fluid velocities is available for each geographical point (see Figure 1).

Figure 1 – Simulated tsunami inundation data using MOST v4 at Port Hueneme, CA for a simulated Cascadia source. Image is taken from the Performance Based Tsunami Engineering (PBTE) data explorer.

In addition to the tsunami data, I also have the topographical/bathymetric data for the region (which does not change in time) with the same spatial resolution as the gridded data. This data is presented in Figure 2.

Figure 2 – Topological/bathymetric data for Port Hueneme, CA in geographic coordinates (left) and UTM projected coordinates (right).


To strengthen my research, it would be quite nice if every major flooding event corresponded to a large near-shore wave such that I could extrapolate uncertainty predictions in the near-shore to the on-shore. Thus I will phrase my hypothesis like so:

“Each prediction for large on-shore flooding events relates to a near-shore large wave prediction.” 


Many approaches were used to analyze this data. Most of them, unfortunately, failed to produce any meaningful results. At first, I attempted to analyze all of the spatial data – at a given time step – all at once, treating the data essentially as raster layers. For the raster layers I attempted to calculate autocorrelation using Moran’s I and cross-correlations between the inundation data and the topographical data. In both cases, the procedures failed to produce any meaningful results.

After analyzing raster layers failed, I determined that perhaps extracting time series was the way to go. I then selected the points of interest along the coast near Port Hueneme, California. These points were arbitrarily selected but were also ensured to be somewhat well distributed along the coastline and were within the domain of the dataset. Figure 3 shows the locations of the points of interest (POIs) along the coast of Port Hueneme, CA. A total of 12 points were selected and were all meant to represent some building or port location. Their names and geographical coordinates were stored in a .csv file as shown in Figure 4.

Figure 3 – POIs where time series of inundation were extracted along coast near Port Hueneme, CA

Figure 4 – CSV file storing locations and their geographic coordinates

The .csv file was read into R by using the “read.csv” function and this table was subsequently converted to a data frame using the procedure highlighted in “Tutorial 2: Automated Autocorrelation of Tsunami Inundation Time Series at Various Points of Interest.”

After the time series of inundation were extracted for each point of interest, autocorrelations and cross-correlations between the different time series were calculated. Unfortunately these forms of analysis also failed to produce anything meaningful results. Mostly the data showed that there was autocorrelation in the data, but it was not useful for determining similarity or difference in the variations for the different time series. Additionally, differencing between different time series was also applied but was found to not be appropriate for the data.

All of these failed attempts led to less polished approach. For each of the times series of data, the 5th and 95th percentile of each of the time series was calculated and plotted on each of the time series plot as shown in figure 5.

Figure 5 – Time series of inundation at each location with 5th and 95th percentiles of inundation levels plotted as the blue and red dashed lines respectively.

What we do here is define a significantly large wave as any wave in the first 3 locations – near-shore locations – that have positive maximums that exceed the 95th percentile or the red line. For the 9 on-shore locations, flood events were defined as the inundation peaks in the time series. Here we define significant flood events as ones with heights that exceed the 95th percentile or the dashed red line for these. In each case, the timings for each of the big waves or inundation events were tabulated and we define these variations as related if they were within 10 minutes of one another. This 10 minute value was derived for the rough 10 minute periodicity of the waves in the near-shore time series. Additionally, we calculate the distance between each of these points by using a plane approximation since the locations are within 10 km of one another.


Table 1 shows the timings of significant waves for near-shore locations and significant inundation events for on-shore locations. Based on our definitions, it was found that all inundation events on land corresponded to some near-shore large wave. However, the reverse is not true in that not every large wave found in the near-shore corresponded to a large flooding event. Notice that almost every single big flooding event corresponded to large waves at all 3 near-shore locations of Port Hueneme Jetty, Channel islands Harbor Jetty, and Port Hueneme. The exceptions however are the red and light blue flooding events. The red event was observed at the Beachcomber Tavern, Manda’s Deli, and the Naval Construction Batallion Center. However, the red events were only considered significant waves at the Port Hueneme locations and the light blue event was only considered significant at the center of Port Hueneme and not at the jetty.

Table 1 – Timings (in minutes) for significant waves or inundation events. Cells highlighted with the same color indicate related events.

Using the distances between points calculated and shown in table 2, we hypothesize that a certain max distance should be used to match variations from near-shore to on-shore. However, we can see that the Port Hueneme Jetty was actually closer to the Beachcomber Tavern than the center of Port Hueneme. We do, however, see that the Channel Islands Harbor Jetty was much further away and it may be pertinent to not use locations that are more than 1 kilometer to one another for comparison.

Table 2 -Distances between near-shore locations and on-shore locations. Distance values in kilometers.


Tsunami inundation data has been very difficult to measure in the field throughout history due the inherent dangers of having people or instrumentation in the inundation zone. This fact, combined with the relative infrequency of tsunami events, forces managers and disaster planners to rely on prediction data from tsunami models [3]. The problem with this, however, is that these models are deterministic and do not give a good idea of what the uncertainty of these predictions are. This can be problematic when relying on these predictions to perform risk analyses.


The tool being developed in my other research has limitations for only being able to predict near-shore variation in tsunami inundation and not on-shore. Preliminary results from this study suggest that most variations of on-shore inundation can be captured by simply looking at large wave events in the near-shore which adds the usefulness of the aforementioned methodology that is beyond the scope of this class. However, it should be noted that one has to be careful when selecting which near-shore points to use to extrapolate variability for on-shore measurements. This area will require much extra study before actually extrapolating predictions for uncertainty.

Software Learning

Before this class began I had no experience in R or Python. Throughout this class I have had the opportunity to learn a great deal of R and some Python. Overall I think it has been a useful and rewarding experience.

Statistics Learning

I was able to pickup on some statistical techniques regarding auto-correlations, cross-correlations, and wavelet analysis in a general sense despite their lack of meaning for my overall project. The main thing I learned, however, was that working with big data is difficult and that most traditional data analysis techniques appear to fail when encountered with such a large data set. The experience of having to poke around and decompose the data into usable forms was very educational for me, but I still think I have a long way to go before being comfortable with big data analysis.

Response to Commentary

Most of the commentary I received was on the use of the techniques of auto-correlation, cross-correlation, and wavelet analysis. However, in the end I found that none of these techniques were useful for what I was trying to determine so unfortunately I could not apply any of the suggestions or comments mentioned.

There was however, one question in the comments that stated, “Finally, I was wondering if the areas with highest inundation are also the most the vulnerable?” Which is a good question because high inundation would usually highlight a dangerous situation, but vulnerability is not so simply defined. Vulnerability can really be thought of as risk*exposure. And thus, the area would only be vulnerable if there were people or structures of interest in the area with high inundation. Simply analyzing prediction of tsunami inundation are unfortunately unable to answer these questions and determine vulnerability. Assessing both the natural and human systems will be required to perform vulnerability analysis and that is beyond the scope of this study.


[1] NOAA. Web. Retrieved 20 March 2017.

[2] Numerical modeling of tidal wave runup, VV Titov, CE Synolakis (1999). Journal of Waterway, Port, Coastal, and Ocean Engineering 124 (4), 157-171,

[3] Wegscheider, S., Post, J., Zosseder, K., Mück, M., Strunz, G., Riedlinger, T., Muhari, A., and Anwar, H. Z.: Generating tsunami risk knowledge at community level as a base for planning and implementation of risk reduction strategies, Nat. Hazards Earth Syst. Sci., 11, 249-258, doi:10.5194/nhess-11-249-2011, 2011.

April 5, 2017

Final Project

Filed under: 2017,Final Project @ 8:16 pm

This is the blog page for Spring 2017 Final Project posts

© 2019 GEOG 566   Powered by WordPress MU    Hosted by