GEOG 566






         Advanced spatial statistics and GIScience

Archive for 2017

June 12, 2017

Final Project: “Spatial Relationships of Vegetation for Restored and Reference Salt Marshes (Salmon River Estuary, Oregon)

Filed under: Final Project @ 2:32 pm

My Spatial Problem Blog Post

Contents:

  1. A description of the research question that you are exploring.

Salmon River is one of the smallest estuaries on the northern Oregon coast (800 ha), with the largest proportion of tidal marsh (400 ha) for any Oregon estuary. It borders Tillamook and Lincoln counties, and is designated as an Important Bird Area by The National Audubon Society.  Conservation Research at Salmon River Estuary has been a focus of government, non-profit, and educational institutions since the 1970’s due to concern over salmonid habitat and the impacts of sea level rise on the coast. Salmon River consists of public and protected wetlands that have been restored and protected since the U.S. Forest Service removed dikes from three sites in 1978, 1987, and 1996. Tidal flow to the ocean is currently unobstructed on sites that were previously used as pastureland and/or diked. One wetland on the estuary was never diked and is used as a reference marsh for field research, to determine functional equivalency of restored marshes. Salmon River Estuary has been a site for place-based restoration and studies of community recovery over the last 40 years (Flitcroft et al. 2016). Vegetation has been monitored at this site since dike removal, however survey records are still being deciphered from past researchers. If sufficient plot data can be recovered and confirmed, future analyses of vegetation patterns over the last 40 years will be investigated further.

Factors influencing the richness and environmental integrity of Salmon River are associated with the physiognomic and taxonomic features of the plant community. This study focuses on the spatio-temporal distribution patterns of Salmon River vegetation to explore how remnant and restored marshes differ in terms of biodiversity and species composition. I expect that different durations of tidal exclusion through dike establishment will reveal differences between sites in the context of plant species composition and distribution.

  1. A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I have collected species data (ocular estimations of percent coverage) from 1 m2 plots on transects from four tidal marshes: Mitchell Marsh (dike removed 1978), Y Marsh (dike removed in 1987), Salmon Creek Marsh (dike removed 1996), and one remnant marsh adjacent to Y marsh as a control (never diked). I also collected soil samples at each sampled transect plot, and tested them for salinity, conductivity, and bulk density, as well as nitrogen content. Transect plots were square shaped plots, 50 m apart in increasing distance from the tide.  My objective for data analysis was to describe the spatial patterns of the vegetation communities in tidal marshes of Salmon River Estuary after dike removal. I surveyed a total of 74 square meter plots on transects.

Stohlgren plots, also known modified Whittaker plots (MW, 1,000 square meters), were established at each marsh site to collect data on species abundance for comparison with transect data. MW plots were implemented to test for patterns of diversity at multiple scales beyond what transect, square meter plots may capture within the same site. The restored and remnant sites have three MW plots each, for a total of 12 MW plots. The MW, plots were placed at a random distance 50 m along and 20 m offset from the sampled transects at each marsh for a stratified random sample design. Each MW plot is a minimum of 50 meters apart, depending on where they were placed along the transect. At each MW plot, percent cover and presence/absence of species were estimated (with the aid of 1 meter square grids). Samples of all species identified were collected, pressed and are being examined to confirm identification. Elevation data were obtained from LiDAR surveys in 2015, and used to pinpoint elevation for all plots.

The data sets for my project include spreadsheets that describe percent vegetation cover, elevation, and soil characteristics per transect plot. MW plots were not sampled for soil and thus only have percent vegetation cover and elevation. The spatial grain of my study is one square meter, represented by the size of my smallest sampling frame for plots. With the nested sample plots and my stratified random sampling techniques, I have multiple spatial extents for this project. One extent could be considered the length of a transect (which vary by tidal marsh), the area of a MW plot (1,000 m2), or could arguably be extended to the entire Estuary. There are some interesting temporal aspects to my study as well; three of the four tidal marshes have experienced successive dike removal. These marshes have been surveyed for vegetation cover post dike removal and every 5 years subsequently. Incorporating these historical data will add dimension to my spatial and temporal analysis of variation at my study site.

3. Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

Are tidal marshes that were restored earlier more similar to the Reference Marsh in terms of environmental conditions (soil, elevation), and species composition when compared to those restored more recently?

I predict that restored marshes will be significantly different from the Reference Marsh. I also predict that time since dike removal will not be strong indicator of similarities between restored marshes and the reference marsh. I anticipate that sites which have experience recent dike removal have soils with higher soil salinity and conductivity, compared to remnant marsh plots.

Does species richness captured differ between restored and reference salt marshes?

I predict that Reference Marsh has higher richness compared to restored marshes. I  expect that plots from the reference marsh (Transect C, adjacent to Y Marsh) will be more diverse and heterogeneous than tidal marshes that have been diked. I predict sites that have experienced dike removal more or less recently will both have different species composition and be less diverse compared to reference sites. I hypothesize that the reference marsh will be the most diverse, with the highest richness and spatial heterogeneity of species throughout, compared to the other low marshes that have been diked.

Do Species Area relationships differ between restored and reference salt marshes?

I predict that the Reference Marsh has a greater number of species over area compared to restored marshes. I also predict that there will be spatial correlation of plant species at a larger scale, with fine scale patchiness within my site, suggesting that there may be ‘nesting’ of species or smaller pockets of diversity within the marsh, with similarities in species assemblages occurring at a larger scale.

Do field methods, specifically nested-rectangular (Modified Whittaker) plots and non-nested-square (Transect) plots capture species richness differently?

I predict that Modified Whittaker plots will capture more species than Transect plots, since MW plots will be able to address species richness at greater scales.

4. Approaches: describe the kinds of analyses you completed using your data.

I produced Mantel tests, and ISA (Indicator Species Analysis), as well as species area curves and a map of my site to compare and contrast differences in species assemblages by site. I used PC-Ord, Excel, and Arcmap to complete these analyses.

5. Results: What did you produce/find?

I conducted a Mantel test on all of my data to determine the scale at which my plot data were spatially autocorrelated (self-related). I found that none of my plots were spatially autocorrelated (Mantel’s r statistic Transect: r = 0.037797, p = 0.182182; MW plot: r = 0.027994, p = 0.164164, accept null hypothesis of no relationship). This is a little surprising, but may be indicative of noise within my dataset, and variation of species at this scale. It is possible that autocorrelation is not detectable at this scale, or perhaps I need a larger dataset with less proportional noise to sort out the autocorrelation signal. I was however able to detect spatial autocorrelation at the 1,000 square meter scale for the Modified Whittaker plots (r = 0.638224, p = 0.00010), suggesting that there may be more fine scale patichness, variation, or nestedness among plant species at each of the SRE tidal marshes. Salt may also be a confounding factor that drives spatial diversity of vegetation, in addition to dike removal, as salt is a limiting factor for some salt marsh plants; not all species are equally tolerant of it.

For the ISA (Indicator Species Analysis) test I completed to determine which species were associated with which tidal marsh environments, I found that (1) Triglochin maritima, Sarcocornia perennis, and Distichlis spicata were significant indicators of salty, restored conditions, (2) Dechampsia caespitosa, Juncus arcticus var. littoralis, Potentilla pacifica, Glaux maritima, and Hordeum brachyantherum were significant indicators of upland, high elevation conditions, and (3) Carex lyngbyei was a significant indicator of restored conditions. I broke my dataset up and conducted a mantel test for each of the groups, using only plots that recorded the presence of at least one species in each of the groups. I did not find any significant autocorrelation either with any of the strong indicator species (that were all found in a high proportion of plots surveyed). I am curious if my plots were located closer to each other, and/or I had surveyed more plots over a larger area, spatial autocorrelation patterns would begin to emerge.

an example of PC-Ord output for a Mantel test. The R statistic implies the amount of correlation between species cover and distance, the p value implies the significance of the correlation/similarity in comparison to chance (computed by Monte Carlo randomizations).

I also produced a map in Arc, to visualize the patterns of diversity I saw in my data. I coded each of my plot locations (for Transects and Modified Whittaker plots) by the dominant plant strategy type: Carex lyngbyei dominant (A plot that had at least 50% or more cover of Carex lyngbyei ), salt dominant (at least 50% or more cover of one or more salt tolerant species), or mid-high marsh dominant (at least 50%, 0r more cover of a mid-high marsh species that is not Carex lyngbyei). I think the map helps to visualize spatial patterns of diversity well, on a broader scale, by grouping plots into different assemblies or types/guilds based on their content.

Map of my site, indicating species assemblages by plot location; The Reference Marsh is dominated by high elevation species, and the previously diked marshes are all Low Marsh environments, dominated by C. lyngbyei and salt-tolerant species.

I created Species Area curves to examine how species richness increased over an incremental increase in area sampled (from 1 to 10, 100, and 1,000 square meters). A Species-Area curve represents the exponential relationship between species richness and scale of area; as the scale of area sampled increases, you may be more likely to find new species. A steeper slope for a species area curve indicates higher richness (number of species) and a shallow curve indicates lower richness. One of the things that can distinguish a species-area curve from a species accumulation curve, is the relative ‘nestedness’ of the environment being sampled. As I mentioned earlier, ‘nestedness’ is a measure of structure and distribution of species across a location, So an example of nestedness would be a location that may have a few species overall, with subsets of locations with more species, or pockets of diversity and heterogeneity. When nestedness is high, the slope of the species area curve is reduced relative to the species accumulation curve. The opposite occurs when nestedness is low. So in this case, all of the slopes of the species area curves I sampled are lower or less than the slopes of the species accumulation curves, for all sites. This suggests that in terms of the spatial patterns of diversity at Salmon River, there may be nesting or patchiness occurring, within tidal marshes, where there may be fewer species overall, especially at restored sites, with pockets of diversity. And this inference is consistent with what I observed in the field on the ground.

As you can see from this Species-Area graph, the reference marsh has the highest richness of all marshes sampled, though YMarsh is a close second as plot size dimension goes up. Mitchell Marsh and Salmon Creek marsh are both comparably low, suggesting that as scale of area increases, species richness does not increase by much. This is also a slight deviation from the species accumulation curves, which suggested Mitchell Marsh may be more diverse than YMarsh at the 1 meter scale on transect plots. While diversity and richness seem to vary within restored marshes by scale, the Reference Marsh has consistently higher richness, and Salmon Creek has consistently low richness.

This graph, shows the trend for species accumulation average number of species encountered per meter squared plot (for transect) over cumulative plots sampled. These data are the result of a combinatrix from PC-Ord that accounts for every possible combination of x number of plots sampled. You can see from this graph, that the square meter plots on the reference marsh have a significantly higher number of cumulative species, compared to all other reference marshes. Salmon Creek has the lowest, and Mitchell and YMarsh are intermediate.

also compared species accumulation (effort curve) on the square meter plots from the MW plots I sampled and found that species accumulation patterns were overall consistent. The only difference here is that Mitchell Marsh was found to have higher species accumulation over area than YMarsh, which in this case was comparable to Salmon Creek’s low diversity. The difference in YMarsh diversity could be from the placement of MW plots or patch variation, both Salmon Creek and YMarsh have large patches of C. lyngbyei. Ultimately this shows that there is variability in species richness within Restored Marshes, but consistently high richness on the reference marsh.

I conducted a multivariate analysis with my field data using the non-metric multidimensional scaling technique which is ideal for ecological data that is non-normally distributed. NMS avoids linear assumptions and uses ranked distances to linearize relationships within the dataset. This allows the user to see a wider variety of structures within the ordination and make a number of insightful observations and conclusions. Now if any one particular graphic could summarize my entire thesis, this would probably be it, and I will do my best to highlight the most salient features here. First I would like to point out that between each of the tidal marshes, which are represented by these amorphous colored convex hulls, there is very little overlap between all sites, and the reference marsh, on the left here in red, is the most divergent from any other marsh.

The reference marsh is also closely associated with a number of high marsh species that are indicative of native, or ‘reference communities.’ The Reference marsh is high up on the elevation axis, also demonstrating that it has high elevation throughout. YMarsh, the blue convex hull at the bottom. Has the longest, widest convex hull, which means that elevation varies throughout the site which is why you see sample units on the lower end of the elevation axis and towards the middle of the elevation axis. The species that are coded here and directly associated with the YMarsh site are halophytic, or salt tolerant, suggesting that these plants are found at low elevations on areas with high soil salinity and conductivity, which describes the conditions of YMarsh. Mitchell Marsh and Salmon Marsh overlap in this case, likely because they both have lower soil salinity and conductivity values from freshwater influence, and mostly mid to low elevation ranges, with the exception of a few high elevation outliers. Both of these sites seem to be associated with introduced species (reed canary grass, PHAR) or pasture grasses like Agrostis stolonifera.

Predominately, there were many instances of homogenous patches of Carex lyngbyei at both of these sites, and at YMarsh as well. In fact the only point at which all three restored sites converge is at the end of the ‘restoration’ axis over Carex lyngbyei. Carex lynbgyei is also divergent from all other species sampled, because it often occupies monotypic swaths of marsh, and is found on the lower end of the restoration axis, based on my coding schematic; areas like the reference marsh were coded with a low number, and restored marshes were given codes with numerical values increasing with the chronological order of dike removals. Carex lynbgyei sits at the end of the axis associated with the highest ‘restoration axis’ values, as it represents a strong pattern within all marshes that have experienced dike removal at any point in time, thus it is indicative of restored ecosystems. Unfortunately, there were no significant differences found between sites and any of the other soil characteristics we sampled for, but this maybe more related to sampling techniques and would be interesting to revisit further. We only collected soil from a surface depth of 10 cm, so perhaps if our samples were collected at a deeper level, we would see stronger patterns related to pH, Bulk Density, and C:N ratios. So in summary, the reference tidal marsh vegetation is richer, more diverse, and complex (heterogenous), in the number and variety of species (high marsh/low marsh) than restored salt marsh vegetation at Salmon River Estuary, across field methods, ~40 years later Carex lyngbyei has persistently dominated restored areas post dike removal which marks a significant departure from patterns of species assemblage on the reference marsh.

NMS ordination for Transect plots, examining species distribution over tidal marshes, elevation and soil salinity.

I also conducted an NMS analysis with my Modified Whittaker plot, and the patterns I observed in species associations with environmental characteristics on transect methods were consistent here as well. Though I we did not collect soil samples . I observed the same things from MW plots as I did from Transect plots. There is one small exception, where the Mitchel Marsh convex hull overlaps with the YMarsh convex hull, and this has more to do with coincidence of similar species found within MW plots on those sites, YMarsh and Mitchell Marsh both had a large presence of C. lyngbyei. In this case, Mitchell Marsh also had instances of salt tolerant species within the MW plots, suggesting that there is variation within Mitchell not only at different scales but at different extents of sampling. This also suggests or reinforces the notion by suggesting that Mitchell Marsh has saltier soils compared to Salmon Creek Marsh, despite both of them having freshwater influences. Also, despite differences in shape, all of the restored marshes’ convex hulls converge over Carex lyngbyei, which is the species mostly strongly correlated with disturbed and restored conditions.

MW NMS that shows consistent species patterns over elevation and tidal marshes. Soil samples were not collected for MW plots.

6. Significance: How is your spatial problem important to science? to resource managers?

Over 40 years later, the tidal marshes of Salmon River Estuary are still very different, and it’s possible they were different to begin with, based on their unique geographies, that influence salt inundation and soil patterns. Salmon River Estuary salt marshes also appear to have responded to and developed from disturbance differently; each is still following a different restoration pathway 40 years later. Soil salinity, elevation, and inundation patterns (channels) vary by geography among these salt marsh sites in the SRE, and have likely played a role in determining species composition by site. Extensive stands dominated by dense cover of C. lyngbyei represent an alternate stable state for vegetation of SRE salt marshes, and would be an important component to understanding novel community functions, as they relate to restoration and future scenarios. Species assemblages vary both by biogeography (soil, elevation, location) and land use history (pasture use, diking, dike removal).

The spatial problem I have chosen to investigate is of importance to scientists, as it provides further insight into how Pacific Northwest Coastal Estuaries recover from land use and disturbance, a phenomenon that has not been thoroughly studied yet. This work is valuable to land managers and conservationists who are tasked with coastal wetland mitigation in the PNW, as this case study severs as one of the few examples of long term estuary esearch on the Oregon coast. Estuaries have historically served as habitat and resources for keystone salmonid fish species, invertebrates, migratory birds, waterfowl, and mammals (such as beavers), particularly in the Pacific Northwest (Klemas 2013). Restoring these habitats is critical for protecting wildlife, managing wetland resources and eco-services, and maintaining our shorelines, especially as we face sea level rise from impending climate change. Threats to environmental stability in the case of wetlands can also harm their cultural value. Wetlands have inherent eco-beauty and are among the many natural systems associated with outdoor recreation. If wetlands are disturbed via pollution, compaction, or compositional change in vegetation, little is understood about the certainty of recovery in the context of reference conditions.

Factors influencing the richness and environmental integrity of estuaries like Salmon River are associated with the physiognomic and taxonomic features of the plant community. This study focuses on the spatio-temporal distribution patterns of Salmon River vegetation to explore how remnant and restored marshes differ in terms of biodiversity and species composition. Salmon River Estuary is especially noteworthy due to its unique management history. Salmon River is exceptional compared to other Oregon coastal wetlands, as it was federally protected before any industries could establish influence. Arguably, Salmon River has avoided most disturbance from development because of its relatively small size; there have been no instances of dredging or jetty construction for the purposes of navigation. Previous use as pastureland with dike establishment in the 1960’s is the dikes established in the 1960s and removed from 1978 onwards encapsulates the majority of known human influence on the marsh. Beginning in 1978, periodic vegetation surveys on site with long term ecological research at Salmon River has created intimate knowledge of the estuary and promoted ecological sustainability.

There has been a dramatic shift with regards to wetland protection and how our government and the public views them. Over the last few decades, policies promoting wetland conversion and development have been exchanged for protection and regulation initiatives. Wetland management goals today are largely focused on restoration to compensate for loss and damage, which has forged new industries tasked with wetland recovery and monitoring. However, some mitigation project datasets and sites are too small to collect useful data or make a meaningful impact on a large environmental scale. It is necessary to amass a variety of high quality data on larger wetland areas over longer periods of time to address how natural recovery processes may be employed for wetland conservation. Salmon River is an excellent long term case study for examining the prospect of rehabilitation for ecosystem functionality and reference conditions (Frenkel and Moran 1991; Frenkel 1995; Flitcroft et al. 2016).

So to recap, there seems to be a false association between restoration and reference conditions, in the case of Salmon River. Though Salmon River Estuary is an example of successful restoration, it does not mirror pre-disturbance ecosystem structure. Thus it seems challenging to manage for pristine environments, since pre-disturbance conditions are often not well known or pristine for that matter, the impacts of disturbance may persist over long periods of time (like C. lyngbyei) and both intact and disturbed wetlands are changing constantly so its impossible to protect them from undue influence. However, we can continue to define and promote functionality in ecosystems as function may change with structure. As a result, from the work I have done, I would recommend adapting our expectations for Salmon River and for the passive, deliberate restoration of estuaries. I would also recommend to continue to restore for function and monitor structural changes so we can understand and infer novel function in ecosystem context (Gray et al 2002)

7. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?

I have previous experience with Arc that I have developed and expanded upon at Oregon State while pursuing my MS in Geography. I was able to do some work in Arc with mapping my data, but I utilized knowledge that I had acquired previously. Ultimately, I have analyzed most if not all of my data thus far in PC-Ord, under the guidance of Dr. Bruce McCune (the software creator), which enables a variety of ordination and multivariate analysis options. I learned how to conduct Mantel tests, ISA, and other multivariate comparisons within PC-Ord.  I did not learn much additionally about Python or R, except what other students were able to complete from their tutorials.

8.What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), and (d) multivariate methods (e.g., PCA)?

I learned that statistics programs like PC-Ord can be preferable for datasets that have lower or more fine scale spatial resolution; my data were difficult to use in Arc because it’s format was not easily to interpolate. As a result, I learned a lot about Principal Components Analysis techniques to tease out patterns in my data, and look at how species assemblage patterns vary by environmental conditions and site treatments (dike removal). However, I found it helpful to visualize my data in a map, even though I was limited to the point location of my plots, and categorize them based on spatial patterns from my plot data (percent cover of species).  I also learned how to conduct a Mantel test on my data at a variety of spatial scales to look for auto-correlation.

From learning about other student’s tutorials, I learned about geographically weighted regression (GWR), and how one may examine clustering of particular environmental conditions with the GWR tool in Arc. GWR can show that certain environmental characteristics  (like canopy cover, understory vegetation cover, elevation) show positive or negative correlation in different locations. I also learned about hotspot analysis from student tutorials as well and found that it can also be used to infer spatial relationships between environmental variables and location. Hotspot analysis can be useful for looking at density of populations or biodiversity.

9. How did you respond to comments from your peers and mentors/instructors?

I received useful comments from my peers and the instructor (Dr. Julia Jones), about considering how salt inundation on the tidal marshes I study, may have a causal relationship to differences in species assemblages in addition to restoration treatment (diking and dike removal).  It’s important to acknowledge confounding factors within my data, as my study is inductive.Dr. Jones also suggested originally that I investigate spatial autocorrelation with Mantel tests, to see how variable species assemblages are within my data. I was able to incorporate feedback that helped me with the formation of my analysis and interpretation of my results.

Literature Cited

Adamus, P.R., J. Larsen, and R. Scranton. 2005. Wetland Profiles of Oregon’s Coastal

Watersheds and Estuaries. Part 3 of a Hydrogeomorphic Guidebook. Report to

Coos Watershed Association, US Environmental Protection Agency, and Oregon

Depart. of State Lands, Salem.

Borde, AM, Thome, RM, Rumrill S, Miller, LM. (2003). Geospatial Habitat Change

Analysis in Pacific Northwest Coastal Estuaries. Estuaries, 26, 1104-1116.

Flitcroft, RL, Bottom, DL, Haberman, KL, Bierley, KF, Jones, KK, Simenstad, CA, Gray, A,

Ellingson, KS, Baumgartner, E, Cornwell, TJ and Campbell, LA. 2016. Expect the

unexpected: Place-Based Protections can lead to Unforeseen benefits. Aquatic

Conservation: Marine and Freshwater Ecosystems (26): 39-59.

Mather, P.M. 1976. Computational methods of multivariate analysis in physical

geography. J. Wiley & Sons, London. 532 pp.

McCune, B. and J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software,

Gleneden Beach, Oregon, USA (www.pcord.com)

National Soil Survey Center, Natural Resources Conservation Service, U.S. Department of

Agriculture (2009), Soil Survey Field and Laboratory Methods Manual. Soil Survey

Investigations Report No. 51.

Stohlgren, TJ, Falkner, MB, and LD Schell. 1995. A Modified-Whittaker nested vegetation

sampling method. Vegetatio 117: 113-121.

Weilhoefer, C, Nelson, WG, Clinton, P, Beugli, DM. (2013). Environmental

Determinants of emergent macrophyte vegetation in Pacific Northwest estuarine

tidal wetlands. Estuaries and Coasts, 36, 377-389

Nitrate Concentrations in Space and Time and how they relate to Red Alder Cover in different watersheds

Filed under: 2017,Final Project @ 10:38 am

My Spatial Problem Blog Post (Revised)

By: Nicole Feiten

Research Description

I am exploring aquatic chemistry among 10 streams in the Oregon Coast range. I will be running anion analysis over the period of 1 year sampling monthly. These streams are divided into three categories, three are located East of Mary’s Peak in the Rock Creek Watershed, three near the top of Mary’s peak (highest in elevation), and four streams are sampled on the Oregon Coast within 200 meters of the Pacific Ocean. I would like to perform a comparison analysis of anions, especially nitrate, between the site locations and find if there is any significant difference in anion abundance between the site locations and over time (6 months of data).

Research Questions

Is there a relationship between nitrate concentration data and site locations?

Do the differences in nitrate concentrations indicate a pattern between site locations, and do they show a pattern over time?

Is there relationship of nitrate data and Red Alder tree cover?

Is there a relationship between nitrate concentrations and land use practices?
Dataset Description

The dataset I will be analyzing includes temporal anion data (for 6 months, project still in progress) and GPS points for all 10 site locations. Additionally, I used NLCD land use data sets and Red Alder Cover data sets from the Oregon Spatial Database Library. My spatial problem focused on nitrate concentration data collected at the 10 sites and the relationship between sites spatially and temporally.

The temporal scale for this analysis includes one sample per month for a six-month period.

Spatially the sites are located some distance apart, listed in the table below in Kilometers. The distances below reflect Euclidean Distance, stream distances were not used, as not all sites were not connected systematically.

 

 

Distances Rock Griffith Middle Fork Kaiser Yew Crooked Cape Gwynn L.Cumm Cumm
Rock 1.6 1.91 6.44 8.76 11.62 58.07 58.78 59.04 58.7
Griffith 1.12 5.23 7.63 10.65 56.62 57.41 57.67 57.27
Middle F. 6.17 8.61 11.7 57.23 58.09 58.23 57.9
Kaiser 2.47 5.61 51.85 52.54 52.79 52.36
Yew 3.2 49.89 50.47 50.71 50.24
Crooked 47.65 48.32 48.35 48.09
Cape 1.29 1.66 1.47
Gwynn 0.399 0.422
L. Cumm 0.338
Cummins

Table 1 Distance between sites in Kilometers

 

Hypothesis

I hypothesize that nitrate concentrations will be similar in sites that are closer together and more different the further apart they are located. This is a classic scenario of autocorrelation. I also suspect that stream nitrate concentrations will be higher in lower elevation streams than in higher elevations streams. Looking further, upon my Red Alder percent coverage analysis, I suspect sites with a higher percent cover of alder will have higher concentrations of nitrate in the streams.  Lastly, I will analyze land use in each of the watersheds, I suspect that watersheds that have higher amounts of agriculture and other anthropogenic activity will have a higher concentration of nitrate.

Analytical Approaches

Initially I will perform the analysis in Microsoft Excel, I am interested in seeing if sites that are closer together show similar concentrations of nitrate. First I calculated Euclidean distances in ARC GIS using the measure tool. After I obtained those distances I will use the “Correl” function in Excel to correlate known nitrate concentrations at each site (an average for the 6 months) and the distances apart the sites are.

 

Table 2 Correlation of nitrate concentrations and site distances

Figure 1 Correlation Analysis showing clustering of sites that are closer together

 

Next I will perform another analysis in excel I will sort the concentration data between streams within varying locations. This will allow me to find the concentration of the nitrate data between points (stream sites) and analyze their relationships over time.

Figure 4 Concentration of Nitrate over time in the Coastal stream sites

Finally, I will use the Oregon Spatial Database to clip layers of relating to percent alder cover in the watershed and landuse practices to identify a relationship between the nitrate concentrations and alder cover and land use.

 

Table 1 Nitrate Concentrations and Percent Alder Cover

Watershed Area (m2) Alder (m2) Percent Alder Cover Average Nitrate
Cape Creek 4353700 3537.74 0.081% 1.88
Crooked Creek 31760400 3226.39 0.010% 1.35
Cummins Creek 21388800 222.89 0.001% 1.44
Gwynn Creek 2865000 5371.57 0.187% 1.61
Little Cummins Creek 3056600 361.35 0.012% 1.72
Yew Creek 9786600 969.92 0.010% 1.01
Kiser Creek 100500 1017.97 1.013% 0.23
Rock Creek 38375500 2359.26 0.006% 0.19
Middle Fork Creek 3294000 339.23 0.010% 0.17
Griffith Creek 4434800 2021.86 0.046% 0.13

Figure 5 Relationship between percent alder cover and nitrate concentration

 

Results


In addition to the above figures I have produced distance and correlation tables, as well as scatter plots of data to analyze for relationships. Below are examples of maps I produced in ARC by clipping land use cover and watershed size for each stream sites. Alder concentrations were produced by joining the attribute tables for the alder layer and the watershed area shape files. Even though there wasn’t as strong of a correlation of alder cover and nitrate concentrations, further research may show different results if just stream riparian area is used.

I found higher differences (indicating higher nitrate concentrations) in the mainstem rivers than the tributaries. Additionally, I saw higher concentrations of nitrate in the spring samples as compared to the winter samples. I expected higher values for nitrate in the mainstem, such as rock creek due to the volume of water coming from upstream. Naturally the smaller tributaries would exhibit lower concentrations than the larger order stream that catches the two. I also expected nitrate differences and concentrations to decrease in the spring time. If the inputs of nitrogen to the system remain relatively stable, then the uptake length of nitrate (the most sought after in stream nutrient cycling) would shorten and cause concentrations to decrease.

Overall nitrate concentrations over the past year in the ten watersheds are relatively low (~0.1-1ppm), and the results obtained were not particularly surprising.  There was no clear pattern linking red alder cover and nitrate concentrations in these watersheds (Table1). Additionally, there were no surprise land use features in the watersheds, and thus no real link to nitrate concentrations in this analysis. Most of the land use in these watersheds are occupied by mixed and evergreen forests, neither are particularly linked to elevated nitrate concentrations. I would say this analysis was consistent with my expectations, starting with low nitrate concentrations, and suspecting most of the land use would be forest cover I predicted there likely not to be a strong link between land use and nitrate concentration. However, I expected slightly more red alder cover in the coastal watersheds and was surprised how little that cover related to the concentration of nitrate.

 

Figure 6 Coastal Watersheds Landuse

Figure 7 Coast Range Watersheds Landuse

 

Scientific Significance

Baseline data collection for anions in Oregon Coast Range streams over a year has not been documented before. In the face of a changing climate, it is important to document and characterize stream chemistry for Forest managers in the future. Increases in nitrates in these streams can affect local flora, fauna as well as increases nutrient loading to the ocean. This loading can affect macroinvertebrate and salmon habitat which can alter ecosystem dynamics and services that currently exist.

 

What I learned about Software and Statistics

I have basic knowledge of R, but I think it would the most useful tool for statistical analysis of the data. Arc-info is a program I at one time used daily but it has been many years since. I would like to be able to map the anion data between the 10 sites but may struggle in the beginning linking the data with Arc. I have no experience with Python or other relevant spatial analysis software.

 

This class taught me more about spatial autocorrelation, especially because my data were grouped in this way. Since I performed the majority of my analysis in Excel, I didn’t expand my knowledge in ARC GIS as much as I had hoped. However, I did learn that if you don’t need sophisticated software to produce your analysis, it is OK to use what you already know. As I move forward in my research I am looking forward to studying multivariate methods and using the software PC-Ord.

 

 

Comment Feedback

Tutorial 1 Comments

The most useful comments from Tutorial were about having a more structured step by step process in my design. I agree that if this tutorial were to help someone in the future complete the same, or similar analysis, that more detailed instructions would be useful. I updated my tutorial to include step by step instructions in the methodology of my analysis.

 

Tutorial 2 Comments

In tutorial two I was able to clearly relay the knowledge and importance of my study. Like most feedback I was pleased to hear about additional analysis regarding clusters of data points that may point me in a direction to understand more about Alder cover and nitrate fixation in streams. Perhaps a buffer layer around the sites and a clip of the cover to just the buffer would show me a better picture about the linkages between red alder and nitrate concentrations. In the future, I would like to continue to use the alder layers, with a buffer added and potentially perform analysis that involve other concentrations of Anions as well.

 

Comments from Dr. Jones

 

The comments I received from Dr. Jones were overall positive and insightful. Exercise one I received feedback on the potential for nitrate to be transported via dispersal or diffusion which I was unable to justify given the data set that I was working with. Also, I found a negative correlation between Rock creek and it’s neighboring creek. I decided that Rock creek with a higher volume of water and larger watershed area, would have likely produced a different concentration of nitrate compared with the other two.

 

Exercise two comments included thoughts about the equation I chose, which was difficult for me to understand as I have not used equations to describe data in the past. The distances between the sites were measured using a Euclidean calculator in ARC. As well as an explanation of higher concentrations in Rock Creek due to Higher concentration of water coming from upstream, I realize this now is incorrect and what I actually meant is it is contributed to by the larger volume of water in Rock creek and additionally the larger watershed.

 

 

The third exercise brought a question about how alder cover may not be associated with nitrate concentrations. After thought, I realized that maybe it is not the percentage of Alder cover in the entire watershed that may affect stream nitrate concentrations, but potentially just the trees in the riparian buffer zone. In future work, I would like to clip buffer zones around the stream and analyze both alder cover and land use cover in just this area and run a correlation with stream nitrate concentrations.

 

 

 

 

June 11, 2017

My Spatial Problem: Revised

Filed under: Final Project @ 11:48 pm

I am interested in the understanding how mammals have responded to changes in their habitat through time and how the distribution of resources on the landscape structures small mammal communities, historically and today. I predominantly utilize stable isotope analysis of carbon and nitrogen to evaluate niche dynamics of rodents along elevation gradients and through time, approximately the last 8,000 years, in the Great Basin of the intermountain west of north America. In this regard, I want to know, how do rodents in the Great Basin respond to changes in the resource base driven by human land-use and climate change?

Original spatial problem.

When I outlined my original spatial problem, I was interested in replicating a geospatial analysis performed by Elsen and Tingley in their study, “Global mountain topography and the fate of montane species under climate change,” published in the journal Nature – climate change, in 2015. In their analysis, Elsen and Tingley calculated the proportional distribution of horizontal surface area along the slope of mountain ranges, with the objective of evaluating the assumption that available area (habitat) decreases with increasing elevation. This is an important assumption to test as it has direct implications for the way ecologists think about biodiversity in mountains. Specifically, it has been assumed that in general species richness decreases with increasing elevation due to decreasing area, as would be predicted by the species area relationship. I wanted to perform this analysis on three mountain ranges in the Great Basin that are of particular interest for my dissertation research. However, the data sets which Elsen and Tingley uses to perform their analysis were far too large for me to handle on my own, and my knowledge of the requisite skills far too limited to perform the analysis. Consequently, my spatial project began to morph, and followed a circuitous trajectory for the remainder of the term.

Exercise 1: Hot spot analysis of the North American deer mouse in the Toiyabe mountains.

Upon choosing to ditch the area-by-elevation analysis I thought I would dig into a dataset that I had explored previously with other methods. This dataset contained two matrices of data collected in the Toiyabe mountain range, NV in both 1930 and 2011. The first matrix was a species by site matrix which included the trap-effort standardized abundance of ~20 small mammal species, predominantly rodents; the second matrix contained site characteristic data including elevation, latitude, longitude, mean summer high temperature, mean winter low temperature, and mean monthly precipitation. In the past I had explored this data using principal components analysis and non-metric multidimensional scaling. I wanted to know which factors explained changes in elevational distribution of species between these two time-periods. I found that climate factors described most of the variation. Notably, all climate factors were collinear with elevation, either increasing or decreasing in proportion with elevation.

Given the spatially explicit nature of this data set, I thought it would be interesting to perform hotspot analysis on the most abundant species in the data set. In performing hotspot analysis on the distribution of this species I hoped to gain insight into the influence in might have on the spatial structuring of the rodent community in the Toiyabe mountain range, at large. However, what this analysis revealed was that transect data is inadequate for evaluating this type of question. Spatial analysis of this nature requires a distinctly different sampling design, specifically, a grid across some larger portion of the mountain range would have been ideal for the question I was hoping to answer. Instead, I discovered that hotspot analysis on transect data reveals more about the behavior of those doing the trapping, than it does of the mammals being trapped.

Exercise 2: Energy flow in the Great Basin rodent community and a test of space for time substitutions.

Following this I chose once again to shift gears and leverage the elevational information in the Toiyabe data set. Again, I borrowed inspiration from novel research, this time more directly related to my own work. In 2015, Terry and Rowe published “Energy flow and functional compensation in Great Basin small mammals under natural and anthropogenic environmental change,” in PNAS. They demonstrated that daily energy flow (kJ/day) through Great Basin rodent community had responded differentially through the Holocene to the modern to the changing climate. Furthermore, they demonstrated that functional groups within the rodent community responded differentially through time as well, and that many of the patterns that had held through the Holocene changed dramatically at the onset of the Anthropocene. I wanted to know if the dynamics demonstrated by these functional groups through time could be approximated along a spatial gradient that captures a similar change in climate. In this regard, I was testing a space for time hypothesis. I found that many of the patterns observed through the Holocene were well approximated by the spatial substitution. However, the analogy between elevation and time did not hold into the Anthropocene. Although interpretation of my analysis was predominantly qualitative, I feel comfortable making the claim that the space for time substitution of the rodent community in the Toiyabe mountain range for the rodent community of the Great Basin approximates patterns from the end-Pleistocene to the end-Holocene, but is decoupled from the dramatic changes in the Anthropocene.

In response to peer feedback I have set out to determine if the patterns observed in the above analysis would hold in a second mountain range. This analysis is in progress, and I hope to add to it, eventually, a third mountain range. I have in hand the data for the Ruby mountains in the Great Basin, and will soon have access to a similar dataset for the Snake range.

Exercise 3: Creating a map of NDMI for the Toiyabe mountain range in the Great Basin.

Despite the interesting findings of the second part of my analysis, I felt as though my work in this class had become tangential to my own interests and the focus of my dissertation. Although it is unfortunate that it took most of the term to come to this realization it has not been a loss. I have a new target; the Normalized difference moisture index (NDMI) uses landsat-7 data to estimated vegetation density and soil moisture content. I want to use this data to develop a better understanding of the spatial distribution of vegetation in the Toiyabe and Ruby mountain ranges (the focus of my dissertation). I have several questions in mind; 1) how does NDMI vary with elevation and aspect? 2) what is the NDMI value at each of the trap sites in the Toiyabe and Ruby mountain ranges? 3) are there correlations between trap-line species abundances and NDMI upon which to base predictions about range wide distributions of species? And 4) What is the degree of patchiness of NDMI in these two mountain ranges, respectively, and what predictions might we make about how this could limit or promote the movement of species laterally or up and down slope?

The above outlined questions would all be exceptionally useful to address and I hope to do so as I become more comfortable with the tools available in ArcMap and Cran-R for these types of analyses. In the interest of exploring these data I have calculated NDMI for the Toiyabe mountain range using Landsat 7 data for June 2011, a time during which small mammal trapping at this site occurred. While I have not yet been able to address the questions I outlined above, I have generated a map showing the NDMI at a 30m x 30m resolution. The color scale for this image is continuous from blue, to green, to yellow, to orange, to red, and reflects the -1(low moisure vegetation, low density) to 1(high moisture, or high vegetation density) range of values possible from each grid cell. What is most apparent in this map is that peaks in the mountain range, especially those with snow remaining, have particularly high NDMI values, and much of the remaining landscape appears uniform with intermediate to low NDMI values. While the general impression given by this map is not surprising, given the arid environment, the distribution of NDMI values evaluated at different scales could prove interesting.

The most valuable thing I learned through this course was not a software package or a particular statistical method. I learned what type of data is truly necessary to perform spatial analyses, and how to get it. Specifically, I learned that there is a vast wealth of landsat data available for free and that it can be clipped down to manageable sizes for the analyses I am interested it. While I still feel somewhat uncomfortable finding datasets, downloading them, and navigating the endless options of ArcMap, I have also made a new connection with an incredibly knowledgeable graduate student, Jessica Pierson. I hope that with her help and that of a several other peers, more knowledgeable by far than I am in these methods, I can become confident in accessing data and performing the necessary analyses in ArcMap.

Reflection on Julia’s guidance and peer feedback

I have come to this point in my spatial project by trial and error and with the encouragement of Julia Jones. Julia’s advise, as I progressed through this course was to give things a shot, and if they didn’t work that would be fine, we could just try something else. This low pressure, low stakes approach was very helpful. This has been a particularly busy term, especially outside of this class, and outside academia as well. Knowing that this course would be student driven, and what I got out of it would be entirely up to me felt daunting at first, but I have come to accept that the bulk of my learning happened at the tail end, and I’m excited that the product I am walking away with is new questions to ask and new methods to learn. I did not bend my data to just perform new analyses if I did not think they would be informative, nor did I choose to recycle old analyses. Following Julia’s guidance, I gave a few new ideas a shot, and while some of them didn’t work out, I am excited to take what I did learn beyond the context of this class to inform important aspects of my dissertation.

Feedback from my peers on the first tutorial was to suggest that different experimental design (e.g. data collection) would have been more appropriate for hotspot analysis, or to change my approach to one that could leverage the structure of my spatial data. I responded to this by shifting from lat-long coordinates to elevation as my spatial parameter. Peer feedback on my second tutorial was largely limited to acknowledgement of my limited sample size, specifically one mountain range, and that by performing similar analyses in additional mountains, I might be able to perform statistical tests, such as linear regression. As stated above, this is underway, and will be a product outside of the context of this class, as both my graduate major advisor and I have an interest in knowing how energy flow through functional groups will vary in different mountain ranges, and if spatial findings will be concordant with the temporal analysis.

Patterns of socio-demographic and spatial variables from survey responses about attitudes regarding flood-safety in neighborhoods

Filed under: Final Project @ 10:56 pm

1. Research question.

For this problem I want to answer:

How socio-demographic and spatial variables explain patterns of survey responses about attitudes regarding flood-safety in neighborhoods?  

2 Description of the dataset.

The dataset for the study has the following characteristics:

  • Data is obtained from Household voluntary survey
  • Convenience sampling. Households located within the 100 and 500  FEMA’s Flood hazard Map. All  at once.
  • Each participant answers socio-demographic and predefined intentions’ questionnaire
  • A printed coded survey questionnaire was mailed out to residents. The code is used to identify resident addresses.
  • 103 variables have been collected
  • Most of the variables are categorical

      An example of a typical variable to be analyzed:

  • Variable: Suppose your current home was to flood, how confident are you in the following possible conditions? – I will be able to evacuate my home before flooding begins (This variable is expressed in 5 categories of discrete values without hierarchy):
  • Categories:
    • Very confident
    • Confident
    • Neutral
    • Somewhat confident
    • Not confident at all

The spatial data consists of a map identifying land properties within the boundaries of the 100 year and 500 year flood hazard Fema’s map has been developed, as shown in Figure 1. The survey has been mailed out to randomly selected properties withing this map boundaries.

Figure 1. Map for South Corvallis affected properties according to FEMA’s 100 year and 500 year flood categories.

3. Hypotheses:

Attitudes regarding flood-safety in neighborhoods are clustered according to socio-demographic and spatial factors.

4. Approaches:

After testing various geospatial aproaches, as such as  the hotspot, kriging, and others. I ended up clustering my data using Spline Interpolation. For clustering, the different categories values were taking as the “Z” values in order to use the spline interpolation of independent variables.

Later, a Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data in order to find multivariate clustering.

Figure 2. Principal component (http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues).

5. Results:

I found statistical relationships between the categorical variables collected from the survey according to its spatial location that define patterns formation. And also, maps of these relationships within the 100 year and 500 Fema’s Map of Figure 1.   

Bootstrapped confidence intervals for raw and composite correlations was performed using R. The result of this analysys using the Spearman correlation is shown in Figure 3.

Figure 3. Bootstrapped confidence intervals for raw and composite correlations for 108 variables

Using Spline Interpolation, different clusters were found analizing independently each variable. Categories values were taking as the “Z” values in order to use the spline interpolation of independent variables. For instance the “Perception of flooding extent” figure below shows four categories interpolated spatially. Similarly, the other variables where studied.

Statistical analysys of the collected data was performed using R. The figure “Self-confidence in being flood-ready” below shows a comparison of people’s answer. Five variables are analyzed with answer categories from 1 to 5, being 1 very-confident and 5 not-confident-at-all. It is noticed that people are more confident on timely house evacuation and, they have low confidence in flood insurance coverage.  Similarly, the other variables where studied.

A preliminary examination of the collected data indicates that 27 variables are suitable for multivariate clustering. A Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data. The Figure “Eigenvalues of principal factors with 27 variables” shows variances greater than 1. This indicates that multivariate clustering can be performed. A two factors analysis indicates loadings for the clustered variables.

 


6. Significance:

This research will contribute to policy and decision making for neighborhood adaptation to climate change. Adaptation policies can better be addressed by identifying residents attitude patterns. Patterns identification of attitudes regarding flood-safety in neighborhoods is important for planning adaptation to different flooding scenarios in order to minimize personal risks.

7. Learning about geospatial software:

I have improved my knowledge of Arc-Gis, Model Builder and/or GIS programming in Python, all for geospatial processing. In addition,  I have improved my knowledge  in R for statistical analysis.

8. Learning about statistics:

I have added Principal Component Analysis (PCA) and Factor Analysis (FA) to my current statistical knowledge.

9. Answer to comments:

Primarily, I have modified my project title. This change was based on the change of my research question. Also, I have tried different methods inorder to make suitable the clustering of my categorical variables.

References

Getis, A., & Ord, J. K. (2010). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x

IBM Knowledge Center – Estimation Methods for Replacing Missing Values. (n.d.). Retrieved May 8, 2017, from https://www.ibm.com/support/knowledgecenter/en/SSLVMB_20.0.0/com.ibm.spss.statistics.help/replace_missing_values_estimation_methods.htm

Beetle outbreaks and wildfire: overlapping in time and space

Filed under: Final Project @ 10:53 pm

Overview

Often landscape scale disturbances are singular events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration an initial disturbance may influence the process and subsequent spatial configuration of a secondary disturbance, such as beetle outbreak followed by fire.  The cumulative effects of sequential disturbances could influence forest structure and successional trajectories through the next century for the landscapes in central interior British Columbia affect by beetle outbreaks and wildfire.

Figure 1: Post-fire photo of a fire that burned through beetle-killed forest in 2014 with various patches of fire severity.

Research Question

How is the landscape variability in fire disturbance related to the landscape variability in antecedent beetle outbreak disturbance?

Data Set

Landsat imagery was used for the analysis. Landsat data is 30-meter resolution multispectral data.  A single Landsat scene contained the wildfire event of interested (Row/Path 50/23).  Pre- and post-fire images were acquired with minimal cloud coverage and anniversary dates to match phenology.  These images were used to generate a burn severity map through the differenced Normalized Burn Ratio (dNBR) (Key and Benson 2006). The pre-fire NBR image was used to characterize the beetle outbreak on the pre-fire landscape. Pixel indexing formulas:

NBR = (NIR – SWIR2)/(NIR – SWIR2)

dNBR = NBRprefire – NBRpostfire

Hypotheses

My ecological null hypothesis is that burning conditions (i.e. weather during day of burn) exert strong control over patch size.  Fire conducive weather, dry and windy conditions, control the disturbance variability across the landscape by facilitating crown fire in lodgepole pine dominated landscapes.  The ecological alternative hypothesis is that beetle outbreaks change the landscape structure by killing trees, which results in decreased continuity of canopy fuels. The lack of canopy fuels may constrain fire spread to the surface and may influence the disturbance variability of fire across that landscape.

Approaches

Semi-variogram

A semivariogram was used to assess the spatial variability between pixel values in the dNBR image. A semivariogram is a means of evaluating spatial correlation between locations, and the shape of the semivariogram may indicate the level of patchiness and average patch size. The semivariogram analysis was conducted in R version 3.3.2 (Team 2016) with the USDM Package (Naimi 2015).

Cross-variogram

The semivariogram analysis was followed by a cross-variogram to assess the spatial relationship between fire and beetle outbreak patterns. The cross-variogram analysis was conducted with the gstat package in R version 3.3.2 (Pebesma and Graeler 2017).  It should be noted that some additional data extraction steps were needed to conduct the cross variogram.  The Gstat package does not work with raster data; therefore, a csv file was created with the cell value and UTM coordinates for each cell in the raster. The analysis was conducted in R version 3.3.2 (Team 2016).

Patch Metrics (Fragstats – Species Distribution Models)

Patch metrics were calculated with with the SDMtools Package (VanDerWal 2015) in R version 3.3.2 (Team 2016).   The SDMtools package is based on the work by McGarigal et al. (2012) FRAGSTATS software package and calculates many of the same patch metrics. In order to calculate patch metrics, I converted the continuous raster into a classified raster.  The burn severity classification was based on break points in the dNBR values defined by Reilly et al. (2017), and the beetle severity classification was based on cross-validation with orthorectified aerial photographs.  Burn classes included a low (plus unburned), moderate, and high, and beetle class included none, low-moderate, and moderate-high. Patch metrics were calculated for each disturbance severity class.

Geographically Weighted Regression

 Geographically Weighted Regression (GWR) was used as a descriptive analysis to understand the relationship between the variability in beetle outbreak and wildfire on the landscape.  The purpose of GWR is to determine areas of non-stationarity, or inconsistencies in the spatial structure over the landscape. I used the continuous raster layers for beetle outbreak and burn severity.  While there are packages in R to conduct GWR, I ended up conducting the analysis in ArcGIS (ESRI 2011).  I had difficulty trouble shooting the R packages. The raster layers were converted to point layers for the analysis.

 Spatial Overlay Analysis

 Spatial overlay analysis has been used in previous studies examining disturbances that overlap in time and space (Hart et al. 2015).  It is simply overlaying maps for each disturbance and determining where the disturbances overlap. In this case, I am looking for overlap between beetle outbreak and wildfire.  Both disturbance maps were classified as binary either burned or no burn and beetle outbreak or no beetle outbreak. Classification was based cross-referencing location with higher resolution orthorectified aerial photographs or higher resolution satellite imagery. I overlaid the two disturbance maps and multiplied them together to end up with four distinct classes: No Beetle – No Burn, No Beetle – Burn, Beetle – No Burn, and Beetle – Burn. Analysis was conducted in R version 3.3.2 (Team 2016).

Results

Semi-Variogram

The semivariogram analysis indicated a spherical relationship in the dNBR image, which suggests some level of patchiness (Figure 2).  The sill of the semivariogram suggests that patches averaged about 1500m, which seemed a bit high (Figure 2).  The dNBR image contained a couple large bodies of water, which may influence the semivariogram.  I removed the water bodies from the dNBR image, and ran the semivariogram on the water free image (Figure 2).  The water free analysis shows a shift in the sill location and indicates that average patch size might be around 500m (Figure 2).  The semivariogram without water also seems to have a double sill, which might indicate that there are both smaller and larger patches.

 

Figure 2: Semivariogram for the Entiako dNBR image with water included (a) and excluded (b). The red circles indicate the sill, which provide a general indication of patch size. With increasing distance (Lag) there is decreasing variability. The dNBR images (c and d)

Cross-Variogram

The semivariogram analysis indicated a spherical relationship in the wildfire pattern and the beetle outbreak pattern, which suggests some level of patchiness (Figure 3).  The sill of the semivariogram suggests that patches averaged about 500-750m for the wildfire patterns and 250-500m for beetle outbreak patterns (Red circle in figure 3).  The cross-variogram in Figure 4c shows an undulating pattern, which indicates auto-correlation at multiple lag distances. Auto-correlation at multiple lag distances suggests that there are similar patches of fire severity that had similar patches of beetle outbreak severity, which may support this idea of the pattern of beetle outbreak influencing the pattern of wildfire.

The cross-variogram illustrates auto-correlation at multiple lag distances with patches of multiple sizes or patches that are regularly spaced. The low cross-variance at lag 300 (the nugget or y-intercept) might indicate that low/high burn severity is associated with areas of low/high beetle outbreak, which seems like it would correspond well to certain areas of the maps. The cross-variogram shows a second dip at a lag distance of about 2000m, which could describe patterns of low/high burn severity that are 2000m from low/high beetle outbreak.  The landscape patterns shown in the maps could also support this relationship.

Figure 3: Semivariogram for the Entiako wildfire (A) and beetle outbreak (B). The red circles indicate the sill, which provide a general indication of patch size. With increasing distance (Lag) there is increasing variability. The dNBR image (C) and NBR image (D)

Figure 4: Cross-variogram analysis for beetle outbreak and wildfire overlap. Graph a is a semivariogram for the wildfire raster. Graph b is a variogram for the beetle outbreak raster. Graph c is the cross-variogram analysis that overlays the raster used in graph a with the raster used in graph b.

Patch Metrics (Fragstats with Species Distribution Models)

In the SDMtools package, I ran two analyses, one on the whole landscape and one on each disturbance severity class. For the first analysis, I found that high burn severity was the smallest component for this fire event only accounting for about 15 percent of the landscape (see Table 1). This was interesting, because lodgepole pine is often characterized by ‘stand replacing’ high severity fire, however, moderate burn severity can also account for ‘stand replacing’ fire in lodgepole pine dominated landscapes, which was about 45 percent of the landscape (see Table 1). The landscape was dominated by low-moderate and moderate-high beetle severity with only a small portion unaffected by beetle outbreak (see Table 2).

Table 1: Burn severity class summary for patches, area, and proportion of landscape

Burn Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
Low 382 2767.14 0.394
Moderate 448 3179.79 0.452
High 422 1082.70 0.154

 Table 2: Beetle severity class summary for patches, area, and proportion of landscape

Beetle Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
None 985 780.93 0.111
Low-Moderate 656 2968.56 0.423
Moderate-High 709 3273.30 0.466

 The second analysis examined the patch metrics for each disturbance severity class. Table 3 and 4 includes patch area metrics by burn severity and beetle severity respectively. While the output includes many other variables, I wanted to look at the variability in patch size for disturbance type and class. The high burn severity maximum and mean patch size are the most interesting part of Table 3, because they are quite different from the low and moderate burn severity.  Further statistical analysis is needed to determine if they numbers are statistically different.  The summary of patch size metrics for both disturbance types indicate that there are many small patches for all class types.

Table 3: Burn severity classes patch size summary

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
Low 382 0.09 1449.45 7.24 0.18 77.7
Moderate 448 0.09 1461.69 7.10 0.18 75.3
High 422 0.09 445.95 2.57 0.18 22.5

 Table 4: Beetle severity classes patch size summary

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
None 985 0.09 77.58 0.79 0.18 4.04
Low-Moderate 656 0.09 2380.86 4.53 0.18 93.07
Moderate-High 709 0.09 1461.60 4.62 0.18 57.68

The SDM tools packages offers much more then is being demonstrated here. The Patchstat function can provides the following outputs: number of cells for each patch, number of cells in the core area, number of outer perimeter cell edges of the patch, number of internal cell edges of the patch, the area of each patch within the landscape mosaic, the interior core area of a patch, perimeter of patch, perimeter area ratio, shape index, fractal dimension index, and the core area index. The ClassStat Function also offers 28 metrics that are included in the output.  The documentation for the SDMTools package offers documentation on all of these metrics (VanDerWal 2015).

Geographically Weighted Regression

The GWR analysis is visualized (Figure 5) with the coefficients.  Reds and oranges indicate areas of positive association while blues indicate negative associations.  These associations visually make sense when going back and looking at the beetle outbreak map and burn severity map.  High beetle outbreak would have lower raster values, while high fire severity would have higher raster values. While the GWR seems to indicate some interesting patterns, I explored some of the variability with scatter plots, which is discussed below.

Figure 5: Geographically weighted regression showing the coefficients from the output. Positive values indicate a positive association, while negative values indicate a negative association.

Spatial Overlay Analysis

The spatial overlay analysis is shown in Figure 6.  Most of the landscape is covered by red, which is a combination of beetle outbreak and wildfire. There are a few small patches of live trees that were unaffected by the beetle outbreak that burned and are shown in orange. A number of these live patches that burned seemed to have burned at high severity.  There are number of areas that have beetle outbreak but did not experience fire, shown in green.  While areas with no beetle and no fire are very few and appear to have relatively small areas when looking at the map.

Figure 6: Spatial Overlay Analysis using binary disturbance classifications to determine where beetle outbreak and fire overlap in the landscape.

Inspection of GWR with spatial overlay

The final piece of this project was revisiting all the maps and thinking critically of the GWR Analysis.  Scatter plots where generated from six selected spots at the extreme ends of the GWR coefficient range in Figure 8 and 10. NBR and dNBR values were extracted and plotted in Figure 10. The purpose of generating scatter plots was to assess the relationship between these two variables in relation to the GWR coefficients. Maps for beetle outbreak (Figure 7), fire disturbance (Figure 8), GWR (Figure9), and Spatial overlay analysis (Figure 9) are shown below with rectangles outlining were values were extracted from for the scatter plots in Figure 10.

Plot 3 and 8 showed a strong linear relationship (see Figure 10).  The other plots visually did not show strong relationships.  The strong and weak linear relationships are supported by the R-squared values for those areas, where Plot 3 ranged from 0.59-0.83, Plot 8 ranged from 0.29-0.55, and Plot 4 ranged from .11-.25.  In looking at the scatter plots, the values associated with the spatial overlay analysis, beetle outbreak and fire severity have quite a range of variability across the scatter plots.

Fire is a complex process and this project has been primarily examining it through a lens of the pre-fire landscape, which overlooks weather and topography as controls of fire severity. Wildfire is a dynamic process that is influenced by fuels, weather, and topography on both macro- and micro-scales. The focus here has been solely on fuels, which likely to does not capture all the landscape variability generated by the wildfire event. Future analyses will account for topography and fire weather.  Additionally, since the focus is on forested landscapes and the interaction of overlapping disturbances in forested landscapes, future data sets may exclude non-forested areas.

Figure 7: The landscape distribution of Beetle outbreak disturbance for Entiako burn perimeter and areas associated with scatter plots in Figure 10.

 

Figure 8: The landscape distribution of burn severity for the Entiako fire that burned in 2012 and areas associated with scatter plots in Figure 10.

Figure 8: The landscape distribution of coefficients from the geographically weighted regression analysis and areas associated with scatter plots in Figure 10.

 

Figure 9: The landscape distribution of spatial overlay analysis with areas associated with scatter plot in Figure 10.

 

Figure 10: Scatter plots of NBR values for Beetle outbreak on the x-axis and dNBR values for burn severity on the y-axis. Point colors correspond to the spatial overlay analysis.

Significance

Overlapping disturbances of beetle-outbreak and wildfire will influence landscape structure and function for decades in central interior British Columbia, Canada.  It is important to characterize the distribution of disturbance severity across a landscape for both individual disturbances and the cumulative effects of overlapping disturbances. There is still much to explore within the analyses here and further analyses that include additional fires as well as topography and weather as explanatory variables of wildfire processes.

The spatial overlay analysis is visually the most dramatic showing much of the landscape dominated by the cumulative effects of overlapping disturbances, but also additional spots of fire mortality. This is important for future forest structure and successional trajectories.  It will be important for scientists and managers to monitor the landscapes response to the cumulative disturbance effect.  Both disturbances affect forest that offer late seral forest habitat utilized in the winter by caribou. The fire in this project plus an additional fire have affected most of the wintering habitat for this caribou herd, which is of great concern for managers.

Wildfire following beetle outbreak may facilitate or impeded lodgepole pine regeneration by preparing the surface for seedling establishment and releasing seeds or by killing or consuming seeds during the fire process. It will be important for scientist and managers to monitor regeneration on this landscape to understand forest response to overlapping disturbances.  This landscape is managed as wilderness, so it provides insight to natural processes unfolding with minimal human intervention under current climate conditions, which may be useful for predicting how surround areas may respond under similar disturbance interactions.

Software learning

Time was invested in learning many of the spatial analysis I performed in R. I thought that this might be a useful invest of time and learning.  R offers a number of spatial tools similar to ARC GIS.  I think R has a number of benefits over ARC GIS, in particular, the ability to track the workflow that could then be recycled for future analyses.  Additionally, once the code is setup, the process is semi-automated.

I was working with a particularly large fire, tens of thousands of pixels.  I found R to process the data quickly. I used only one fire event for this project, but have two other fires I want to look at, which I can now easily run through the r-scripts I have generated.

During this class, I also learned how to use R-Markdown.  R-Markdown offers an interface for coding and commenting, essentially a scientific notebook. R-Markdown allows for easy documentation of code chunks, and the ability to generate PDFs or HTML files for sharing. This documentation will allow me to more easily go back to the code later and understand what I did or share the code with someone else. It offers the ability to reproduce my work.

I did end up using ARC GIS for visualizing my raster files, data processing such as clipping, and for geographically Weighted Regression.  R does offer a couple of packages for conducting geographically weighted regression, but I was having difficulty trouble shooting the packages in R.

What did you learn about statistics?

I learned ‘oh so many things’. I learned a number of spatial statistical methods throughout this course including semi-variograms, cross-variograms, patch metrics (FRAGSTATs), geographically weighted regression, and spatial overlay analysis. Semivariograms and cross-variograms are useful descriptive analysis to assess variance as a function of scale and generally describe landscape patterns. Patch metrics (FRAGSTATS) are also a useful tool for generating metrics and describing the distribution of disturbance severity across a landscape. I think there are many more patch metrics to explore in the SDM package that may be useful for examining some of the variability in disturbance severity across the landscape, overlapping disturbances, cumulative disturbance effects, and future successional pathways for disturbed forest landscapes. The geographically weighted regression was likely the most challenging analysis to make work and draw conclusions from. I will probably be thinking about this beyond this course. It was useful for seeing where the overlapping disturbances were positively and negatively correlated, but there was still a great deal of variability in the scatter plots. The spatial overlay analysis was useful to see were areas of beetle outbreak and/or no beetle outbreak overlap with areas burned and/or unburned.  The last piece tied in here was scatter plots of the geographically weighted regression base on the beetle outbreak, fire severity and spatial overlay, which showed a great deal of variability.

Response to Comments

I received a number of useful comments on my exercises and tutorials, some of which I have been able to address in this final blog post.

  1. Exercise 2: Julia’s Comment – regarding additional interpretation of the cross-variogram was addressed in the cross-variogram section above.
  2. Exercise 3: Julia’s Comment regarding SDMtools for future FRAGSTATs users – I tried to list some of the additional metrics that are provided in the output. The package was relatively easy to use, and I provided commented code with tutorial  2 that will hopefully be useful to a future user.
  3. Exercise 3: Julia’s Comment – “Useful to know about proportions of h,m.l severity. yes mean patch sizes differ by burn severity but not median patch size – what do the full distributions of patch sizes look like for the various severity classes? Note the high st dev for L and M – indicating that likely there are a couple of very large patches affecting the mean (not the median), hence not really a difference in patch sizes. Why not? What would control patch size?” – I have attempted to generate boxplots, but have not been able to make ones that are visually worth sharing. There seem to be lots of small patches and a few very large outliers. This is worth additional exploration, but I am out of time in terms of doing this for the class.
  4. Tutorial 1 comments: Peer Comments – Create many variograms; NDVI, EVI; look at different band combinations. These comments were all considered.  I had generated an NDVI layer early one, and after reading through some remoting sensing pieces regarding Landsat data and beetle outbreak I opted to use the pre-fire NBR raster.  We talk quite a bit about the fact that the semi-variogram takes a sample of the data.  I opted not to run multiple semivariograms, since the analysis is considered a descriptive tool.  The model run provided a general understanding of the spatial variance on the landscape that seem useful. I am still considering the use of additional bands, but chose to hold off on that for this project/class.
  5. Tutorial 2: Peer Comment – Identifying any connections between layers and considering other factors. I think that the next steps for the project will be to bring in topography and weather layers, and run additional spatial statistical analysis including topography and weather. Due to time this was outside of the purview of this project.

 

 

References

ESRI. 2011. Arc GIS Desktop: Release 10. Environmental Systems Research Institute, Redlands, CA.

Hart, S. J., T. Schoennagel, T. T. Veblen, and T. B. Chapman. 2015. Area burned in the western United States is unaffected by recent mountain pine beetle outbreaks. Proceedings of the National Academy of Sciences of the United States of America 112:4375–80.

Key, C. H., and N. C. Benson. 2006. Landscape assessment (LA): Sampling and analysis methods. USDA Forest Service General Technical Report RMS-GTR-164-CD:1–55.

McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Naimi, B. 2015. Package “ usdm .” R Topics Document.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

VanDerWal, J. 2015. Species Distribution Modelling Tools: Tools for processing data associated with species distribution modeling exercises.

 

Final Project: Patterns in NDVI fields over time and compared to biomass

Filed under: Final Project @ 10:05 pm

1. The research question that you asked.
Spatial patterns in disease spreading in agriculture are important to understand. For example, what causes the infection and how to avoid further spreading. Originally, the idea was to study the Red Blotch disease in vineyards. But due to delays there was no data. With the same goal in mind I looked at NDVI (Normalized Difference Vegetation Index) data for agricultural fields and try to understand the ‘spreading’ of the NDVI to develop methods to analyze spreading of any sort in remote sensed data over time.
In my ‘updated’ spatial problem I looked at;
1) the spatial and temporal autocorrelation of the NDVI parameter, and
2) the correlation between the pattern of NDVI and average biomass in agricultural fields in Manitoba, Canada.
This results in the final research question: Does patchiness in NDVI relate to biomass and does it change over time?
2. A description of the dataset you examined, with spatial and temporal resolution and extent.
I used data from about 30 fields, each around 1 km3. Over the growing season from May until September there is RapidEye derived NDVI at a spatial resolution of 6.5 by 6.5 m. Each field has on average 15,000 pixels. The biomass data is from the SMAPVEX12 study and contains the wet biomass averaged per field.

Study Location

3. Hypotheses: predictions of patterns and processes you looked for.
It is expected that patchiness in NDVI is an indicator for a low biomass, and therefore a low yield. A uniform NDVI implies that the crops are all in the same health. Patchiness could mean that some plants may suffer stress. Over time, it is expected that the patchiness in the NDVI changes. Either the patches become more distinct, or the fields become uniform.
4. Approaches: analysis approaches you used.
The analysis can be split up in two parts. With part one focusing on the quantifying the pattern of NDVI with variograms, and part two analyzing the correlation between variogram parameters and biomass, and how the parameters change over time.

Part 1:
To assess the patchiness of the NDVI I used a variogram. The variogram shows the characteristics of spatial autocorrelation. There are three parameters of interest that can be read from a variogram graph; a) the range, which gives an indication of the lag in the autocorrelation. b) the sill, which represents the variance of the variable at the range lag. And, c) the nugget. This should be zero, but due to errors in the data this can vary slightly. The variogram is standardized by dividing the semivariance by the standard deviation of that field
Part 2:
We analytically compare the variograms with scatterplots between i) range and biomass, ii) sill and biomass, and iii) range and sill with biomass color scheme. For the scatter plots the correlation coefficients are determined with Pearson’s R and a p-value. Next, a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).

5. Results: what did you produce — maps? statistical relationships? other?

Part 1:
In the variogram plot, Figure 1, we can see that there is a high inter field variability in spatial auto correlation for NDVI. It is difficult to tell from the graph if there is a correlation between biomass and variogram type. Also, there is a difference in crop type between the field, which has a considerable influence on the biomass. For further analysis, a distinction between crop types should be made.

Variograms for all fields

Part 2:
Doing the same analysis but now just focusing on one crop, wheat, does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatterplot for biomass no pattern is visible.

Correlation between range and biomass, and sill and biomass for wheat

scatter plot with color coding for average biomass

For the sill-range correlation over time it seems that there is an increase in correlation over time. There seems to be a high correlation coefficient of 0.88 with a 5%-significant p-value of 0.047. One hypothesis is that this could be caused by the development of the shapes of the variogram. At the beginning of the growth season the patterns are ill-defined, which makes the variogram whimsical. And with the subjective reading of the variogram this might be emphasized. During the growth season the patterns become more distinct, and the variograms approach the shape of a theoretical variogram. For the perfect variogram one expects to get a perfect correlation of 1 between the sill and the range.

Range-Sill correlation over time

From the two plots showing the change of range and sill over time per field we cannot draw any conclusions. There is no trend visible.

Development of sill over time

Development of range over time

6. Significance. What did you learn from your results? How are these results important to science? to resource managers?
This ‘spatial problem’ has been useful to investigate the methods to investigate spatial correlation and correlation to other parameters. This approach could be used to understand and predict for example the spread of diseases in agricultural fields.

7. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?
I already had some experience in the software mentioned, so I didn’t improve my skills significantly. The only software I was not comfortable with was R, and I used this to calculate and visualize the variograms. The language is not much different from Matlab and Python.

8. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), and (d) multivariate methods (e.g., PCA)?
This class was an eyeopener for me regarding statistics. I investigated Moran’s I, local Moran’s I, Variograms, and Pearson’s correlation coefficient. I had never used a variogram before, and it gave me insight in the spatial autocorrelation. It was a way to quantify the patterns in the fields. For my future research, I’m interesting in Principal Component Analysis.

9. In addition, your final blog post should include a section in which you describe how you have responded to (a) the comments you received from your fellow students on Tutorial 1 and Tutorial 2 and (b) comments on Canvas from Dr. Jones on your exercises 1, 2, and 3.
During the presentations in groups of three, we discussed the approach and analysis. One thing that came up is the subjectivity of reading the variograms and using that data for correlations. I’ve considered fitting a variogram and automatically extracting the sill and range, but this was too complicated and it still needed my subjective input. Another comment was to use more data, however I calculated the significance for the correlation coefficients, which indicated that there was enough data.
Dr. Jones advised to standardize the variograms to be able to compare them.

Final Project: Assessing Forest Structure Parameters of Tropical Forest Based on Unmanned Aircraft System (UAS) Visual Imagery Data

Filed under: 2017,Final Project @ 7:36 pm
  1. Background:

The emission of Carbon Dioxide (CO2) is one of the greatest factors causing climate change. Forests, as the largest terrestrial carbon reservoir have an important role to reduce CO2 emissions and prevent climate change. Considering the important role of forests to reduce CO2 emissions and prevent climate change, it is necessary to manage forests sustainably. In sustainable forest management, forest manager must frequently monitor and measure forest structure parameters, such as number of trees, tree locations, tree heights, crown width, biomass, and forest carbon. However, these forest structure parameter are usually measured through conventional field measurements, which are costly and labor intensive. New technology, such as Unmanned Aircraft Systems (UAS) can be used as alternative methods to monitor and assess forest structure parameters. There are some programs designed to process UAS visual imagery data and measure forest structure parameters from the data, for example Agisoft PhotoScan, Fusion, ArcMap, and RStudio.

  1. Research question:

Considering the statements on the background, I dicided to assess forest structure parameters (crown width, height, and number of trees) in a particular part of Siberut National Park based on UAS visual imagery data. The research questions are: (1). How the crown width and height vary in space in the study area, and are they autocorrelated? (2). Are Crown Width and Tree Height linearly related in space in the study area?

  1. Dataset:

This project used Unmanned Aircraft System (UAS) visual imagery data that was taken from Sony camera NEX-5R mounted in a fix wing UAS. The data was gathered in Siberut National Park in West Sumatra Province, Indonesia. It was a part of my Advisor Professor Michael Wing’s project. There are 3 flights with 300 images in each flight. The image resolution is 350 x 350 inch. The images were taken in a flight with 80% sideward overlap and 80% forward overlap. I focused on the UAS visual imagery data in the first flight. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAS visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, tree positions, tree height, and crown width. One of my labmate did this data processing. The result of this data processing is a data set (points) consisting of 804 detected trees with their height and crown width (see figure 1). Unfortunately, because the images were not georeferenced yet for this process, the height in this project is the sum of tree height and elevation (ground height).

Figure 1. Detected trees by Fusion software.

  1. Hypotheses:

I hypothesize that crown width and height vary in space in the study area. There are trees that have small crown and some other that have bigger crown because differences in species and ages. In this project, the variation of the height will be affected mostly due to the elevation which is included in the height. Furthermore, I hypothesize that crown width is not autocorrelated, while height is autocorrelated since height is the total of tree height and elevation. I also hypothesize that crown width and height are linearly related. The linear relationship can be positive or negative due to some factors, such as differences in elevation and species.

 

  1. Approaches:

In general, I use modeling approach to answer the research questions. To answer the first research question, I did variogram analysis in RStudio. Furthermore, I did Geographically Weighted Regression (GWR) in ArcMap to answer the second research question. Description of the analysis or tool and the steps are given below:

  • Variogram Analysis

I use variogram analysis to assess how the height and crown width vary in space in the study area, and whether they are autocorrelated or not. The variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993).

The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. Most variograms are defined through several parameters; namely, the nugget effect, sill, and range.

  • Nugget effect represents micro-scale variation or measurement error. It is estimated from the empirical variogram as the value of y(h) for h = 0.
  • Sill representing the variance of the random field.
  • Range is the distance (if any) at which data are no longer autocorrelated.

Figure 2. The example of variogram graph.

In Rstudio, I used the R codes that are stated below to do the variogram analysis. First, I need to install the “gstat” package and load the package using the following R codes:

> install.packages(“gstat”)

> library(gstat)

Next, to load my data into the RStudio, I have to set my working directory. It can be done through the “session” toolbar. Then, I just need to write some R codes to do variogram analysis. The first variogram analysis is for crown width and space (x and y). The second variogram analysis is for height and location (x and y). All the R codes are given below.

Treelist1 <- read.table(“D:/Oregon State University/Term/Spring 2017/GEOG 566/Exercises/Exercise 2/treelist1.csv”, sep=”,”, header=T)

g1 <- gstat(id = “Max.crown.width”, formula = Max.crown.width~1, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g1))

g2 <- gstat(id = “Height”, formula = Height~1, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g2))

 

  • Geographically Weighted Regression (GWR) Analysis

I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the sum of the tree height and the elevation (ground height).

GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.

The first step is load the dataset into the ArcMap. The data set is in shape file (points) consisting of 804 detected trees with their location, height and crown width (see figure 3). As stated before, because the images were not georeferenced yet, the height is the sum of tree height and elevation.

The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 3 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but for this project I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width. Both are continuous numerical varibales.

 

Figure 3. Geographically Weighed Regression Tool.

The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis. To finish the analysis just click “Ok”.

  1. Results:

As given below, the results of the variogram analysis and geographically weighted regression (GWR) analysis are shown in the graphs, tables (statistical relationship), and new layers (shape file). To better understand the results, the description is given below:

  • The Result Description of Variogram Analysis

Based on the variogram graph of crown width (figure 3), in general crown width is not autocorrelated although there are some patches with different nugget, sill, and range. In the first part of the graph, there is a patch with 25 range (distance) which is relatively larger than the other range. I think it can be happened due to different size of crown width in the study area. It seems like there are some trees with big crown and some other with smaller crown that are clustered. Although quite rare, 25 m which indicates the size of the crown width is reasonable because there are some tree species in tropical forest that has even more than 25 m crown width.

Figure 3. Variogram graph of crown width.

Moreover, based on the variogram graph of height (see figure 4), it can be seen that the semivariance is incremental with the increase of range (distance). It means that height is autocorrelated. I think that can happen because of the height data which is the sum of tree height and elevation. Since the elevation is included in the height data, the elevation or the topography of the study area will influence the variance and the variogram analysis. Therefore, it seems like the study area is on a slope. As known, an autocorrelated data is not good for some statistical analysis that requires independent assumption. I can do some effort to ensure the independent assumption of the data or do some adjustment before doing further statistical analysis (e.g. calculating confidence interval). The effort that I can do is “adjusting the standard errors”. This is really important for autocorrelated data because the usual standard error will overestimate the precision when there is positive autocorrelation.

Figure 4. Variogram graph of height.

 

  • The Result Description of Geographically Weighted Regression (GWR)

We can see the whole result by looking at the attribute table which shows the statistical relationship between crown width and height in space in the study area (see figure 5). By default ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 6).

The attribute table consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height have positive linear relationship. Which means there is an increase in height for every one unit increase in Crown width. In other words, the bigger the crown the higher the tree will be.

Figure 5. Attribute table of Geographically Weighed Regression result.

 

Figure 6. Distribution of Residual Standard Deviation in study area.

However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 7). We also can change the number of classes and color. In my case, I chose three classes to distinguish features (trees) that have positive linear relationship, negative linear relationship, and trees that have coefficient (linear relationship) close to zero. The layer after adjustment can be seen in the figure 8.

Figure 7. Layer properties:Symbology.

   

Figure 8. Distribution of coefficient of the explanatory variable.

However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between crown width and height (Ground height + Tree height), and some other trees have negative linear relationship. The distribution of trees that have positive and negative linear relationship can be seen in Figure 8. The red points indicate trees that have negative linear relationships between Crown width and Height, which means trees with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means trees with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients of the explanatory variable are close to 0.

Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high trunk, and some other have big crown and short trunk. In addition, different elevation can give significant effect because the height data used in this project is the total of ground height (elevation) and tree height. Trees with positive linear relationships might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley). Trees with coefficient close to zero might be occurred because of the data that has not been georeferenced and the algorithm used in Geographically Weighted Regression analysis that included data from the neighbor features (trees). That can affect the value of coefficient in linear relationship.

In general, the R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height. I also can add additional explanatory variables like “tree species” to increase the accuracy of the linear model.

 

  1. Significance:

I learned from the results of this project that Unmanned Aircraft System (UAS) visual imagery data can be used as a low cost alternative method for measuring forest structure parameters after being processed through some software and analysis. Forest structure parameters are usually measured by forest managers using conventional methods of field ground measurements. These conventional methods are costly and labor intensive. By using UAS visual imagery data that have been processed in some software, such as Agisoft, Fusion, ArcMap, and RStudio, forest managers will be able to monitor forest structure parameters in a faster and cheaper way.

Forest manager can do point pattern analysis, variogram analysis, geographically weighted regression (GWR) analysis, or other spatial and statistical analysis to assess the structure of a particular forest. For example, I found out from this project that we can assess the linear relationship between forest structure parameters in space in the particular forest. In my case, there is linear relationship between crown width and tree height, so we can predict the growth of the trees from this linear relationship. Therefore, generally speaking, forest manager can make a linear regression model using geographically weighted regression (GWR) tool in ArcMap between two or more variables. Then, forest managers can estimate or predict the forest growth based on the linear model in a faster and cheaper way. That is why this new method will be very useful, especially when it is used in remote areas with limited accessibility, where it is almost impossible to do field ground measurements.

 

  1. Things that I learned: Software

From this class and this project, I learned some new and useful codes and packages in RStudio. I learned how to do some spatial and statistical analysis that I have never done before, for example variogram analysis and geographically weighted regression (GWR) analysis. Even though, I did not include the geographically weighted regression (GWR) that I did in RStudio in this project, I already shared the link that has all of the codes in the Geog 566 blog post. In addition, I realized that besides table format data (.csv) I also can use raster data as an input in RStudio. Although my experience with raster data did not go well since there was not enough memory in RStudio, I did well with my table format data (.csv). I definitely will use RStudio and some of the packages and codes in my future research.

Moreover, In ArcMap, I learned how to do point pattern analysis (spatial autocorrelation) and raster clustering analysis (unsupervised classification) in exercise 1. And I learned geographically weighted regression (GWR) analysis in exercise 3 and this final project. All of those tools will be very useful for me in assessing forest structure parameters of tropical forest. At first I only do regression analysis in RStudio, and the analysis does not consider the spatial information (location of the features) in its equation. Then, I learned from this project that I can do regression analysis in ArcMap which includes the spatial information (location of the features) in the analysis. Although I have not figured out the whole formula used in this analysis, I am satisfied that I was able to do this analysis with my data set.

 

  1. Things that I learned: Statistic

I learned some new statistical analysis in this class and this project. The statistical analysis that I did is Spatial Autocorrelation (Global Moran’s I), Variogram, and Geographically Weighted Regression (GWR). First, I did spatial autocorrelation (Global Moran’s I) in ArcMap. As we know, the Spatial Autocorrelation (Global Moran’s I) tool measures spatial autocorrelation based on both feature locations and feature values simultaneously. Given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random. The tool calculates the Moran’s I Index value and both a z-score and p-value to evaluate the significance of that Index.

The next statistical analysis that I did is Variogram. I did this analysis in RStudio. As mentioned before, the variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993). The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. For some statistical analysis, independent assumption is really important. Therefore, it is necessary to check the independency of the variables by doing statistical analysis like variogram.

The third statistical analysis that I did is geographically weighted regression (GWR). I did this analysis in ArcMap. GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. In this project, I was able to automatically make linear regression model for all 804 trees in my data set. For my linear model, I choose height as the response variable and crown width as the explanatory variable. GWR analysis requires all the variables as continuous numerical variables. One thing that is interesting for me is that GWR uses spatial information (tree location) in its statistical equation. This is quite new for me since all the regression models that I have learned before only use regular data set and do not put into account the spatial information (e.g. tree location). Hence, GWR will be very useful to analyze linear relationship between forest structure variables of tropical forest which have spatial information attached.

 

  1. Responses to Comments

Most of the responses that I got concerned about my dataset which is not georeferenced yet. I am also thinking the same, and I wish I can georefence the data. However, since it will need more data (DEM of the particular study area), time, and processing through some software which I am still learning how to use, I cannot georeference the data at this point. Nevertheless, I am working on it, and I hope I will have the georeferenced data as soon as possible. Once I have the georeferenced data, I will be able to correctly interpret the results from spatial statistical analysis (e.g. variogram and geographically weighted regression (GWR)).

I also got another comment from Dr. Jones to elaborate more in my explanation of the result, particularly explanation about study area that is probably located in a slope based on variogram analysis of height, and explanation about variogram analysis of crown width which has about 25 m in size. In response, I have tried to revise and elaborate more my explanation in this final project. Moreover, I got commend that is asking about the precision of crown width measurements in topical forest based on UAS imagery data. I think this question can be answered if I have ground truth data, which I do not have, so I can compare the UAS and ground measurements. Otherwise, to get an idea about the precision of UAS measurement compared to ground measurements, we can read some papers that assess this topic. I have read some papers, and it appears that UAS measurement is quite accurate compared to ground measurement and LiDAR measurement.

 

References

http://faculty.washington.edu/edford/Variogram.pdf

https://en.wikipedia.org/wiki/Variogram

http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/geographically-weighted-regression.htm

Final Project: evaluating interspecific competition between wolves and cougar in northeast Oregon

Filed under: Final Project @ 5:48 pm

Background & Research Question

The research I am conducting is focused on understanding and quantifying the competitive interactions between two apex predators, wolves and cougar. Populations of large carnivores have been expanding across portions of their historical range in North America, and sympatric wolves (Canis lupus) and cougars (Puma concolor) share habitat, home ranges, and prey resources (Kunkel et al. 1999, Husseman et al. 2003, Ruth 2004), suggesting these coexisting predators may be subject to the effects of interspecific competition (interference or exploitative). Interspecific competition among carnivores can affect the spatial distribution, demography, and population dynamics of the weaker predator (Lawton and Hassell 1981, Tilman 1986), but demonstrating asymmetric competition affects through quantified measures or experiments has remained difficult for highly mobile terrestrial mammals like large carnivores. Generally, it is expected that cougar are the subordinate competitor in wolf-cougar interactions, but the frequency and strength of agonistic interactions can be system specific and will determine the overall influence either predator has on the population dynamics of the community (Kortello et al. 2007, Atwood et al. 2009, Creel et al. 2001). My goal is to assess the influence wolf recolonization in northeast Oregon has had on cougar spatial dynamics. For this course, I plan to focus on two analyses associated with questions about wolf influence on the distribution of sites where cougar acquire food (prey) resources and the degree to which wolves influence cougar spatial dynamics. More specifically, I hope to address:

 

How has wolf recolonization and presence influenced the spatial distribution of cougar predation patterns?

 

Data

The data I used to address my question comes from field data collection in cooperation with the Oregon Department of Fish and Wildlife and available nation-wide dataset GIS coverages. Location points from global positioning system (GPS) collars deployed on a sample of wolves and cougar were used to identify potential predation sites and the study area was a 1, 992km2 game management unit (Mt. Emily WMU) in northeast Oregon (Figure 1).

 

Cougar kill sites – identified through cluster analysis of GPS location data (see Anderson and Lindsey 2003 and Knopff et al. 2010) and verified through field investigation. This process was done before (2009 – 2012) and after (2014 – 2016) wolves recolonized the area and produced a dataset of kill site locations for cougar (npre-wolf = 1,213; npost-wolf = 541).

 

Wolf kill sites – identified through cluster analysis of GPS location data (see Sand et al. 2008, Knopff et al. 2010, and DeCesare et al. 2012) and verified through field investigation. This process produced a dataset of kill site locations for wolves (n = 158).

 

Distance to wolf kill sites and activity– Euclidean distance (m) to nearest wolf kill site; or Euclidean distance (m) to nearest kernel density derived isopleth probability contour edges (25%, 50%, 75%, 95%).

 

Canopy cover – Continuous percent (%) canopy enclosure where file pixel values range from 0 to 100 percent, with each individual value representing the area or proportion of that 30m cell covered by tree canopy. Available from National Land Cover Database (NLCD https://www.mrlc.gov/finddata.php).

 

Slope – average change in vertical elevation per unit horizontal distance (i.e. steepness), calculated as degree from 30-m DEM using spatial analyst. (Digital elevation model (DEM) datasets available from USGS National Elevation Dataset https://nationalmap.gov/elevation.html).

 

Figure 1. Location of the Mt. Emily Wildlife Management Unit in northeast Oregon and global positioning system (GPS) locations for cougars monitored to determine kill site distribution and spatial dynamics pre- (2009-2012) and post-wolf recolonization (2014-2016).

Predictions

Based on competition theory and emerging evidence on wolf-cougar interactions in other systems (Alexander et al. 2006, Kortello et al. 2007, Atwood et al. 2009), I expect cougar and wolves in northeastern Oregon to exhibit resource partitioning in foraging niche (resource partitioning hypothesis). Representative evidence for resource partitioning between carnivores could be demonstrated if cougar kill sites occur on the landscape in areas disparate from wolf kill sites. I would also expect cougar to alter their movement and space use relative to pre-wolf recolonization patterns (active avoidance hypothesis). Further, I expect the presence of wolves to affect cougar movement patterns and spatial distribution, which may be evident through a shift in the distribution and space used by cougar between pre- and post-wolf recolonization periods (niche shift [competitive exclusion] hypothesis).

 

Hypothesis: Niche shift (competitive exclusion) – cougar may demonstrate altered foraging niche (kill sites will begin to occur further from wolf activity centers) and overall distributional shifts between time periods with and without wolf presence.

 

Analysis Approaches & Methods

I used ArcMap to visualize data, evaluate cougar kill site distribution for differences between pre- and post-wolf time periods, and extract site-specific features related to kill sites. As part of my approach to answer my spatial problem, I also made use of and explored other statistical programs outside Arc, like Geospatial Modelling Environment (GME version 7.2.0, Beyer 2012) and program R to summarize data (dplyr, MASS, car packages), model relationships (lme4 package), and explore spatial relationships (adehabitatHR, ks, stam packages). The primary tool and approaches I explored were:

 

Ripley’s K – this tool determines spatial clustering or dispersion of features (or values associated with features) over a range of distances and allowed me to statistically measure the overall patterns in kill site data (i.e. distinguish departure from expected random distributions) for both wolves and cougar. I used the ‘multi-distance spatial cluster analysis’ tool in ArcGIS to generate observed/expected K-values, a K-function graph, and confidence envelopes for values. Larger than expected K-values indicate the distribution is more clustered than expected based on a random distribution, and lower than expected K-values indicate the distribution is more dispersed than expected.

Kernel density – This tool calculates the magnitude per unit area for point or polyline features and allowed me to visualize the frequency and distribution of kill sites (points). I used the kde tool in the Geospatial Modeling Environment (GME) to calculate kernel density estimates (KDE), or utilization distributions, for wolf kill sites and for cougar kill sites in each time period. I created density estimates using the PLUGIN bandwidth and a 30m resolution cell size. Multiple bandwidth estimators are available as well as input options for scaling, weighted fields, data subset and/or edge inflation. The sample of kill sites (and sample of cougar) in the pre-wolf time period was higher (45-48%) than the post-wolf sample of kill sites (and cougar), therefore it was necessary to standardize the two cougar kill time period KDE raster datasets so the densities relative to catch (in this case kill events) per sample (cougar) were comparable. I used the raster calculator in ArcGIS to do this by dividing each period kill site KDE raster by that periods’ respective kills/per cougar ‘effort’.

Isopleth probability surfaces –This tool allowed me to relate the proportion of post-wolf cougar kill sites (points) to both the pre-wolf cougar kill site distribution (density raster) and wolf kill site distribution (density raster) using the probability contours (polygon feature) of each density feature. The resulting polygons represent quantiles of interest (programmable) expressed as the proportion of the density surface volume. I specified isopleths for the 25th, 50th, 75th, 95th, and 99th ; 25th and 50th % isopleths are generally thought of as ‘core’ use areas and represent the highest density of the feature of interest, whereby ‘use’ decreases as you move out to 99% contours. I used GME and the ‘isopleth’ command to create polygon probability surfaces. I also used the ‘addarea’ command to calculate and add information to the isopleth shapefiles for each polygons’ area and perimeter, as well as the ‘countpntsinpolys’ command to add a column with the counts of post-wolf cougar kills in each pre-wolf cougar kill or wolf kill isopleth polygon. I also used ‘select by attribute’ and ‘intersect’ features in ArcGIS to subset out summer and winter kill site datasets to explore density feature overlap and seasonal influences on kill distribution.

Latent selection differences – this tool uses a logistic regression framework for direct comparison of selection between two groups of interest to contrast the difference through quantifiable measurements of relationship strengths (Czetwertynski 2007, Latham et al. 2013). This analysis allowed the interpretation of changes in site-specific habitat characteristics as the relative difference in selection between two binary variables, not the selection or use of a given habitat. This regression was performed: 1) between wolves (coded as 1) and cougar (coded as 0) to evaluate the relative difference in selection at kill sites, and 2) between cougar pre- (coded as 0) and post-wolf (coded as 1) to evaluate the relative selection differences in cougar kill sites as it relates periods with and without wolves. Exponentiated coefficients indicate per-unit increase/decrease of habitat or distance to a given variable such that the effect is calculated as [1-exp(coefx)] * 100, and interpreted as the relative risk of selection of variable x by group 1 compared with group 0 increased/decreased by x%.

 

Results

Ripley’s K

Cougar (pre- and post-wolf) and wolf kill sites showed significant clustering relative to expected random distribution (Figure 2, Table 1). I explored multiple distances and the pattern held for all three kill datasets down to less than 10m (our hand-held GPS units had error ratings from 3-7m when recording kill site coordinates in the field). While this analysis provided irrefutable evidence kill sites for both carnivores exhibited spatial clustering, that is not surprising for animals that exhibit strong territoriality and prey on potentially clumped food resources (herds of ungulates). Thus, the results of this analysis did not help to answer my overarching question.

 

Figure 2. Example of associated L (d) function for pre-wolf cougar kill sites. Red line is observed spatial point processes (SPP), blue line is expected SPP, and gray dashed lines are confidence envelopes calculated from 99 simulated permutation.

 

Table 1. Sample of observed and expected K-values for pre-wolf cougar kill sites generated from Ripley’s K analysis.

 

Kernel density

Visual examination of the cougar pre- and post-wolf kill density rasters (Figure 3) did show what appeared to be a shift in the spatial distribution of where the highest density of cougar kill sites occurred, shifting from the northeast to the south part of the unit. There were several factors, besides the presence of wolves, which could have produced the observed shift in kill density, such as: 1) catch-per-unit effort issues (i.e. larger sample sizes of cougars (and kill sites) in pre-wolf data set), 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). The visual interpretation was useful in guiding further aspects of the analysis that need to be accounted for in my investigation of my spatial problem. The standardized kill density rasters demonstrated that part of the shift observed was related to catch-per-unit effort influences, but a shift between the two highest density areas of kills was still evident from pre- to post-wolf time periods. This suggests other variables, like time effects from seasonal prey distribution changes or wolf presence, could also be factors influencing the distribution of kill density. Standardization of the two kill density rasters improved visual interpretation of the spatial patterns and accounted for one of the other factors (catch-per-unit effort) that might mask distributional shifts related to wolf presence (first two panels of Figure 4). However, this method allowed only an implicit understanding of space-time concepts where I could describe distributional patterns observed through visual interpretation, but the method lacked any inferential measures of significance to quantify the shifts and formally relate the patterns of cougar predation across time.

 

Figure 3. Comparison of of pre- and post-wolf cougar kill site kernel density estimates (KDE) before and after standardization for catch-per-unit effort (kills/cougar). Dark red indicates high densities (frequency) of kill sites progressing to blue, indicating low (or no) density of kill sites.

 

Isopleth probability surfaces

Spatial changes in probability – Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution (Figure 4). The shift was evident from the proportional changes in post-wolf cougar kills within the pre-wolf cougar kill probability surface. For example, only 28.5% of all post-wolf cougar kills were within the 50% pre-wolf cougar kill probability contour (Table 2). Even if I exclude the kills outside the study area boundary the proportion of kills in each probability contour were 5-15% lower than would be expected based on the pre-wolf kill site distribution. However, the mechanism behind the shift could still be related to factors other than wolf presence like seasonal shifts in prey distribution or density.

Figure 4. Comparison of pre- and post-wolf cougar kill site kernel density estimates (KDE) isopleth probability contours showing 25th, 50th, 75th, 95th, and 99th quartiles. The isopleth contours for pre-wolf cougar kill site distribution are fit with post-wolf cougar kill point locations to demonstrate posterior distributional shift.

 

Table 2. Pre-wolf cougar kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of post-wolf cougar kill sites, % is the proportion of all kills within each polygon ‘donut’, and % Kills is the cumulative proportion of all post-wolf cougar kills within each quartile class.

Wolf activity & seasonal influences – Evaluation of season-specific cougar kills relative to wolf kill activity centers added further evidence toward a spatial shift in the distribution of cougar kills between time periods with and without wolves (Figure 5). The magnitude of the repulsive effect appeared stronger in winter than summer, evident in the uniformly decreased proportions of cougar kills per probability contour of wolf activity, and increased proportion of kills outside any wolf activity between time periods with and without wolves (Table 3). The relationship appeared opposite in summer, with higher proportions of cougar kills observed in the post-wolf period than expected in all but core (25%) and outside (no wolf activity) areas (Table 4). Accounting for seasonal shifts in kill density improved visual interpretation of spatial patterns and reduced another factor that might mask influences of wolf activity. However, the observed shift in winter could still be due to shifts in prey populations as a behaviorally mediated prey response to wolves (i.e. the shift is related to prey responding to wolves, and cougar indirectly shifting distribution to follow prey). The shift could also be related to differences in prey species-specific distributions, as cougar prey predominantly on deer, but show seasonal selection of neonate elk calves in the summer, and wolves prey predominately on elk (i.e. the shift is related to differences in prey species use of habitat). Unfortunately, there isn’t data on prey species at a resolution that will help further tease apart these interactions relative to shifts in prey distributions or density.

 

Figure 5. Visual comparison of seasonal wolf activity based on kill site density probability contours and the distribution of cougar kill sites across time periods with and without wolves.

 

Table 3. Winter wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.

 

Table 4. Summer wolf kill site probability contour attribute table. Isopleths show the 25th, 50th, 75th, 95th, and 99th quartiles for pre-wolf cougar kill density and represent the ‘prior’ distribution of cougar kills. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.

Species kill distribution overlap – Wolves consistently allowed for more of their core and total kill distributions to overlap with cougar than cougar allowed their kill range to overlap with wolves (Table 5). This was not surprising, but yielded a helpful visual and metric to compare distribution overlap evident through visual examination (Figure 6).

 

Figure 6. Seasonal overlap in core (50%) and 95% kill KDE probability contour between wolves (red) and cougar (blue). Dark blue and red lines represent the 50% (core) probability contours while light blue and red lines represent the 95% (total) probability contours for kill distributions.

 

Table 5. Species-specific proportional overlap between wolf and cougar core (50%) and 95% kill KDE probability contours. Calculated as overlap area (km2) / cougar or wolf 50% or 95% area (km2) * 100 and related to each species core and total kill area usage.

 

Latent Selection Differences

Cougar pre-post wolf – Relative habitat use at cougar kill sites after wolf recolonization showed little difference from pre-wolf cougar habitat use in the variables evaluated (Table 6). Cougar selected for denser canopy cover (increased by 1.6%) and steeper slopes (increased 2%) at kill after wolves recolonized the study area, but showed little difference in selection for higher elevations or further distances to wolf kill sites (< 1% increase). Conditional density plots for the variables (Figure 7) further demonstrate the ambiguity of the results relative to selection based on the explanatory variable. They also show how the conditional distribution of the dependent variable (pre/post-wolf cougar kill sites) are better determined in the ‘pre-wolf’ time period based on the explanatory variable due to the higher sample sizes of kill sites in that time period.

 

Wolf-cougar – Relative habitat use at wolf kill sites showed some small differences compared to cougar habitat use at kill sites in the variables evaluated (Table 7). Wolves selected for less dense canopy cover (decreased by 3.6%) and steeper slopes (increased 4.7%) at kill after compared to cougar, but showed little difference in selection of elevations or distances to wolf kill sites (< 1% increase in elevation and decrease in distance to wolf kill sites). Conditional density plots for the variables (Figure 8) further demonstrate the ambiguity of the results relative to selection based on most of the explanatory variable. The plot for slope did show a distinct transition around 55° between cougar and wolf kill sites. However, similar to the pre- and post-wolf cougar analysis, in general the plots show how the conditional distribution of the dependent variable (wolf or cougar kill sites) are likely better determined for cougar based on the explanatory variable due to the higher sample sizes of cougar kill sites.

 

Table 6. Latent selection differences between cougar kill sites before and after wolf recolonization. Logistic regression coefficient (β) estimates, exponentiated coefficients and associated 95% confidence intervals, and relative selection.

 

Figure 7. Conditional density plots for cougar pre- and post-wolf kill sites.

 

Table 7. Latent selection differences between wolf and cougar kill sites. Logistic regression coefficient (β) estimates, exponentiated coefficients and associated 95% confidence intervals, and relative selection.

 

Figure 8. Conditional density plots for wolf and cougar (post-wolf) kill sites.

Significance

The results from the analyses performed for this class begin to improve understanding the spatial aspects of the relationships between these two carnivores. Predation is recognized as a major factor influencing ungulate population dynamics. Predation effects on prey populations are tied to the complexities of intraguild dynamics, as the predation risk for shared prey can vary relative to the nature of predator-predator interactions as well as based on the behavioral responses of prey to predators (Atwood et al. 2009). In addition to addressing key ecological questions regarding predator-predator interactions, results from this research will provide information on the effect of wolves on cougar populations, and potential effects of this expanded predator system on elk and mule deer populations. Knowledge gained from this study will be critical to effective management and conservation of cougar in Oregon and could be useful to other parts of western North America facing similar changes in community dynamics as wolves continue to expand their range.

 

Learning Outcomes

Software – I gained increased comfort and familiarity with several programs I have used in the past (GME, R, ArcGIS), and began exploring various other packages in R (stam, ks, adhabitatHR), other programs (QGIS) and programming languages (Python), as well as new tools in ArcGIS to compare and contrast results and expand some ‘next steps’ in my analyses of wolf-cougar interactions.

 

Spatial statistics – Through exploration of the spatial analyst tool box in ArcGIS I learned a great deal about spatial processes and analysis tools as they relate to different data sources and overarching questions. I found that my event (presence only) data did not lend itself to many of the tools well, which forced me to think about my data structure, question(s) of interest, and how I could (or could not) use certain tools to analyze data and answer my spatial questions about wolf-cougar interactions. It was also extremely valuable to have classmates exploring both similar and dissimilar analyses that I could learn from and apply to my own data explorations.

 

Peer/Instructor Comments – Both my peers in class and Julia were helpful in guiding the progression of these analyses. Some, like investigating prey mediated spatial responses, I cannot address with my data. Feedback about investigating kernel range overlap sparked the species kill overlap analysis above and a next step will be to conduct a similar analysis comparing kill distribution overlap for cougar between time periods with and without wolves.

 

References

Anderson, C. and F. Lindzey. 2003. Estimating cougar predation rates from GPS location clusters. Journal of Wildlife Management 67:307-316.

Atwood et al. 2009. Spatial partitioning of predation risk in a multiple predator-multiple prey system. Journal of Wildlife Management 73:876-884.

Beyer, H. 2012. Geospatial Modelling Environment (software version 0.7.2.0). http://spatialecology.com/gme

Creel et al. 2001. Interspecific competition and the population biology of extinction-prone carnivores. Pages 35-60 in J.J. Gittleman, S.M. Funk, D. Macdonald, and R.K. Wayne (eds). Carnivore Conservation. Cambridge University Press, Cambridge, UK.

Czetwertynski, S.M. 2007. Effects of hunting on the demographics, movement, and habitat selection of American black bears (Ursus americanus. Ph.D. thesis, Department of Renewable Resources, University of Alberta, Edmonton, Canada.

Husseman et al. 2003. Assessing differential prey selection patterns between two sympatric carnivores. Oikos 101:591-601.

Knopff et al. 2010. Cougar kill rate and prey composition in a multi-prey system. Journal of Wildlife Management 74:1435-1447.

Kortello et al. 2007. Interactions between cougar (Puma concolor) and gray wolves (Canis lupus) in Banff National Park, Alberta. Ecoscience 14:214-222.

Kunkel, K.E. and D. H. Pletscher. 1999. Species-specific population dynamics of cervids in a mulitpredator ecosystem. Journal of Wildlife Management 63:1082-1093.

Latham et al. 2013. Spatial relationships of sympatric wolves (Canis lupus) and coyotes (C. latrans) with woodland caribou (Rangifer tarandus caribou) during calving season in a human-modified boreal landscape. Wildlife Research 40:250-260.

Lawton, J.H. and M.P. Hassell. 1981. Asymmetrical competition in insects. Nature 289:793-795.

Ruth, T.K. 2004. Patterns of resource use among cougars and wolves in northwestern Montana and southeastern British Columbia. Ph.D. dissertation, University of Idaho, Moscow, ID, USA.

Tilman, D. 1986. Resources, competition and the dynamics of plant communities. Pages 51-74 in M.J. Crawley ed. Plant Ecology. Oxford: Blackwell Scientific Publications.

 

Analyzing Hydrologic Patterns in a Single Basin in the Oregon Coast Range

Filed under: Final Project @ 5:13 pm

Introduction

My master’s thesis involves quantifying geomorphic change in streams after the addition of a large wood jam for salmon habitat restoration purposes. Large wood has been shown to have many positive effects on salmon habitat, including creating pools and backwater areas that are essential for juvenile salmon survival. This past summer, I surveyed seven stream reaches within a single basin in the Oregon Coast range to capture the in-stream and floodplain topography before the addition of large wood restoration structures. Three other sites were previously established in 2015 for a total of ten study sites. I plan to survey these sites again next summer to capture changes in topography caused by the interaction of wood, water, and sediment.

A major part of my fieldwork is monitoring water stage height at different locations in the drainage basin to capture the hydrologic patterns that are driving the geomorphic change at each LW site, including the frequency and duration of inundation of these log jams. I formulated three questions to help guide my analysis of this data:

Questions

  1. How well correlated are stage gage data at different locations within a single basin, based on the amount of distance between them?
  2. How does upstream-downstream connectivity, drainage area, and position within the catchment affect the relationship between different site hydrographs?
  3. How can I model the physical water surface elevation at different cross sections at my log jam sites?

Data

To conduct my analysis, I used water stage data gathered using Level Loggers at 6 different engineered log jam sites in the Mill Creek watershed. Three Level Loggers were installed in Fall 2015 and three more were installed in Fall 2016. Stage height was recorded at 15-minute intervals at each site. These stage gage instruments are pressure transducers. Therefore, I used a seventh gage that recorded atmospheric pressure to correct the stage height in MATLAB after downloading the data in the field.

I also have discharge rating curves for four of these stage gage sites, obtained through repeat discharge measurements taken at four of the stage gage sites. These can be used to calculate the discharge for any given stage height and obtain values that I can incorporate into my model of the physical water surface elevations.

The approximate location of each instrument was recorded using a handheld GPS unit. The streamwise distance between sites varied from 0.63 to 7.51 kilometers (Table 1) calculated using attribute table statistics in ArcGIS. The upstream drainage area of each stage gage site was from 4.02 to 21.76 square kilometers (Table 2), calculated using the NetMap Toolbox in ArcGIS.

Three water surface elevation (WSE) rebar were also installed at each log jam site. The top of each WSE rebar was surveyed in with a Total Station during my summer field season. During three subsequent visits to the field, I noted the distance between the top of the rebar and the water surface, marked off in 2-cm increments. This data could be later correlated with the stage gage data in order to answer my third question.

Hypotheses

Question 1

I expected to see a negative relationship between distance between instruments and correlation coefficient.

Question 2

I expected sites with more similar drainage area sizes that are closer together geographically will have more similar hydrograph shapes. I expect that streams that are connected will also be more similar.

Question 3

I wanted to be able to use the step-backwater equation or HEC-RAS to model the physical water surface elevations of my sites, based on the physical observations of the WSE rebar and my stage height and discharge data.

Approaches

I worked mostly in Excel and ArcGIS, using ArcGIS to derive basic spatial relationships between my stage gage sites and Excel to calculate statistics from these data.

Question 1

For Question 1, I used to Excel to import and organize all of my stage gage data. I identified the time period where I had data for all six gages, then created a table to set up a pairwise correlation coefficient analysis for all combinations of the six gages for that range. Finally, I calculated the correlation coefficient for each pair of stage gage arrays in the table. To measure the streamwise distance between each pair of stage gages, I used a previously created map in ArcGIS with GPS coordinates for the six stage gages and the stream polyline on which the gages were located, selecting all portions of the stream polyline between each pair of stage gage sites and using the Statistics tool in the stream polyline attribute table to calculate the sum of the selected lengths, which I input into another pairwise Excel table. Finally I used Excel to create an X-column with the distances between the paired gages and a Y-column with the correlation coefficients between the paired gages. Then I plotted X vs. Y as a scatter plot and examined the relationship.

Question 2

For Question 2, I used attribute tables in ArcGIS to determine the upstream drainage area of the four sites I was interested in. I chose to look at hydrograph relationships at four of my sites that are all have upstream-downstream connectivity but have different upstream drainage areas and are located on different tributaries. Then I used Excel to further manipulate the stage gage data that I had started to analyze for Question 1. I graphed the difference in stage heights over time between site with the largest drainage area, and each of the three other sites to examine patterns in the hydrographs, creating three “stage difference” graphs. I also graphed the raw hydrograph data for comparison.

Question 3

For Question 3, I scaled down my analysis and created a DEM of a single log jam site in ArcGIS. Then I interpolated an elevation profile perpendicular to each of the three water surface elevation rebar which I imported into Excel. Then I used an Excel spreadsheet that utilizes the step-backwater method to calculate and graph the water surface elevations at the three rebar locations.

Results

Question 1

Correlation coefficients ranged from 0.86 (Site 9 vs Mainstem) to 0.97 (Site 26 vs South Fork). Distances between sites ranged from 625 m (Site 36 vs Mainstem) to 7512 m (Site 9 vs South Fork). Five out of the six site pairs with the lowest correlation coefficients included the Mainstem site as part of the pair.

The plot of distance vs. correlation coefficient yielded a slight negative relationship with an R2 of 0.0117 which indicates low correlation (Figure 2).

 

Question 2

All sites of interest have upstream-downstream connectivity with Site 26 (Figure 3). Upstream drainage areas ranged from 4.02 to 21.76 km2 (Table 4) while pairwise distances ranged from 0.63 to 3.76 km (Table 3).

Mainstem (MS) and South Fork (SF) are on the same tributary have different upstream drainage areas and are different distances from Site 36 (Table 3, Table 4, Figure 3). Both graphs showed little difference with Site 26 during baseflow conditions. The 26 v MS graph was relatively flat but it still had some high peaks during the two largest floods of the season which indicates other sources of water that are influencing the hydrograph (Figure 4). The 26 v SF graph had high positive peaks during floods (Figure 5).

Mainstem and Site 36 are a similar distance from Site 26 but have different drainage areas and are on different tributaries (Table 3, Table 4, Figure 3). They are on also on different tributaries (Figure 3). The 26 v. 36 graph looked very similar to the 26 vs. SF graph (Figure 5).

Site 36 and South Fork have similar drainage areas but are different distances from Site 26 and also are located on different tributaries (Table 3, Table 4, Figure 3). The two graphs were very similar in shape despite differences in geographic location, though the Site 36 hydrograph was more similar to Site 26 during baseflow conditions (Figure 5).

Question 3

Two of the WSE rebar (WSE 2 & WSE 3) are located very close together, at the upstream end of the site. The third rebar, WSE 1, is located at the downstream end of the site (Figure 6). Cross section numbers correspond with WSE rebar numbers.

The November and February WSE rebar observation dates resulted in similar, higher water surface elevations with lower water surface elevations in December (Table 5).

Stage height ranged from 0.364 to 0.692 m for the three WSE rebar elevation dates, with the lowest stage height in December and the highest in November. Discharge was correlated with stage height, with the lowest flow in December, 0.85 m3/s, and the highest flow in November, 2.16 m3/s (Table 6).

The plots generated from the step-backwater spreadsheet in ArcGIS looked relatively similar for Cross Sections 1 and 3, with a lower water surface elevation in December and two very similar, higher water surface elevations in November and February (Figure 7, Figure 8). This corresponds with the patterns I observed in the WSE rebar data and the stage and discharge data (Table 5, Table 6).

Graphing the WSEs at Cross Section 2 yielded different results, with a much larger separation between the November and February WSEs. This might be due to the different topography at Cross Section 2 which does not constrict the water in the same way as it does at Cross Sections 1 & 3. Figure 5 shows the simulated water surface elevation if the water flowed all the way across the cross section into a secondary channel, though in reality it probably only accumulates in the main channel (on the right) because of the obstruction in between the two channels.

Several of the calculated water surface elevation values have large discrepancies (~0.2-0.3 m) which indicate issues with the data (Table 3). I repeated the entire analysis and also tried changing the Manning’s roughness values in the spreadsheet which can have an effect on flow levels, but neither of these methods had any significant impact on the outcome.

Graphing the observed and calculated water surface elevations in comparison with the bed elevation also illustrates inconsistencies with the data. All three observed WSE lines follow the same general shape, increasing gradually and then dropping off. The calculated WSE line for December follows the same pattern, but the lines for November and February have the opposite shape, with an almost flat water surface for most of the reach and a sudden jump up at the last cross section (Figure 6).

Significance

The overall purpose of my research is to explore the response of site-specific features that vary with scale to engineered large wood jams within an individual basin. This kind of research can help refine stream restoration efforts and optimize resource allocation. For this class, I focused on understanding the hydrologic patterns on my watershed. The patterns of discharge, floodplain and large wood inundation throughout the basin can help clarify the picture of what is truly driving the geomorphic changes that we measure in the watershed.

Question 1

When addressing Question 1, I learned that regardless of location in the basin, the water levels are responding relatively uniformly, with only a slight delay between sites that are spaced further apart. Instruments are located on different tributaries so the relationships are slightly more complicated than I expected but I think I will still be able to apply what I have learned from this analysis to extrapolate to other parts of the basin where I do not have stage data.

Question 2

For Question 2, I learned that sites with similar drainage area sizes and geographic locations behave more similarly, and that drainage area has a greater influence on the similarity of hydrographs than distance does. I was also able to identify a portion of the hydrograph that is still missing and attribute it to the water flowing through North Fork Mill Creek, which is not instrumented. It would be interesting to install a LevelLogger on this tributary in the future to see if it completes the picture

Question 3

The results of Question 3 were not very conclusive. My analysis showed that I do not have enough data points to understand what is really going on. Two of the cross sections (2 & 3) are not spaced very far apart which could also have contributed to the issues with the data. Despite these issues, this analysis gave me insight into what I need to do in the future to ensure the robustness of my dataset, namely more, spaced-out water surface elevation rebar at each site to create a more complete water surface elevation profile.

All of my findings reinforce relatively well known hydrologic concepts. I will be able to provide more interpretation of these patterns after I have gathered my second round of geomorphic data. Then I hope to be able to correlate these hydrologic patterns with the changes we see in local channel morphology around each log jam site. This kind of information will help restoration planners determine where in the basin is the best place to put a log jam.

Learning Outcomes

At the beginning of the term, I assessed my proficiency with ArcGIS, Python, and R as follows:

ArcGIS: Intermediate level (I spent last term working with it. I can perform most basic manipulations and figure out what I don’t know how to do from the help menus and the internet.)

Python: Beginner level (I have used it in two classes to perform basic calculations and plot data/results.)

R: Very limited experience (I used it a few times in my undergraduate Statistics class but that was 7 years ago so I will probably be very rusty.)

I was excited to expand my skills in Python and R but the class did not end up providing me with a lot of opportunities to do so. I was encouraged to use software that I knew and therefore ended up conducting my analyses entirely in ArcGIS and Excel. I did not have the type of spatial data that was suited to large-scale spatial statistics anyways, so my only exposure to methods such as hotspot analysis, spatial autocorrelation, regression, and multivariate methods was through the tutorial presentations, where I learned about how other people had used these approaches. It would have been more educational to apply these concepts to my own data but unfortunately that was not really an option.

Feedback

Tutorial 1

In Tutorial 1, one of my reviewers suggested that my method for calculating distance between points was not very efficient. It is possible that I could have used Python to automate the process, but the process of teaching myself how to use Python in order to perform this relatively simple if repetitive task would probably have taken more time than just doing it the way I did it. If I had to perform this manipulation thousands of times instead of less than twenty times, it may have been worth it to explore other options. The same is true for my methods for calculating correlation coefficients.

Tutorial 2

In Tutorial 2, I got a lot of questions about other factors besides drainage area and geographic location, such as elevation, that could also be influencing the hydrograph patterns. One reviewer also suggested that I compare sites with similar drainage areas more closely to see if drainage size really is the biggest determining factor in flow variation under different flood and drought scenarios. These are all really great suggestions that I look forward to exploring in the future.

Exercise 1

I did not get any suggestions for how to improve Exercise 1.

Exercise 2

Based on the comments I got on Exercise 2, I converted my units of drainage area from m2 to km2 which are more reasonable and understandable. I also clarified the naming conventions for my sites. I combined what had initially been three separate graphs into one graph to more efficiently illustrate hydrograph patterns. Finally, I added a discussion of the limitations of my approach.

Exercise 3

Apparently my WSE rebar labeling was not very clear so I improved that for my final write-up.

 

Final Project: Understory Vegetation Response in Ponderosa Pine Forests

Filed under: 2017,Final Project @ 1:41 pm

Research Question

This quarter, I explored the following question: how does understory vegetation percent cover respond to ponderosa pine canopy cover and other environmental variables across Deschutes County? This question is a substitute for my thesis research, which takes place in the Willamette Valley. I substituted Deschutes County for the Valley due to the lack of available data within the Valley, and the availability of ponderosa pine plot data from the USFS’s FIA program in Deschutes.

For the first exercise of this class, I explored autocorrelation within my two main variables. Next, I asked if the relationship between the two variables was geographically influenced. Finally, I looked for the specific influences of each variable upon understory vegetation cover in the regions defined by the second exercise.

Dataset

I used a subset of the Forest Service’s massive FIA dataset to replicate what I imagined my own data would look like. The biggest difference between this dataset and my own is its geographic location. Because the FIA plots are gridded, and the Willamette Valley ponderosa pine plantations are so small, there were very few FIA plots in the Willamette Valley that are entirely composed of ponderosa pine. As a substitute, I used plots from Deschutes county, which has a much higher proportion of pure ponderosa forests than the Willamette Valley does. There were 98 plots in Deschutes county that had ponderosa overstories, and 46 within that group with sufficient understory vegetation data. I used plot and subplot data to find the associated location, inventory year, canopy cover, and understory percent cover based on growth form (bare soil, forb, grass, shrub, and seedling). I included the environmental variables elevation, slope, and aspect, sourced from a DEM layer, to help explain additional variation within understory cover. I also acquired a soil map and included soil codes and drainage types in the final dataset.

Figure One: Ponderosa Pine plot dataset

Hypotheses

I hypothesized that understory species associated with ponderosa pine forests are adapted to open canopies with abundant light availability; therefore, the greatest  percent coverage of all understory growth forms would be present in low-cover canopies. I expected that there would be considerable variance in the understory response that cannot be attributed to canopy conditions, because environmental factors and previous land use will strongly influence the understory composition. However, I included the additional environmental variables in my correlation assessment to help identify the influence that canopy cover does exert over the understory.

Approaches

Part One

For exercise one, I assessed the autocorrelation within the canopy cover and understory vegetation cover. I used the Global Moran’s Index value in ArcMap to measure autocorrelation in these two variables. I had originally wanted to perform a variogram in R for this dataset, but I decided against this method, because my data is arranged as a low-density band of points, rather than a raster or a grid of points. It was also difficult to find an efficient piece of code to perform a variogram on a .csv file in R, so I decided to go with the more user-friendly ArcMap.

Figure Two: Deschutes County plot distribution

The steps to accomplish this analysis were very simple. I imported my .csv file with the FIA plots, their associated latitudes and longitudes, and other values into ArcMap as X,Y data. Then I converted the points to a shapefile. I projected this shapefile into the right UTM zone, so that the tool would work properly. Finally, I ran the Global Moran’s I tool under the Spatial Statistics > Analyzing Patterns toolbox. I performed these steps twice: once for the vegetation cover and once for the canopy cover.

Part Two

For exercise two, I performed a geographically weighted regression on canopy cover and understory vegetation cover to identify plot clusters based on the relationship between the two variables. Using the same shapefile I created in exercise one, I added the plot points to ArcMap. In ArcMap’s GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat unevenly distributed across the landscape, I used an adaptive kernel with an AICc bandwidth method, which means variable distances are allowed when assessing neighboring points.

Part Three

For exercise three, I used R to run three multiple linear regression analyses: two on the clusters identified in exercise two, and one on the entire dataset. To separate the dataset, I selected the clusters based on color intensity in ArcMap. The redder or bluer the points, the more strongly clustered they were. I exported each cluster’s attribute table in ArcMap, then converted the tables to .csv files in Excel. I opened each .csv in RStudio, and used the following code to run an MLR on both clusters and the full dataset:

# Read .csv files.

Full <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Full_Points.csv”)

Blue <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Blue_Points.csv”)

Red <- read.csv(“C:\\Users\\riddella\\Desktop\\R_Workspace\\Red_Points.csv”)

# MLR fits

Full_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Full)

Blue_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Blue)

Red_Fit <- lm(Total_cvr ~ Canopy_Cov + Elevation + Slope + Aspect + Soil_Code + Soil_Drainage, data = Red)

# Read results

summary(Full_Fit)

summary(Blue_Fit)

summary(Red_Fit)

Results

From exercise one, I eventually found that there was no autocorrelation within either of my primary variables (canopy cover, understory vegetation). During my first trial, I did find autocorrelation within the understory vegetation variable. However, I revisited the source of my dataset, and found that the majority of the zeros that my dataset included were actually “no value” placeholders. I removed these entries from my dataset, and performed the analysis again, finding that there was no autocorrelation in the understory vegetation distribution.

Figure Three: An example of a Global Moran’s Index output

Exercise two revealed that my dataset is clustered, which was only visible after I switched the symbology to variable coefficient, rather than standard deviation. The GWR I performed on canopy cover and understory vegetation showed a different relationship between the two variables in two distinct areas. This exercise was the most mentally challenging of the three, because it took me a while to grasp the concept of a geographically weighted regression.  

Figure Four: Clustering in GWR output

In exercise three, my results were initially surprising. The cluster MLRs did not reveal any significant influences by the six explanatory variables; all p-values were well above 0.05. However, the full regression did result in low p-values for both the soil type (p = 0.0141) and soil drainage (p = 0.0471) variables, which indicates that there is a relationship between soil type and soil drainage and vegetation cover across Deschutes County ponderosa pine forests at a larger scale.

Significance

Although this type of analysis has been performed countless times in many different forest ecosystems, this approach is significant in teaching forest scientists and land managers how plant communities respond to each other and their surroundings. Understory vegetation growth can be influenced by canopy cover, but canopy cover may not always be the limiting variable. In this dataset, canopy cover did not significantly influence understory vegetation. Nor did slope, aspect, or elevation. Vegetation cover is certainly responsive to environmental variables, but the relationships may be too fine to attribute to a single measurement.

In terms of my own learning, practicing statistical analyses on this placeholder dataset helped me cement concepts I had learned on paper, but that I wasn’t sure how to apply to a research scenario. Statistics can be somewhat of an invisible science, because there isn’t always a visual representation of how the data is being processed or presented. Statistics is still a bit of a weak spot in my skillset, but these exercises helped me gain a sturdier foundation in analyzing and interpreting statistical information within a dataset.

Your Learning

The majority of my exercises were completed in ArcMap, which, out of the programs available to us, I was the most familiar with. However, I was able to use new tools, such as Global Moran’s Index and Geographically Weighted Regression. I was also glad that I was able to revisit R for the final exercise. Although it would have been nice to learn a few tasks in Python or Matlab, this class provided me the opportunity to maintain and strengthen the skills that I developed in past classes, which I don’t get the chance to use very often.

What Did You Learn About Statistics

This class was my first introduction to spatial statistics, so I had a bit of a learning curve for each exercise. Certain concepts that seemed simple enough, e.g. two variables changing in similar ways, became lost in the new terms I was learning (geographically weighted regression =/= autocorrelation!) but I think I managed to get all the terms straight in the end.

One concept that was surprising to me was the inherent neutrality of autocorrelation or clustering. Sometimes autocorrelation spells doom for a dataset, but other times it just provides an explanation for phenomenon. The quantification of autocorrelation might even be the entire purpose of a study. For my project in this class, the discover of clustering in my dataset was surprising, but not undesirable. It simply revealed a previously invisible trait of the plot data I had downloaded.

How Did You Respond to Comments

Julia’s commentary on my exercises and in person helped keep my exercises moving in a fairly linear direction. Her comments were especially valuable as I was assessing exercise two, which only became clear after I correctly displayed the coefficients.

On tutorial one, I received a comment that suggested including climatic variables in my analysis. I did end up including environmental variables in my final analysis to try to explain some of the clustering in the dataset. The other comment on tutorial one recommended both a GWR and multiple linear regression. I ended up using both of these for the last two exercises. In tutorial two, one commenter suggested I explore environmental variables as influencers in understory vegetation cover, which I did end up including in my MLR for exercise three, and the final commenter did not leave constructive suggestions.

June 9, 2017

Final Project Post

Filed under: Final Project @ 9:39 pm

Revised Spatial Problem

During this class I worked on two separate projects.

Project 1: After much exploration of data and problems associated with my original spatial problem, I have slightly revised my questions. I originally wondered if I could obtain information about some of the factors that might influence the presence of 3 amphibian diseases (Ranavirus, Bd, and Bsal) and perhaps create a habitat suitability model. To do this, I was going to need information on the current distributions of the pathogens. I quickly realized most of this baseline information is missing therefore it is hard to create a predictability model. And so, my new approach is to create a model after I collect data on disease prevalence in my area of interest (Deschutes National Forest). Now I will collect up to 160 samples from the Deschutes National Forest and perform cross validation with half of those withheld samples to the remaining amount. By doing this, I can create a region-specific model of predicted occurrence after samples are collected. Once I collect information on pathogen distribution I also would like to answer some questions using spatial statistics afterward.  My collection starts this summer so I had no data to work with, so instead I used artificial data I created.

Slightly revised questions are:

  • Are ranavirus, Bd, and Bsal present in Deschutes NF, and at what scale?
  • What kind of spatial patterns are associated with pathogen presence, movement, and spread?
    • (Exercise 2; Tutorial 1; I asked a smaller question within this bigger question- “How is Bd presence/absence related to the distance from road or trail?”)

Hypotheses

  • I hypothesize that there will be a greater presence of pathogens at lower elevations because of a longer seasonal activity period due to warmer temperatures.
  • I hypothesize that the more amphibian species present the more likely a pathogen is to be present because of likely transmission between hosts.
  • I hypothesize that the size of the water body will influence detection rates of pathogens because of dilution of the pathogen in bigger water bodies.
  • (Exercise 2; Tutorial 1) I hypothesize that Bd presence will be higher in areas that see higher visitor use because of higher transmission rates.

Project 2: For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed to get done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My spatial questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests/meadows?

Hypotheses

  • Meadows are reptile biodiversity hotspots due to warmer temperatures.
  • Aspect matters for reptiles in forests and meadows due to southern-related aspects being warmer.

 

Dataset Description

I worked on 2 different projects, so different data was used for both.

Project 1: My first project was related to the question: “How is chytrid fungus presence/absence related to the distance from road or trail?” I chose this question because humans are known to be one of the primary causes of pathogen transmission. For this project, the following data was used:

  • Deschutes NF boundary (National Forest Service)
  • Water body layer (standing) from USGS- All of Oregon- clipped to Deschutes NF boundary
  • Road layer (National Forest Service)- all of Oregon- clipped to Deschutes NF boundary
  • Bd presence/absence points- created points in Deschutes NF within water bodies

Project 2: The second project I worked on was related to two questions and finding suitable habitat: “Are forest meadows reptile biodiversity hotspots? Does aspect matter for reptiles in forests?” Following data was used:

  • Deschutes NF boundary (National Forest Service)
  • Elevation DEM layer (USGS)- All of Oregon- clipped to Deschutes NF
  • Habitat type layer (National Forest Service- Deschutes only)
  • Aspect layer generated with ArcMap- All of Oregon- clipped to Deschutes NF
  • Roads layer (National Forest Service)- All of Oregon- clipped to Deschutes NF
  • Water layer (standing and flowing) (USGS)- All of Oregon- clipped to Deschutes NF

Approaches

Project 1: I used Logistic Regression for my y variable (chytrid presence/absence) related to an x variable (for this I used distance from road or trail) by using python script and R. Logistic regression is shown to be the best based on Table 6.1 in exercise 2 because my x variable is continuous (distance to road/trail) and y is binary (present/absent). To run Logistic regression, I had to prep my data. I wanted to answer the question, is the presence of Bd more likely near a road or trail (where humans are transporting the pathogen)? Therefore, I had to get distance from sample points to the nearest road. To do this, I first had to bring in multiple layers of roads and trails within Deschutes National Forest. I used the “Merge” tool to bring all the layers together. My next step was to find the distance from the sample point to the nearest road or trail in my new “merged roads and trails layer”. I used the “Near” tool which generated a value representing the distance from the sample point to the nearest road or trail. Once I had that information, I ran logistic regression where I used Bd as my dependent variable, and distance from road as my explanatory variable. I also used (just for practice) OLS, and hotspot analysis.

Project 2: I performed site suitability/selection for this study based on the following criteria:
Need to identify 4 sites (had to be 4 to match previous experiment):

  • each site would have 4 treatments
    • (a North and South, open meadow and closed forest)
  • road access (< 300m from road)
  • sites at similar elevations
  • similar forest types
  • similar proximity to non-forest
  • each site has the same proximity to water
  • area would be of similar size
  • created by natural processes, not a clear-cut.

I used many tools to determine site suitability for the field experiment such as Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute. Overall my approach was to narrow down possible meadows in the Deschutes National Forest to meet the above criteria. For entire approach and walkthrough see Tutorial 2.

Results

Project 1: For this project my results produced statistical relationships. Logistic regression works best for my data and was useful because it showed a significant relationship between the two variables as seen below.

 

 

Project 2: This project resulted in a map. All of my work resulted in 4 suitable sites that I will use for one of my research projects- determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 1.

For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made. Future work- would be interesting to automate this process using python.

y = f(x)
> glm(type~diversity)
Call:  glm(formula = type ~ diversity)
Coefficients:
(Intercept)    diversity
0.006173     0.098765
Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
Null Deviance:     2
Residual Deviance: 0.4198     AIC: 5.123

 

 

 

Significance

Project 1: Because my data was artificial, this was more of a learning experience. However, if this data was real then this project would have allowed me to determine if areas closer to roads had higher disease prevalence. This is important because of roads being travel corridors for humans, but also amphibians. Corridors like roads allow for easier transmission between areas that may have been geographically isolated in nature.

Project 2: My results showed me areas that had all requirements I needed. Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.

Learning

I learned a lot about various statistical methods used in spatial studies, as well as improved my understanding of methods I previously used in Stats 511, such as regression. Not only do I understand the theory better but I understand more about how to actually perform these tests within R and ArcMap. My understanding of what is available in ArcMap related to statistics was improved upon. In addition, I learned how to perform habitat or site suitability selection which was a big skill I wanted to gain. My skill and ability in ArcMap/GIS definitely improved. I learned that importing and using my own data in R wasn’t as scary as I thought it was going to be.

I learned a lot about requirements needed to run various test which was super helpful. I liked having Dr. Jones PowerPoint slides that broke tests down in a simple way where you could find the most appropriate test for your data. I learned that hotspot analysis is useful if you are trying to find patterns of things such as location in disease prevalence. Would be useful for trying to predict other areas that may have outbreaks.  I did not explore much in the spatial autocorrelation world due to my data and questions being not exactly suitable for that type of analysis. My understanding of regression and multivariate methods improved. I ran OLS, and logistic regression tests which showed interesting results but had different requirements to be used such as binary or continuous data which I was not too familiar with beforehand.

On Comments

Comments given to me by fellow classmates and Dr. Jones were extremely helpful. For example, on tutorial 1/exercise 2 I confused one of my variables (distance from road) as binary when it actually was continuous. This changed tests that were appropriate for my dataset but I used logistic regression which worked. In addition, comments from classmates were very helpful and gave great insight to my work. For example, on tutorial 2/exercise 3 I performed habitat suitability based on various factors. A comment made by one of my peers expressed the importance of getting the most up-to-date layers available because factors could change rapidly, such as habitat type after a wildfire goes through an area. This gave some good insight I did not really think about before.

Final project: Can remotely-sensed sea surface temperature and chlorophyll-a concentration predict the presence of blue whales in New Zealand’s South Taranaki Bight region?

Filed under: Final Project @ 4:59 pm

Research Question

My research focuses on the ecology of blue whales (Balaenoptera musculus brevicauda) in the South Taranaki Bight region (STB) of New Zealand (Figure 1). Despite the recent documentation of a foraging ground in the STB (Torres et al. 2015), blue whale distribution remains poorly understood in the southern hemisphere. The STB is New Zealand’s most industrially active marine region, and the site of active oil and gas extraction and exploration, busy shipping traffic, and proposed seabed mining (Torres 2013). This potential space-use conflict between endangered whales and industry warrants further investigation into the spatial and temporal extent of blue whale habitat in the region. My goal for this term was to begin to investigate the relationship between blue whales and their environment. Specifically, the question I am asking here is:

Can the number of blue whales present in an area be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration?

Figure 1. A map of New Zealand, with the South Taranaki Bight region (STB) delineated by the black box.

Dataset

For these analyses, I used blue whale sighting data from our 2017 survey of the STB. Between 2 and 21 February, we conducted 1,678 km of survey effort. During this time, we recorded 32 blue whale sighting events which consisted of 68 individual whales.

In addition to the blue whale sighting data from our survey, I downloaded two rasters containing average sea-surface temperature (SST) and chlorophyll-a (chl-a) concentration for the month of February 2017 from NASA’s Moderate Resolution Imaging Spectrometer (MODIS aqua) website (https://modis.gsfc.nasa.gov/data/). These layers use a 4 km2 grid cell size.

 Hypotheses

The STB region is characterized by a wind-driven upwelling system which originates off of Kahurangi Point and curves around Farewell Spit and into the bight (Shirtcliffe et al. 1990). Previous work has found that blue whales associate with cold, well-mixed, productive waters (Croll et al. 1998). Additionally, previous work on cetacean habitat modeling has found that animals tend to associate with oceanic fronts and eddies, where water masses of different temperatures are meeting and mixing and appear to aggregate prey (Becker et al. 2010, Cotte et al. 2011). Therefore, I hypothesize that the number of blue whales present within a given spatial area would show a negative relationship with temperature, and a positive relationship with deviation in temperature and chl-a concentration.

Approaches

For my final analysis for this course, I was able to build a model which included both blue whale presence and absence data. First, I loaded the two raster layers in ArcMap and overlaid our research vessel’s trackline as well as the blue whale sighting locations (Figure 2).

Figure 2. Remotely-sensed chlorophyll-a concentration, overlaid with the trackline of our research vessel (black lines) and blue whale sighting locations (red circles).

I obtained remotely-sensed SST and chl-a values for every 4 km2 grid cell that our research vessel transited through using the “extract by mask” tool in ArcMap to extract only the portion of the rasters for which we have actual observations. Subsequently, I used the “raster to point” tool to create a layer with a single point value at the center of each cell we transited through (Figure 3). I then used the “near” tool to sum the number of blue whales sighed in each grid cell and extract that value to the points containing the oceanographic information.

Figure 3. The results of the “extract by mask” and “raster to point” operations, which were used to extract just the grid cells from the remotely-sensed chlorophyll-a concentration raster that our research vessel transited through. The points represent an extracted value for each raster grid cell.

Because it appears that there may not be a simple linear relationship between blue whale presence and the values for SST and chl-a, which can be seen in the work I did for tutorials 1 and 2, I wanted to explore what other environmental features can be gleaned from these remotely-sensed raster datasets. In order to assess the possibility of fronts influencing blue whale distribution through a coarse proxy, I computed the standard deviation of both SST and Chl-a for each cell along the ship’s trackline. To do this, I used the “focal statistics” tool within the neighborhood toolbox in ArcMap. This allowed me to calculate the standard deviation for each cell across the surrounding 5×5 cell area. Since each of my grid cells is 4km2, this means that I had a value for the standard deviation across a 20km2 area. The results of this calculation for SST are shown in figure 4.

Figure 4. A raster layer with the standard deviation of sea surface temperature over the surrounding 20 km2 for each grid cell, overlaid with the locations of blue whale sightings.

I then used the “extract values to points” tool to add sd(SST) and sd(chl-a) to the points from the ship’s trackline. The resulting attribute table contained geographic location, number of whales, SST, chl-a, sd(SST), sd(chl-a) for every grid cell that our ship transited through. I exported this attribute table as a .csv file so that it could be read into R for further statistical analysis.

Finally, I built a generalized additive model (GAM) to assess the effect of these remotely-sensed oceanographic variables on the number of blue whales present in an area, which can be written as:

Number of blue whales ~ Chl-a + SST + sd(Chl-a) + sd(SST)

I used a GAM rather than a general linear model (GLM) because my assessments from previous exercises indicated clearly that the relationship is not linear. I chose to use a Poisson distribution, because my response variable contained many values of zero and one, causing a skew in the distribution. The R script I used to run the model is copied below:

Results

The results of the GAM are shown in Table 1. Both SST and sd(SST) were significant predictors of the number of blue whales present. There was no significant relationship between either chl-a or sd(chl-a) and the number of blue whales present. Of the two temperature variables, sd(SST) was a better predictor than just SST.

Table 1. Output from the generalized additive model. Statistically significant values are denoted by an *.

Parameter DF Chi-square p-value
Chl-a

SST

sd(Chl-a)

sd(SST)

1

1

1

1

0.336

4.078

0.696

6.182

0.5624

0.0434*

0.4040

0.0129*

Overall, only 4.32% of the deviance in the data could be explained by the model.

Significance

Overall, the model I built here does not explain very much of the patterns that may exist in this dataset. That is not overly surprising to me, as I am working with remotely-sensed data that is averaged over a period of a month and I know from personal experience that conditions in this area fluctuate temporally. Moving forward on this project, I will incorporate in situ measurements of SST, fluorescence (a measure of chl- concentration), as well as prey density as measured by acoustic backscatter strength. The work I did this term was a valuable exercise in working with these concepts, and I believe that my model will improve with the inclusion of finer-scale predictor variables.

I believe that the results presented here do have biological relevance, and follow what is currently established by the literature. Changes in temperature may not have a direct effect on blue whale distribution, however temperature can be used as a proxy to evaluate other oceanographic phenomena such as oceanographic fronts which may be of greater direct significance in defining blue whale habitat.

This habitat work is of particular interest as relatively little is known about blue whale distribution and habitat use in the Southern Hemisphere, including in New Zealand. The high concentration of industrial activity in the STB region increases the importance of knowing when and where blue whales are most likely to be present so that management strategies can be established accordingly.

Statistical Approaches Learned

I started the term off by using a hotspot analysis in exercise 1. For exercise 2, I used a geographically weighted regression (GWR). I think that the GWR is a useful and powerful tool, however I don’t think my dataset is big enough for it to be especially useful, at least in the way that I was applying it. For exercise 3, I used a GLM and practiced moving between ArcMap and R. For the final piece of this project I performed further analyses in both ArcMap and R, ultimately settling on the GAM presented here. There were numerous other trials of various tools that did not make it into any exercises or tutorials, but were useful practice nonetheless.

One thing I think I could have spent some more time on this term is investigating spatial autocorrelation between my variables of interest, through approaches such as correlograms or variograms. Perhaps this is something I will spend a bit more time on as I move forward with my analysis.

I can certainly say that my abilities and my confidence in my use of both Arc and R have improved over the course of this term. While I was working with a dataset which is minimal compared to what I hope to include moving forward, I think that I am now prepared with several tools which will be useful when I incorporate more data. Finally, I found it tremendously helpful to discuss approaches with and receive feedback from my classmates!

Response to Comments from Previous Tutorials

In my tutorial 1, I used a geographically weighted regression to examine the relationship between chl-a concentration and blue whale group size at each blue whale sighting location. What I found was that the regression model could not fit a linear relationship between the two. Based on my findings and feedback I received, I used a non-linear model for my final analysis. One piece of feedback I received was a question about whether the behavior of the animals may affect their distribution and habitat use. I think that this is an interesting question, but that since I am just using one year of data for this analysis my sample size for different behaviors may not be large enough to draw meaningful conclusions. However, this is something I may pursue when I include more years of survey data in my analysis.

For my tutorial 2, I used a GLM to evaluate chl-a and SST as predictors of blue whale group size at blue whale sighting locations. It was suggested that I use a Poisson distribution in my regression model, which I did for the final analysis. For tutorial 2, I was not yet able to extract all of the absence values and so I was advised that my results would likely be more meaningful with the inclusion of points without whales. I was able to do this for the final analysis presented here, and the significance and interpretability of my results did improve. Additionally, I got a suggestion to use principal component analysis in order to decide which oceanographic variables to retain in my model. I think that this approach will be very useful moving forward once I have processed additional data and have several more variables to include.

References

Becker EA, Forney KA, Ferguson MC, Foley DG, Smith RC, Barlow J, Redfern J V. (2010) Comparing California current cetacean-habitat models developed using in situ and remotely sensed sea surface temperature data. Mar Ecol Prog Ser 413:163–183

Cotte C, d’Ovidio F, Chaigneau A, Lèvy M, Taupier-Letage I, Mate B, Guinet C (2011) Scale-dependent interactions of Mediterranean whales with marine dynamics. Limnol Oceanogr 56:219–232

Croll DA, Tershy BR, Hewitt RP, Demer DA, Fiedler PC, Smith SE, Armstrong W, Popp JM, Kiekhefer T, Lopez VR, Urban J, Gendron D (1998) An integrated approach to the foraging ecology of marine birds and mammals. Deep Res II 45:1353–+

Shirtcliffe TGL, Moore MI, Cole AG, Viner AB, Baldwin R, Chapman B (1990) Dynamics of the Cape Farewell upwelling plume, New Zealand. New Zeal J Mar Freshw Res 24:555–568

Torres LG (2013) Evidence for an unrecognised blue whale foraging ground in New Zealand. New Zeal J Mar Freshw Res 47:235–248

Torres LG, Gill PC, Graham B, Steel D, Hamner RM, Baker S, Constantine R, Escobar-Flores P, Sutton P, Bury S, Bott N, Pinkerton M (2015) Population, habitat and prey characteristics of blue whales foraging in the South Taranaki Bight, New Zealand.

 

June 5, 2017

My Spatial Problem Revised

Filed under: 2017,Final Project @ 1:03 pm

1) The Question I asked:

Does site aspect play a significant role in radar backscatter after processing?

2) Description of the Dataset

The backscatter dataset comes from the Sentinel-1 satellite constellation, which is equipped with C-band radar and has global coverage. The return time for a given area is 6-12 days, with a ground resolution on the order of 10 m (after pre-processing steps have been completed). I was looking at a single point in time which covered much of Western Oregon. I am using plots of land owned by Starker Forests Inc., which can be seen in Figure 1 below. Although the information on forest cover was not provided, using the plots significantly reduced the size of the area of interest (for ease of use in ArcMap), as well as ensured that at the least the areas would be even-aged, monoculture stands of Douglas fir.

Figure 1: Area of Interest with values extracted at points (green).

3) Hypotheses

The null hypothesis I was working with was that there is no significant difference between aspects in average backscatter. This is assumed to be true once the preprocessing steps have been conducted, but my project was to investigate if that is a safe assumption.

4) Approach

The first analysis I used was to test for autocorrelation. This was a frustrating process in ArcMap due to the clunky nature of that program, however once I exported my data from Arc and brought it into R the process became much smoother. The autocorrelation step was an investigative move to test if the data points I would be using were a representative sample.

The next analysis was a comparison of aspect and backscatter values, which involved another grueling battle with Arc which ended with a semi-automated workflow that exported data for later use in R.

The final analysis was to look closer at the (non)relationship between aspect and backscatter by using various statistical tools in R (namely the bartlett test and the Kruskal-Wallis test).

5) Results

I produced a few maps, but the important results came in the form of graphs and statistical values. The most important was the finding that backscatter did not correlate at all with aspect via a Kruskal-Wallis test (figure 5). Combined with a bartlett test on homogeneity of variances, I found that the values of gamma-naught (backscatter) did not vary between the different aspects. These two tests show that there is no evidence that backscatter changes with aspect (all else being equal).

Figure 2: Variogram of the datapoints used in Excercise 1

Figure 3: Scatterplot of the backscatter values according to aspect (clockwise from top-left: NW, N, NE, E, SE, S, SW, W, center: Flat).

Figure 4: Boxplot showing the means and variation of backscatter (y axis) and aspect (x axis).

Figure 5: Results of Kruskal-Wallis test on both Gamma~Aspect and log(Gamma)~Aspect. Both tests resulted in high p-values (p > 0.6), which indicates the values of gamma-naught (backscatter) do not come from different distributions. If they did, that would mean that aspect plays a role in the distribution of backscatter.

6) Significance

The significance of this project is fairly limited to users of Sentinel-1 data who are using the same pre-processing as me. It is mainly a validation of the radiometric and geometric calibration process (an important step nonetheless).

7) What did I learn about programs?

I solidified my disdain for ArcGIS as a tool for statistical analysis on large datasets. I now see Arc as a gateway software that you can pass data through, pull out what you need based on geographic means (or with model builder), and then export the data to use in R or some other program.

I learned how useful R can be in geostatistical analysis, and was exposed to a few new software packages which I will be using in my thesis work.

8) What did I learn about statistics?

I learned a few new statistical tools such as variograms and the Kruskal-Wallis test. I didn’t learn enough to explain them very well to anyone else, but I did learn enough to know what the results from each mean in relation to my data.

9) Response to critique and comments

The responses and comments I received were very helpful. David and Marja both had thoughtful insight on my tutorial 2, and brought up approaches that I did not think to do. Lody and Audrey were also helpful during the first phases (tutorial 1) since they had similar projects to mine and a similar background so we could help guide each other.

The response from Dr. Jones during class was the most helpful though. I would frequently come to a dead end in my processing or my approach, but Dr. Jones always had an idea to change the direction toward an ultimate goal by taking smaller steps. Her comments on Canvas were not as helpful in giving direction.

June 2, 2017

Processing Multispectral Imagery from an Unmanned Aerial Vehicle For Phenotyping Seedlings in Common Garden Boxes: Part 2

Filed under: 2017,Tutorial 2 @ 1:52 pm

 

Consolidating Spectral Variables – PCA

Before any comparisons could be made between spatial and spectral variables in the study, I needed to look a bit closer at each set of variables. Retaining a large number of spectral variables in my models would make them much more complicated, therefore it would be advantageous to reduce that number. Based on a correlation matrix constructed using the R function ‘corrplot’ (Figure 14), there are several strong correlations between variables. In fact, the red edge band seems to have strong positive correlations with the other 4 Micasense Rededge bands. This clued me in that principal components analysis (PCA) could be used to condense them.

Figure 14: Correlation matrix between mean crown response for 5 spectral bands from Micasense Rededge sensor (blue, green, red, near ir, red edge) as well as two vegetative spectral indicies (NDVI and TGI).

Though I knew I wanted to do PCA to reduce the number of spatial variables in my dataset and had some evidence that it could work, it was necessary to decide how many components should be retained. By plotting the Eigen values in a Scree plot using R package ‘doParalell’ (Figure 15), it became evident that retaining three principal components would be appropriate.

Figure 15: Scree plot for 7 spectral variables. Results indicate that 3 principal components should be retained for principal components analysis based on the shape of the curve and the Eigen value = 1 line.

The R package ‘Psych’ was used to carry out PCA for 3 components on the 7 spectral variables. Results (Figure 16) indicate that the model could account for 94% of the total variance in the data. The principal components RC1, RC2, and RC3 are representative mostly of green, infrared, and blue, respectively. In short, the spectral dataset can be consolidated from 7 variables to 3 without losing much precision.

Figure 16: Output for 3 factor principal components analysis to summarize 7 spectral variables. Results show that retaining these three components would explain 94% of the variation from the original model.

Choosing Spatial Variables – Hot Spot Analysis and GWR

Though several spatial variables had been added to the data, it was unclear what relationship those variables had with the spectral data. For preliminary investigation, an ordinary least squares (OLS) model was built in ArcMap with NDVI as the dependent variable and all 4 spatial variables as explanatory variables. The results of the variable distributions and relationships plot from the OLS output (Figure 17) suggest that box number is the spatial variable most likely to have a significant effect on spectral data. Both box and plot center seem to be slightly skewed in terms of their residual plots, and nearest neighbor appears to be normally distributed.

To further inform this analysis, hot spot analysis (Getis- Ord Gi*) was carried out for the NDVI variable (Figure 18). Results demonstrate clear spatial trends. The most obvious disparity lies between the boxes. There also some evidence that position within the boxes has an effect, as the healthiest seedlings tend to appear in the center of boxes, where the least healthy seedlings tend to appear around the edges.

Figure 17: Variable and residual distributions for spatial variables in the study. Note that nearest neighbor (nn), box center, and plot center appear to have randomly distributed residuals, where box # (box_) does not.

Figure 18: Hot Spot analysis of seedlings based on NDVI shows spatial irregularities between and within boxes. This figure serves as a baseline for regressions that follow and suggests that both box # and distance to center of box may be significant covariates.

Using geographically weighted regression (GWR) in ArcMap, I was able to look at how NDVI variability changed with the inclusion of each of my spatial variables as effects. Because the box # seemed to have the greatest effect, I started by modeling it alone as a predictor of NDVI. The result (Figure 19) shows a large visual improvement in the randomness of variability of the data when compared to the hot spot analysis.  Adding box center to the model (Figure 20) only had a marginal effect on the result, though some outliers at the east and west extremes of the site were corrected.

Figure 19: GWR of NDVI by box. Reduction in number of severe outliers in box centers suggests that box effect is likely significant.

Figure 20: Adding distance to box center to model had only a small effect. The distribution of variability in the regressed model seems to be much more uniform and, presumably, closer to the corrected NDVI values without the spatial effects of box number and box center.

Though there is still much work to be done, this workflow sets the foundation for processing multispectral imagery that has or may have spatial biases and proposes ways to evaluate and work though them. Many applications exist for this type of work, but in the case of my study, the hope is to use spatially corrected NDVI values to classify seedlings based on their drought response, which could end up looking very similar to Figure 21.

Figure 21: Potential application for spatially corrected data: cluster and outlier analysis to identify individuals and groups of individuals with unexpected NDVI values compared to predictions based on spatial variables. Seedlings in high clusters (pink) or high outliers (red) could be strong candidates for genetic drought resistance.

June 1, 2017

Processing Multispectral Imagery from an Unmanned Aerial Vehicle For Phenotyping Seedlings in Common Garden Boxes: Part 1

Filed under: Tutorial 1 @ 6:11 pm

Background

A total of three common garden sites were imaged by flying a UAV (Tarot 810 hex-copter) over the boxes at 15m AGL. All three sites are within the Kaibab National Forest area near Jacob Lake, Arizona, USA (Figure 1). The aircraft carried two sensors on a servo-powered 2-axis gimbal. The first sensor, a Micasense Rededge multispectral imager, simultaneously records images in 5 discrete spectral bands as seen in Figure2. The second sensor, a FliR Tau 2 longwave infrared thermal camera, records images in the 7.5-13.5 µm spectral band. Sensors were programmed to trigger so that images would achieve 70% overlap and sidelap. Because of the similarities between the three locations, this tutorial will focus on only one of the sites, named “Little Mountain”. For this site, one flight resulted in 540 multispectral images (108 per band) and 125 thermal infrared images that were used for the processing and analyses that follow.

Figure 1: The approximate location of the common garden boxes, about 70 miles north of the Grand Canyon’s North Rim in Northern Arizona, USA.

Figure 2: The designations and spectral characteristics of bands associated with Micasense Rededge sensor

Orthomosaic Reconstruction

Due to the large number of raw images, the processing workflow could be streamlined by consolidating them into orthomosaic raster files using Agisoft Photoscan. Conveniently, the entire workflow in the program is located under the ‘Workflow’ tab on the top/center of the screen. This includes the first step, using either the ‘Add Photos’ or ‘Add Folder’ button to load in the raw imagery. Once the photos have been added, the program will give a prompt and the user should select ‘Create Cameras From Images’ for most datasets. In the case of multispectral data, however, select ‘Create Multispectral Cameras From Images’. This feature allows for simultaneous processing of all 5 bands from the multispectral sensor. Now, you can verify the sensor specification information by clicking on ‘Camera Calibration’ under the ‘Tools’ tab. Make sure that the camera type, pixel size, and focal length inputs are correct. This will save extra steps later.

Next, select ‘Align Photos’ under the ‘Workflow’ tab. This step is quite computationally demanding, so prepare for long processing times. Also, as the number of photos in the reconstruction increases, the processing demands increase exponentially. To test the adequacy of your imagery for coverage, adjust the accuracy to a lower level and check if all the photos are registered in the alignment. If some of the photos are missing, it is likely due to insufficient overlap and/or sidelap. For a very high-quality reconstruction, set inputs to match those in Figure 3. If the highest possible resolution is required, increase key point limit to 200,000 and increase tie point limit to 40,000.

Figure 3: Input parameters for very high resolution initial reconstruction. To lessen computational demands, lower accuracy levels can be used. For highest resolution, increase key point limit to 200,000 and tie point limit to 40,000.

Once the initial alignment is complete, use the ‘Optimize Cameras’ button on the Reference Toolbar to increase the accuracy of the camera positions. Use the three-axis orb to move your reconstruction around in the viewing window. Toggle camera visibility with the ‘Show Cameras’ button on the Main Toolbar. The next step in processing is to use densify the point cloud using ‘Build Dense Cloud’ under the ‘Workflow’ tab. Use the inputs in Figure 4 for highest possible quality. The computing requirements for this step are less than for the initial reconstruction, so reducing the quality will not have great impact on processing time.

Figure 4: Input parameters for building dense point cloud based on sparse cloud from initial photo alignment. Lowering quality level in this step will not result in much change in the processing time required.

Next, use the ‘Build Mesh’ (Figure 5) tab to make a mesh surface from the dense cloud. Increasing the face count field is necessary when creating detailed surface models, but in this case is not necessary. Finally, use  ‘Build Orthomosaic’ (Figure 6) to create the raster for export. Use ‘Top XY’ in the projection plane field to generate a nadir perspective image. To export the file, follow the path shown in Figure 7. Use TIFF format for easiest transition into ArcGIS and buffer creation steps that follow. The finished orthoimage can be seen in Figure 8.

Figure 5: Inputs for building mesh from dense cloud. Increase face count to generate more detailed mesh surface when planning to create surface models from data.

Figure 6: Input parameters for building orthomosaic image from mesh surface. This configuration will produce a nadir perspective orthophoto due to the ‘Planar > Top XY’ projection.

Figure 7: Location for exporting orthomosiac in image file formats. Exporting as TIFF will allow for raster viewing and manipulation using ArcMap subsequent steps.

Figure 8: The final product from image pre-processing:  an orthomosaic image of the area of interest. These common garden boxes are in the Kaibab National Forest, Arizona, USA.

Buffer Creation

Due to dense spacing within the boxes, a protocol was needed to identify where the plant centers were located, and ultimately, which pixels from the raster could be assigned to each seedling as part of its crown area. The first step toward accomplishing this was also the most tedious: to make three new shapefiles; one with points at the center of each seedling, another with points at the center of each box, and one with a single point at the center of the plot (Figure9). These shapefiles are used to calculate the spatial variables for the dataset displayed in Figure 10: distance to nearest neighbor (nn), distance to box center (bc) and distance to plot center.

Figure 9: Manually drawn points representing plant centers (yellow), box centers (red) and the plot center (green).  Note that the upper left box was omitted from analyses due to blurring effects in the reconstructed orthoimage.

A. B.  C.

Figure 10 (A-C): Visual representations of the spatial variables produced during buffer creation workflow. Buffer circles have been scaled by size according to distance to nearest neighbor (nn), center of box (bc), and center of plot (pc).

Circular buffer zones were drawn around plant centers using the formula  where R is the radius of the buffer and nn is the distance to the nearest neighboring point. This method was chosen over choosing an arbitrary uniform buffer dimension (i.e. 2 cm radius) due to its objectivity and repeatability. The new buffer layer represents the extent of the seedling crown areas, but still lacking is a filter to exclude non-vegetation pixels. The solution to this problem is in the next section.

Vegetation Masking

To ensure that my spectral analysis would only include values from ‘vegetation’ pixels, a mask layer would be needed. Making this layer was relatively easy due to something called the normalized differential vegetation index (NDVI). An NDVI layer was added to the dataset by performing raster algebra on the rasters from bands 3 and 4 from the Micasense sensor according to the formula : . One unique and particularly useful quality of NDVI imagery is that there is great contrast between vegetation and non-vegetation features (Figure 11). By performing an unsupervised natural breaks (Jenks) classification I split the newly created raster into two classes, exported only the pixels which were classified as vegetation, and ‘Voila!’, the vegetation mask layer was born (Figure 12).

Figure 11:  One common garden box viewed using normalized differential vegetation index (NDVI) to distinguish vegetation features from non-vegetation.

Figure 12: Vegetation mask layer (yellow) created from natural breaks (Jenks) classification of NDVI layer into 2 classes. Pixels classified as ‘vegetation’ based on the classification were exported into a new ‘mask’ layer.

Once the mask layer was created, all of the necessary inputs were in place to transform the dataset using my R-code (shown in part 3). The code uses the scaled buffer zones and vegetation mask layers to include only vegetation pixels for each plant and take an average for each of the spectral variables from within the crown area. Also, the spatial variables for each seedling are included. As a result, the new attribute table contains 11 variables (7 spectral and 4 spatial) for all 1697 seedlings. It is good to note, however, that if spatial variables were in fact independent of spectral response then it would be reasonable to classify seedlings based on spectral response alone (Figure 13). Unfortunately, the spatial variables do seem to make a difference, setting the stage for Part 2 of this tutorial.

Figure 13: Example analysis application for data resulting from part 1 of this tutorial: Individual seedling Isocluster unsupervised classification by standard deviation based on mean crown NDVI. Similar analyses could be conducted using other spectral variables such as TGI, but more complex methods are required to investigate the relationship between spatial variables and spectral response.

 

May 26, 2017

A space for time substitution on an impacted landscape

Filed under: Tutorial 2 @ 1:49 pm

Question:

Has the total daily energy flow through a functional group at a given elevation changed over the eighty-year interval between 1931 and 2011?

Background:

I am hoping to better understand the forces structuring small mammal communities, specifically rodents, across space and through time. In this regard, I am using trap effort-standardized abundance data to evaluate the response of species and functional groups to changes anthropogenic land use practices and climate warming. These data are spatially explicit, and although their spatial coordinates reflect the patterns of trapping, their elevational distribution captures patterns that are reflective of biological responses to a changing environment. Increasing elevation is inversely correlated with summer high temperatures and collinear with winter low temperatures, and positively correlated with increasing precipitation. There is also a trend of decreasing human impact due to land use with increasing elevation, and in the Toiyabe mountain range, human land use for cattle grazing and agriculture has also decreased over time. However, climate warming in the Great Basin has continued over the same time span and has shifted climatic bands upslope.

Species and functional groups respond to these change overtime in dynamic ways which may be recapitulated in their distribution along elevation/ climate gradients. This is referred to as the space for time hypothesis, and generally states that space, which captures the temporal gradient of interest, may be used to stand in for time; with the expectation that species distributions across space reflect their response to the variable of interest through time.

I am interested in the functional groups captured by diet preference, habitat affinity, and body size. Diet preference classes include herbivores, granivores, and omnivores, habitat affinity includes xeric adapted, mesic adapted, and generalist, and body size includes four size bins, 0-30g (size class 1), 40-80g (size class 2), 100-200g (size class 3), and 250-600g (size class 4). Any given species may be any combination of these three functional classes, although not all combinations are actually represented. Thus, patterns in the distribution of these functional groups also reflect tradeoffs that species face, and help to highlight species that may be at risk when functional categorizations become decoupled from environmental conditions. Thus, changes in the elevational distribution of functional groups over time provides key insights to species responses. However, total daily energy flow can also be estimated for these groups, because energy flow through a group is a function of both body size and abundance. This helps us to understand the interactions between abundance, body size, diet, and habitat affinity.

Thus, the question becomes, has the total daily energy flow through a functional group at a given elevation changed over the eighty years between the modern and historic; and based on the cross correlations between body size, diet, and habitat affinity, are their species or groups that appear especially at risk of local extirpation?

Analytical approach and brief description of steps followed to complete the analysis:

To perform this analysis, I used Cran-R to trap effort standardize abundance data for rodent species in the Toiyabe mountain range for both historical and modern data sets. The function in R is an iterative process which resamples all trap sites within the historic and modern data sets that have a trap effort greater than the site with the lowest number of trap nights. This process is automated using R, and generates an average species occurrence from 1,000 iterations. I then used excel to calculate the daily energy flow through each population based on body mass estimates and an equation which employs the principals of allometric scaling, Ei = aNiMib. Where Ei is energy flow (kJ/day), a and b are allometric parameters, specifically, b is the scaling parameter and equal to 0.75. N is the abundance estimate, in this case the trap effort standardized abundance, and M is the estimated body mass (Terry and Rowe 2015).

All trap localities were paired for the historic and modern time-periods, and traps site elevations were binned into three elevation ranges, bellow 5500ft, 5500-6500ft, and above 6500ft. Energy flow for each functional group was calculated and plotted based on the above stated elevation bins.

Brief description of results you obtained:

Total energy flow per day though the modern rodent community in the Toiyabe mountain range is lower than in the historic time period (Figure 1). This finding is consistent with findings by Terry and Rowe, 2015, that total energy through the rodent community in the Great Basin decreases sharply at the Holocene-Anthropocene boundary.

Using climate-elevation correlations we can substitute elevation for time in the Holocene to model the warming climate from the early to late Holocene in the Great Basin. Temperatures increase with decreasing elevation, and precipitation decreases with decreasing elevation. Historically, energy flow through xeric adapted species decreases with increasing elevation, while the opposite is true for mesic adapted species. Notably, energy flow through the historic mesic adapted species increases in a linear fashion with elevation, while in the modern energy flow through this group roughly plateaus at the middle elevation (Figure 2). While both modern and historical patterns are generally consistent with Holocene trends for these functional, neither captures the distinctive decline in energy flow through the xeric adapted group during the modern. This suggests that space for time comparisons may be valid for pre-human impact studies, but that they may not capture abrupt shifts in species responses during the Anthropocene.

Small bodied, omnivorous, habitat generalists demonstrate greatest energy flow at middle elevation in both the historical and modern time-periods, however, total energy flow through these groups are markedly decreased in the modern compared to the historic time periods at these elevations. While generalist demonstrate lower energy flow at the middle and high elevations for the modern compared to the historic, their low elevation values remain unchanged (Figure 2). This is consistent with omnivorous, habitat generalist through the Holocene and into the Anthropocene (Terry and Rowe 2015).

            Finally, herbivores demonstrate a nearly four-fold decrease in energy flow at the lowest elevations over the eighty-year period between the historic and modern time-periods (Figure 2). This is the only result that seems to strongly reflect the abrupt shift from the Holocene to the Anthropocene. This is a particularly surprising result as human land use was greater historically for this mountain range than it is today, and ground cover by herbaceous plants and grasses has increased over this time-period. This may suggest that increased temperatures at lower elevations have pushed herbivorous species upslope, indicating that these species may be more sensitive to climate. Conversely, energy flow thought granivores at low elevations increased from historic to the modern, also consistent with Terry and Rowe, 2015. This however, may be expected, as granivorous rodents in this region also tend to be xeric adapted, while herbivorous rodents tend to be mesic adapted.

            While inference based on this study is limited in scope due to the sample size of one (one mountain range), it does begin to make the case that space for time comparisons should not be assumed to hold as we move into a future where landscapes and climate are increasingly affected by human activity. By doing so, we may fail to capture the subtle ways in which certain ecological patterns are becoming decoupled, and thus, must be wary of drawing spurious conclusions based on old assumptions.

Figure 1. Total energy flow (kJ/day) for each elevational bin in the historic and modern time periods.

Figure 2. From left to right; Energy flow (kJ/day) through functional groups in the historic and modern time periods along the elevation gradient. From top to bottom, functional classifications: size class, diet preference, and habitat affinity.

 

Critique of the method – what was useful, what was not?

Given the nature of the data, this method of analysis was useful for me. I learned something about the data that was not obvious before my analysis, or comparison with the research at the greater temporal scale captured in the study by Terry and Rowe 2015.  While my inferences are limited, increasing the number of mountain ranges on which I perform this analysis will enable me to perform a number of more informative statistical analyses, and test the spatial gradient at both the elevational scale, and the latitudinal scale, as the three mountain ranges, for which similar data exists occur in a north to south array.

References:

Terry, R.C. and R.J. Rowe. 2015. Energy flow and functional compensation in Great Basin small mammals under natural and anthropogenic environmental change. PNAS, 112(31):9656-9661.

May 24, 2017

Tutorial 2: Identifying clustering with a geographically weighted regression

Question You Asked

How much spatial clustering is present in the regression model of vegetation response to canopy cover? I am interested in determining if a single equation can predict the way that the two variables interact within east-side ponderosa pine forests, or if multiple equations are necessary.

Name of Tool or Approach You Used

To answer this question, I used the Geographically Weighted Regression tool in ArcMap.

Brief description of steps you followed to complete the analysis

In the GWR tool, I used vegetation cover as the dependent variable, and canopy cover as the explanatory variable. Because my points are somewhat clustered across the landscape, I used an adaptive kernel with an AICc bandwidth method.

Brief description of results you obtained

The map of coefficients did reveal certain clusters of high coefficients and low coefficients across the landscape. However, judging by the GWR table below, this clustering may not be statistically significant. One anomaly of this assessment was the negative adjusted R2 value. A negative R2 means that the equation did not include a constant term.


Map of clustered plots

VARNAME VARIABLE DEFINITION
Neighbors 46
ResidualSquares 14750.95231
EffectiveNumber 3.80525057
Sigma 18.69738296
AICc 405.3014401
R2 0.048524493
R2Adjusted -0.01473284
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov

Table of original GWR results

To remedy the negative adjusted R2, I tried adding in another explanatory variable (elevation). This appeared to help the model, reducing the residual squares and bringing the adjusted R2 value back above 0.

VARNAME VARIABLE DEFINITION
Neighbors 46
ResidualSquares 13904.07263
EffectiveNumber 5.22764912
Sigma 18.46665082
AICc 405.8795404
R2 0.103150476
R2Adjusted 0.01015694
Dependent Field 0 Total_cvr
Explanatory Field 1 Canopy_Cov
Explanatory Field 2 Elevation

Table of remedied GWR

Critique of the method – what was useful, what was not?

This method was useful in that I could process the data in ArcMap, which is where I was already analyzing my points. It was also very helpful to visualize the coefficients in the map below. However, I am still a little unsure why the coefficient map shows such strong clustering while the output table does not show any significance.

Automating Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods

Filed under: Tutorial 2 @ 12:04 pm
  1. Question asked

As reported, I am handling more than 100 variables. In addition, my data is constantly being updated. So, I needed to be able to have a tool in order to create independent features for analyzing each of my variables. The question is: How can I automate Feature creation of variables for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods?

  1. Name of the tool or approach that was used.

I used ArcPy with Python for the automation and, ArcMap for output visualization. ArcPy is a site package that builds on (and is a successor to) the successful arcgisscripting module.

  1. A brief description of steps you followed to complete the analysis.

First, we have our input feature as shown in Figure 1. Notice the attribute table containing all our variables.

Next, we take advantage of the Model Builder in ArcMap in order to visualize our model using the tool “Feature Class to Feature Class”, as shown in Figure 2.

 

In addition, we export the resulting python code. The exported code is presented in Figure 3.

 

This python code gives us an idea of how the “Feature Class to Feature Class” tool is working. So we proceed to create our own function file as depicted in Figure 4.

Within our function file and, based on the previously exported function, we build our own function. Our function is presented in Figure   5.

Once we have our function within our function file, we can now execute our function creating a script file. The content of our created script file is shown in Figure 6.

  1. A brief description of results you obtained.

 The result of running our script file is shown in Figure 7. Here, we first read the variables of our input feature and, at the las row, the output field of the new feature is listed.

Finally, the newly created feature is seen in Figure 8. Here we can also see it corresponding attribute table. It is noticed that the created feature contains only the selected field.

  1. Critique of the method – what was useful, what was not?

  • Model Builder:

This method is useful for building and visualizing your model. And also has the capability to export a python script of your model.

  • ArcPy:

ArcPy is a very useful package that builds on the successful ArcGIS scripting module. It allows creating the cornerstone for a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. So, many variables can be analyzed without the need to access to ArcMap, saving an enormous amount of time.

References

ArcGIS (2017). What is ArcPy. http://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy/what-is-arcpy-.htm

Tutorial 2:

Filed under: 2017,Tutorial 2 @ 11:59 am

Research Question

Building upon my question from exercise 2, my research question for exercise 3 seeks to answer how the covariance between SCF and EVI within a wildfire perimeter change over time following the wildfire.

Approach/tools used

For this exercise, I utilized a variety of software programs. While the overall goal of variogram creation required using R, ArcMap and Excel were used significantly in order to process the data into a usable format. As a result, much hopping between programs was required, which was undoubtedly quite tedious/frustrating at times but ultimately proved to be a useful learning experience.

Analysis procedure

To create a cross variogram in R using the gstat package, the process is much smoother using data in table format. As a result, one of the main challenges of creating a cross-variogram was converting my raster data into usable tabular data. To accomplish this, I first converted the SCF and EVI rasters into point shapefiles, then used the “Add XY Coordinates” tool in ArcMap to add two fields to the point shapefiles containing the X- and Y-coordinate information. Next, the attribute table from each point, shapefile was converted to a CSV table. The final step before stepping back into R was to combine all EVI and SCF data for each year into a single CSV table to ease the script writing in R.

After creating the combined EVI-SCF table, I noticed that a few of the columns had different numbers of records. It turns out that I used a slightly different extraction method for a few SCF layers – a product of conducting this analysis across multiple days and not taking diligent notes on the layers used. After correcting this issue, I read the cleaned table into R and created cross-variograms using the code snippet shown below.

Results

My hypothesis for the years following the wildfire was to see the strongest covariance in the first year, then decreasing overall covariance in each year afterwards. However, the plots below tell a different story. There are a lot of things going on in the time-series of cross-variograms, but a couple of evident trends stick out to me. First, although each year shows two clear bumps in the cross-variogram, the lag distance of those bumps shift significantly from year to year. My guess would be that the initial shift from pre-fire to the first year post-fire was primarily due to the variation in burn severity across the landscape. The shifts in ensuing years may be due to varying rates of vegetation recovery or perhaps due to changes in snowpack distribution caused by the wildfire.

Another fascinating pattern is the change in sign of the covariance. Before the wildfire, the covariance is mostly negative except for a few thousand meters of lag distances. This suggests that between most pixels within the burn area, SCF and EVI vary negatively – if snowpack increases between two points, vegetation generally decreases between those two points. Following the wildfire, however, the covariance becomes more and more positive overall. By year 3 following the wildfire, at all distances the covariances are positive, but still showing a similar bumpiness as in the pre-burn cross-variogram. This outcome was a bit baffling, as I would expect the relationship to be strongest (most positive) in the first year post-fire, with a gradual retreat to the pre-fire state.

Another pattern of significance is the consistent cross-variogram valley that occurs at about 5000 meters in each year, even though the peaks on either side shift from year to year. This can be interpreted as the between-patch distance, and is perhaps the more important result of this method – that from year to year, even before the wildfire, the landscape shows similar distances between patches.

Figure 1: Pre-fire (2005) cross variogram between SCF and EVI

                     Figure 2: First year post-fire (2007) cross variogram between SCF and EVI

Figure 3: Second year post-fire (2008) cross variogram between SCF and EVI

Figure 4: Third year post-fire (2009) cross variogram between SCF and EVI

Method critique

By creating cross-variograms for each year following a wildfire and the pre-fire year, I was able to discern the patchiness of the EVI and SCF data as well as the shifts in the snowpack-vegetation relationship. The persistence of the bumpiness across years strengthens the evidence of two patch sizes, but the shift in bump locations results in a cloudier picture of exact patch size.

One disadvantage to this approach was the inability to determine the strength of the SCF-EVI relationship within a given year’s cross-variogram. After researching the interpretation of covariance, it became clear that covariance values can only tell you the sign of a relationship, but cannot lend any insight into the magnitude of the relationship because the data is not standardized. However, because I generated covariograms across multiple years, I was able to glean information regarding the relative shifts in covariance, which is ultimately the type of information I’m most interested in producing.

For the final project, I plan on generating additional covariogram for the next couple of year in order to observe how the covariance continues to change. I’m interested to see at what point the covariance plot starts to look like the pre-fire condition.

Tutorial 2: Nitrate Data in Relationship with Red Alder cover and Land Use

Filed under: 2017,Tutorial 2 @ 11:43 am

 

  1. Research Question 

Do the nitrate values that I collected in the field have a relationship with the percent of Red Alder cover in each watershed?

                Sub question: Do the nitrate values have a relationship with land use activities in each watershed tree?

  1. Name of the tool or approach that you used.

For this analysis I utilized ARC GIS and Microsoft Excel

  1. Brief description of steps you followed to complete the analysis.

First I utilized a package in Arc GIS called NetMap to delineate the 10 watersheds. This package uses GPS points to draw watershed boundaries based on elevations and flow paths.

Next I uploaded the 2011 red alder percent cover shapefile from the Oregon Spatial Library Database, and then clipped it to the watersheds.

By joining the attributes of these two shapefiles, I was able to obtain the total area for the watershed and the total area of Red Alder cover in the watershed.  Then I divided these two areas I was able to obtain percent cover for Alder and compare that to the nitrate concentrations at each site.

Table 1 Nitrate Concentrations and Percent Alder Cover

Watershed Area (m2) Alder (m2) Percent Alder Cover Average Nitrate
Cape Creek 4353700 3537.74 0.081% 1.88
Crooked Creek 31760400 3226.39 0.010% 1.35
Cummins Creek 21388800 222.89 0.001% 1.44
Gwynn Creek 2865000 5371.57 0.187% 1.61
Little Cummins Creek 3056600 361.35 0.012% 1.72
Yew Creek 9786600 969.92 0.010% 1.01
Kiser Creek 100500 1017.97 1.013% 0.23
Rock Creek 38375500 2359.26 0.006% 0.19
Middle Fork Creek 3294000 339.23 0.010% 0.17
Griffith Creek 4434800 2021.86 0.046% 0.13

 

Once this information was obtained, I utilized a scatterplot to identify a pattern.

Figure 1 Relationship between percent alder cover and nitrate concentration

 

Though the trendline indicates a negative correlation the data point at 1 percent may be an outlier. The watershed indicated by this point is small, which could inflate the true amount of alder cover in the region. With this in mind no distinctive pattern was observed for a relationship between alder cover and nitrate concentration in the ten watersheds. This information isn’t entirely surprising as the nitrate concentrations were relatively low to begin with.

 

Landuse Analysis 

Looking further, I utilized the 2011 NLCD data to evaluate land use in the 10 watersheds. With no usual suspects (agriculture, urban etc.) being particularly dominating in the watersheds it was difficult to link increased nitrate concentrations and land use at the field sites.  Most of the land cover was predominantly evergreen and mixed forest types which are natural for the areas sampled.

 

Figure 2 Coastal Watersheds Landuse

 

Figure 3 Coast Range Watersheds Landuse

  1. Brief description of results you obtained.

Overall nitrate concentrations over the five months in the ten watersheds are relatively low (~0.1-1ppm), and the results obtained were not particularly surprising.  There was no clear pattern linking red alder cover and nitrate concentrations in these watersheds (Table 1). Additionally, there were no surprise land use features in the watersheds, and thus no apparent link to nitrate concentrations in this analysis. Most of the land use in these watersheds are occupied by mixed and evergreen forests, neither are particularly linked to elevated nitrate concentrations. I would say this analysis was consistent with my expectations, starting with low nitrate concentrations, and suspecting most of the land use would be forest cover I predicted there likely not to be a strong link between land use and nitrate concentration. However, I expected slightly more red alder cover in the coastal watersheds and was surprised how little that cover related to the concentration of nitrate.

 

  1. Critique of the method – what was useful, what was not?

This method was very useful, I was able to delineate the watersheds, clip layers to identify potential sources of nitrate and utilize excel to further the analyze relationships. I used the join feature to gather information from two attibute tables to calculate the percentage of alder cover as well. I will use the skills learned and practiced in this exercise in the future to identify statistical relationships between spatially correlated (or not) data.

 

Spatial autocorrelation in NDVI for wheat fields and some characteristics

Filed under: 2017,Tutorial 2 @ 11:06 am

Tutorial 2 is based on tutorial 1 with a few alterations. 1) the variogram is standardized by dividing the semivariance by the standard deviation of that field, 2) only the wheat fields are used for the analysis so that the results are more easily compared, and 3) a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).

 

Part 1 and 2

Standardizing of variogram

var@variogram$gamma=var@variogram$gamma/cellStats(ras,’sd’,na.rm=TRUE,asSample=TRUE)

where var is the Formal Class RasterVariogram. The variance column in this class can be accessed by @variogram$gamma. This column is replaced by the standardized variance, by dividing the existing column by the standard deviation. The standard deviation is calculated with the cellStats function. The input raster is ras, and ‘sd’ is the statistical option for standard deviation. na.rm=TRUE is a logical to remove NA values, and lastly, the asSample=TRUE means that the denominator for the standard deviation is n-1.

Doing the same analysis but now just focusing on one crop does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatter plot there is no pattern visible for the relationship between the sill, range, and biomass.

Part 3

For part 3 the range and sill values for all wheat fields at 5 different dates is determined. The correlation between range and sill for every date is determined. This Pearson correlation coefficient is then analyzed over time. It seems that there is an increase in correlation over time. There seems to be a high correlation with a 5%-significant p-value of 0.047.

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

Filed under: Tutorial 2 @ 1:09 am

Reva Gillman

Tutorial 2                                Manipulating Layer Properties in ArcGIS ArcMap 10.4.1

  1. Question that I asked:

What are the geographic distributions of some of the ‘riskier’ species that arrived on Japanese Tsunami Marine Debris (JTMD)? What do their native and non-native ranges look like, and how do they compare with each other?

First of all, I had to choose 4 species with clear invasion history out of the 104 JTMD species, to focus on. Out of the 31 with invasion history, I chose Asterias amurensis (seastar) and Hemigrapsus sanguineus (Japanese shore crab) since they are known to be an issue in other regions. I then chose two species with large non-native spread: Crassostrea gigas (Japanese oyster), an economically important aquaculture species, and Teredo navalis, the shipworm with an almost global distribution that has burrowed in wooden ships all over the globe for 100’s of years.

  1. Name of the tool or approach that I used:

Manipulating Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species.

  1. Brief description of steps I followed to complete the analysis:

First I made an excel spreadsheet of the species data, by going through each ‘risky’ species distribution, converting regions to realms, and typing that up in the spreadsheet. Then, I did a join with Realm_Code which matched up with the attribute table from the shape file. However, I ran into issues with mapping this data, as it would only map the region that was native, or non-native, without the other regions represented at all, so I had to figure out another method.

My second attempt was to directly select regions of the shape file, and change the symbology by going into selection, and clicking on use with this symbol. This may have eventually worked, but was very hard to figure out which regions to select for each species, and there was not an intuitive way to save the selections as a new layer.

Finally, I found out how to go into Layer Properties, and manipulate the shape file from there:

a. First you left-click on the layer you want to manipulate, and select Properties from the bottom of the options.

b. Go into Symbology

c. Underneath the Categories options on the left, select unique values

d. From the value field, select the field that you are interested in, in my case that was Realm_Code

e. Select the Add all values button on the bottom, which will put all of the selected field options in the screen.

f. From there, you can select a field, and remove it, or add values as needed, which I did for each species

g. By double clicking on each field, you can select the appropriate color that you want to represent each category, you can change the outline of the symbol, the fill color, and texture. I did this to make the color of invasive realms one color, and native realms another color, and regions where the species was both native and non-native another color/texture.

h. Then, you can copy and paste the layer, to manipulate again for the species, so you don’t have to start from scratch.

Layerproperties

4. Brief description of results you obtained.

Below, there are the 4 maps of each species distribution, to see the extent of their ranges.

 

Teredo navalis geographic distribution

 

 

hemi

 

 

Hemigrapsus sanguineus geographic distribution

Crassostrea gigas  Geographic Distribution

 

For clearer viewing of the above graphs, please download: Tutorial2

5. Critique of the method – what was useful, what was not?

This method can be tedious if you want to look at many different layers of maps. For me, it worked out because I chose to look at four different layers of species geographic distributions, but if this method was less tedious I might have looked at more. By going through and manually selecting each polygon, or realm, that I wanted to highlight, I had a lot of control over what the final map looked like, and I liked the level of detail you can find with symbology, outlines of polygons, fill color, and texture of fill. The final maps are straightforward, and easy to save each layer, copy, and paste the next layer so you don’t have to start from scratch each time.

May 23, 2017

Tutorial 2: Extracting raster values to points in Arc for regression analysis in R

Filed under: Tutorial 2 @ 10:38 pm

Research Question

As I have described in my previous blog posts, I am investigating the spatial relationship between blue whales and oceanographic variables in the South Taranaki Bight region of New Zealand. In this tutorial, I expand on the approach I took for tutorial 2 to quantify whether blue whale group size can be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration values at each sighting location. To accomplish this I used ArcMap to align different streams of spatial data and extract the relevant information, and then used R to perform statistical analyses. Specifically, I asked the following question:

“How well do remotely sensed chlorophyll-a (chl-a) concentration and sea surface temperature (SST) predict blue whale group size? “

Approach

I downloaded raster layers for chl-a concentration and SST, and used the the “extract values to points” tool in Arc to obtain chl-a and SST values for each blue whale sighting location. I then exported the attribute table from Arc and read it into R. In R, I performed both a general linear model (GLM) and a generalized additive model (GAM) to investigate how well chl-a and SST predict blue whale group size.

Methods

I downloaded both the chl-a and SST raster files from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website (https://modis.gsfc.nasa.gov/data/). The raster cell values are averaged over a 4 km2 spatial area and a one-month time period, and for my purposes I selected the month of February 2017 as that is when the study took place.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from both the chl-a concentration and SST rasters for each of the blue whale sighting data points. The resulting contained the location of each sighting, and the number of whales sighted (group size), the chl-a concentration, and the SST for each location. All of this information is contained in the attribute table for the layer file (Fig. 1), and so I exported the attribute table as a .csv file so that it could be read in other programs besides Arc.

Figure 1. Attribute table for blue whale sightings layer, which contains blue whale group size, chl-a concentration, and SST for each sighting location.

In R, I read the in the data frame containing the information from the attribute table. In an exploration of the data, I plotted chl-a against SST to get a sense of whether there is spatial autocorrelation between the two. Additionally, I plotted group size against chl-a and SST separately to visualize any relationship.  Subsequently, I ran both a GLM and a GAM with group size as the response variable and chl-a concentration and SST as predictor variables. While a GLM fits a linear regression model, a GAM is more flexible and allows for non-linearity in the regression model. Since I am in the exploratory phase of my data analysis, I tried both approaches. The script I used to do this is copied below:

Results

There appeared to be some relationship between SST and chl-a concentration (Fig. 2), which is what I expected considering that colder waters are generally more productive. There was no clear visual relationship between group size and chl-a or SST (Fig. 3, Fig. 4).

Figure 2. A scatter plot of chlorophyll-a concentration vs. sea surface temperature at each blue whale sighting location.

Figure 3. A scatter plot of chlorophyll-a concentration vs. blue whale group size at each blue whale sighting location.

Figure 4. A scatter plot of sea surface temperature vs. blue whale group size at each blue whale sighting location.

The result of the GLM was that there was no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size. The result of the GAM was very similar to the result of the GLM in that it also showed no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size, and the predictor variables could only explain 4.59% of the deviance in the response variable.

Critique of the Method

Overall, the method used here was useful, and it was good practice to step out of Arc and code in R. I found no significant results, and I think this may be partially due to the way that my data are currently organized. In this exercise, I extracted chl-a concentration and SST values only to the points where blue whales were sighted, which means I was working with presence-only data. I think that it might be more informative to extract values for other points along the ship’s track where we did not see whales, so that I can include values for chl-a and SST for locations where the group size is zero as well.

Additionally, something that warrants further consideration is the fact that the spatial relationship between blue whales and these environmental variables of interest likely is not linear, and so therefore might not be captured in these statistical methods. Moving forward, I will first try to include absence data in both the GLM and GAM, and then I will explore what other modeling approaches I might take to explain the spatial distribution of blue whales in relation to their environment.

Tutorial 2: Using kernel density probability surfaces to assess species interactions

Filed under: Tutorial 2 @ 4:48 pm

Overview: question clarification and approach

Continuing my exploration of changes in cougar predation patterns in time periods with and without wolves, I wanted to expand my evaluation of spatial repulsion and/or shifts to examine how cougar kill site distributions related to wolf activity. I identified several elements in my data sets, besides the presence of wolves, which could produce a shift in kill density or distribution including: 1) catch-per-unit effort discrepancies (i.e. larger sample sizes of cougars (and kill sites) in one data set), or 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). I accounted for catch-per-unit effort issues in tutorial 1, but need to account for seasonal variation in prey distribution as part of my analysis of wolf influence on cougar kill distribution. Density features continued to be a good tool to explore my presence only data (kill events). Studies of sympatric wolves and cougar have shown differences in the attributes of sites where each carnivore makes kills (slope, elevation, physiographic features), but I was interested in how cougar might be positioning themselves (e.g. where they are hunting and making kills) on the landscape relative to centers of wolf activity. Therefore, the next step in understanding my spatial problem was to determine if there were differences in when and where cougar were killing prey by asking:

Does wolf activity account for the distribution differences between pre- and post-wolf cougar kill sites?

For the purposes of this analysis the data would be “location, implicit” with the variable of interest (cougar kill sites) having a covariate (proximity or distance to wolf activity) measurable at each sampled location and the causal mechanism inferred from co-variation. Central areas of wolf activity could be identified from wolf kill density, estimated on a seasonal basis, and related to the distribution of cougar kill sites. A probability surface could be created from wolf kill density and used to calculate the proportion of pre- and post-wolf cougar kill sites within probability features as a proxy for the distribution of wolf activity. This relationship could then be compared as a joint proportion or as an expected/observed relationship across time periods. Alternatively, the isopleth polygon feature could be used to calculate a “distance-to-edge” feature relating each cougar kill to potentially increasing levels of wolf contact. This relationship could be evaluated between time periods and across seasons through ANOVA or regression.

My approach to this question was to relate the proportion of both pre- and post-wolf cougar kill sites (points) to wolf activity using the probability contours (polygon feature) of wolf kill site distribution (density raster) as a proxy for activity. I used several tools in ArcGIS, GME, and R to carry out this analysis. Geospatial Modeling Environment (GME) is a standalone application that makes use of ArcGIS shape files and R software to carry out spatial and quantitative analyses. It was created to take advantage of R computing and replaces the ‘Animal Movement’ plugin many movement ecologists made use of in previous versions of ArcGIS. GME allows a lot of flexibility in tasks related to dealing with large data sets and common analyses performed on animal location data (KDE, MCP, movement paths). Because it can operate as a GUI or command driven operator using R, it is user friendly and allows for iterative processes, quick manipulations of data, and combinations of these process not easily duplicated in Arc.

 

Methods

Prior to evaluating influences from wolf activity, it was necessary to address potential effects from seasonal shifts in prey distribution or density related to ungulate migration patterns. Therefore, the first step for this analysis was to subset kill data into winter (1 Nov – 30 Apr) and summer (1 May – 31 Oct) seasons. I used the ‘select by attribute’ feature in ArcGIS to subset each kill data set (wolf, cougar (pre-wolf), and cougar (post-wolf)) into season-specific shapefiles.

 

Figure 1. ArcGIS ‘select by attribute’ call from attribute table to subset data into season.

Similar to the procedure for tutorial 1, I used GME to create density and isopleth probability contours for wolf kill sites and the ‘countpntsinpolys’ command to add columns with the number of cougar kills (pre/post) to each seasonal wolf kill isopleth polygon. Finally, I used ArcMap to visualize the data and make a figure showcasing the process.

 

Figure 2. GME ‘kde’ call in the command builder GUI.

 

Results

Visual examination of season-specific cougar kills relative to wolf kill activity centers added further evidence toward a spatial shift in the distribution of cougar kills between time periods with and without wolves (Figure 3). The magnitude of the repulsive effect appeared stronger in winter than summer, evident in the uniformly decreased proportions of cougar kills per probability contour of wolf activity, and increased proportion of kills outside any wolf activity between time periods with and without wolves (Table 1). The relationship appeared opposite in summer, with higher proportions of cougar kills observed in the post-wolf period than expected in all but core (25% probability surface) and outside (no wolf activity) areas (Table 2).

 

Figure 3. Visual comparison of seasonal wolf activity based on kill site density probability contours and the distribution of cougar kill sites across time periods with and without wolves.

Table 1. Winter wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.

 

Table 2. Summer wolf kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of cougar kill sites, and % Kills is the cumulative proportion of all cougar kills within each quartile class.

 

Method Critique

Accounting for seasonal shifts in kill density improved visual interpretation of spatial patterns and reduced a factor that might mask influences of wolf activity. However, the observed shift in winter could still be due to shifts in prey populations as a behaviorally mediated prey response to wolves (i.e. the shift is related to prey responding to wolves, and cougar indirectly shifting distribution to follow prey). The shift could also be related to differences in prey species-specific distributions, as cougar prey predominantly on deer, but show seasonal selection of neonate elk calves in the summer, and wolves prey predominately on elk (i.e. the abundance of newborn calves in the summer creates spatial tolerance between wolves and cougar). The comparison of pre- to post-wolf cougar kill sites relative to the probability of wolf kills may be misleading, since the relationship is based on pre-wolf cougar kill data overlapping a density feature that didn’t exist (i.e. no wolves on the landscape at the time those kills were made). However, the comparison does provide some insight as the pre-wolf cougar kill distribution still represents a ‘prior’ distribution of cougar kills, and overlapping wolf activity demonstrates the proportion of kills we could expect if there were no competition (spatial avoidance/repulsion) between the two species. Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution. The method was useful and provide evidence toward the presence of a season-specific shift in where cougars are killing prey (i.e. the answer to my ‘question asked’ is yes, in winter), but the regression analysis discussed above may provide additional support.

 

Tutorial 2

Filed under: 2017,Tutorial 2 @ 3:50 pm

For exercise 3, I finally left the world of Arc and took my data to R.

Question/Objective:

Test the radar backscatter between different-facing aspects and determine if aspect influences radar signal.

Tool used:

I used R to run the statistical analysis, but still needed to extract the data using Arc.

What I did:

I extracted the data from ArcGIS by creating 1000 random points througout my scene and running the “extract data to points” tool to pull out the backscatter and aspect at each point. Then I exported the table of point data to Excel to clean up any missing values that might throw off R. There were 4 values in total that were missing either the backscatter or aspect value. Then I saved this table as a .csv and brought it into R.

Once in R, the data needed a little bit of cleaning up. Since the default aspects from Arc range from -1 (flat) and 0 (north) to 360 (still north, all the way around the compass), using just the numeric value to classify the points would not work well. I used the following lines of code to change the aspects to factors:

NEWDATA$ASPECT <- cut(DATA$ASPECT,breaks=c(-1,0,22.5,67.5,112.5,157.5,202.5,247.5,292.5,337.5,360),labels = c(“FLAT”,”N”,”NE”,”E”,”SE”,”S”,”SW”,”W”,”NW”,”No”),include.lowest = TRUE)

levels(NEWDATA$ASPECT) <-c(“FLAT”,”N”,”NE”,”E”,”SE”,”S”,”SW”,”W”,”NW”,”N”)

The second line combines the two “north” aspects into one (i.e. 0-22.5 degrees and 337-360 degrees).

Results:

The results were promising. I expected to find no difference between aspects, and the scatterplots showed that, at least upon visual inspection, the aspects showed no difference between each other.

A boxplot also showed little difference in the average Gamma0 values (the backscatter).

Next step was to run a statistical analysis. Since the data were not normally distributed, I had to do something other than an ANOVA, so after some internet research I went with a Kruskal-Wallis test. This tests for difference in population distributions (i.e. if they are identical or not) when the distributions are not normal. The code is simple enough:

kruskal.test(GAMMA~ASPECT, data=DATA)

I also conducted an ANOVA on the log-transformed Gamma0 data, which appeared more normal than before. None of the results were significant from any test, meaning that the aspect does not have a statistically significant effect on the backscatter, which is the result I was hoping for.

Critique of the Method:

This method was useful for my dataset. I had a rather large dataset to work with in Arc, and didn’t find the statistical tools that I needed in Arc either. I am comfortable in R, and prefer it for larger datasets since it doesn’t have the bells and whistles that Arc does.

Tutorial 2: Investigate Spatial Autocorrelation (Mantel Test)

Filed under: 2017,Tutorial 2 @ 1:30 pm

For this Tutorial I revisited a principle that I learned about in the Spatio-Temporal Statistics (Variation) statistics course that I took with Dr. Jones last fall. Autocorrelation is an important component to address within ecological work, as well as geographic work. Plots that are closer to each other may have a greater probability of capturing the same species during fieldwork and it is important to know if my data are influenced by distance and spatial relationships. Examining auto-correlation within my data may also reveal information about spatial patterns related to patch size, or gradients at my field site (Salmon River Estuary).

Research Question:

My research question fits the framework “how is yi related to xi + h? I am investigating spatial autocorrelation for my field survey plots and species assemblages represented by those plots. I want to know if I can predict or expect the presence of a plant species in one plot, based on its proximity (distance) to another plot. The presence of spatial autocorrelation in my plots would imply some sort of dependency or relatedness between plots, i.e. the closer two plots are located to each other, the more likely they are to overlap in terms of species present. Spatial autocorrelation can provide context for interpreting scale of patterns and causal variables related to spatial vegetation patterns (likely elevation, salinity, land use history).

My Approach (Tools for Analysis):

PC-Ord was very helpful in structuring my data for the Mantel test. I was able to run the test several times with different combinations of data to detect patterns of auto-correlation. Again, I formatted my data in Excel and used the Excel sheets to produce matrices for the Mantel text. Each matrix pair had species data, and UTM coordinate data.

Steps to Complete Analysis:

I organized my data in excel, started a new project in PC-Ord, assigned species and location matrices, and ran a Mantel test (Group > Mantel test). I used Sorensen (ranked) distance measures for my percent cover data, and Euclidean distance for my coordinate data. I set the test to complete 999 randomized runs for a Monte Carlo comparison to test significance. I also used my species data to conduct an Indicator Analysis, to identify species that are more common, faithful, or exclusive to particular marsh that I surveyed. I directed PC-Ord to conduct an ISA, and grouped by data by tidal marsh surveyed. The ISA produced a number for each species surveyed that indicated the percentage of plots that species was found in, and whether or not it was significant for that species. From the indicator analysis, I found that certain upland marsh species are indicative of remnant sites, certain species are indicative of salty conditions, and other species are indicative of restored sites.

  

examples of matrix inputs from Excel to PC-Ord. The data were structured as matricies with sample units (plots) x species cover, or sample units x UTM coordinates.

Results and Outcomes:

I conducted a Mantel test on all of my data and found that none of my plots were spatially autocorrelated (Mantel’s r statistic Transect: r = 0.037797, p = 0.182182; MW plot: r = 0.027994, p = 0.164164, accept null hypothesis of no relationship). This is a little surprising, but may be indicative of noise within my dataset, and variation of species at this scale. It is possible that autocorrelation is not detectable at this scale, or perhaps I need a larger dataset with less proportional noise to sort out the autocorrelation signal. I was however able to detect spatial autocorrelation at the 1,000 square meter scale for the Modified Whittaker plots (r = 0.638224, p = 0.00010), suggesting that there may be more fine scale patichness, variation, or nestedness among plant species at each of the SRE tidal marshes. Salt may also be a confounding factor that drives spatial diversity of vegetation, in addition to dike removal, as salt is a limiting factor for some salt marsh plants; not all species are equally tolerant of it.

For my ISA test, I found that (1) Triglochin maritima, Sarcocornia perennis, and Distichlis spicata were significant indicators of salty, restored conditions, (2) Dechampsia caespitosa, Juncus arcticus var. littoralis, Potentilla pacifica, Glaux maritima, and Hordeum brachyantherum were significant indicators of upland, high elevation conditions, and (3) Carex lyngbyei was a significant indicator of restored conditions. I broke my dataset up and conducted a mantel test for each of the groups, using only plots that recorded the presence of at least one species in each of the groups. I did not find any significant autocorrelation either with any of the strong indicator species (that were all found in a high proportion of plots surveyed). I am curious if my plots were located closer to each other, and/or I had surveyed more plots over a larger area, spatial autocorrelation patterns would begin to emerge.

an example of PC-Ord output for a Mantel test. The R statistic implies the amount of correlation between species cover and distance, the p value implies the significance of the correlation/similarity in comparison to chance (computed by Monte Carlo randomizations).

I also produced a map in Arc, to visulalize the patterns of diversity I saw in my data. I coded each of my plot locations (for Transects and Modified Whittaker plots) by the dominant plant strategy type: Carex lyngbyei dominant (A plot that had at least 50% or more cover of Carex lyngbyei ), salt dominant (at least 50% or more cover of one or more salt tolerant species), or mid-high marsh dominant (at least 50% 0r more cover of a mid-high marsh species that is not Carex lyngbyei). I think the map helps to visualize spatial patterns of diversity well, on a broader scale, by grouping plots into different assemblies or types/guilds based on their content.

Critique of the method – what was useful, what was not?:

This method was manageable to set up and execute, however I feel that it was difficult to negotiate useful outcomes with my data. It is highly plausible that my data do not capture these patterns, however, it was somewhat difficult for PC-Ord to work with my data on this problem. I have many zeroes in my dataset, as not all species are represented in every single plot (I have a grid of plot number X species, with cells filled in for percent cover ranging from 0 – 100). I was not able to complete mantel tests with certain assemblage combinations, as PC-Ord could not compute a randomization outcome with the number of zeroes present (random shuffling of the dataset produced too many identical outputs because I have too many zeros). Ultimately, I think I would approach this problem again with a larger dataset over a greater extent of my study site.

 

May 22, 2017

Tutorial 2: Site Suitability

Filed under: Tutorial 2 @ 2:43 pm

Question/Objective:

For exercise 3, Dr. Jones allowed me to work on a big component of one of my thesis chapters that I needed done which was to determine site suitability for a field experiment. This wasn’t related to stats but required a lot of work in ArcMap. My questions are:

  • Are forest meadows reptile biodiversity hotspots?
  • Does aspect matter for reptiles in forests?

Forest meadows could be hotspots due to increased solar radiation, temperatures, heterogeneous conditions, increased productivity, and rocky soils. These questions are important because it could result in strong management implications if forest meadows are reptile habitat. Then managers would need to retain/restore meadows, retain connectivity among meadows, and identify ‘climate-smart’ meadows if aspect is related to diversity.

I performed site suitability/selection for this study based on the following criteria:

  • Need to identify 4 sites:
    • each site would have 4 treatments
      • (a North and South, open meadow and closed forest)
    • road access (< 300m from road)
    • sites at similar elevations
      • similar forest types
      • similar proximity to non-forest
      • each site has the same proximity to water
      • area would be of similar size
      • created by natural processes, not a clear-cut.

Tools:

I used many tools to determine site suitability for the field experiment:

Clip, Aspect, Extract by Attribute, Raster to Point, Near, Extract Values to Points, Reclassify, Select by Attribute

Methods/Walkthrough:

Map 1. Habitat types in Deschutes NF.

I started with a habitat type layer (Map 1) and DEM layer obtained from the Forest Service of Deschutes National Forest.

First, I needed a map showing aspect. To do this, I used the Aspect tool in the Spatial Analyst toolbox which used my DEM layer to create an aspect map of the area as shown in Map 2.

Once I had my aspect layer, I needed to somehow identify areas within the habitat layer that were meadows. I used the Extract by Attribute tool to select only habitat that was considered ‘meadow’ since this was what I needed. Once I had the habitat type layer to only show meadows I then used the tool Raster to Point to generate points within meadows to allow me to store future information as seen in Map 3.

From there, I inserted the roads layer, also obtained from the NFS. Roads were needed because my sites could only be <300 meters from the nearest road. Therefore, I used the Near tool to generate the distance from points (meadows) to nearest road. I also did this with my raster layer of waterbodies to make sure sites were in similar proximity to water, as that could be influencing reptile diversity.

Once the points (meadows) had information regarding proximity to roads and water, I then used the Extract Values to Points tool to obtain elevation data for each point within a meadow, because they have to be at similar elevations.

At this point, I had most of my information I needed regarding similarity between possible meadows. Next, I reclassified my aspect layer to only show north and south aspects (as seen in Map 4) because those were the only aspects I needed to make sure were present within each meadow. From there, I just opened the Attribute Table for my meadow points and used the Select by Attribute option to tell ArcMap to only show sites that were <300 m from road, and <1800 m elevation (see Map 5).  Once I narrowed down the possible sites, I manually looked at every possible meadow to see if north and south aspect was present within the open meadow and also within the bordering forest.

Results: All of this resulted in 4 suitable sites that I will use for one of my research projects, determining reptile diversity in meadows. My 4 suitable sites can be seen in Map 6.

Critique: The way I did this worked, but I feel like there may be an easier way. This method took me a few weeks, and a lot of frustrations working with many types of files. I had never done site suitability before, and this proved to be harder than it seemed.

Map 2. Aspect created with ArcMap.

 

 

Stats:

Are meadows significant biodiversity hotspots for reptiles compared to forests?

Tools/Methods: For practice I made artificial data that I would potentially obtain with this field experiment. I practiced using it in R-studio, because doing logistic regression in ArcMap isn’t possible, and most of my data will most likely have to be used with logistic regression.

Critique: For the data, I formatted the meadows as 1’s and the forest plots as 0’s. Therefore, my Y variable was binary while my X variable was species diversity, which was continuous. This method is useful for doing logistic regression and is pretty straight-forward and gives you results.

Results: My artificial data was a very small sample size so it makes sense that the p-value was not significant. Overall, the results matched what I suspected from looking at the artificial data I made.

y = f(x)

> glm(type~diversity)

Call:  glm(formula = type ~ diversity)

Coefficients:

(Intercept)    diversity

0.006173     0.098765

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual

Null Deviance:     2

Residual Deviance: 0.4198     AIC: 5.123

Map 3. Raster to Points

Map 4. Reclassified aspects. North and south only.

Map 5. Select by attributes. Meadows <300m from road and <1800m elevation.

Map 6. Results. Meadow sites being used for study.

Tutorial 2: Examining Hydrograph Relationships

Filed under: 2017,Tutorial 2 @ 2:12 pm

Question

How does upstream-downstream connectivity, drainage area, and position within the catchment affect the relationship between different stage gage sites?

Tool/Approach

I chose to look at the relationships between the hydrographs for four sites that have upstream-downstream connectivity but have different upstream drainage areas and are located on different tributaries. I used ArcGIS to obtain the drainage areas and Excel to analyze the hydrographs. The four sites are as follows: Site 26 is the site with the largest drainage area, on Mainstem Mill Creek. The “Mainstem” site is also located on the mainstem, with a slightly smaller drainage area. Site 36 and South Fork are located on different tributaries, Gunn Creek and South Fork Mill Creek respectively, but have approximately the same drainage area. This drainage area is significantly smaller than either Site 26 or Mainstem (Figure 1).

Methods

First I used NetMap’s Watershed Delineation tool to create upstream drainage area polygons for each of the four sites I was interested in, using the attribute table to obtain the total area of each polygon (Table 1).

Then I graphed the difference in stage heights over time between Site 26, the largest site, and each of the other three sites to examine patterns in the hydrographs, creating three “stage difference” graphs. I removed any periods of missing data (especially a two-week period in February at the Mainstem site). I also graphed the raw hydrograph data for comparison (Figure 2).

Finally I combined the three stage difference graphs into one plot for easy comparison.

Results

First I compared the hydrograph patterns of Mainstem (MS) and South Fork (SF) since they are on the same tributary. Both graphs showed little difference with Site 26 during baseflow conditions. The 26 v MS graph was relatively flat which makes sense since Site 26 and MS have more similar drainage area sizes and are expected to behave more similarly since the amount of water going through them is more similar and they are closer together geographically. However, the MS site still had some high peaks during the two highest floods which indicates other sources of water that are influencing the hydrograph. The 26 v SF graph had high positive peaks during floods which indicates that there is a large source of water going through Site 26 that is not being accounted for in the SF hydrograph, probably from the Mainstem site (Figure 3).

Then I compared the reactions of Mainstem and Site 36 since they were a similar distance from Site 26 but with different drainage areas. The 26 v. 36 graph looked very similar to the 26 vs. SF graph which suggests that drainage area has a greater influence on the similarity of hydrographs than distance (Figure 3).

Finally I compared Site 36 and South Fork since they have similar drainage areas but are on different tributaries that both drain into Site 26. The two graphs were very similar in shape which further enforced the fact that drainage area was a primary driving factor. The two tributaries seem to be functioning relatively similarly despite differences in geographic location. The Site 36 hydrograph was more similar to Site 26 during baseflow conditions which makes sense since they are closer together in terms of streamwise distance and therefore the floodwave changes less between the two points (Figure 3).

Discussion

When examining the patterns in these graphs, it is obvious that there is a portion of the hydrograph that is still missing. Most of the variability in peak flows is accounted for in the Mainstem hydrograph, but there are still minor peaks during the big floods. Little of the variability is accounted for in the Site 36 and South Fork hydrographs. Looking at the map, I hypothesize that North Fork Mill Creek, which is not instrumented, represents the differences in the hydrographs that are currently not being accounted for. It would be interesting to install a LevelLogger on this tributary in the future to see if it completes the picture.

This was an interesting analysis that allowed me to further explore my data and how my study basin reacts to precipitation events. It was useful in illustrating the influences of drainage area and stream connectivity on hydrographs in different parts of the basin, and pinpointing where there is still potentially missing data that could give a more complete picture of my watershed. It is still a pretty qualitative analysis in that I could not derive any real statistics from what is happening with my hydrographs, but it paints a clear picture of what’s going on and inspires further questions.

Tutorial 2: Automated Autocorrelation of Tsunami Inundation Time Series at Various Points of Interest

Filed under: Tutorial 2 @ 12:11 pm

Question

Tsunami inundate shorelines in a chaotic manner, resulting in varying degrees of flooding from location to location. The spatial variability of tsunami hazards has important implications for the tsunami-resistant design of structures and evacuation plans because different safety margins may be required to achieve consistent reliability levels. Currently, tsunami mitigation strategies are primarily informed prediction models due to the lack of nearshore field data from tsunami disasters. Thus, to investigate the spatial variability of tsunami hazards, we analyze the relationship between different test locations or points of interest (POIs) in our domain. The question is:

“How much does tsunami inundation vary spatially?”

Essentially saying if one observes a certain flooding depth at their location, could they assume that a similar flood level is occurring elsewhere? And how does this change with time? And how to certain coastal features affect this?

Tools

To answer this question, I examined the autocorrelation of the time series of inundation at various locations as well as the difference in inundation levels between these locations. In this case, 12 locations were selected and thus the process to analyze all of these locations was automated. All of this was implemented in R.

Methods

As before, the tsunami inundation data is stored in a large NetCDF file. Thus we use the same procedure as mentioned in Tutorial 1 to read the NetCDF file into R. This process requires the “ncdf4” package to be installed.

For this analysis, we begin by selecting the points of interest along the coast near Port Hueneme, California. These points were arbitrarily selected but were also ensured to be somewhat well distributed along the coastline and were within the domain of the dataset. Figure 1 shows the locations of the points of interest (POIs) along the coast of Port Hueneme, CA. A total of 12 points were selected and were all meant to represent some building or port location. Their names and geographical coordinates were stored in a .csv file as shown in Figure 2.

 

Figure 1 – POIs where time series of inundation were extracted along coast near Port Hueneme, CA

Figure 2 – CSV file storing locations and their geographic coordinates

The .csv file was read into R by using the “read.csv” function and this table was subsequently converted to a data frame as shown by the code in Figure 3. Next, the “which.min” function from the “stats” package was used in a for loop to find the indexes. Next a loop is used to extract all the different time series and plot them accordingly. Figure 4 shows the time series of inundation for each of the locations stored in the CSV file. Note that the first 3 locations are in the ocean and the rest are land locations.

The time series data was stored in a separate data frame and then the “acf” function from the “stats” package in R was run in loop form to obtain the autocorrelation values. THis was followed by using another loop with a nested loop to take the difference in inundation value between each of the locations and the autocorrelation function was applied to these as well. The code for these procedures is shown in Figure 3.

Figure 3 – R code used to read in .csv file and procedures to perform autocorrelation analysis

Figure 4 – Time series of inundation for each location shown in Figure 1

Results

Figure 5 shows the autocorrelation plot for each of the inundation time series shown in Figure 4. These plots show that there are indeed variations between the different locations. The autocorrelation plots also appear to vary more as the POIs get further away from one another.

Figure 5 – Autocorrelation plots of the inundation time series from Figure 4

Figure 6-8 show some examples of the autocorrelation plots of differences in inundation level between two locations. Figure 6 shows the autocorrelation of the difference in inundation level between Port Hueneme and all of the other locations. A placeholder image is shown in place of the difference between itself (which doesn’t exist). Figure 7 shows the same plot but for Del Monte Produce. And Figure 8 shows this for the naval base Navsea 1388.

There are 9 more of these plots that were created by this process, but in the interest of not inundating this report, only 3 are shown. Each of these figures shows rather interesting differences for the different locations.

Figure 6 – Autocorrelation plots between difference in inundation levels for all locations relative to Port Hueneme

Figure 7 – Autocorrelation plots between difference in inundation levels for all locations relative to Del Monte Fresh Produce

Figure 8 – Autocorrelation plots between difference in inundation levels for all locations relative to Navsea 1388

Critique of Method

The method provided a very convenient way of viewing the different locations and their associated autocorrelations with the touch of a button. In the end, POIs can be easily added or removed by modifying the CSV file and the source code should be able to adjust accordingly without any modification to the code. This feature has already been tested and shown to be quite robust for handling any number of locations with coordinates within the data domain. This shows that this procedure is efficient and effective. Overall, I was very pleased with the operational capability of this method.

Observing the autocorrelations for each of the  locations and for the inundation differences between the regions was quite interesting. From a glance, I could immediately tell that there were differences between the POIs. I think this information will indeed be quite useful, though my experience with interpreting these types of autocorrelation plots will require some additional thought.

May 21, 2017

Geographically Weighted Regression (GWR) in R

Filed under: 2017,handy links @ 10:18 pm

I found an interesting link about Geographically Weighted Regression (GWR) analysis in R. I think it might be useful.

https://rstudio-pubs-static.s3.amazonaws.com/44975_0342ec49f925426fa16ebcdc28210118.html

Geographically Weighted Regression Analysis of Tropical Forest

Filed under: 2017,Tutorial 2 @ 10:10 pm
  1. Question that you asked
  • Is crown width linearly related to height in space in the study area?
  1. Name of the tool or approach that you used.
  • I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the sum of the tree height and the elevation (ground height).
  • GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.

 

  1. Brief description of steps you followed to complete the analysis.
  • The first step is load the dataset into the ArcMap. As reminder, this dataset is the same dataset I used in exercise 2. The dataset is generated from UAV visual images. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAV visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, height, and canopy width. The result of this data processing is a data set in a table consisting of 804 detected trees with their height and crown width. Unfortunately, because the images were not georeferenced yet, the height in this tutorial is the sum of tree height and elevation.
  • The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 1 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but in this exercise I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width.

Figure 1. Geographically Weighed Regression Tool.

  • The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis.
  • To finish the analysis just click “Ok”. We can see the whole result by looking at the attribute table (see figure 2). ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 3).

Figure 2. Attribute table of Geographically Weighed Regression result.

   

Figure 3. Distribution of Residual Standard Deviation in study area.

 

  • However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 4). We also can change the number of classes and color. In my case, I choose three classes to distinguish features (trees) that have positive linear relationship, negative linear relationship, and trees that have coefficient (linear relationship) close to zero. The layer after adjustment can be seen in the figure 5.

Figure 4. Layer properties:Symbology.

 

   

Figure 5. Distribution of coefficient of the explanatory variable.

 

  1. Brief description of results you obtained.
  • The result can be seen in Figure 2, 3, 4, and 5. The attribute table shown in the figure 2 consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height is positively related. Which means there is increase in the Height for every one unit increase of Crown width. In other words, the bigger the crown the higher the tree will be.
  • However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between Crown width and Height (Ground height + Tree height), and some other trees have negative linear relationship. The distribution of trees that have positive and negative linear relationship can be seen in Figure 5. The red points indicate trees that have negative linear relationship between Crown width and Height, which means trees with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means trees with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients are close to 0.
  • Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high trunk, and some other have big crown and short trunk. In addition, different elevation can give significant effect because the Height data used in this case is the total of ground height (elevation) and tree height. Trees with positive linear relationship might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley). Trees with coefficient close to zero might be occurred because of the data that has not been georeferenced and the algorithm used in Geographically Weighted Regression analysis that included data from the neighbor features (trees). That can affect the value of coefficient in linear relationship.
  • In general, the R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height. I also can add additional explanatory variable like “tree species” to increase the accuracy of the linear model.
  1. Critique of the method – what was useful, what was not?
  • The method was really useful to generate regression model for all feature (trees). It helps to understand the distribution of the trees with different coefficient or other values, such as standard error and residual standard deviation because there is a new layer as an output that can show those values in space in the study area.
  • However, dependent and explanatory variables should be numeric fields containing a variety of values. Linear regression methods, like GWR, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0). In addition, the regression model will be misspecified if it is missing a key explanatory variable, for example in my case is elevation and tree species.

 

Source

http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/geographically-weighted-regression.htm

 

Tutorial 2: Calculate Disturbance Severity Patch Sizes with SDMtools in R

Filed under: Tutorial 2 @ 4:29 pm

Overview

Often patch dynamics are examined under the lens of single disturbance events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration of patches from an initial disturbance may influences the spatial configuration of patches of a sequential disturbance.  The broader research question is focused on determining if there is a relationship between patches from an initial disturbance of beetle outbreak and a sequential disturbance of wildfire. Thus far, I have conducted analysis to determine if there is any inherent patchiness for the burn severity raster and the beetle-outbreak raster.  Both have indicated some level of patchiness. I wanted to go back and characterize the patches for each disturbance to generate summary statistics about patches for each disturbance.  The Entiako burn perimeter delineated our landscape for both disturbance events. The Entiako fire burned in 2012 in central interior British Columbia. I asked the following questions:

What is the variability in patch size for burn severity classes and beetle severity class?

Methods

The burn severity and beetle severity raster generated from Landsat imagery using dNBR and NDMI pixel indexing approaches respectively. Initially, I planned to conduct the analysis in FRAGSTATS that assesses landscape spatial patterns.  The raster seemed unreadable in the commonly used GEOTIFF format.  I ended up conducting the analysis in R version 3.3.2 (Team 2016) with the SDMtools Package (VanDerWal 2015).   The SDMtools package is based on the work by McGarigal et al. (2012) FRAGSTATS software package and calculates many of the same metrics. The following spatial equation was considered as a guide yi = f(yi-h,i+h).

In order to calculate patch metrics, I converted the continuous raster into a classified raster.  The burn severity classification was based on break points in the dNBR values defined by Reilly et al. (2017), and the beetle severity classification was based on cross-validation with orthorectified aerial photographs.  Burn classes included a low (plus unburned), moderate, and high, and beetle class included none, low-moderate, and moderate-high. In the SDMtools package, I calculate patch metrics for each disturbance severity class, burn severity class, Figure 1 and beetle severity class Figure 2.  The following PDF includes step-by-step instructions with r-code for generating patch metrics:

ATaluccigeog566_exercise3_r_19May2017

 

Figure 1: Burn Severity Classes for Patch Metric calculations.

 

Figure 2: Beetle Severity Classes for Patch Metric Calculations

Results

 

In the SDMtools package, I ran two analyses, one on the whole landscape and one on each burn severity class. For the first analysis, I found that high burn severity was the smallest component for this fire event only accounting for about 15 percent of the landscape (see Table 1). This was interesting, because lodgepole pine is often characterized by ‘stand replacing’ high severity fire, however, moderate burn severity can also account for ‘stand replacing’ fire in lodgepole pine dominated landscapes, which was about 45 percent of the landscape (see Table 1). The landscape was dominated by low-moderate and moderate-high beetle severity with only a small portion unaffected by beetle outbreak (see Table 2).

 

Table 1: Burn severity class summary for patches, area, and proportion of landscape

Burn Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
Low 382 2767.14 0.394
Moderate 448 3179.79 0.452
High 422 1082.70 0.154

 

Table 2: Beetle severity class summary for patches, area, and proportion of landscape

Beetle Severity Class Number of Patches Total Area in Hectares Proportion of Landscape
None 985 780.93 0.111
Low-Moderate 656 2968.56 0.423
Moderate-High 709 3273.30 0.466

 

 

The second analysis examined the patch metrics for each disturbance severity class. Table 3 and 4 includes patch area metrics by burn severity and beetle severity respectively. While the output includes many other variables, I wanted to look at the variability in patch size for disturbance type and class. The high burn severity maximum and mean patch size are the most interesting part of Table 3, because they are quite different from the low and moderate burn severity.  Further statistical analysis is needed to determine if they numbers are statistically different.  The summary of patch size metrics for both disturbance types indicate that there are many small patches for all class types.

 

Table 3: Burn severity classes patch size summary in hectares

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
Low 382 0.09 1449.45 7.24 0.18 77.7
Moderate 448 0.09 1461.69 7.10 0.18 75.3
High 422 0.09 445.95 2.57 0.18 22.5

 

Table 4: Beetle severity classes patch size summary in hectares

Burn Class Number of patches Minimum Patch size Maximum Patch size Mean Patch Size Median Patch size SD
None 985 0.09 77.58 0.79 0.18 4.04
Low-Moderate 656 0.09 2380.86 4.53 0.18 93.07
Moderate-High 709 0.09 1461.60 4.62 0.18 57.68

Critique of the method – what was useful, what was not?

This analysis attempt was much more complicated than anticipate. The complications mostly came from the raster format.  The raster was formatted as floating point, which was not compatible with the stand alone Fragstats Software nor the patch grid extension in Arc GIS. The patch grid extension for Arc GIS is based on Fragstats and generates similar patch metrics (Rempel et al. 2012). I did not realize the issue was related to the  floating point raster until after I figured out how to run the SDMtools package in R (VanDerWal 2015). I did trouble shoot and figure out the raster issue, which lead to converting the raster from floating point to integer in R.  Additionally, a major challenge was figuring out how the patches are defined in SDMtools package. Fragstats and Patch Grid both offer an option to define the cell neighborhood rule as 4 or 8.  This option of defining the neighborhood cell rule was unclear to me in the SDMtools package. Based on a brief comparison of data outputs from Fragstats and SDMtools, I believe that patch identification tool in SDMtools classifies patches base on the 8 cell neighborhood rule. I think that calculating patch statistics in R is less cumbersome and allows for better documentation of methods. While the results of this analysis are interesting to consider, they do not address the overlap in disturbance events, which could be quite interesting.

 

 

References

McGarigal, K., A. Cushman, and E. Ene. 2012. FRAGSTATS v4: Spatial patterns analysis program for categorical and continuous maps. Univeristy of Massachusetts, Amherst, Massachusetts.

Reilly, M. J., C. J. Dunn, G. W. Meigs, T. A. Spies, R. E. Kennedy, J. D. Bailey, and K. Briggs. 2017. Contemporary patterns of fire extent and severity in forests of the Pacific Northwest, USA (1985-2010). Ecosphere 8:e01695.

Rempel, R. S., D. Kaukinen, and A. P. Carr. 2012. Patch Analyst and Patch Grid. Ontario Ministry of Natural Resources. Center for Northern Forest Ecosystem Research, Thunder Bay, Ontario.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

VanDerWal, J. 2015. Species Distribution Modelling Tools: Tools for processing data associated with species distribution modeling exercises.

 

May 8, 2017

Tutorial 1: Nitrate Concentration Differences: Between Sites and Over Time

Filed under: 2017,Tutorial 1 @ 10:26 am

Research Question: 

Do the differences in nitrate concentrations indicate a pattern between site locations, and do they show a pattern over time.

Tools and Approach: 

I performed the analysis largely in excel, though ARC GIS mapping software was useful in spatial proximity analysis. I was able to produce a chart and representative scatter plot of the nitrate concentration differences over time between various sites. The sites are naturally grouped/divided into three groups based on spatial location. The first is the Rock Creek basin (and the example used in this tutorial) additionally I analyzed Mary’s Peak streams and finally the Coastal streams.

Analysis:

First, I sorted my concentration data that I had produced for exercise one (Table 1). Table 1 represents total nitrate concentrations in ppm for all sites. Next I subtracted the concentration data between streams within varying proximity (Table 2). The second table represents the differences in concentration values, a negative value correlates with a higher nitrate concentration in the stream that was being subtracted. This analysis allowed me t
o find concentration of the nitrate data between points (stream sites) and over time.

Results

Rock Creek Basin:I found higher differences (indicating higher nitrate concentrations) in the mainstem rivers than the tributaries. Additionally, I saw higher concentrations of nitrate in the spring samples as compared to the winter samples. I     expected higher values for nitrate in the mainstem, such as rock creek due to the concentration of water coming from upstream. Naturally the smaller tributaries would exhibit lower concentrations than the larger order stream that catches the two. I also expected nitrate differences and concentrations to decrease in the spring time. If the inputs of nitrogen to the system remain relatively stable, then the uptake length of nitrate (the most sought after in stream nutrient cycling) would shorten and cause the concentrations of nitrate sampled to decrease.

 

 

 

 

 

 

 

 

Critique of Method:

This method was useful as it made me think about my data set in a different way. Using the differences in concentrations over time, I was able to think about two variables (Y) difference in concentration, and stream site location to assess why each site exhibited the data it did. I was able to determine that upstream site locations typically have lower concentrations of nitrate then the downstream sites. I also was able to think about how nitrate is available as the seasons change from winter to spring. At first I wasn’t able to understand why nitrate concentrations could decrease in the winter months, but a combination
of lower flows and increase macroinvertebrate activity are potential players in a decrease in concentration.

 

May 7, 2017

Tutorial 1: Visual data analysis

Filed under: Tutorial 1 @ 11:30 pm

Reva Gillman

Geog 566

  1. Research question: What is the geographic distribution of the native regions of the Japanese Tsunami Marine Debris (JTMD) species? Which regions are the most prevalent for JTMD species to be native to, and which are least prevalent?
  2. Approach that I used: visually represent native region frequency of JTMD species within 12 different coastal geographic regions of the world.
  3. I was able to get a shape file of the Marine Ecoregions of the World (MEOW), which are geographic coastal regions that match with my JTMD species data, into ArcMap, and categorize the 12 realms with different colors. After that was complete, I made an excel sheet of the species totals per region, which had a realm column of the same attribute of the shape file. I was then able to do a ‘join’ in ArcMap, to align the species data with the geographic shape file. Then I played around with data intervals for the species sum categorizations for the legend, to arrive with the following map of the JTMD native regions:

realms (1)

 

Marine Ecoregions of the World (MEOW)

  1. Results: The most prevalent region that JTMD species are native to is the Temperate Northern Pacific. This seems obvious, as JTMD originated from Japan, so we expect most species to be native to that same region of the globe. Next most prevalent native region for JTMD species is Eastern Indo-Pacific, the region southeast of Japan. However, after that the native regions that are prevalent for JTMD species begin to span the globe: Temperate Northern Atlantic, and Tropical Eastern Pacific. At the other end is the least prevalent region: the Southern Ocean, only one JTMD species is native to this cold southern region.
  2. Critique of the method – what was useful, what was not? This is a largely visual interpretation of the data, without using statistical analyses of the data. However, it is very useful to be able to visualize the native regions spatially on a map of the geographic regions, and be able to see the different species sum categories according to the legend, to determine high and low frequencies of occurrence. It is a good start, and good beginning before I further analyze other aspects of the JTMD species spatial dimensions.

 

Spatial autocorrelation in NDVI for agricultural fields vs the average biomass of that field

Filed under: Tutorial 1 @ 4:29 pm

 

  1. Question that you asked…

Looking at the spatial autocorrelation of NDVI within a field and the average wet biomass of that field, is there a correlation? When the NDVI distribution is patchy, is the biomass lower? Or maybe higher? Or is there no relation at all.

  1. Name of the tool or approach that you used…

To assess the patchiness of the NDVI I used a variogram. The variogram shows the characteristics of spatial autocorrelation. There are three parameters of interest that can be read from a variogram graph; a) the range, which gives an indication of the lag in the autocorrelation. b) the sill, which represents the variance of the variable at the range lag. And, c) the nugget. This should be zero, but due to errors in the data this can vary slightly. The variograms were used for two analysis. First, we visually compare multiple variograms from different fields categorized for biomass. And second, we analytically compare the variograms with scatterplots between i) range and biomass, ii) sill and biomass, and iii) range and sill with biomass color scheme. For the scatter plots the correlation coefficients are determined with Pearson’s R and a p-value.

 

  1. Brief description of steps you followed to complete the analysis…

The variograms are created in R with the ‘usdm’ package.

Step1. Create a list of all the geotiff files in a folder

Step2. Create for every file a variogram.

var<- Variogram(raster,size=50)

Make sure to use a capital V in Variogram.

Step3. Plot the variograms with points and lines

Plot(var)                              # points

Plot(var,type=’l’)               # lines

Step4. Plot all the variograms on one graph

Par(new=TRUE)                 # this line makes sure all graphs are in the same figure

 

Step5. Include legend with field number and average biomass

Step6. For every variogram estimate the range and the sill and put it in an excel sheet.

Step7. Create scatterplots for i) range-biomass, ii) sill-biomass, and iii) range-sill color coded for biomass. Over here I switched to Matlab, because I’m more familiar with this. But R could do the trick as well.

 

Step8. Calculate Pearson’s correlation coefficient and p-value.

[R,p]=corrcoef(range,biomass)                   % repeat this for the other scatter plots.

  1. Brief description of results you obtained…

In the variogram plot, Figure 1, we can see that there is a high inter field variability in spatial auto correlation for NDVI. It is difficult to tell from the graph if there is a correlation between biomass and variogram type. Also, there is a difference in crop type between the field, which has a considerable influence on the biomass. For further analysis, a distinction between crop types should be made.

 

 

Figure 1: Variograms for the different fields with average biomass

 

Also in the scatter plots the relation between biomass and the variogram parameters in not apparent. The Pearson’s correlation coefficients are very low, between 0.049 and 0.158 with significant p-values (for a 5% level).

Figure 2: Scatter plots for Range vs Biomass and Sill vs Biomass

Figure 3: Scatter plot for Range vs Sill with biomass color coded

 

  1. Critique of the method – what was useful, what was not?…

This method does not show a correlation between variogram parameters and average biomass for the fields, however it is possible that a distinction in crop type would improve the results. The biggest gap in this method is the subjective estimation of the range and sill values for the variograms. In the future, a variogram can be fitted and the range and sill parameter can be automatically generated. However, also for this variogram fitting decisions need to be made, such as fitting type, which could possibly introduce subjectivity.

May 6, 2017

Forest Structure Assessment Using Variogram in R

Filed under: 2017,Tutorial 1 @ 8:16 am

Question and tool

The questions that I try to answer in tutorial 1 is how the height and crown width vary in space in the study area, whether those variables are autocorrelated. I use variogram analysis in RStudio to answer these questions.

The variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993). The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. Most variograms are defined through several parameters; namely, the nugget effect, sill, and range.

Figure 1. The example of variogram graph

Figure 1. The example of variogram graph

  • Nugget effect represents micro-scale variation or measurement error. It is estimated from the empirical variogram as the value of y(h) for h = 0.
  • Sill representing the variance of the random field.
  • Range is the distance (if any) at which data are no longer autocorrelated.

Data Analysis and Result

The data used in this tutorial is generated from UAV visual images. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAV visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, height, and canopy width. The result of this data processing is a data set in a table consisting of 804 detected trees with their height and crown width. Unfortunately, because the images were not georeferenced yet, the height in this tutorial is the sum of tree height and elevation.

Figure 2. Detected trees by Fusion software.

In Rstudio, I use the R codes that are stated below to do the variogram analysis. First, I need to install the “gstat” package and load the package using the following R codes:

> install.packages(“gstat”)

> library(gstat)

Next, to load my data into the RStudio, I have to set my working directory. It can be done through the “session” toolbar. The first variogram analysis is for crown width and space (x and y). The R codes used for this first variogram analysis can be seen below. Based on the graph, in general,, crown width is not autocorrelated although there are some patches with different nugget, sill, and range. In the first part of the graph, there is patch with 25 range (distance) which is relatively larger than the other range. I think it can be happened due to different size of crown width in this particular study area. It seems like there are some trees with big crown and some other with smaller crown that are clustered.

> Treelist1<-read.table(“D:/Oregon State University/Term/Spring 2017/GEOG 566/Exercises/Exercise 2/treelist1.csv”, sep=”,”, header=T)

> g1 <- gstat(id = “Max.crown.width”, formula = Max.crown.width~1, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g1))

Figure 3. Variogram graph of crown width

The next variogram analysis is for height and location (x and y). The R codes and graph are shown below. From the graph, it can be seen that the semivariance is incremental with the increase of range (distance). It means that the height is autocorrelated. I think that can happen because of the height data which is the sum of tree height and elevation. Since the elevation is included in the height data, the elevation or the topography of the study area will influence the variance and the variogram analysis.

> g2 <- gstat(id = “Height”, formula = Height~1, locations = ~X+Y, +            data = Treelist1)> plot(variogram(g2))

> plot(variogram(g2))

Figure 4. Variogram graph of height.

My next variogram analysis is the cross variogram analysis. The R codes and graph are given below. It appears that the semivariance of height and crown width is negatively related, which means there is decrease in semivariance when the distance is increased. Cross-variogram values can increase or decrease with distance h depending on the correlation between variable A and variable B. Semi-variogram values, on the other hand, are by definition positive. One thing that is confusing is the negative value of semivariance. In my understanding, the semivariance should have positive value. Maybe there is mistake in my R codes that causes this confusing result.

> g3 <- gstat(g1, id = “Height”, formula = Height~1, locations = ~X+Y, +            data = Treelist1)

> plot(variogram(g3))

Figure 5. Cross variogram graph of crown width and height.

Critique of the method

As explained in the data analysis and result, variogram analysis is useful to assess how the forest structure parameters, such as crown width and height vary in space in the study area, and whether they are autocorrelated or not. In addition, for me, it is faster to do variogram in R using “gstat” package especially when you have data set in table format. I tried using my raster data for variogram analysis in R, but R crashed and there was not enough memory. Therefore, I prefer to use “gstat” and data set in table for variogram analysis. However, variogram analysis in R has limitation in term of visual presentation. It is quite hard to identify which part of the study area that has higher variability (larger patch) since the result is in graph format and not in map.

Sources

http://faculty.washington.edu/edford/Variogram.pdf

https://en.wikipedia.org/wiki/Variogram

May 5, 2017

Tutorial 1: Use of the Optimized Hot Spot Analysis for explaining patterns of Socio-demographic and spatial variables of survey responses regarding flood-safety in neighborhoods

Filed under: Tutorial 1 @ 11:52 pm
  • 1. Question Asked

    How socio-demographic and spatial variables explain patterns of survey responses about perceptions and attitudes regarding flood-safety in neighborhoods?

    Location of survey responses and variable values (variable: “perception of flooding reaching your home”) are shown in Figure 1.1.

    Figure 1.1. Location of survey responses and variable values (variable: “perception of flooding reaching your home”)

    2. Name of the tool or approach used

    Various tools are used to accomplish the objective of setting up inputs for the optimized hot spot analysis. The motivation of optimizing this analysis is because I was not able to identify any clear spatial pattern (I was getting trivial results) on my initial hot spot analysis without optimizing it.

Four tools are used to accomplish our objective:

  • For finding out the “Analysis Field” for optimizing calculations
    • Generate Near table in Arc Map

     

  • For missing data:
    • Replace all missing values in numerical variables with the mean using R or/and
    • Replace all missing values in categorical variables with the median using R

     

  • For the spatial analysis I am using:
    • Optimized Hot Spot Analysis in ArcMap

     

  • For creating a prediction surface map using results from the Hot Spot Analysis:
    • Kriging in ArcMap

3. A brief description of steps followed to complete the analysis

I followed the next steps:

  • For finding out an “Analysis Field” for optimizing calculations

Find distances from neighborhood lot properties to near streams

  • Use Near features tool in ArcMap. For “Input Feature” use the lot property shape file. For “Near Features” use the streams shape file.

  • In the output table, Identify “IN_FID”, which is the ID of the input feature(lot property ID)

  • Join the output table to the input features table based on “OBJECTID” of the input feature and “IN_FID” of the output table.

  • Now you have the distances in the “Input Feature” table. Next, export the table and copy variables of interest as one more value of the Survey dataset.
  • For missing data:

I have used a very simple concept: Replace all missing values in numerical variables with the mean and/or, replace all missing values in categorical variables with the median. The task for analyzing 108 variables was coded in R. The coding steps, showing only for some variables, is as follows:

  • For the spatial analysis:

Now we are ready to perform the hot spot analysis using the “Optimized Hot Spot Analysis” tool of ArcMap. As you can see in the figure below, there is an “Analysis Field (optional)” field, which in our case is quite useful.

For my problem:

Input Features: categorical survey responses with its corresponding location.

Analysis field (optional): corresponds to the properties distance to the water streams. This variable was calculated in 3.1. This is a continuous variable.

  • For creating a prediction surface map:

For this purpose, I have used results from the Hot Spot Analysis in the “Kriging” tool of ArcMap.

For my problem:

Input point features: Feature results from the Hot Spot Analysis.

Z value field: corresponds to the “GiZ scores” results from the Hot Spot Analysis.

4. Brief description of obtained result

The following results correspond to one variable of the survey responses (the survey contains 104 categorical variables to be analyzed).

Three scenarios of spatial correlation have been analyzed.

 Scenario 1: Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of all streams (Willamette River, Marys River, Millrace) surrounding near the neighborhood. The results are shown in Figure 4.1

Figure 4.1. Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of all streams (Willamette River, Marys River, Millrace) surrounding near the neighborhood.

The hot spot analysis yields patterns formation correlated to the distance of all streams (Willamette River, Marys River, Millrace) surrounding near the neighborhood.

Scenario 2: Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of two major rivers (Willamette River and Marys River) surrounding near the neighborhood. The results are shown in Figure 4.2

Figure 4.2. Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of two major rivers (Willamette River and Marys River) surrounding near the neighborhood.

The hot spot analysis yields patterns formation correlated to the distance of two major rivers (Willamette River and Marys River) surrounding near the neighborhood.

Scenario 3: Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of a seasonal stream (Millrace) crossing the neighborhood. The results are shown in Figure 4.3

Figure 4.3. Patterns of resident’s perceptions of flooding reaching their home correlated to the distance of a seasonal stream (Millrace) crossing the neighborhood.

The hot spot analysis yields patterns formation correlated to the distance of a seasonal stream (Millrace) crossing the neighborhood.

5. Critique of the method

  • Generate Near table:

This method is useful for finding the shortest distance between two features. This information was quite useful for my analysis.

  • Replace missing data:

Missing data was one of the issues I encountered analyzing my data. I have first tried to remove missing values using the “VIM” package in R, which uses multiple imputation methodologies, but this method didn’t work out in my case. I was getting messages of problems trying to invert matrices. This problem can be attributed to the small range of values of the categorical variables vectors. So, I have used a very simple concept: Replace all missing values in numerical variables with the mean and/or, replace all missing values in categorical variables with the median (IBM Knowledge, Center  n.d.). This helped me a lot.

  • Optimized Hot Spot Analysis:

For my problem, found more useful the “Optimized Hot spot Analysis” tool rather than the “Hot Spot Analysis (Getis-Ord Gi*)” tool because the “Analysis field” allowed me to find and optimize clusters formation in my data.

  • Kriging:

This ArcMap tool allowed me mapping cluster formation based on the hot spot analysis outputs. This tool allow better visualization of spatial patches.

References

Getis, A., & Ord, J. K. (2010). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x

IBM Knowledge Center – Estimation Methods for Replacing Missing Values. (n.d.). Retrieved May 8, 2017, from https://www.ibm.com/support/knowledgecenter/en/SSLVMB_20.0.0/com.ibm.spss.statistics.help/replace_missing_values_estimation_methods.htm

 

Tutorial 1: Calculation, Graphing, and Analysis of Correlation Coefficients in Excel

Filed under: 2017,Tutorial 1 @ 2:27 pm

Question: How well correlated are stage gage data at different locations within a single basin, based on the amount of distance between them?

Approach: To explore this question, I used the Correlation function in Excel to do a pairwise correlation analysis of my six sets of stage gage data, and then correlated the results with the distance between each pair of instruments, measured in ArcGIS.

Methods

Calculate correlation coefficients in Excel

  1. Import stage gage arrays into Excel with one column for each site. Measurements arrays won’t exactly match so make sure that the first value in each array was recorded within 5-10 minutes of the other first values.
  2. Create pairwise correlation table – blank out one half to prevent redundancy in calculations. Correlating a site with itself will always result in a correlation coefficient of 1 so blank out the diagonal as well.
  3. Calculate correlation coefficient for each pair of sites using CORREL function in Excel: =CORREL(array1,array2) in the corresponding cell (Table 1).

Table 1. Pairwise correlation table filled in with example values.

Calculate distance between sites in ArcGIS

  1. Import stream layer and site points into ArcGIS (Figure 1).

Figure 1. Map with imported stream layer (blue line) and site coordinates (yellow and green points). Stage gage locations are indicated with red stars.

  1. Site locations will probably not all be exactly on top of stream polyline. Snap site points to stream polyline using Editor Toolbar.
    1. Make sure Snapping is on in the Snapping Toolbar.
    2. Right click on site point layer and choose “Edit Features”, then “Start Editing”.
    3. Use the Edit tool to select the site you want to snap.
    4. Drag the point perpendicular (the shortest distance) from its current location toward the stream until it snaps to it.
    5. Repeat for all other sites that are not already snapped to streamline.
    6. “Save Edits” and “Stop Editing”.
  1. Split stream polyline at intersections with site points for easier measuring.
    1. Start Edit session.
    2. Use Split Tool to click at each intersection of a site point and a stream polyline. This will split the line in two at the place where you click.
    3. “Save Edits” and “Stop Editing”.
  2. Create another pairwise table in Excel to input distance data.
  3. Calculate distance between points.
    1. Start an Edit session.
    2. Highlight all portions of stream between one pair of points using Edit tool. Hold down shift key to select multiple portions at once (Figure 2).

Figure 2. Example selection of stream polyline sections between two stage gage sites. Selected lines are in light blue.

  1. Open stream polyline attribute table and right click on “Length.”
  2. Use Statistics function to calculate total length (Sum) (Figure 3).

Figure 3. Attribute table with selected stream polyline sections and statistics summary.

  1. Input total length value into Excel table.
  2. Repeat step 5 for all pairs of points.
    1. Use “Clear Selection” to clear previously selected points before starting process again.

Make correlation graphs

  1. Make x-column with pairwise distances and y-column with corresponding correlation coefficients. Include a third column with the pair for clarity (Table 2).

Table 2. Distance and correlation table.

  1. Graph x vs y and examine the relationship.

Figure 4. Plot of relationship between streamwise distance and correlation coefficient of all possible pairs of six stage gage sites.

Results: When I started this analysis, I expected to see a clear negative relationship between distance between instruments and correlation coefficient. I did not obtain this result when I performed the analysis, however. The plot that I made showed a slight negative relationship between distance and correlation coefficient. Upon further reflection, this relationship actually makes sense for the basin and is actually the result that I hoped to see. This relationship shows that regardless of location in the basin, the water levels are responding relatively uniformly, with only a slight delay between sites that are spaced further apart.

The points representing the Site 9 vs Site 36 and Site 9 vs Mainstem correlation coefficients could be considered outliers with their unusually low values. This could be because the two sites are located on different tributaries and are two of the most widely spaced apart pairs.

Utility: This method was useful for illustrating the relationships between different stage gages within my study basin. The instruments are located on different tributaries so the relationships are slightly more complicated than I expected but I think I will still be able to apply what I have learned from this analysis to extrapolate to other parts of the basin where I do not have stage data.

Tutorial 1: Raster auto correlation in R

Filed under: Tutorial 1 @ 1:47 pm

A Quick Primer

My spatial question is how the radar backscatter (measured by the Sentinel-1 satellite) relates to features on the ground. The backscatter is influenced by slope, aspect, forest characteristics (tree height, density, moisture content etc.)

The question I was asking for the first excercise was whether or not my backscatter was autocorrelating. If it is, then I will need to reconvene on how to approach my question (spoiler alert: the data are not autocorrelated).

I tried a few approaches, but the one that worked best was using R code and the raster tool in R (not a default tool but easy enough to download). However, before I could do that I had to reduce my dataset to something that was manageable.

The variogram is a useful tool to measure autocorrelation. The process boils down to this:

The value of interest at each pixel (or point) is compared to the values of interest at every other pixel and the similarity between the values is examined spatially. If there is autocorrelation in the data, then pixel values will be more similar to each other in a spatially coherent way (another way of putting it: the values of a given pixel can be predicted by the values of pixels that are nearby). Spatially clumped data are an example of autocorrelated data, whereas uniform or random data typically do not display autocorrelation.

The backscatter data collected by the satellite should not display autocorrelation, unless the underlying features were spatially autocorrelated. Since I was looking at data for a hilly, forested area, it is expected that the radar data should not be autocorrelated.

Getting to the Good Stuff: Methods

First, I had to import my data into ArcMap. This was after pre-processing the radar data into a spatially and radiometrically corrected raster with pixel dimensions of 10 m x 10 m. I attempted to run a variogram on this raster in Arc, but it crashed. So I attempted to run a variogram on just a subset of this raster, and it crashed again. Finally, I developed a variogram within Arc on just a subset of my data. Unfortunately, there were still so many points that the variogram looked more like the shower scene from Psycho.

On to R, where I brought in my original raster to run a variogram. This crashed the whole computer this time, so I decided a small subset would be better to work with. Using the same subset as before, I brought in the GeoTiff file into R (using the ‘raster’ package), making sure that the data was not corrupted (previously I had tried to export the file from Arc, but it gave each pixel an 8-bit RGB value rather than the float decimal value of backscatter).

Above, the raster as it appeared in R. The extent of the raster was much larger than the raster itself, which I attribute to the export procedures in ArcMap. I made a histogram in R to view the values of the backscatter raster (values were in digital number dNBR):

Using the ‘usdm’ package in R, creating and viewing a variogram was as simple as:

var2 = Variogram(Gamma0)
plot(var2)

The resulting plot showed little correlation beyond the immediate lag distance, and a rapid leveling off.

This corresponds to the data being relatively non-correlated, which is precisely what I was hoping for.

Critique

ArcMap is wholly unfit for processing raster autocorrelation beyond a certain resolution. I’m not sure what that resolution is, but the data I was working with was too much for Arc to handle. It was even too much for R to handle, which is why I had to use a subset of my data and hope that it was a representative sample of the data at large. In comparing other projects within the class, it seems like autocorrelation within Arc using variograms works when you have limited data; but when the data is composed of 100,000 individual points (or pixels) it is far too much for ArcMap. I would recommend anyone using high resolution data to stay away from the variogram, and use a different autocorrelation measure. This is probably due to the process that variograms use (i.e. comparing every point to every other point) that is susceptible to a rapid inflation of required computational time.

May 4, 2017

Tutorial 1: Using a Geographically Weighted Regression to assess the spatial relationship between blue whales and chlorophyll-a

Filed under: 2017,Tutorial 1 @ 4:41 pm

Research Question

The goal of my spatial problem is to assess the relationship between blue whales in the South Taranaki Bight region of New Zealand and the environmental factors which define their habitat and influence their spatial distribution. Chlorophyll-a (chl-a) concentration is an indicator of productivity in the marine environment. Chl-a can be remotely-sensed, and the concentration reflects the abundance of phytoplankton, the tiny photosynthesizing organisms which form the base of the marine food web. Blue whales do not directly feed on phytoplankton. However, krill, which feed on aggregations of phytoplankton, are the main prey type for blue whales. Remotely-sensed chl-a can therefore be used as a proxy for the location of productive waters where I might expect to find blue whales. For this exercise, I asked the following question:

“What is the spatial relationship between blue whale group size and chlorophyll-a concentration during the 2017 survey?”

Approach

I downloaded a chl-a concentration raster layer and used the “extract values to points” tool in Arc in order to obtain a chl-a value for each blue whale group size value. I then used the geographically weighted regression (GWR) tool from Arc’s spatial statistics toolbox in order to investigate the spatial relationship between the whales and the concentration of chl-a.

Methods

The chl-a concentration layer was downloaded from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website. MODIS data can be accessed here, and available layers include numerous sources of remotely-sensed data besides chl-a concentration (Figure 1). Chl-a concentration values are averaged over a 4 km2 spatial area and a one-month time period. I downloaded the raster for February 2017, as our survey lasted for three weeks during the month of February.

Figure 1. NASA Moderate Resolution Imaging Spectrometer (MODIS) data sources, including chlorophyll-a concentration which is used in this tutorial.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from the chl-a concentration raster for each of the blue whale sighting data points. This resulted in a layer for blue whale sightings which contained the location of each sighting, the number of whales sighted at each location (group size), and the chl-a concentration for each location (Figure 2).

Figure 2. Blue whale sighting locations and group sizes overlaid with chlorophyll-a concentration.

The geographically weighted regression tool is found within the spatial statistics toolbox in Arc. The dependent variable I used in my analysis was blue whale group size, and the explanatory variable was chl-a concentration. I used a fixed kernel, and a bandwidth parameter of 20 km (Figure 3). Functionally, what this means is that the regression looks at point values that are within 20 km of one another, and then fits a linear relationship across the entire study area.

Figure 3. The geographically weighted regression tool in Arc’s spatial statistics toolbox.

Results

The results of the GWR are shown graphically in the figure 4. The GWR fits a linear equation to the data, and the values which are plotted spatially on the map are coded according to their deviation from their predicted values. Many of the points are > 0.5 standard deviations from their predicted values. One point, which appears in red in figure 4, is > 2.5 standard deviations above its predicted value, meaning that blue whale group size at that location was much higher than expected given the chl-a concentration at that same location.

Figure 4. Result of the geographically weighted regression (GWR). Color codes represent the deviation from the expected values for blue whale group size according to the chl-a concentration value at that location.

The attribute table from the GWR output layer shows all of the components of the equation fitted by the GWR, as well as the predicted values for the dependent variable according to the fitted linear equation (Figure 5). It appears that there are several points which are far from their predicted values according to the GWR, such as the row highlighted in figure 5.

Figure 5. The attribute table associated with the geographically weighted regression (GWR) output layer. The highlighted row is an example of where the observed value was dramatically different from the predicted value.

The local R2 values are < 0.04 for all values, demonstrating that the data do not fit the relationship fitted by the GWR very well at all. My interpretation of this result is that, across the spatial scale of my study area, the relationship between blue whale group size and chl-a concentration is not linear. This is apparent from looking at the raw data as presented in figure 2, as it appears that there are several whales in an area of high chl-a concentration and some large groups of whales that are in an area of low chl-a concentration.

Critique of the Method

Although the results showed that there was no linear relationship between blue whale group size and chl-a concentration across the spatial extent of my study area, this is a very useful result. The ecological relationship between blue whales and chl-a concentration is not direct—remotely-sensed chl-a concentration indicates locations of productivity, where phytoplankton are present in high abundance. These areas of high phytoplankton are presumed to drive the location of dense aggregations of krill, the prey source for blue whales. However, there is likely both spatial and temporal lag between the phytoplankton and the blue whales. The fact that the ecological relationship of the two variables investigated here is an intricate one is reflected by the fact that a simple linear relationship does not capture how the variables are related to one another in space. I look forward to exploring this relationship further, perhaps incorporating other environmental factors that contribute to the location of krill patches such as sea surface temperature.

Tutorial 1: Using Logistic Regression vs Ordinary Least Squares Regression.

Filed under: Tutorial 1 @ 3:04 pm

Question:

  • How is chytrid fungus presence/absence related to the distance from road or trail? I chose this question because humans are known to be one of the primary causes of pathogen transmission.
    • This question involves a relationship between presence or absence of chytrid fungus (dependent variable) and distance from road or trail (explanatory).

Tools:

  • I attempted to use Logistic Regression and Ordinary Least Squares for my y variable (chytrid presence/absence) related to an x variable (for this I used distance from road or trail) by using python script and R. Although OLS may not be the best test to run for my data, (logistic regression is shown to be the best based on Table 6.1 in the exercise because my x and y are both binary or categorical data) I wanted to try it anyways to see if there was any linear relationship between x and y. The OLS tool also outputs some cool visuals, figures, and plots. This could be useful for someone who has continuous data.

Methods:

  • To run OLS and Logistic regression, I had to prep my data. I wanted to answer the question, is the presence of Bd more likely near a road or trail (where humans are transporting the pathogen)? Therefore, I had to get distance from sample points to the nearest road. To do this, I first had to bring in multiple layers of roads and trails within Deschutes National Forest given to me by an employee. I used the “Merge” tool to bring all the layers together. My next step was to find the distance from the sample point to the nearest road or trail in my new “merged roads and trails layer”. I used the “Near” tool which generated a value representing the distance from the sample point to the nearest road or trail. Once I had that information, I ran the Ordinary Least Squares tool and logistic regression where I used Bd as my dependent variable, and distance from road as my explanatory variable. Below is my code and used for both OLS and logistic regression.

Results:

  • I am not certain whether OLS was useful or not, but I think not because my data was not continuous which violates one of the assumptions of the test. Although my data violates assumptions made by OLS, it backed up results by the logistic regression, showing a relationship between distance from road and Bd presence.However, while the logistic regression model stated a significant difference, OLS stated a non-significant difference.Logistic regression works better for my data and was useful because it showed a significant relationship between the two variables. Below are some figures produced by OLS, and above are the statistical results along with script from both OLS and logistic regression.

May 3, 2017

Tutorial 1: Cross-Correlation between tsunami inundation time series extracted from gridded netCDF data

Filed under: Tutorial 1 @ 12:44 pm

Question

Tsunami inundate shorelines in a chaotic manner, resulting in varying degrees of flooding from location to location. The spatial variability of tsunami hazards has important implications for the tsunami-resistant design of structures and evacuation plans because different safety margins may be required to achieve consistent reliability levels. Currently, tsunami mitigation strategies are primarily informed prediction models due to the lack of nearshore field data from tsunami disasters. Thus, to investigate the spatial variability of tsunami hazards, we analyze the relationship between different test locations in our domain. The question is:

“How much does tsunami inundation vary spatially?”

Essentially saying if one observes a certain flooding depth at their location, could they assume that a similar flood level is occurring elsewhere? And how does this change with time? And how to certain coastal features affect this?

Tools

To answer this question, I examined the cross correlation between time series of inundation extracted from simulated tsunami inundation data. All of the tools described in this tutorial were implemented in R.

Methods

My data was stored in large netCDF (.nc) files which had to be opened in R. This required the “ncdf4” package. The netCDF file was read into R by using the “ncopen” function and its contents were printed to the Console window in RStudio by using “print” as shown in figure 1. At this step we can use the “nvar_get” function to extract the time and geographic coordinate variables from “ncin” object which contains the data from the netCDF file.

Figure 1 – Reading and viewing contents of netCDF file containing tsunami simulation data.

After this, we want to extract the time series of inundation at particular geographical points specified by the user (me). In this case, I’m interested in the water level at the jetties for Channel Island Harbor (Figure 2) and Port Hueneme (Figure 3) and how correlated they are to one another. After determining the geographic coordinates for my points of interest, the closest points within the gridded data were identified by using a the minimization function “which.min(abs(desired_value – data))” in R as shown in Figure 4. Using the indices identified by the minimization procedure, the “ncvar_get” function is used obtain the particular time series. In this case, however, since the inundation data is very large, we only want to extract the desired subset and thus had to be careful with our “start” and “count” inputs in the “ncvar_get” function (see Figure 4).

Figure 2 – Geographic location of Channel Island Harbor Jetty

Figure 3 – Geographic location of Port Hueneme Jetty

Figure 4 – Code excerpt for extracting time series from user specified geographical points

After obtaining the time series, we plot to visualize in Figure 5.

Figure 5 – Time series of free surface variation at extracted jetty locations for Channel Island Harbor and Port Hueneme

Now that we have stored our inundation time series in arrays (ha.CHjetty and ha.PHjetty), we can now perform a cross correlation by using the “ccf” function from the “stats” package in R.

Results

The result from this procedure was essentially a list of correlation coefficients for specific time lag values. This information is best visualized as a plot as shown in Figure 6.

Figure 6 – Cross correlation plot between the inundation time series at the Channel Island Harbor and Port Hueneme Jetties.

Figure 6 shows that the inundation at the two different jetties are somewhat lagged from one another, but by only a short time. Additionally, there are significant differences between the observations with the maximum correlation coefficients to be ~0.6.

Critique of Method

Overall this method was quite effective and very useful for the specific analysis. From a data analysis standpoint, this setup can be effectively automated and can be used to extract the time series at any given set of locations to compare correlations. This methodology was quite simple to implement and produced simple and interpretable results.

Furthermore, this information can be useful from the bigger question standpoint in the sense that we get pertinent information regarding the inundation levels at this different locations. If we treat the tsunami and incoming waves, we can consider the lag effect as being how long it takes for the wave to get from one jetty to the other. In this case, only minutes separate the largest effects of the wave from getting from one jetty to another which suggests that actions should be taken simultaneously to mitigate any issues that might occur.

The other aspect to note is that there are significant differences between the inundation levels observed at these two locations. This suggests that the measures that take the magnitude of tsunami inundation into account should be determined individually at these two harbor/ports. This preliminary analysis highlights a need for more localized and site-specific tsunami mitigation measures.

May 1, 2017

Tutorial 1: Assessing distribution shifts in kernel density probability surfaces

Filed under: Tutorial 1 @ 1:48 pm

Overview: question & approach

Identifying spatial repulsion and/or shifts in distribution are often examined to identify interspecific competition effects. Therefore, I wanted to assess the distribution of cougar kill sites including the spatial clustering and frequency in pre- and post-wolf time periods focusing on identifying shifts. A simple solution to my presence only data (kill events) was to create a density feature. However, there are several elements in my data sets, besides the presence of wolves, which could produce a shift in kill density or distribution that I will need to account for including: 1) catch-per-unit effort discrepancies (i.e. larger sample sizes of cougars (and kill sites) in one data set), or 2) time effects from seasonal distribution shifts (i.e. prey migration patterns). Before getting into an investigation of causal mechanisms, a first step is to determine if there are differences in where cougar are killing prey between study time periods by asking:

Is the distribution of post-wolf cougar kill sites different than the distribution of pre-wolf cougar kill sites?

There are two ways to assess this question. For the purposes of this analysis the data would be “location, implicit” with the variable of interest (cougar kill sites post-wolf) having a covariate (cougar kill site density/distribution pre-wolf) measurable at each sampled location and the causal mechanism inferred from co-variation. Density measures for each kill site could be pulled from the density feature raster created, but would be most relatable on a prey species level since elk and mule deer behave differently at a landscape level. Alternatively, an isopleth (or contour line) probability surface created from the density features could be used to calculate the proportion of post-wolf cougar kill sites within specified probability features of the pre-wolf cougar kill site distribution.

 

My approach to this question was to relate the proportion of post-wolf cougar kill sites (points) to the pre-wolf cougar kill site density (raster) using the probability contours (polygon feature) of kill site density as a measure of distribution. I used several tools in ArcGIS and the Geospatial Modeling Environment (GME) to carry out this analysis. GME is a standalone application that makes use of ArcGIS shape files and R software to carry out spatial and quantitative analyses.

 

Figure 1. Geospatial Modeling Environment (GME) home page.

 

Methods

I had already created point shape files of cougar kill sites for each time period in ArcCatalog, and used the kde call in GME to calculate kernel density estimates, or utilization distributions, for cougar kill sites in each time period (exercise 1). For that analysis, I used the PLUGIN bandwidth and a 30-m resolution cell size. Multiple bandwidth estimators are available as well as input options for scaling, weighted fields, data subset and/or edge inflation.

Figure 2. GME kde call in the command builder GUI.

 

The next step for this analysis was to standardize the two cougar kill time period KDE raster data sets so the densities relative to catch (in this case kill events) per sample (cougar) were comparable. This was necessary because the sample of kill sites (and sample of cougar) in the pre-wolf time period was higher (45-48%) than the post-wolf sample of kill sites (and cougar). I used the raster calculator in ArcGIS to divide each period kill site KDE raster by that periods’ respective kills/per cougar ‘effort’.

Figure 3. Density raster data standardization using ArcGIS raster calculator.

 

Figure 4. Raw code for analysis. GME runs like an R GUI, therefore code can be written in any text editor and past into the command window.

 

Using the new density raster, I then used the ‘isopleth’ command in GME to create a polygon probability surface. The resulting polygons represent quantiles of interest (programmable) expressed as the proportion of the density surface volume. I specified isopleths for the 25th, 50th, 75th, 95th, and 99th quartiles. I also used the ‘addarea’ command in GME to calculate and add information to the isopleth shapefiles for each polygons’ area and perimeter. Next, I used the ‘countpntsinpolys’ command to add a column with the count of post-wolf cougar kills in each pre-wolf cougar kill isopleth polygon. In ArcGIS, I created another column in the attribute table and used the ‘Field Calculator’ to fill each field with the proportion of total kills. Finally, I used ArcMap to visualize the data and make a figure showcasing the process.

 

Figure 5. GME command text box where you can enter code directly, copy from the command builder, or paste code constructed in another editor. Because GME uses R you can program iterative analysis for loops to batch process large data sets.

 

Results

Visual examination of the standardized kill density rasters demonstrated that part of the distributional shift observed was related to catch-per-unit effort influences, but a shift between the two highest density areas of kills was still evident from pre- to post-wolf time periods (Figure 6). This suggests other variables, like time effects from seasonal prey distribution changes or the presence of wolves, could also be factors influencing the distribution of kill density.

Figure 6. Comparison of pre- and post-wolf cougar kill site kernel density estimate (KDE) rasters before and after standardization for catch-per-unit effort (kills/cougar), and with 25th, 50th, 75th, 95th, and 99th isopleth probability contours. The isopleth contours for pre-wolf cougar kill site distribution is also fit with post-wolf cougar kill point locations to demonstrate posterior distributional shift.

 

The observed shift was further evident from the proportional changes in post-wolf cougar kills within the pre-wolf cougar kill probability surface. For example, only 28.5% of all post-wolf cougar kills were within the 50% pre-wolf cougar kill probability contour. Even if I exclude the kills outside the study area boundary the proportion of kills in each probability contour were 5-22% lower than would be expected based on the pre-wolf kill site distribution.

 

Table 1. Pre-wolf cougar kill site probability contour attribute table. Isopleths represent the 25th, 50th, 75th, 95th, and 99th quartiles. ‘Out’ refers to areas outside the probability contour surface. Number of kills (No. Kills) is the number of post-wolf cougar kill sites, % is the proportion of all kills within each polygon ‘donut’, and % Kills is the cumulative proportion of all post-wolf cougar kills within each quartile class.

Method Critique

Standardization of the two kill density rasters improved visual interpretation of the spatial patterns and accounted for one of the other factors (catch-per-unit effort) that might mask distributional influences related to wolf presence. However, similar to the visual examination of the non-standardized density raster this method allowed for only an implicit understanding of space-time concepts and lacked inferential measures of significance to quantify the shifts and formally relate the patterns of cougar predation across time. Further, because the isopleth contours represent quantiles expressed as the proportion of the density surface volume they were identical when created from the standardized or non-standardized density rasters. Using the probability contours as a ‘prior’ distribution of kill sites offered a more robust and meaningful measure to quantify the shift in cougar kill distribution. However, the mechanism behind the shift could still be related to factors other than wolf presence like seasonal shifts in prey distribution or density. Both methods (standardization and prior distribution comparison) were useful and provide evidence toward the presence of a shift in where cougars are killing prey (i.e. the answer to my ‘question asked’ is yes), but further analysis is necessary to infer causal mechanisms.

April 30, 2017

Tutorial 1: An analysis of the spatial distribution of the North American deer mouse (Peromyscus maniculatus) in the Toiyabe mountain range, NV, USA.

Filed under: 2017,Tutorial 1 @ 10:07 pm

Background & Question

The north American deer mouse (Paramyscus maniculatus) is the most widely distributed rodent species in north America and has the highest proportional abundance across many of the rodent communities in the Great Basin. Understanding its distribution in space is important for understanding its impact on the spatial arrangement of the rest rodent community to which it belongs. In this regard, I wanted to evaluate the degree to which this species is clustered in the Toiyabe mountain range in central Nevada. I expect the pattern of P. maniculatus distribution in the Toiyabe mountain range to be reflective of its distribution behavior in other similar mountain ranges in the Great Basin, and thus exerts similar forces across the landscape. The first step in understanding the relationship between the distribution of P. maniculatus and the other species in small mammal communities was to map the distribution of P. maniculatus. Here I used a descriptive statistical technique of spatial analysis, specifically, Hot Spot analysis. Because data are spatially explicit at the scale of interest the spatial relationship I was interested in measuring could be expressed using the equation for spatial autocorrelation below (eq. 1); where the number of P. maniculatus at location i is a function of the number of P. maniculatus at location i±h, where h is some displacement of space.

Eq. 1.               P. maniculatusi = f(P. maniculatusi±h)

I expected that at the scale of interest, the number of individual P. maniculatus captured would be distributed approximately equally across trap sites. Hot spot analysis provides an index of spatial clustering from clustered to random to dispersed. Clustering suggests the spatial process of attraction, while dispersal suggests repulsion between points. These patterns could be explained as the ecological processes of intraspecific commensalism and competition, respectively. Alternatively, if the distribution of points is random it may suggest that the underlying process producing the distribution of points is not captured by the sampling design, or that the distribution of individuals is not spatially dependent. I do not expect the distribution of P. maniculatus to be spatially dependent.

Methods

Raw data tables represent all trap data from 2009 – 2011 for the Toiyabe mountain range small mammal survey. Each individual animal is associated with explicit and implicit spatial data, as well as biological measurements including body size measurements and age-class. Additional information about collection and preservation methods are also associated with individual animals. I collapsed this comprehensive data set by constraining my analysis to one species, Peromyscus maniculatus.  I then collapsed this dataset further by converting it to a count of individual P. maniculatus at each unique location. Explicit location data was recorded as latitude, longitude using the NAD1927 Datum.

Using ArcMap10, I projected these points onto relief base map of North America. To accomplish this I imported the raw data as a .csv file into ArcMap.

.csv file

After importing the .csv file, set the datum for the projection by right clicking on the .csv file in the table of contents and selecting layer properties and setting the data source.

import .csv

Within data source menu, select the datum that your coordinates were measured in, and indicate the X and Y coordinate you wish to display.

.csv properties

Set datum and XY

Once the datum is set, it is easiest to then select a base map to project your points onto you’re your tool bar Select the add basemap button and choose an appropriate map. Here I used terrain with labels.

Select Basemap

Once you have your points projected onto the base map you will need to export them as a shape file. Right click on the layer you wish to export and choose the selection to make a shape file.

Export shape file

Perform Hot Spot analysis on this shape file by assigning the count data to each point.

Perform hotspot analysis

The output for the Hot Spot analysis is the confidence level that clustering (hot spot) or dispersal (cold spot) is occurring.

Hot spot output 

Results

Hot Spot analysis of P. maniculatus based on trap line locality provided mixed insights. Specifically, this analysis does not show how P. maniculatus is distributed on the landscape, but rather how the researchers distributed trap lines on the landscape. However, it does demonstrate that across trap lines P. maniculatus essentially evenly distributed. Only one trap line represents a hot spot for this species, while all other localities were not significantly cold or hot, that is not clustered or dispersed. The single hot spot occurred at a point in space where a human food subsidy to the population likely drove increased population density, as it was located within a campground.

 

Final map

Discussion

Despite the non-random distribution of point localities, the distribution of P. maniculatus across the points was consistent with the generalist ecology of this species. This suggests that members of small mammal community at all measured points in space will be influenced by the presence of P. maniculatus. However, the results of this analysis do not suggest that that influence of this generalist species will be uniform at different spatial localities as the habitat and other rodent species present vary in space despite the distribution of P. maniculatus.

While Hot spot analysis is a useful tool for understanding the distribution of individual occurrences in space, sampling will exert a strong influence over the inference one is capable of making based on this analysis. Trap line data do not provide much inferential power as the distribution of points is determined by the researcher and at best the hot spot analysis provides the same information that a histogram might, with count as the response variable and the latitude or longitude of each trap line as the explanatory variable. The greatest value in performing this analysis on this type of data is that it provides a standardized visual representation of the data. Which may help me see how variable the distribution of a species is between trap lines. However, moving forward I will be using elevation rather than latitude and longitude and my spatially explicit variable.

 

April 28, 2017

Tutorial 1: Calculating Global Moran’s Index on Point Data

Filed under: 2017,Tutorial 1 @ 2:19 pm

A quick summary of my project and objectives: I am using a subset of the FIA (Forest Inventory and Analysis program run by the USFS) dataset, which is contained in a .csv file. I want to assess the understory vegetation response to ponderosa pine canopy cover and other environmental factors, and I am using USFS plot data to simulate the process I will be using on my own dataset in the future. The dataset currently looks like this:

Obviously there are multiple variables associated with each plot, but for the sake of this exercise, I will only be assessing canopy cover and total vegetation cover. Because my dataset is not a raster, nor is it regularly gridded, I decided not to attempt a semi-variogram. I was also unable to kick my ArcMap habit for this exercise. I decided to run a Global Moran’s Index assessment on the points in ArcMap.

The first step of this process was to load my .csv into Arc. This can be accomplished by adding the file as X,Y data. However, you can’t do much with X,Y data for our purposes, so I exported the X,Y data as a shapefile. This created an attribute table holding all the same information as the .csv file. I also added a boundary of Deschutes County, which is where my subset of data is located. I used the Project tool to make sure both the points and county boundary were properly oriented on the landscape, using the WGS_1984_UTM_Zone_10N coordinate system.

From here, the process was very simple. I used the search tool to seek out a spatial autocorrelation tool, which brought up the Global Moran’s Index. I selected my points layer as the input, and selected canopy cover and total vegetation as my input fields (in separate trials).

The reports showed the levels of clustering for each variable. The reports are shown below.

Canopy cover spatial autocorrelation:

Vegetation percent cover spatial autocorrelation:

I found that the variables had very different levels of clustering. The canopy cover variable had essentially no clustering. The Moran’s Index value was -0.034, with a p-value of 0.57, meaning the level of clustering was non-significant. However, the vegetation cover was highly clustered, with a Moran’s Index value of 0.0496 and a p-value of 0.00028.

This method was useful in giving me a different perspective on the data. It is difficult to simply review a spreadsheet and try to make any meaningful interpretation of the numbers. While the tool did not give any sort of visualization, it allowed me to imagine how the vegetation might be distributed across the landscape. The results mean that canopy cover varies across the focus area randomly, so stands with thin or dense canopies can be found just about anywhere. However, stands with dense understory vegetation are probably located in similar regions, as are stands with thin understories. This might imply that other environmental factors have a strong influence on how densely vegetation grows over a wide area.

Tutorial 1: Species Area Curves with Vegetation Community Data

Filed under: 2017,Tutorial 1 @ 1:58 pm

Salmon River Estuary: Map of study site and plot locations

Species Area curves represent the relationship between a specified area of habitat and the number of species present there. Species-area curves are useful for comparing species richness data from sample units that differ in area, especially nested sampling plots. Empirically, the larger the area, the larger the number of species you are likely to capture; a principle that has held mathematically true in ecology. Species-area curves can address the adequacy of sample size and estimate the overall diversity and spatial heterogeneity within a community dataset. Species-area curves help to conceptualize the relationship between species richness and spatial scale in a sampled environment. Typically species-area curves take only species presence, into account and demonstrates average number of species per plot sampled. Species-area curves can also help researchers determine the appropriate spatial grain for a study area, as the spatial grain (the size/dimensions of the smallest sample unit) has a strong impact on the shape of a species area curve. Species area curves are a basic diversity measure that are helpful for describing the heterogeneity of a community sample. Species accumulation curves, are derived by a log transformation of the species area curve, and describe a linear trend of how species richness increases over area sampled. The species accumulation curve is also called the ‘effort’ or ‘discovery’ curve, as the species richness measured is often related to the amount of time and space sampled by a researcher. The goal for my tutorial is to demonstrate how I have created species-area and species accumulation curves to investigate spatial pattern of vegetative richness within and between tidal marshes at my study site.

Equation for Species Area Relationship: S = cAz  >>>>> log(S) = c + zlog(A)

Where S = Species Richness, A = area, and c and z are empirically determined constants from plotted data (Rosenzweig 1995).

The research question I have relevant to this exercise/tutorial is: Do species area relationships differ between restored and remnant tidal marshes at Salmon River Estuary? How do field methods, specifically, nested, rectangular (Modified Whittaker) plots and square, non-nested (Transect) plots capture these species-area relationships differently?

Between the two types of sampling techniques I applied, I would ultimately expect the Modified Whittaker plots, which cover more area, to capture the same number if not more salt marsh species compared to transect plots. I hypothesize that the reference marsh would have a greater number of species, and a steeper slope for the species-area curve compared to restored marshes. I would also expect more recently restored marshes to have less ‘steep curves’ than marshes that were restored less frequently.

Name of the tool or approach that you used:

PC-ORD is statistical software counterpart to R that that performs ordinations and multivariate analyses on community datasets. Microsoft Excel was also used for graphics. Initially I used PC-ORD to generate the average number of species per plot over cumulative area sampled, for each plot size in each tidal marsh. I then exported the data to excel and produced graphs of cumulative species over log10 area and figured out the slope of each line. I kept it simple and used software I am familiar with, to expedite the process of producing useful results I can interpret and be confident in.

Brief description of steps you followed to complete the analysis:

Step 1: Enter data into an excel spreadsheet and format for PC-ORD

Open PC-ORD. Start a new project (File > new project). Name the project appropriately and import the excel sheet into the ‘main matrix’ input. Save the project. Use the summary command to produce a species-area curve output. The output will include average number of species per plot over sampled distance. Repeat this for every plot size sampled ((x2)1 m2, 10 m2, 100 m2, 1,000 m2).

Save PC-ORD output and import back into Excel. Graph average species richness against area sampled for species area curves. Do this for transect 1 m2 plots and MW 1 m2 plots, for each tidal marsh, using Excel commands (highlight data > insert > scatter plot graph). Calculate the log10 of the area sampled and log10 of species richness, for all nested MW plots (1 m2, 10 m2, 100 m2, 1,000 m2). The trendline is a fitted linear trend that describes log species richness vs. log species area.

Brief description of results:

The species curve for transect data appears to rise and plateau quickly, whereas the MW species curve rises more steadily and plateaus later (Figure 1 a-d, Figure 2 a-d). MW plots had higher overall species richness (21) and lower species turnover (1.7) compared to transect data species richness (17) and species turnover (4.5). This is an expected result, as there were fewer larger MW plots that covered more area compared to transect plots. Furthermore, the transect plots exist on environmental gradients related to salinity and elevation, suggesting greater species turnover within sample units, compared to MW plots that are not aligned with gradients (McCune and Grace 2002; Rosenzweig 1995). Average species accumulation for transects is similar (16.26) to average species accumulation for MW plots (17.93), suggesting that both represent adequate sample size and appropriate spatial grain for this study site (Figure 2 a, b, c, d, Table 2). ANOVA and post hoc t tests indicated that all tidal marshes have significantly different species accumulation rates.

Critique of the method – what was useful, what was not?

I would describe this method as useful, fast and simple. It was reasonably accessible and easy to do, without requiring extensive knowledge of statistical programs or software packages. However, I would not have found the method as easy if I had not just completed a course at OSU that instructed me on how to use PC-ORD (taught by Dr. Bruce McCune, who created the software). Needing to use two programs to get my results may be a bit cumbersome, however with an initial quick google search I was not able to find any helpful code on how to produce semilog species area graphs and species area curves with R. It seems like pursuing R would have been more frustrating and the results would not have been as clear to me. I am satisfied with my approach and pleased with my outcome, as it is simple and the results make sense to me.

Example of Results:

 

 

 

 

Tutorial 1: Using variograms to assess patchiness in raster pixels for a single wildfire event

Filed under: 2017,Tutorial 1 @ 10:32 am

Overview: Question & Tool

Often patch dynamics are examined under the lens of single disturbance events, but contemporary disturbance regimes are demonstrating that large-scale disturbances can overlap in time and space.  The spatial configuration of patches from an initial disturbance may influences the spatial configuration of patches of a sequential disturbance.  The broader research question is focused on determining if there is a relationship between patches from an initial disturbance of beetle outbreak and a sequential disturbance of wildfire.  Before I address the broader question, I would like to determine if the post-fire landscape indicates some inherent patchiness. The research question here asks:

What is the patchiness (variability) of  fire severity for a single fire disturbance?

In order to do this I employed a variogram to assess the spatial pattern for a post-fire image.  The post-fire image was generated for the Entiako fire that burned in 2012 in central interior British Columbia.

Methods

The burn severity raster was generated by following protocols outlined by Key and Benson (2006). I collected a pre and post fire Landsat (30m) images for the Entiako fire that burned in 2012.  Both images were from the same month to match phenology and had minimal cloud cover.  I calculated the differenced Normalized Burn Ratio (dNBR), a pixel indexing classification.  Initially, NBR is computed for each individual image:

NBR = (NIR – SWIR) / (NIR + SWIR)

 

Then the difference is taken between the pre- and post-fire NBR images:

dNBR = NBRprefire – NBRpostfire

The dNBR image is useful for differentiating a gradient of fire severity within a fire perimeter.

A variogram was used to assess the spatial variability between pixel values in the dNBR image. A variogram is a means of evaluating spatial correlation between locations, and the shape of the variogram may indicate the level of patchiness and average patch size. The following spatial equation was considered as a guide:

  • yi = f(yi-h,i+h)

The variogram analysis was conducted in R version 3.3.2 (Team 2016) with the USDM Package (Naimi 2015). I used the following code:

Variogram_Tutorial

 

 

 

Results

The variogram analysis indicated a spherical relationship in the dNBR image, which suggests some level of patchiness (Figure 1).  The sill of the variogram suggests that patches averaged about 1500m, which seemed a bit high (Figure 1).  The dNBR image contained a couple large bodies of water, which may influence the variogram.  I removed the water bodies from the dNBR image, and ran the variogram on the water free image (Figure 1).  The water free analysis shows a shift in the sill location and indicates that average patch size might be around 500m (Figure 1).  The variogram without water also seems to have a double sill, which might indicate that there are both smaller and larger patches.

Critique of the method – what was useful, what was not?

The variogram seemed like a useful descriptive statistical tool for assessing the spatial relationship of pixels, as an indicator of overall patchiness, and an indicator of average patch size.  It allowed us to examine this relationship while maintaining the continuous dNBR raster.  Conducting the variogram in R was the key to maintaining the continuous raster data.  In order to conduct the analysis in Arc GIS I would have had to convert the continuous raster to point data.  This analysis allowed us to consider both implicit and explicit concepts of time.  The dNBR image captures this idea of implicit time, by measuring the difference between the pre- and post-fire images.  The variogram allows us to capture the idea of explicit time by describing the inherent patchiness in the post-fire landscape. It is important to recognize that the variogram analysis is taking a sample of the data.  The sample is dependent on the spatial resolution and extent.  If you rerun the variogram analysis the curve is slightly different each time, because the model re-samples the data set.  In trying this a few times it did seem like the sill remained at a relatively consistent lag distance.

 

Figure 1. Variogram for the Entiako dNBR image with water included (a) and excluded (b). The red circles indicate the sill, which provide a general indication of patch size. With increasing distance (Lag) there is increasing variability. The dNBR images (c and d) for the Entiako Fire that burned in 2012. Image c was used for variogram a, and Image d was use for variogram b. Removal of large bodies of water is visible in d (black circle), which correspond to variogram b.

 

 References

Key, C. H., and N. C. Benson. 2006. Landscape assessment (LA): Sampling and analysis methods. USDA Forest Service General Technical Report RMS-GTR-164-CD:1–55.

Naimi, B. 2015. Package “ usdm .” R Topics Document.

Team, R. C. 2016. R: A language and environment for statisitical computing. R Foundation for Statistical Computing, Vienna, Austria.

April 24, 2017

Japanese tsunami marine debris biota

Filed under: 2017,Final Project,My Spatial Problem @ 12:49 pm

 

Research problem: Japanese Tsunami Marine Debris species

Six years ago, the devastating Tohoku earthquake and tsunami struck the coast of Japan. Since then, it has become evident that hundreds of coastal species from Japan have crossed the Pacific Ocean on tsunami debris, including species that have become invasive and been known to cause ecosystem/economic damage elsewhere. As of January 2017, scientists have documented the arrival of >650 debris items, referred to as Japanese Tsunami Marine Debris Biofouling Items (JTMD-BF) 1-651. Debris items include docks, buoys, boats, pallets, and wooden structures. These items were identified as JTMD if it 1) had clear identification such as a serial or registration number that was linked to an object lost during the tsunami of 2011; 2) had clear biological evidence of originating primarily from the Tohoku coast of Japan; or 3) a combination of these factors. The BF items are a subset of all the debris lost, as some debris items that were known to be lost from Japan remain undocumented – it is possible they ended up in remote locations.  A huge effort by taxonomists to identify the species on JTMD has generated a comprehensive species list. Currently, there are around 300 taxa that have been collected on JTMD from North American and Hawai`ian coastlines since 2012. I am interested in looking at several spatial aspects of the dataset as research questions. First of all, I want to explore the geographic distributions of the species with invasion history compared to those without invasion history. I also want to look at a few species of interest, and map their geographic distributions to compare (both native and non-native on the same map). Lastly, I want to explore the different methods of transport, or vectors, that have been documented for 31 of the JTMD species, to better understand different dispersal mechanisms and patterns. 

Dataset: JTMD Database

My work is part of an international effort to evaluate the risks associated with JTMD and associated species.  I contributed to the development of a database of life history, distributional, and environmental attributes of many JTMD species. This database is being used for reference and analysis, and can aid in efforts to assess the risk associated with JTMD species, and in determining which species are of highest concern for establishment in certain areas. The database contains a little over 100 JTMD species, with 26 attributes per species. It has information for native geographic regions and non-native (if applicable) geographic regions, as well as vector information, meaning the methods of transport that species have used to disperse to different regions of the world.  The regions are classified as the Marine Ecoregions of the World. It also has survival temperature and salinity regimes, reproductive information, habitat, depth, trophic level, abundance data, and more. The database does not include temporal information, but these species arrived on debris items in pulses for years after the tsunami event, and that is another interesting aspect of the phenomenon.

Hypotheses: While the JTMD species are largely native to the Northwest Pacific (near Japan), they are also native to regions from all over the world. I hypothesize that most species without invasion history are native to the NW Pacific, and those with invasion history are native to regions all over the world. I expect to see this represented when their native regions are mapped. For the species that have been found outside their native range, I expect to see that some are only just outside native ranges in a few areas, but some with a history of invasion are more globally distributed.

For the individual maps of species of interest, I am not sure how they will compare, but I expect some to be more global, and some to be more localized, and only invasive in one realm.

For the vector types, I expect to see a lot of overlap of geographic regions between different types of vectors, because there is a lot of overlap in documentation. What I mean by overlap is that most species that were introduced to a certain region via ballast water were also introduced via aquaculture/fisheries trade, for example, and that can result in similar looking maps for different vector types.

Results: Please refer to these links for some more detail, and clearer maps on the JTMD species native region distributions: Reva Gillman_geog566exercise1 JTMD species of interest individual range maps: Reva Gillman_Exercise2 , and lastly JTMD Vector distribution: Reva Gillman_exercise3_geog566

 

JTMD species native regions

The following map shows the total number of JTMD species native to each region, with a legend on the side specifying what range each color represents.

Results: The most prevalent region that JTMD species are native to is the Temperate Northern Pacific. This seems obvious, as JTMD originated from Japan, so we expect most species to be native to that same region of the globe. Next most prevalent native region for JTMD species is Eastern Indo-Pacific, the region southeast of Japan. However, after that the native regions that are prevalent for JTMD species begin to span the globe: Temperate Northern Atlantic, and Tropical Eastern Pacific. At the other end is the least prevalent region: the Southern Ocean, only one JTMD species is native to this cold southern region.

 

 

 

Looking at Individual Species geographic ranges

What are the geographic distributions of some of the ‘riskier’ species that arrived on Japanese Tsunami Marine Debris (JTMD)? What do their native and non-native ranges look like, and how do they compare with each other?

First of all, I had to choose 4 species with clear invasion history out of the 104 JTMD species, to focus on. Out of the 31 with invasion history, I chose Asterias amurensis (seastar) and Hemigrapsus sanguineus (Japanese shore crab) since they are known to be an issue in other regions. I then chose two species with large non-native spread: Crassostrea gigas (Japanese oyster), an economically important aquaculture species, and Teredo navalis, the shipworm with an almost global distribution that has burrowed in wooden ships all over the globe for 100’s of years.

The approach I used was within ArcMap. I manipulated the Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species.

Crassostrea gigas geographic distribution

 

 

Teredo navalis geographic distribution

JTMD species vector distribution:

The maps below show the number of species that were introduced to each realm via each vector type. As you can see by looking at the first map for natural dispersal, the most prevalent regions are Temperate Northern Atlantic, and Temperate Northern Pacific. For the second map of moveable structures, the most prevalent regions are Temperate Northern Pacific, and Southern Australia/ New Zealand.

For solid ballast, the most prevalent region for this vector is Southern Australia/ New Zealand. For recreation, the most prevalent regions are Temperate Northern Atlantic, along with Southern Australia/ New Zealand.

For the method of transport of ballast water, the most prevalent regions are Temperate Northern Pacific, Temperate Northern Atlantic, along with Southern Australia/ New Zealand, and Southern Africa.  For aquaculture and fisheries trade, the most prevalent regions are Temperate Northern Pacific, Temperate Northern Atlantic, along with Southern Australia/ New Zealand.

Approaches: The approach I used was within ArcMap. I chose to visually represent native region frequency of JTMD species within 12 different coastal geographic regions of the world. For the second part of individual species maps, I manipulated the Layer Properties in ArcGIS ArcMap 10.4.1 in order to manipulate the polygons in the shape file, to make them appear different colors according to which realms were documented as native or non-native regions for each species. For the last part of my project, I visually represented vector distribution frequency of JTMD species within 12 different coastal geographic regions of the world, with a different map for each vector type.

Significance: Human-mediated transport of marine species across the globe through ballast water and hull fouling has been a concern for some time. JTMD is unique in comparison to these marine vectors, in that it can transport large numbers of marine species across ocean basins. Shipping routes are direct and arrive in known locations and at measurable frequencies whereas JTMD, which is propelled by winds and currents and travels at much slower speeds than ships, can arrive almost anywhere at any time. JTMD is potentially the transport vector with the most random distribution yet described. Due to the slow rates of transport by currents rather than propulsion, the effects of drag and dislodgement are reduced on JTMD in comparison with ship hull Furthermore, JTMD transports large numbers of adults, rather than larval stages that are more common in ballast water. As of January 2017, only one JTMD species, the striped beakfish Oplegnathus fasciatus, has been observed free-living in along the west coast of North America (in Oregon and Washington). At this time, we do not know if any of these JTMD species will become established outside of their current distributional range as a result of the earthquake and tsunami. But by studying the JTMD species invasion histories, and geographic distributions of both native and non-native areas, we can better understand this large dispersal event, and inform vector management response for future events.

The maps that were created during this class are useful for my research, and are really nice ways to visualize the native regions of the JTMD species, the distribution of the JTMD species, and the vector distribution of the JTMD species. I can potentially use these maps for reports and thesis work on JTMD species in the future.

My Learning: I started out the class as a beginner in spatial analysis, and with no experience in GIS,  or Arc-Info. Being a beginner, I had to do a lot of self-exploration, tutorial watching, and asking for help from peers and teachers. I am happy to report that I gained a lot of comfort using ArcMap, and can now get a shape file into ArcMap, get data as another layer over the shape file, and join data, manipulate the layer properties, and create maps with legends in ArcMap. I am happy with these skills, and I feel that I learned a lot about beginner ArcMap usage, and feel more comfortable talking about GIS mapping with others now.

Statistical knowledge: I didn’t use statistical analysis with my project, as my data was for very large realms of the marine regions of the ocean, and didn’t lend itself to statistical analysis, so I did exploratory analysis and visualizing as means of analysis. However, by talking to others in the class, and learning about what their methods were, I learned some new methods for spatial analysis, like variograms, that I can potentially use in the future.

Comments: I got very intuitive comments from students on my tutorials. One suggestion was to look at frequency of occurrence for each debris item, and map that, which I wasn’t able to incorporate because I don’t have the necessary data, but it was a good suggestion if we did have it. I also used the comments from Julia on my exercises to guide my exploration for future exercises. For example one of Julia’s suggestions was to map the number of species introduced to each region by a vector, and that is exactly what I ended up doing to visualize the vector distribution of JTMD species.

 

 

April 20, 2017

Exercise 1 Part 2: Assessing spatial autocorrelation of snowpack using semivariograms

Filed under: Tutorial 1 @ 10:50 am

Research Question

For this exercise, I elected to compare the spatial autocorrelation of snow cover frequency (SCF) before and after the Grays Creek wildfire in central Idaho.

Approach/tools used

After trying out a few different tools, I decided to use ArcGIS to create semivariogram plots for my SCF data within the Grays Creek wildfire perimeter.

Analysis procedure

First, a bit of data processing was required for this exercise. Since conducting spatial autocorrelation analyses in ArcGIS is more attainable using point data, I converted my SCF data from its original raster format into an array of equally spaced points (see example in figure below). A point was created at the center of each 500m pixel. This process was completed for both the winter (Oct. 1 – May 1) preceding the wildfire and the winter following the wildfire. Finally, the semivariogram clouds were produced using ArcMap’s geostatistical analyst toolbox.

Results

SCF Pre-fire Semivariogram cloud:

SCF Post-fire Semivariogram cloud:

Overall, the semivariograms display similar distributions between the pre- and post-wildfire data. The semivariogram clouds show the empirical semivariance values for each pair of points plotted against the lag (distance) separating these two points. For this data, higher semivariance values correspond to large differences in SCF, while low values indicate similar SCF values. Although subtle, it does appear as though the post-fire semivariogram has a less defined trend of increasing semivariance with distance. There’s not enough information to say whether this is a result of severely burned wildfire patches, but it would be interesting to examine this potential relationship further.

As expected, data point pairs generally become less similar as the distance between them increases. In both plots, the semivariogram clouds display a steady increasing trend. This trend is likely a product of the snowpack gradient that exists across elevations at a local scale – as elevation increases, so does snowpack. In other words, points that are closer together will be more similar in elevation, and since snowpack is closely linked with elevation at a local scale, nearby points will also have more similar SCF values.

 

Method critique

The main critique of this procedure was the variable that I chose to analyze. Because snowpack is undoubtedly spatially autocorrelated at close distances (due to the strict elevational relationship), the semivariogram did not reveal any patterns that I was not already aware of.

However, I anticipate the semivariogram to be a useful spatial analysis tool as I move forward with my research. The semivariogram statistics would probably be much more appropriate for either my vegetation index data or burn severity data as these datasets have a less predictable distribution across a landscape.

April 7, 2017

Topographic constraints on climate drive range shifts in Great Basin small mammals

Filed under: My Spatial Problem @ 8:55 pm

A description of the research question that you are exploring.

In 2015, Elsen and Tingley published “Global mountain topography and the fate of montane species under climate change,” in the journal Nature – climate change. They combined two data sets, the global data set of mountain ranges from Natural Earth’s physical vectors and a high-resolution near-global DEM (SRTM30) to evaluate the assumption that available non-vertical surface area decreases with increasing elevation in mountain ranges. This is an important question in community ecology and conservation as it applies directly to a known relationship between species diversity and area established by the field of island biogeography and more widely applied in recent decades. Specifically, the relationship states that physical space increases, the number of species that space can hold increases. Elsen and Tingly addressed this question across 182 of the world’s largest mountain ranges. However, they did not apply their analytical methods to any of the mountains in the basin and range province, except for the Sierra Nevada Mountain range which make its western border and the Rocky Mountains on its eastern border. The basin and range in comprised of north-south trending mountain ranges and have been the focus of several studies addressing questions relevant to island biogeography. Additionally, modern (2009 – 2016) and historical (1929-1931) small mammal surveys have established small mammal community composition for a number of these mountain ranges. In order to better understand the dynamics underlying the distribution of small mammal species along elevation gradients in these ranges I would like to perform similar analysis to Elsen and Tingley on three ranges for which historical and modern small mammal communities have been studied, the Toiyabe mountain range, Ruby Mountain range, and the Snake range. Specifically, these analyses would address the question; to which hypsographic classification are species range shifts subject? I hope to better contextualize the range dynamics of species in response to climate vegetation changes. Furthermore, understanding how available area changes with elevation in these ranges will enable better predictions about a species ability to remain within its thermal envelope by moving up slope.

 

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

Predominantly, I plan to use the SRTM30 DEM in combination with land-sat data to evaluate area as a function of elevation. I am not sure which dataset will be best to use for vegetation analysis. I have spatially explicit data on the distribution of small mammal species along elevation. However, small mammal species data is not continuous along the elevation gradient, or within habitat/elevation bands.

The following description of the SRTM30 dataset has been copied from

< http://topex.ucsd.edu/WWW_html/srtm30_plus.html>

SRTM30_PLUS V1.0 November 11, 2004

INTRODUCTION: This data consists of 33 files of global topography in the same format as the SRTM30 products distributed by the USGS EROS data center. The grid resolution is 30 seconds which is roughly one kilometer. Land data are based on the 1-km averages of topography derived from the USGS SRTM30 gridded DEM data product created with data from the NASA Shuttle Radar Topography Mission. GTOPO30 data are used for high latitudes where SRTM data are not available. Ocean data are based on the Smith and Sandwell global 2-minute grid between latitudes +/- 72 degrees. Higher resolution grids have been added from the LDEO Ridge Multibeam Synthesis Project and the NGDC Coastal Multibeam Data. Arctic bathymetry is from the International Bathymetric Chart of the Oceans (IBCAO) [Jakobsson et al., 2003]. All data are derived from public domain sources and these data are also in the public domain. The pixel-registered data are stored in 33 files with names corresponding to the upper left corner of the array shown below.

The USGS SRTM30 data and documentation is available at

ftp://edcsgs9.cr.usgs.gov/pub/data/srtm/SRTM30

 

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

Elsen and Tingley produced four categories of hypsographic classification based on the relationship between area and elevation for mountain ranges. These categories are Diamond, Hourglass, Inverse pyramid, and pyramid, these categories describe mountains that increase then decrease in area with elevation, decrease then increase, increase, or decrease, respectively. They found that approximately 68% of the 182 ranges they analyzed were not categorized by the pyramid shape category. Importantly, the pyramid category describes “the dominant assumption in ecology and conservation that area decreases monotonically with elevation from a mountain range’s base (Tingley and Elsen 2015).” Despite this finding, the basin and range province is the result of 40 million years of geologic and geomorphic processes, which include continental rifting, erosion, mantle uplift, Pleistocene pluvial lake bank erosion, glaciation, and Holocene warming and drying. Specifically of these forces, I expect that the listric normal faulting that characterizes the Basin and Range to have resulted in pyramid shaped mountains. I also expect that ranges will be spatially auto-correlated, with respect to the skew of east versus west facing slopes.

 

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data. 

I would like to learn how to overlay mountain range polygons atop a hi-resolution digital elevation model. I will need to learn how to use the R-package “Raster” to make the hypsographic curves described above and by Elsen and Tingley. I would also like to use Arc to make maps.

 

Expected outcome:

I would like elucidate the area-elevation relationship to which small mammal species, and all species are subject to in the mountains of the Great Basin. Ideally, I would also love to evaluate available area for habitat type based on the distribution of vegetation type and % cover along elevation. In this regard, I would like to produce maps of habitat types and graphs that illustrate the percent cover of a habitat type as a percent of total available space on the mountain. It would also be nice to show which microhabitat features are associated with climate/ weather measurement stations in place on these mountain ranges. In addition, I would be excited if I could make predictions about the constraints on a species ability to move upslope based on available area, this may be represented as a plot of species richness against a combined elevation-area score.

 

Significance.  

The Great Basin has a unique geological and physographic history in North America and has an experienced accelerated pace of human impact throughout the Anthropocene. The combination of human land-use practices and climate change has significantly altered small mammal community composition, pushing it outside of its range of natural variability. Understanding the physical constraints to a species ability to persist on the landscape is a critical to predicting how shifting community dynamics are constrained by the physical environment.

 

Your level of preparation:

I feel comfortable using R and exploring new packages and I am typically able to find the support I need when I am trying to understand or apply the tools available in a new R package.  I have limited experience with Arc-Info and/or Arc-GIS, however I do have knowledge of a number of the basic rules to follow when using these tools. I have extremely limited experience with Python.

Next Page »

© 2017 GEOG 566   Powered by WordPress MU    Hosted by blogs.oregonstate.edu