Category Archives: 2022

Spring 2022

Investigating the relationship between in-stream course woody debris and the riparian environment using terrestrial mobile LiDAR

The research question that you asked (provide one question for each exercise).

Ex 1: How does the morphology of stream segments with log buildup differ from those with less debris accumulation?

Ex 2: How are local in stream debris concentrations (pieces per unit area) related to stem density measured in intervals upstream?

Ex 3: How does debris volume influence surrounding slope?

A description of the dataset you examined, with spatial and temporal resolution and extent.

My data consist of a terrestrial mobile LiDAR-derived rasterized DEM. The data were acquired last summer in a tributary to McRae Creek in the HJ Andrews Experimental Forest. A 1km stream reach was surveyed with LiDAR over the span of a few hours. A section measuring roughly 50 meters in length was extracted from the full longitudinal survey to work on in this course.

Hypotheses: predictions of patterns and processes you looked for.

I expected to find clusters of debris along the stream reach, representing an uneven distribution through space. I expected that the underlying geomorphology of the streambed, the surrounding bank slopes, and tree stems would be related to this phenomenon. I also expected to find that a larger concentration and volume of debris would impart a more drastic local change on stream morphology than areas with smaller/less debris.

Approaches: analysis approaches you used.

Discrete LiDAR return elevation values were interpolated using a TIN algorithm and then rasterized. A hillshade was created from the raster DEM to aid in visual identification of individual debris points, which were then discretized with manually-placed points. Kernel density on the points was run.

Riparian stem points in the riparian buffer zone were assigned points with the method used for the in-stream debris. A valley bottom flowline was manually drawn, and buffered with fixed radius plots measuring 10 meters in diameter spaced in 5 meter intervals along the flowline. Debris and stem points were sampled and run through a cross-correlation function in R.

In CloudCompare, the CANUPO supervised objects classifier was trained with points manually identified as wood debris and segmented from the point cloud. Points classified as debris were fitted with cylindrical geometric primitives using the RANSAC algorithm. Best fitted solids were used to extract volume of individual debris. Volume values were assigned to discretized debris points in ArcGIS. Each point was buffered with a 3 meter fixed radius plot, wherein the DEM’s slope was sampled. Slope as a response to debris volume was fitted to a linear model in R.

Results: what did you produce — maps? statistical relationships? other? Present the key, important results you created.

Figure 1: Kernel density produced a heatmap of debris concentration along the stream corridor.

Figure 2: Cross correlation produced a model and associated charts that did not indicate any significant relationships.

Figure 3: Discretizing stem and debris across the DEM produced GIS feature layers.

Figures 4&5: The CANUPO and RANSAC algorithms produced a classified point and fitted cylinders, respectively.

What did you learn from each of the analyses you conducted (i.e., from each exercise)?

Exercise 1: Kernel density successfully identified hotspots of high debris concentration along the stream segment, confirming the patterns seen through visual inspection of the DEM. This method validated the proof of concept implementation of points as debris.

Exercise 2: The cross correlation function was unable to distinguish patterns of lagged correlation between stem and debris density, likely due to too small of a sampling spatial extent and/or sampling frequency. This suggests that small-scale relationships between stream morphology and debris may be too noisy for analysis.

Exercise 3: The CANUPO and RANSAC algorithms were moderately successful at extracting debris and volume from the stream point cloud. However, segmentation of debris was not precise enough to fully remove all points from the cloud, producing remnant chunks of debris points that likely interfered with linear modeling of slope and debris. This exercise also suggested that modeling the effects of debris volume on slope on a per-debris basis (as opposed to total volume in a logjam) may not show any reliable patterns.

For all exercises, the methods used for sampling were successful, in that they were able to extract desired metrics at a given spatial scale and extent.

Significance. How are these results important to science? to resource managers?

Understanding how debris and riparian trees/vegetation interacts with stream morphology is an important topic in watershed and riparian management. Relating riparian tree density and buffer width to stream morphology conditions is relevant to OFPA regulations that stipulate harvest retention guidelines around sensitive areas. Study of forest conditions that promote healthy streams is important. The use of terrestrial mobile LiDAR to study these phenomena offers a way in which innumerable metrics of the stream environment may be extracted with great precision and accuracy, paving the way toward more sophisticated study of watersheds.

Software learning. Your learning: what did you learn about software (a) Arc-Info, (b) GIS programming in Python, (c) programming in R, (d) Modelbuilder in Arc,or (e) other?

These activities have illustrated the relative strengths and weaknesses of ArcGIS’s toolbox of functions. Arc makes it easy to develop a sampling design and workflow and conduct some spatial analyses, but other programs offer greater flexibility, ease of use, and functionality in certain cases. CloudCompare has emerged as a powerful open-source program that I have relied on for working directly with the point cloud, which is something that Arc struggles with. R has worked well as an environment for experimenting with different methods and statistical analyses functions, allowing for ease of use when working with tabular data.

Statistics learning. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) cross-correlation/regression (cross-correlation, geographically weighted regression [GWR], regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA, multiple component analysis), (e) other techniques (change detection/confusion matrices, other)?

Hotspot analysis produced a good visual aid for the logjams along the stream, which might end up as a figure in my dissertation. I suspect spatial cross correlation will serve well as a tool investigate change over distance along streams when I incorporate the whole of my datasets in the analysis.

Evolving question. How did the results of each analysis lead you to change/refine your question? Write out the original question you stated at the beginning of the class, and restate the question(s) you now plan to address.

Original question: How do physical aboveground elements in riparian areas such as herbaceous vegetation, course woody debris, and trees affected debris concentration in headwater streams?

Revised questions:

How does debris, locally and collectively, influence stream morphology, and what is the spatial scope of that influence?

How do riparian trees and their biometrics (height, DBH, volume, live crown ratio) relate to in-stream debris concentration and volume?

What mediating role does riparian terrain (slope and roughness) play between in-stream debris and riparian trees?

How do upstream trees and their biometrics relate to downstream debris concentrations and volume?

Future techniques. What techniques would you like to explore to answer your research questions in the future?

I would like to see how machine learning could predict how stream morphology relates to the debris and riparian metrics I explored in the course.

I would like to use deep-learning to extract advanced tree metrics from the point cloud (stem volume, branch volume, leaf/crown volume) to feed into the model.

I would like to modify my sample design with a dynamic stream buffering function that will extract bankfull width, so that purely in-stream metrics may be analyzed.

I would like to implement a VBET (valley bottom extraction tool) with the stream DEM to replace the manual drawing process. I would also like to automate point assignment to debris, possibly by working with a rasterized DEM of classified debris points produced by the CANUPO process. I would like to implement automated stem extraction from a LiDAR derived CHM (canopy height model) using the Lidr package in R.

I would like to incorporate herbaceous vegetation into my analysis.

I would like to explore point patterns of debris along the streamline using Ripley’s K.

Spatial Patterns of Vegetation in Restored Tidal Wetlands of a River Estuary

Research Questions

Exercise 1: What are the spatial patterns of vegetation species distribution across sampled points?

Exercise 2: Is there spatial cross-correlation between pairs of plant species presence along a sampled transect? Is there spatial cross-correlation between the presence of a given plant species and elevation along sampled transects?

Exercise 3: How did vegetation presence change between 2015 and 2019? Where was there a gain or loss of vegetation, and what areas remained vegetated or unvegetated?

Datasets

For the first component of Exercise 1, I examined a 2021 elevation and vegetation dataset. Elevation was collected with an RTK-GPS and vegetation species presence and maximum height were recorded at each point. Sampling was done every 50m on a grid. For the second component (autocorrelation) of Exercise 1, and for Exercise 2, I used an elevation and vegetation dataset from 2021 resampling of permanent transects (which have been sampled nearly annually since 2009). The same sampling method was used as the previously described dataset, except sampling was done every 1m along each 50m transect. Additional data was collected but was not used for these analyses.

For Exercise 3, I used 4-band multispectral aerial imagery from with 0.25m resolution from 2015 and 2019.

Hypotheses

I predicted that broadly, the spatial pattern of plant species would be clustered at the site scale, due to differences in abiotic conditions (i.e. salinity and elevation). I predicted that there would be some differences between spatial patterns, which I expected is due to smaller scale differences in abiotic conditions as well as biological interactions (not being investigated). In terms of site-wide vegetation change, I predicted there would be a net gain in the extent of emergent high marsh over time, if with the restoration of tidal influence there has been sufficient sediment availability for vertical accretion to occur via a positive feedback loop between accretion and vegetation growth (Kirwan et al. 2013). (I did not end up looking at specific habitat types or plant communities, but still would predict vegetation gain on the mudflat in higher elevation areas.)

Approaches

Exercise 1: point pattern analysis (average nearest neighbor) and autocorrelation

Exercise 2: cross-correlation

Exercise 3: confusion matrix and change detection map

Results

In Exercise 1, I produced maps displaying the presence and absence of two marsh plant species, marsh jaumea and saltgrass. I produced statistical relationships for the average nearest neighbor analysis. For this analysis, I selected a subset of points so that there would not be gaps in the sampling coverage; the sites sampled were not contiguous, creating gaps that would have affected results. The observed mean distance for saltgrass is 48.61m, which reflects the 50m sampling grid. The observed mean distance for marsh jaumea is 70.14m.

Clipped area for nearest neighbor analysis and results of analysis for saltgrass

For the autocorrelation component, I produced plots. For one of the transects, I found significant autocorrelation for lags 1-4. I then appended two transects from a different restoration unit to increase the sample size to 100 points, and found significant autocorrelation at all lags, decreasing over space.

Top: auto-correlation for saltgrass presence/absence on appended transects in Phase II; Bottom: presence/absence of saltgrass along transects in Phase II

For Exercise 2, I produced cross-correlation plots for the relationship between saltgrass and jaumea presence/absence along two transects and between saltgrass presence/absence and elevation. For the pairs of species, there was no significant cross-correlation on one of the transects. For the other transect, there is some significant cross-correlation (maximum ~0.35) for lags -4 to -13, decreasing over space. I believe this indicates significant cross-correlation of saltgrass to the left of a point with jaumea. For elevation and saltgrass, there is significant cross-correlation between lags -2 to 2, and the plot is fairly symmetric. For the other transect, there is significant cross-correlation from lags -4 to 6. These findings make sense to me, as I tended to see saltgrass at higher elevations within the tidal frame.

Cross-correlation for saltgrass and marsh jaumea

Cross-correlation for elevation (m, NAVD88) and saltgrass

For Exercise 3, I produced a map of vegetation change, and a sort of confusion matrix (summary statistics for the percent represented by each category). I believe that there are inaccuracies in this analysis (see next section), but it does appear that by 2019 there was some vegetation colonization surrounding vegetation that existed in 2015.

Analysis Learnings

In Exercise 1, the average nearest neighbor analysis didn’t turn out to work well for my data, due to gaps between sampled areas, as well as observer-determined sampling points (50m grid). For the autocorrelation analysis I used the transect data because pulling out a transect from the grid wasn’t enough data to use. I learned that the presence (or absence) of saltgrass at one point did tell me something about the presence (or absence) at the next few points for one transect, and when I appended transects for another unit, there was autocorrelation at every point.

In Exercise 2, I used a very limited set of data (one pair of species for two transects, and one species and elevation for two transects). The plots look relatively different for the sets of transects, which I expected as I consciously chose two that would be different (different habitat, one with a transition from mudflat to vegetation, etc.). There was significant cross-correlation between saltgrass and elevation, and I’m interested in investigating relationships for more species and transects. Additionally, I think using salinity instead of elevation will be informative.

Exercise 3 was primarily useful for learning the process of a basic change detection analysis and becoming aware of data issues I’ll need to address. For example, some parallel patterns of vegetation gain and loss on the northeastern island indicated that the channels are not lining up well and I will likely need to do new georeferencing. Additionally, I believe there are areas of the large mudflat that were categorized as vegetated but are actually algae, showing up as a loss of vegetation.

Significance

The results from autocorrelation and cross-correlation in Exercises 1 and 2 show some promise for predicting the presence of vegetation. If this is the case upon further investigation, this may be useful in modeling future post-restoration trajectories (i.e. under different sediment accretion scenarios, where would vegetation be predicted to colonize?). Additionally, cross-correlation between species may provide information on common plant associations. I think that these analyses are likely most useful as initial steps that might inform future analyses.

The “confusion matrix” and map of change detection give a site-wide view of vegetation change (though currently without the nuance of habitat type). Once some issues are addressed and new results are produced, the map will be a helpful visual of patterns of change over time. For example, a freshwater wetland transitioned to mudflat post-levee breach in 2009, and there need to be elevation gain for vegetation colonization to be possible. Maps such as this produce a visual of whether this has occurred, whether it’s occurring in specific areas and/or as a result of known or unknown processes (i.e. the eastern side where it was predicted there would be more sediment input). This analysis contributes to monitoring efforts, and can inform resource managers with adaptive management decision-making (I.e. could plantings or sediment application be warranted)? This research fits into a larger monitoring framework at the site. Additionally, monitoring contributes to knowledge about the time frame and trajectory of restoration, which may inform the design of future restoration projects.

Software learning

I used ArcGIS Pro for data visualization, point pattern analysis, and the confusion matrix and change map. The steps involved in these analyses introduced me to more geoprocessing/spatial analysis tools in Arc. I used R for data manipulation (into the presence/absence and elevation format I needed) and for spatial autocorrelation and cross-correlation, which were new R functions for me. I did not end up using Python or Modelbuilder in Arc.

Statistics learning

I learned that point pattern analysis is a good option for presence absence data, though my sampling points being observer-determined hindered the average nearest neighbor analysis (as well as gaps between sampling sites). I chose spatial autocorrelation because I have evenly spaced count data, and determined that presence/absence data would indeed work as a count. In the limited bit I’ve heard about autocorrelation in the past, it’s been in terms of checking for violations of model assumptions (I.e. regression analyses), so it was helpful to learn about new applications for univariate analysis.

Cross-correlation was a new statistical method to me. One limitation I found was that I was unable to run this analysis in the instance where a species was present at every sampling point. In working on change detection, I was able to think through ways to deal with the issue of NDVI not being standardized between years, and using unsupervised classification. I hadn’t run these analyses before, so I learned a lot about the process!

Evolving question

My objective in “My Spatial Problem” was to explore the spatial patterns of vegetation species and communities, how vegetation community structure has changed since restoration, and how this related to geomorphological change via changes in sediment delivery and inundation regimes.

Wow, that was a broad question! For the first two exercises, I ended up focusing on individual vegetation species, rather than tackling any sort of community analysis (to come in future research questions). My original question was missing relating vegetation species presence to a variable B (elevation for exercise 2); I had skipped ahead to broadly stating that I wanted to relate vegetative change to geomorphic change.

My restated questions are: How is species distribution related in space to physical environmental variables (i.e. elevation, salinity, proximity to channels)? How are patterns of vegetation change related to geomorphic change?

Future techniques

I would like to continue working on change detection, looking more into habitat classification methods. For example, I would like to learn more about unclassified habitat supervision and whether manual adjustments need to be made for areas of potential misclassification (such as algae being classified as veg). In the future, I will incorporate empirical data to classify points on spectral signatures, and then do image classification. Additionally, I’d like to look into other values for classification such as the Soil Adjusted Vegetation Index (SAVI) that adjust for soil reflectance, or other classification methods such as object-based classification. Ultimately, I will want to perform change detection by habitat type (mudflat, salt marsh, riparian floodplain, etc.) to better quantify restoration progress.

I would like to explore dissimilarity analyses, such as Bray Curtis, to look at changes over time in resampled vegetation quadrats. I’ll be exploring ordination techniques once I’m further along in thinking about vegetation community analysis. I’ll also be doing a lidar change detection analysis and relating this to vegetation change (technique options to be explored!).

Functional Diversity of Disparate Taxa Along the Steens Mountain-Alvord Desert Elevation Gradient

J.A. Laney – Final Project Blog Post GEOG 566

Background & Research Questions

One chapter of my dissertation research is focused on exploring patterns of functional diversity and heterogeneity–diversity relationships in communities of disparate small-bodied vertebrate taxa. I am interested in (1) understanding how functional diversity (estimated by metrics such as functional richness, functional divergence, functional redundancy, functional dispersion, etc.) vary within and across communities of songbirds and small mammals along the elevational gradient of my study system, and (2) relating these patterns to changes in habitat heterogeneity that occur across elevation. In this course I focussed on the first of these two objectives by exploring autocorrelation of functional metrics of communities and also by describing the dissimilarity of communities by distance.

In Exercise 1, I set out to assess the spatial autocorrelation of various functional metrics across the 16 localities for which I have both bird and mammal survey data. In Exercise 2, I asked how taxonomic and functional Sørensen dissimilarity of passerine bird and small mammal communities in my dataset vary as a function of geographic distance. In Exercise 3., I explored the sensitive of my analysis dissimilarity of bird-mammal communities to changes in the underlying data, specifically when filtering bird observations by distance from observation points?

Description of the Dataset

The dataset I analyzed is comprised of records of small mammal and bird occurrences, as well as associated habitat information, from a comprehensive biological survey project I am leading on a desert-montane gradient in the Northern Great Basin: The Steens Mountain Resurvey Project. To date, I have surveyed 21 discrete localities across the elevational gradient of this region with sites ranging in elevation from the mountain’s summit to the basin floor (Figure 1.). Each locality consists of a circular survey area with an approximate area of 0.8km. Localities were selected based on the availability of historical small mammal survey data and to maximize sampling of the elevations and habitats in the Steens Mountain-Alvord Desert system. Of the total localities in this project, I selected 16 sites where both bird and small mammal data were collected for the years 2019 and 2021.

Within each survey locality (Figure 1.), I conducted avian point count surveys for breeding songbirds and small mammal trapline surveys in the summers of 2019 and 2021. Point-counts were conducted within 4 hours of sunrise between early June and early July to coincide with hours of peak bird activity during the breeding season in this region. I detected birds by sight or sound and recorded each bird’s distance from the observer to the nearest meter using a digital rangefinder. Small mammal trapline surveys consist of removal trapping along multiple traplines (usually 3-6) arrayed at each survey locality to capture the heterogeneity of habitat types and plant communities and to detect the highest diversity in the shortest amount of time. Using a combination of baited Sherman live-traps, Havahart traps, museum snap traps, Victor rat traps, and pitfall traps along traplines, I targeted rodent and shrew species (Orders Rodentia and Eulipotypla) (<500g). In most scenarios, mammal surveys were conducted for a minimum of four nights to maximize sampling. Small mammal surveys are conducted between early July and mid-August, tailored to align with periods of historical sampling while accounting for moon phase.

Analytical Approach

Methods – Exercise 1.

To characterize functional diversity of the passerine-rodent-shrew communities, I calculated functional metrics, such as functional richness, functional redundancy, functional divergence, and functional dispersion of communities using unique trait combinations of the birds and mammals detected in each survey locality in this project. These metrics were derived using the “mFD” packing in program r for each locality using the abundance data of the birds and mammals detected during field surveys. To assess spatial autocorrelation of the various functional metrics across the 16 localities for which I had both bird and mammal survey data. To do so, I calculated the global Moran’s I correlation coefficient for each functional metric.

I began by sorting my bird point-count records by taxonomic order (Passeriformes) and distance (< 100 meter from observer). I then joined bird data with small mammal data by locality. I selected unique mensural traits from the ecological literature that were both biologically relevant and shared both the small-bodied songbirds and small mammals in this system. These traits included body mass, litter/clutch size, % diet invertebrate, % diet vertebrate, %diet scavenged, % diet seed, % diet fruit, and % diet plant other. Traits came from two different databases, the “Amniote” database, and “Elton” database.

I calculated standard functional metrics for each locality using the “mFD” r package and matrices of species by site, and traits by species. To calculate functional redundancy, I first binned traits using the Sturgis algorithm to derive “functional entities, or species with unique trait combinations (UTCs). In r, I generated a distance matrix of localities using their associated geographic coordinate information. I then took the inverse of the matrix values and replace the diagonal entries with zero to complete a distance matrix that I could use to assess spatial autocorrelation. Once I had functional metrics for each locality, I computed Moran’s I in the r programming language using the ‘Moran.I’ function in the ‘Ape’ package. This was a relatively straight forward process using minimal lines of code, though I fist had to create a custom function to compute the coefficient across multiple columns representing individual functional metrics.

Methods – Exercise 2.

For this exercise 2, I was interested in exploring how the taxonomic and functional dissimilarity of the passerine bird and small mammal communities in my dataset vary as a function of geographic distance. Specifically, I modeled the Sørensen dissimilarity between localities, as well as its turnover and nestedness components, as both a function of geographic distance in kilometers and xyz distance (latitude, longitude, and elevation) to see how dissimilarity is related to distance.

I began by calculating the pairwise taxonomic dissimilarity of all pairs of localities using the Sørensen index in the ‘betapart’ package in R. The output of this package provides overall Sørensen distance, as well as the turnover and nestedness components of the dissimilarity. I then calculated the pairwise functional dissimilarity of all pairs of localities using the functional beta diversity function in the ‘mFD’ package (which utilizes the ’betapart’ framework). The output provides the functional Sørensen distance between all pairs of localities, as well as the functional turnover and nestedness components of the dissimilarity. Next, using the ‘sf’ package, I created geographic centroids for all my localities and calculated the pairwise geographic distance in km for all localities using the ‘st_distance’ function in that same package. Considering that these localities are distributed along an elevation gradient in a mountain-desert system, elevation is “built into” the geographic distances between localities. However, I also wanted to explicitly incorporate elevation into the distances. Thus, I used the ‘scatterplot3d’ package to create a 3-dimensional space composed of the xy (latitude and longitude) and z (elevation in meters) coordinates of the localities (Figure 2.). I then used the ‘dist’ function to calculate the xyz distance between all pairs of localities in this space. Finally, I modeled distance decay of taxonomic and functional pairwise dissimilarity, turnover, and nestedness for all pairs of localities against geographic distance in xy space and xyz space and plotted the results. These analyses were done using the ‘decay.model’ and ‘plot.decay’ functions in the ‘betapart’ package.

Methods – Exercise 3.

For this exercise 3., I was interested in assessing parameter sensitivity of the spatial pattern I described in Exercise 2. I tested the sensitivity of the dissimilarity distance decay analyses by filtering the dataset so that it only contained passerine bird observation that were within 50 meters or less of the points during surveys. I use distance-detection methods in my avian point count surveys while in the field, whereby I estimate the distance from me to all birds detected during a count to the nearest meter (calibrated by a laser rangefinder). Thus, I have distance estimates for every data point. In Exercise 2, I filtered bird observation to 150 m from observer. This is large distance and probably more than is reasonable for the dataset as it could introduce error, such as misidentified birds and potentially double counting of birds within localities due to overlapping count radii. As I wanted to test the method while ensuring I had enough species points to calculate the functional diversity, I decided to leave in those observation for the the first pass of this analysis in Exercise 2. In this exercise, however, I filtered the bird observations to passerines only detected within 50m or less of the observer during point counts. This is a much more reasonable distance and is in line with other approaches in passerine data collection methods using point counts. This filtering step reduced the number of observations in the bird-mammal input dataset to 1211 from the original 1668 species records and from 56 bird species to 51. I did not modify the small mammal data.

Figure 2. A simple 3-dimensional representation of the space containing the xy (latitude and longitude) and z (elevation in meters) coordinates of the 16 localities used in this analyses created using the ‘scatterplot3d’ package in R.

Results

Using the results of the Moran’s I global test (Table 1.) in Exercise 1., we can reject the null hypothesis that there is no spatial autocorrelation present for a given functional metric across these localities if p-value is < 0.05. Based on these results, functional richness, functional originality, and the number of functional entities appear to be spatially autocorrelated. All other functional metrics are not spatially autocorrelated correlated across the localities within the extent of this study.

Table 1. Output from Moran’s I analysis of all functional metrics across all localities.

The results from Exercise 2. show that turnover is the primary driver of taxonomic Sørensen dissimilarity between all localities (Figure 3.). Unlike taxonomic dissimilarity, functional dissimilarity does not seem to be driven exclusively by either turnover nor nestedness components and the relationship is less clear. I did not see a major difference in distance decay when dissimilarity is plotted against either xy or xyz distance, thus I have only presented the results from the xyz distance analysis here. The results after adjusting the dataset were strikingly similar to the results I obtained in the previous analysis (Figure 4.). As in Exercise 2, we do see a triangle plot emerge. This shows that taxonomic dissimilarity can be both extreme and low in near localities, but only extreme in localities that are spatially separated to greater degrees. This may indicate some lower bound of taxonomic dissimilarity as distance increases.

Figure 3. Taxonomic dissimilarity and functional dissimilarity as a function of three-dimensional distance (latitude, longitude and elevation in meters) for communities of rodents, shrews, and passerines modeled using a power function. Triangle symbols denote pairwise comparisons of localities. The y-axis indicates dissimilarity, and its turnover and nestedness components, in species composition (A – C) and functional composition (D – F) between localities (measured using the Sorensen dissimilarity index), with higher values indicating more dissimilar communities. Also shown are the slope (b) and coefficient of determination (R2) for the fitted models.

Figure 4. Taxonomic dissimilarity and functional dissimilarity as a function of three-dimensional distance (latitude, longitude and elevation in meters) for communities of rodents, shrews, and passerines modeled using a power function after filtering the data to only include bird observations < 50 m from observer. Triangle symbols denote pairwise comparisons of localities. The y-axis indicates dissimilarity, and its turnover and nestedness components, in species composition (A – C) and functional composition (D – F) between localities (measured using the Sorensen dissimilarity index), with higher values indicating more dissimilar communities. Also shown are the slope (b) and coefficient of determination (R2) for the fitted models.

Interpretation & Significance

Functional richness (FRic), functional originality (FOri), and the number of functional entities appear to be spatially autocorrelated based on the results of the Moran’s I global analysis. FRic indicates reflects the amount of niche space filled by species in the community. In this analysis I primarily chose functional traits that correspond to diet, thus changes in species composition may modify the functional richness if those species consume drastically different resources. It makes sense that communities that have similar FRic would be geographically autocorrelated due to the environmental filtering along the elevational gradient, as communities closer to each other are most likely comprised of similar species that use similar dietary resources. FOri quantifies how changes in species abundances modify the functional redundancy between species (i.e., minimal functional distances among species pairs). Species tend to be functionally less original in the pool if they tend to share their traits more closely with other species. The interpretation here is a bit trickier, but it seems that localities that have similar abundances of particular species are autocorrelated.

Turnover appears to be the primary driver of taxonomic dissimilarity in the bird-mammal communities along this gradient. As turnover between communities is though to indicate environmental filtering processes structuring species composition, this pattern makes sense considering these localities are distributed along an elevation gradient with major differences in environmental conditions and habitat between localities at different elevations. Interestingly, we do see a triangle plot emerge (panels A and B in Figure 3.). This shows that taxonomic dissimilarity can be both extreme and low in near localities, but only extreme in localities that are spatially separated to greater degrees. This may indicate some lower bound of taxonomic dissimilarity as distance increases.

The fact that the same dissimilarity pattern emerges when I adjusted the dataset to only include bird observations within 50 m of the observer is interesting (Figure 4.) and somewhat of a relief. For one, it suggests that that my field sampling was thorough enough to capture similar species both near and far from observation points. More useful though, this sensitivity analysis shows that the dissimilarity patterns observed are robust to slight variations in the underlying data set. The numbers of bird species was reduced in the dataset by 5, and the total number of observations was reduced by 457. This did have an effect on functional richness, as the functional metric calculations used in the ‘mFD’ package take both species identity and the abundances of each species into account. However the overall pattern of dissimilarity remained the same between pairwise comparisons of sites across the elevation gradient despite the reduction of species data.

The findings I have produced in this course are an important part of describing the pattern in functional diversity of these bird-mammal communities along the gradient of the Steens-Alvord system. These results tell part of the story and will be valuable as I connect functional diversity of these disparate taxa to underlying environmental habitat characteristics.

Learning Reflection

Over the course of these three exercises I increased my learning of using program r to perform several analyses and wrangle data. Specifically, I used the ‘mFD’ and ‘betapart’ packages to perform multivariate analyses in order to calculate estimates of both taxonomic and functional diversity across communities, as well as dissimilarity of these communities as a function of geographic distance in three dimensions. I did a great deal of data sorting, which was time consuming but helpful in the long run—both for my dissertation chapter and my increased understand of working in r. I learned that certain spatial analyses were not appropriate for my data given the structure of the dataset and the aggregate metrics I was interested in. These techniques included kernel density and autocorrelation function, for example. I spent a great deal of time at the end of the course attempting to perform interpolation techniques on habitat data associate with my species observation data. Specifically I wanted to assess origins techniques to interpolate a habitat surface across locality landscapes from discrete habitat point data I have. Ultimately, I was unable to accomplish this in r, though I think this speaks more to my limitations that the program. I also failed at attempting to do this in QGIS before running out of time at the end of the course. I plan to continue attempting to achieve this goal.

Future Directions & Techniques

The work I have conducted in this course has been beneficial in that I have described spatial patterns of taxonomic and functional diversity in bird-mammal communities across the Steens-Alvord elevation gradient. This is a necessary first step for the work I am planning to do for this chapter of my dissertation. Ultimately I also plan to investigate the relationship between these patterns and covarying changes in habitat heterogeneity across elevation. To do this I will utilize habitat data I have calculated along small mammal trapline (i.e., structural complexity indices derived from desecrate habitat quadrate data collected in the field), as well as remotely sensed data I have pulled in using ArcGIS. I am interested in trying regression kriging and Empirical Bayesian kriging to develop habitat complexity surfaces that could be used as predictors in models that describe the influence of habitat heterogeneity an available area by elevation to functional metrics of these communities. Thus, my work continues.

Whale what pattern do we have here?

Research Question

My question and goal for the term stayed consistent across the term. I did drop one environmental driver, swell height, due to time constraints with downloading and extracting data. I also filtered my focus of the data from all years collected to just the month of the year with the most whale sightings. This month occurred during an El Nino year, so my hypotheses and initial goals were still applicable. Due to this shift in examining just the ten days, I stopped using the boundaries of the marine sanctuaries in my analysis in exercises 2 and 3; however, I did still use the clipped whale locations to those sanctuaries.

The original question was: The spatial problem I wanted to examine was the impact of ENSO on environmental drivers for fin whale area restricted search (foraging) behavior. This question was eventually broken down to what is the impact of ENSO on the spatial distribution of fin whales from August 1-10, 2016?

Data Description

The fin whale data used was from the Whale Habitat, Ecology, and Telemetry lab. The fin whales were clipped within the three marine sanctuaries, Cordell Bank, Monterey Bay, and the Greater Farallones, with a behavior state of 2 (area restricted search). The fin whale points have a resolution of 3 locations per 8 hours; however, some only transmitted one location for the day. The fin whale data was collected in 2004, 2006, and 2014-2018 during the summer-fall months. The whale locations were collected using Argos satellite tags and processed through a Bayesian switch state-space model, which produced regularized tracks and assigned behavior classifications due to the characteristics of the points.

All the area restricted search points had 714 point locations that were recorded daily with some uneven gaps. Their northernmost latitude is 38.99569 and their southernmost latitude is 35.54629. Their eastern-most longitude is -121.4425 and their westernmost longitude is -124.25.

When I restricted it to just August 1-10, 2016, the northernmost latitude is 37.82975 and the southernmost latitude 36.57525. The easternmost longitude is -122.0430 and the westernmost longitude is -123.4833.

SST: SST, GOES Imager, Day and Night, Western Hemisphere, 2000-2020 (1-Day Composite) from ERDDAP. This dataset had a resolution of 0.05 for latitude and longitude.

Chlorophyll-a: Chlorophyll-a, Aqua MODIS, NPP, L3SMI, Global, 4km, Science Quality, 2003-present (1-Day Composite) from ERDDAP. The dataset has a resolution of 0.0416 for latitude and longitude.

Disclaimer: Some chlorophyll-a concentration data was fabricated due to internal issues with the nibble program. I recommend using a different dataset for chlorophyll-a if someone were to recreate this analysis.

**Figure 1.** All area restricted search (ARS) locations for fin whales in the data set. Each year is a different shade of blue and the marine sanctuary borders are in different colors. Purple for Monterey Bay, orange for the Gulf of the Farallones, and green for Cordell Bank.

Figures 2-5. Chlorophyll-a concentration and sea surface temperature at the latitude and longitude of the fin whale locations were recorded between August 1-10, 2016.

Hypotheses

Exercise 1
1. During cold modes, the spatial pattern of whale area restricted search will be clustered in areas with low SST, high chlorophyll-a concentration, and higher swell. I expect the clusters of area restricted search to be tighter in those conditions due to them needing to travel shorter distances to find the best locations in the habitat for feeding.
Exercise 2
1. Cold sea surface temperature and high chlorophyll-a concentration in the central California coast area promote (enhances) fin whale area restricted search locations.
1. Warm sea surface temperatures and low chlorophyll-a concentration in the central California coast area limits (reduces) fin whale area restricted search locations.
Exercise 3
1. Based on the distribution of the whales and environmental drivers, the model will produce an output with the similar spatial distribution of whales under similar conditions for sea surface temperature and chlorophyll-a concentration.

Approaches

Point Pattern Analysis

In Exercise 1, I tried multiple methods to assess if the recorded whale locations were in a cluster or dispersal pattern. I also conducted some preliminary data visualization to examine where the patches were before statistical tests were applied. I had attempted to make a k-cluster test in python prior to this class, so I attempted that method first and moved on to Average Nearest Neighbor afterward when those results were difficult to assess in later years. This test was the most useful because the ratio value determined if the data was trending towards dispersal or clustering. I had extra time, so I also conducted point and kernel density on the points. While these helped generate future questions or focus points, they were not helpful for this exercise’s goal.

Cross-Correlation

My initial goal in the second exercise was to examine the autocorrelation between the two environmental drivers, sea surface temperature and chlorophyll-a concentration, and the whale locations. While this was interesting to examine, due to the nature of the telemetry data and the lag component of the autocorrelation and cross-correlation functions, I had difficulty interpreting the results, and it pushed my knowledge of R and statistics to the edge. Dr. Jones and I developed a personalized test using the kernel density value of the August 1-10, 2016, whales and an interpolated trend of the environmental data recorded during that time. This exercise improved my ability to problem solve and find personalized workarounds for my data that can still be understood and replicated by others. This problem helped me break down the larger question into smaller chunks that were easy to accomplish with my knowledge of ArcGIS Pro analysis tools.

Modeling

For the final exercise, my goal was to run a species distribution model that specifically examined the parameter sensitivity and impact on the spatial pattern of the whale locations. Due to the presence-only data, I started my process with the MaxEnt software but quickly ran into an issue because it was no longer supported by R studio. To counter this, I installed and learned the basics of Maxnet, MaxEnt’s successor in R. Despite my data being perfect for that type of model, the results in the output did not make sense. I found the ArcGIS equivalent of MaxEnt and ran into an issue from previous exercises: difficulty interpreting the results. While examining other model options in ArcGIS Pro, I found a random forest tool that predicted the latitude and longitudes of predicted whale locations based on the environmental driver parameters. From here, I devised some code in R to divide the location of the whales into 0.25-degree grid cells to count the number of whales present in each cell for the observed and predicted. I found that the model was both under and over-predicting values between displaying the results and calculating their residuals.

Results

**Table 1.** Tables of the five different average nearest neighbor calculations for each year.

The average nearest neighbor calculations for all the location data used in this exercise have statistically significant distances in all but one year, 2014. 2004’s p-value, while statistically significant, should be viewed with caution as the sample size is very small and likely impacts the p-value. 2015-2017 have a strong correlation between their distances.

The 2004 and 2014 data are considered trending toward dispersion due to their nearest neighbor ratios being higher than 1. 2015, 2016, and 2017 are considered trending towards clustered due to their nearest neighbor ratios being less than 1 (see Table 1). 2004, 2014, and 2015 were El Nino years, and the latter part of 2016 and all of 2017 were La Nina years. Using the Nearest Neighbor and point density results, 2016 and 2017 confirm the hypothesis I tested.

**Table 2.** Pearson’s product-moment correlation was conducted on the environmental factors of interest (SST and chl-a) and the longitude of the fin whale locations. The test produced a t-value, degrees of freedom (df), p-value, and 95% confidence interval.

All but one correlation test resulted in a statistically significant p-value. The test for chlorophyll-a and longitude was the only test to produce a confidence interval entirely in the negative range. The absolute value of the T statistic produced in this test is used to determine if the autocorrelation for a specific lag equals zero. A T value greater than 2 indicates the autocorrelation is not equal to zero. In the chl and lon test, depending on rounding conventions, this T statistic could indicate the autocorrelation is not equal to zero.

**Figure 6.** Kernel density of fin whale locations across August 1-10, 2016, overlayed with the interpolated trend of average SST in the same period recorded at the whale locations.

**Figure 7.** Kernel density of fin whale locations across August 1-10, 2016, overlayed with the interpolated trend of average chlorophyll-a concentration in the same period recorded at the whale locations.

The highest density of whales occurs in the 13-14 degrees Celsius range with a few in the warmer range. The highest density of whales occurs in the 6 mg m-3 range with a few in the higher concentration range.

**Figure 8.** Predicted vs Observed whales from the random forest model. The trendline was set using a linear regression method with a 95% confidence interval.

The random forest model in ArcGIS Pro produced predicted latitude and longitudes for the whales given the parameters of the SST and chlorophyll-a concentration data. When examined in a 0.25-degree cell, there grid cell 7 (-123, -122.7 and 37.1, 37.3) had the most whales with 53 observed and 125 predicted.

Based on the results from Figure 8, there are some aspects of the model that underperformed, the points below the line and confidence interval, and those that overperformed, the ones above the trend line.

Significance

The different results from the exercises help understand the relationship between location and environmental factors. While I focused on a specific month, I combined my oceanographic and biological knowledge with the results and interpreted the outputs constructively. The tools and knowledge gained from this project can be used by future handlers of the data to determine the next steps in spatial analysis and other relationships to examine.

Software-wise, this project was significant because I used several R and ArcGIS Pro combinations. Many complete their analysis in just one software, but I wanted to combine workflows. This desire expanded my problem-solving capabilities because I could draw from multiple sources to achieve the analysis tests.

My Improvements in Skills and Statistics

I initially started with working proficiency with R and ArcGIS Pro and novice skills in Python3. At the end of the term, I believe I am closer to an expert with ArcGIS Pro and R regarding problem-solving and creating custom workflows for the problem. I am now at working proficiency with Python3.

I gained beneficial experience in manipulating data and understanding why specific formats and data collection work with some analysis tools and not with others. I was working with telemetry track data, which resulted in presence-only values.

Evolving Questions and Future Techniques

I would like to make the plots and maps from exercise 2 and run the models used in exercise 3 on different temporal scales and compare the results of the models to the initial question asked and hypotheses. Based on the graph of the relationship between the predicted and observed whales, I would like to examine further where the model is under and overpredicting. Examining those locations and trends in the data may answer why the model produced the output it did.

I would also like to develop a better contingency plan for the fault in remote sensing data and how to fill gaps in the data extraction. I had several issues with the raster data that impacted the accuracy and replicability of this analysis. While I stand by the methods I used for this class project, given more time and resources, I think I could have found another workaround that would make the results publishable.

I would also include all the whale behavior state locations to explore a more comprehensive analysis and explanation regarding what influences foraging behavior locations and spatial patterns.

Ringtail Home range estimation and Species Distribution Modelling in southwestern Oregon

The research question that you asked (provide one question for each exercise).

Exercise 1:How much area does a ringtail territory occupy, is this consistent between individuals and sexes?

Exercise 2: What environmental variables influence ringtail distribution?

Exercise 3: What forested habitat types are ringtail selecting or avoiding, and do these relationships change at different scales?

Final Project: What does predicted ringtail distribution look like across the Applegate Wildlife Management Unit, and what factors are associated with ringtail presence?

Male Ringtail released after being fitted with a GPS collar

A description of the dataset you examined, with spatial and temporal resolution and extent.

I captured ringtail from October 2020 to May 2022 in the Applegate Wildlife Management Unit in southwestern Oregon (Figure 1). I deployed both GPS and VHF collars and retrieved 1,462 gps locations from 16 individuals. Collars were scheduled to record locations 3 times per night during foraging hours, and lasted approximately 3.5-5 months per deployment. My GPS locations are clustered on the Eastern portion of the Applegate Wildlife Management Unit (Figure 2).

In conjunction with my GPS data I used remotely derived environmental data including elevation, slope, aspect, and canopy cover at 30m resolution. I used NLCD landcover data types to create buffered habitat layers at the 0.1km and 0.5km scale.

**Figure 1.** The Applegate WMU is 57% public lands and is bordered by the state of California on the south and Grants Pass, OR on the North.

**Figure 2.** Outline of the Applegate WMU with ringtail gps locations in blue

Hypotheses: predictions of patterns and processes you looked for.

I was particularly interested in mapping my ringtail home ranges to derive estimates for space use, territorial overlap, and utilization distributions. I expected home range sizes to vary by sex, minimal territorial overlap among males or females, and clustered utilization distributions.

I was also interested in habitat associations for ringtail within my study area and the Applegate WMU. Previous data from a California species distribution modelling suggests that ringtail presence is positively influenced by presence of hardwoods, larger hardwoods, canopy closure, steeper slopes, and best modelled at a coarse scale 10km² (Campbell 2004). These factors likely influence the presence of ringtail in the state of Oregon, but available habitats differ from those available in California.

I expected a positive relationship between ringtail presence and slope, canopy cover, and hardwoods at all ages and scales. I expected a negative relationship between presence and habitats with old growth characteristics, primarily because these are often found at high elevations and do not have a hardwood component. I expected a polynomial relationship between presence and elevation because ringtail are a mid-elevation species (table 1).

Approaches: analysis approaches you used.

I used program R and the packages adehabitatHR, sp, rgdal, and raster to estimate utilization distributions and create home range polygons. Within the package adehaitatHR, I used the functions KernelUD, getverticeshr, and getvolumeUD to calculate utilization distributions as well as area estimates for each individual.

For habitat associations I used two methods, logistic regression modelling using pseudo absence data, and maximum entropy (MaxEnt) modelling using presence only data. All modelling was done within Program R using the packages raster, reshape2, dismo, maxnet, glmnet, MuMin, presenceAbsence, and ecospat.

Results: what did you produce — maps? statistical relationships? other? Present the key, important results you created.

For my first exercise I produced home range estimates (Table 1), home range polygons (Figure 3), and utilization distributions for individual ringtail (Figure 4). I was surprised to learn that home range sizes were highly variable between individuals and within sexes (mean 445 ha; range 58-795 ha). I created utilization distributions for each individual ringtail, most were unimodal and a few individuals had bimodal distributions.

For mapping ringtail distribution I produced probability maps using logistic regression modelling and Maxent models in the little Applegate valley where I collected my gps locations (Figure 5). I further expanded the extent of my Maxent model to include the entire Applegate WMU (Figure 6). Using my final model for the Applegate WMU, I produced variable response plots (Figure 6). All habitat types in my final model were buffered to the 0.5km scale.

Table 1. Home range size estimates for 16 ringtail using kernel polygon methods (*Female)

Individual ID	Home range estimates (ha)
R02	546.99
R05	615.08
R08	201.7
R09*	641.04
R10	472.95
R13	353.89
R14	232.99
R15	795.26
R17	312.68
R18	310.91
R19	621.33
R20	359.04
R22	430.49
R23	628.86
R24*	58.301
R25	539.92

**Figure 3.** Map of kernel polygon boundaries for individual ringtail, highlighting overlapping territories.

**Figure 4.** Bimodal utilization distribution for Ringtail R05 (male)

**Figure 5.** Ringtail presence probability map of the Applegate valley using maxent methods

**Figure 6.** Final Presence probability map for the Applegate WMU using the MaxEnt modelling method. Dark green indicates highest probability of occurrence.

**Figure 6.** Variable response plots created using the package maxnet. Response curves show how each the model prediction changes as each environmental covariate is varied (keeping all others covariates at their average).

What did you learn from each of the analyses you conducted (i.e., from each exercise)?

Exercise 1: Home range size was highly variable between individuals and within sexes (mean 445 ha; range 58-795 ha). The smallest home range (R24; 58 ha) belongs to a female, but so does one of the larger home ranges (R09; 641 ha). Ringtail do not use their entire home ranges equally and have unimodal or bimodal utilization distributions.

Exercise 2: Steeper slopes, SW aspect, increased canopy cover, and mid-elevations are all important variables when looking at ringtail distribution. Habitat types also influenced ringtail distribution, with an avoidance of grasslands, shrub covered sloped, and old growth stands. Only one habitat type had a significant positive relationship Mixed conifer (white/douglas fir) aged 31-80 years (p-value 0.0002)

Exercise 3: Scale matters when conducting modeling exercises. When habitat variables were buffered to 0.5km, their significance changed. For example at scales of 30-100m hardwoods were negatively associated with ringtail presence, but when buffered to 0.5km scale, hardwoods were positively associated with ringtail presence. I think this result is likely due to the nature of my GPS locations. They were collected during active foraging times and represent a particular behavior. Ringtail are known to use hardwoods as diurnal resting locations, and ringtail can travel >500m in a single evening. It is possible that foraging quality is reduced in hardwood habitats, or prey is not present in sufficient quantities at certain times of the year (ie winter/early spring).

Significance. How are these results important to science? to resource managers?

Knowing how ringtail use their surrounding environments, including the total space needed to support an individual, and the type and quality of habitat they require all add to our understanding of ringtail ecology. This information can help managers make informed decisions regarding proposed land use changes and their impacts to the species, how populations may react to climate change, and making informed decisions regarding species conservation status.

Software learning. Your learning: what did you learn about software (a) Arc-Info, (b) GIS programming in Python, (c) programming in R, (d) Modelbuilder in Arc,or (e) other?

I learned many new techniques for manipulating spatial data with program R. I am still more familiar with visualizing data using GIS, but the reproducibility of R code makes it an excellent tool, particularly for modelling.

Statistics learning. What did you learn about statistics?

I learned there are many ways to quantify the relationships between spatial data, and those methods can be easy to perform using programs such as R and ArcGIS. I found hotspot methods to be very useful for animal movement data, and regression techniques. I used Kernel density methods, utilization distributions (similar to hotspot), and regression methods for my analyses. Hotspot methods are useful for identifying the location and intensity of clustering within a dataset. Spatial autocorrelation can describe how your variable relates to itself, and cross-correlation can describe relationships between two variables. Regression is what I am currently most familiar with, and can be useful for describing how your response variable is influenced by explanatory variables.

Evolving question. How did the results of each analysis lead you to change/refine your question? Write out the original question you stated at the beginning of the class, and restate the question(s) you now plan to address.

Original Question: How is the distribution of ringtail related to the quality of available habitat used via the amount of food, water, and resting structures available?

Exercise 1:How much area does a ringtail territory occupy, is this consistent between individuals and sexes?

Exercise 2: What environmental variables influence ringtail distribution?

Exercise 3: What forested habitat types are ringtail selecting or avoiding, and do these relationships change at different scales?

Final Project: What does predicted ringtail distribution look like across the Applegate Wildlife Management Unit?

Future techniques. What techniques would you like to explore to answer your research questions in the future?

I want to explore my final model set for the Applegate WMU in greater detail, using step-wise selection and AIC and/or AUC to select the best model for my data.

I want to expand my dataset to include historical ringtail location data, and locations collected at diurnal rest site locations. Using rest site and foraging locations will improve the quality of my modelling efforts. I want to expand the extent of my current model to southwestern Oregon, which encompasses the known range of ringtail within the state. At this larger extent I want to try modelling variables at scales up to 10km2, as suggested in Campbell 2004.

Literature Cited

Campbell, L. A. 2004. Distribution and habitat associations of mammalian carnivores in the central and southern Sierra Nevada. Dissertation, University of California Davis, Sacramento, California, USA.

Final Project Blog Post. The relationship between Mean Sea Level Pressure and Geomagnetic activity in the year 2020.

The research question that you asked (provide one question for each exercise).

Exercise 1: “How is Mean Sea Level Atmospheric Pressure distributed across the time period of 1980 to 2020?”

Exercise 2: “Is there a global relationship between Mean Sea Level Pressure and Geomagnetic Intensity (year 2020 specifically)?”

Exercise 3: “Is there a significant difference in the global distribution of the residual values for Mean Sea Level Pressure from Geomagnetic Intensity for the 2010-2020 period?”

A description of the dataset you examined, with spatial and temporal resolution and extent.

Mean Sea Level Pressure data is retrieved from the Copernicus website from NOAA as a .grib global dataset with 0.25×0.25 for 1980 to 2020.

Global Geomagnetic data (specifically, Geomagnetic Intensity) was retrieved from the NOAA IGRF2015 model as a .csv file with a 0.5×0.5 resolution for the same periods.

Both variables are a set of classified values of a specific value, which is basically meteorological and geological data.

Hypotheses: predictions of patterns and processes you looked for.

My hypothesis was that higher Magnetic Intensity would result in higher Mean Sea Level Pressure, attracting heavy air particles closer to the area.

Approaches: analysis approaches you used.

I used hotspot analysis and Moran’s I autocorrelation for the first exercise.

I used Cross-Correlation for my second exercise.

Finally, I used the “Agreement/difference between two (raster or vector) layers” method, though I had to use several additional tools to substitute the “Confusion matrix”, due to data being too bulky to be able to be processed in ArcGIS.

Results: what did you produce — maps? statistical relationships? other? Present the key, important results you created.

Several maps and statistical relationships were produced to describe the relationship between Mean Sea Level Pressure and Geomagnetic Intensity.

First, I used the ArcGIS geoprocessing tool called Spatial Autocorrelation (Moran’s I global) to assess if the data is dispersed or clustered.

I expected it to show a clustered distribution with a Moran’s Index of 0.99 and a p-value of 0 due to the data being a 0.25×0.25 degree grid.

The results proved to be as assumed.

In order to define future areas of interest, a hotspot analysis was conducted.

I expected it to show a few spots of aggressive fluctuation in atmospheric pressure.

The results presented were quite interesting and, in some cases, unexpected. As presumed, most of the ocean and sea surface didn’t have any significant fluctuations. However, generally, only the equatorial area didn’t present any hotpots. Analysis showed that there are areas of interest near Chile and some mountain formations across Africa. Most of the map presented a large area of hot spot with 99% confidence in both hemispheres.

For the second exercise, Initially, I thought of using Geographically Weighted Regression. However, I was only able to perform it later, when comparing two time periods, and it was calculated with a lot of errors. Therefore, I decided to use Cross-Correlation, since it would present me more valuable information.

I expected it to plot a map in a form of a taster to assess areas with high correlation and, therefore, the possible relationship between Mean Sea Level Pressure and Geomagnetic Intensity (year 2020 specifically). However, it was plotted as a feature layer, which I had to transform into a raster later (both for a more visually comprehensive picture and to being able to perform raster analysis).

By assessing the following information, it is seen that there are areas that present a high relationship between Mean Sea Level Pressure and Geomagnetic Intensity. As such, these areas are mostly around the Arctic and Antarctic areas, where the relationship is negative, with several other areas:

1) Eastern part of the Asia

2) European area

3) The middle part of the Pacific Ocean near the coast of North America

4) Southern part of South America

The correlation index R^2 is equal to 0.04, which can tell us that the relationship, though weak generally, is still present. However, certain areas still have a high correlation. By assessing the graphs, it is evident that further research is needed to look for a possible lag in the relationship.

Lastly, for the third exercise, I plotted Standard Deviation from predicted values maps to compare the temporal difference within the assessed relationship.

I transformed both of my feature layers of Standard Deviation for 2010 and 2020 into rasters, using “Kriging” tool.

Then I combined both layers to look for any differences/similarities between the two rasters.

After that, I used the “Change Detection” tool to specifically show how the two rasters differ from each other.

I also plotted two graphs in order to look into the comparison between the Standard Residual and Normal Distribution for each time period.

My initial hypothesis was accurate to some extent. The overall global trend is similar within given time periods, taking the graphs for Standard Residual vs Normal Value distribution into account. However, the strength of the relationship between Geomagnetic Intensity and Mean Sea Level Pressure differs depending on the region. Looking into the Change detection raster, the most stable regions are seen where there is no color. Currently looking into the Tropical Cyclone distribution as a side project, this tells me that the tropical region near North America will be a perfect fit for further assessment.

The difference within certain regions might be caused by constant changes in magnetic fields. Therefore, the overall process of change would probably be due to different polarization of the Earth and certain areas being more magnetized than the others. One of the other assumptions previously thought of is that such distribution change could be due to anthropological influence in specific areas, such as Asia and North America. Though this is just speculation.

What did you learn from each of the analyses you conducted (i.e., from each exercise)?

Exercise 1: I learned that Moran’s Spatial Autocorrelation analysis makes no sense with grid data, because the cells themselves are already clustered. It can only show if there are any missing spots in the data itself. However, a hotspot is much more representative of the situation, though similar to the plotted map itself.

Exercise 2: I finally learned how to properly do correlation indices, though it took me a lot of steps. Most importantly, I learned that, apparently, Geographic Regression doesn’t work with that amount of data in ArcGIS, which requires a separate program written for it. Additionally, I learned how to properly set the same resolution for two different datasets manually since ArcGIS wouldn’t properly change the resolution with the geoprocessing tools it has.

Exercise 3: I learned that the best way to look for differences in distribution would be a hotspot analysis. Though, additionally, contour maps could be projected to see the movement of the variables. Furthermore, I learned that the temporal scale is very important, and future assessments should include a much larger period. In this exercise, I have also learned about Kriging, which I used to plot my rasters. It was a very useful tool to implement since my data was more visually presentable and didn’t have any missing data within it.

Significance. How are these results important to science? to resource managers?

I believe that this can be important to hydrometeorology, since, if proven to be right, the approach of forecasting several weather variables through geomagnetism (or including it in the existing forecasting models) could potentially improve the forecast period or the quality of them. However, originally, I was looking into this relationship to learn whether it’s possible to predict the formation of Tropical Cyclones.

When talking about resource managers, it’s possible to be able to direct resources to specific areas of need. For example, if we know that there’s a high magnetic field in a specific area of the Pacific Ocean, we will know to look for cyclogenesis and add geomagnetism to track its probable route. Therefore, resource managers would know where to spend their resources as means of mitigating the damages caused by cyclones.

Software learning. Your learning: what did you learn about software (a) Arc-Info, (b) GIS programming in Python, (c) programming in R, (d) Modelbuilder in Arc, or (e) other?

Throughout these exercises, I’ve used several different tools. However, not all of them were eventually used for the final assessment.

Excel: Terrible when data is too big. It’s more efficient and faster to either use Python, R, or SQL (which I might be learning next);

ArcGIS: I explored more tools that I can use in my future assessments. Moreover, some tools do not work as well, and it seems that Esri updates its software very rarely. I had a talk with one of the GIS specialists when applying for an internship. He said that Esri packages are very limited in their functionality and, due to that reason, they primarily use QGIS. Therefore, I might switch to QGIS, since it is Open Source, free, and will be easy to learn after using ArcGIS.

R: Initially, I thought that it would be useful to remember R and start learning it again. Seeing how everyone uses it made me think that it has improved. However, while trying to complete this assessment in R, I remembered why I quit coding in it and switched to Python. I know that R has a reputation for being a science/data oriented programming language, but it seems to be less intuitive than Python.

Python: I have never worked with GIS packages before in Python. I wasn’t able to use them to their fullest, but I was able to learn a new package for me, which is pandas. I remember using several other packages to be able to perform the same analysis. Now it requires less time and effort for me to perform data analysis, especially with big data. One of the most important things for me also was that I was able to return to programming once again, which got me into learning and exploring new packages.

Statistics learning. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) cross-correlation/regression (cross-correlation, geographically weighted regression [GWR], regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA, multiple component analysis), (e) other techniques (change detection/confusion matrices, other)?

Hotspots: A perfect tool to use when having grid data. It allows assessing certain points of interest better visually. And being able to combine 2 layers of hotspots, makes it easier to define changes in, for example, temporal scale.

Spatial Autocorrelation: This doesn’t provide much information with grid data, except for, perhaps, some errors or missing information within the dataset itself. However, could be a useful tool for other types of data.

Cross-correlation/GWR: Personally, I love this method, since it was able to provide specific relationship variables for my datasets. It takes a lot of time and requires a lot of preparation if datasets are different (which was my case). Though it shows very representative and informative results.

Agreement/difference between two (raster or vector) layers: I mostly used this term to describe how I was able to combine two raster layers of hotspots to be able to visualize changes. However, I used additional statistics, such as Standard Residuals distributions to look if the pattern in different time periods would be different.

Evolving question. How did the results of each analysis lead you to change/refine your question? Write out the original question you stated at the beginning of the class, and restate the question(s) you now plan to address.

My initial question was: “How are the Atmospheric variables that describe the weather related to Geomagnetic properties of the Earth through the mechanism of heavy particles magnetization”.

After assessing the results of my analysis, I decided to be more specific in my variables and time periods. I wasn’t able to perform an analysis on a large temporal scale, so I shorten it quite much. I also wasn’t able to find data on some of the variables I initially wanted to look at (Kpa/AA).

Consequently, my question transformed and changed into the following: “Is there a relationship between Mean Sea Level Pressure and Geomagnetic Intensity for the time period of 2020?”

Future techniques. What techniques would you like to explore to answer your research questions in the future?

I would love to explore Geographically Weighted Regression more in the future. Currently, I am not sure whether this is an ArcGIS problem, or if I am doing something wrong, but I wasn’t able to perform it within the program without errors. The map I was able to get in ArcGIS resulted in a feature layer with a lot of errors in the form of a checkerboard (it was basically a grid, where the values would be good inside the grids themselves, but the “lines” would be error data).

When dividing my global data into separate areas, I would also like to work with a confusion matrix, since it seems that it won’t compute with such a huge amount of data. Which forces me to, perhaps, explore the QGIS’s open-source code.

I would also like to explore more plotting and GIS related packages in Python since it seems that this topic is of interest to me. Furthermore, I want to know if it is possible to implement this knowledge into AI programming.

“Geo-analysis of Wave Power Potential along the Oregon Coast”

GEOG 566 Spring 2022

Submitted by: Sanjaya Paudel

Questions asked:

Where are the hotspots for wave power in Oregon Coast?
How does the spatial pattern of Wave Power (Dependent variable) vary with locations?
What maybe the explanatory factors (Wave height and Peak Period) affecting the Wave Power in Oregon coast?
How does Bathymetry of the location affect wave power?
How does the natural and man-made features in coast affect the wave power potential?

A description of the dataset you examined, with spatial and temporal resolution and extent.

The datasets to be used in this research was downloaded from Department of Energy’s Water Power Technology Office’s (WPTO), and Marine and Hydrokinetic Toolkit (MHKit). The downloaded csv has Wave Characteristics: Significant Wave Height & Peak Period. Using these datasets, a novel Net Wave Power Assessment (WNPA) is performed which will give us the metrices of extractable wave power from each station downloaded. The dataset for bathymetry of the Oregon Coast was obtained from NOAA which was in netcdf(.nc) format. It was converted to raster using NETCDF tool in ArcGIS.

Hypotheses: predictions of patterns and processes you looked for.

The wave power varies by location because the wave characteristics responsible for wave power such as wave height and peak period varies by location.

More the distance away from coast more will be the wave power, however hotspots may arise near the coastal region too.

The man made and natural features may have affected on wave power potential.

The bathymetry of the location indirectly affects the wave characteristics such as wave height and peak period, however, the bathymetry may also affect wave power production directly.

Approaches: analysis approaches you used

I used ArcGIS Pro to visualize and quantify the spatial patterns in which I was interested. For Interpolation of my point data features I used three different approaches: Kriging, IDW and Thin plate spline. And for determining the relationship between my dependent variable and explanatory variables, I used Geographically Weighted Regression (GWR), a geoprocessing tool available in ArcGIS Pro toolbox.

I had a csv file containing 4,575 stations with coordinates, wave height, significant wave height, peak period, bathymetry, and wave power. First, I added the csv file in ArcGIS then displayed the stations using “Display X, Y Data” tool. The stations are shown in figure 1 below.

There were gaps in my datasets, I decided to use interpolation to properly visualize the wave power over this area and find the hot spots. There were different options of interpolation available in Arc tool box. I tried 3 of them and compared their results with each other to find out which interpolation technique is best suitable for my datasets. The resulted maps are shown in Figure 2 of result section. To quantify the differences between 3 different interpolation technique, I divided the total dataset into two parts: one for interpolation modeling and next one for validating. I choose 100 random stations from whole datasets and used it for validation and used rest of 4,475 stations for modeling all three interpolations. After interpolation of the surfaces, I used “Extract values by Point” tool in ArcGIS to extract the values obtained at that 100-point stations. I subtracted the true(original) value at that location with interpolated values, then calculated Root Mean Square (RMS) to see which has higher RMS. The method with less RMS is believed to be better model. The calculated RMS for 3 interpolation methods is shown in Table 1 below of Result Analysis section.

Later after viewing the interpolation result and visualizing the hotspots, I was eager to find out why there are such hotspots and what factors are driving it at that location. I then used GWR tool in ArcGIS to view the relationships with my dependent variable (wave power) and explanatory variables (wave height, peak period, bathymetry). We have to define the number of neighbors and some other parameters such as cell size for the GWR calculation. The GWR tool in ArcGIS provide us the summary report with the goodness of fit (R-squared). R square varies from 0.0 to 1.0, with higher values being preferable to higher influence. It can be thought of as the percentage of dependent variable variance that the regression model accounts for. The GWR tool also gives us the graphs showing the relationship between dependent variable and explanatory variables along with the relationship between the explanatory variables as well. The result is presented in the Figure 3 below.

Result Analysis:

Table 1: Comparison of 3 Interpolation techniques.

Kriging had less RMS value compared to IDW and Spline method.

Figure 2: Comparison of Interpolation techniques.

From the figure above, I found out that the hots-spots are around Newport and Yaquina head. The wave power potential at those locations were 5 kilowatts per meters. Close examination of Newport area showed that those region with high wave power has bathymetry around 15 to 20 meters. The close analysis of Yaquina head also showed the bathymetry to be around 15-20 meters. My next case study region was Waldport which had similar geographic feature as Newport. i.e with the mouth of river flowing to ocean). However, the wave power potential was only around 3 kilowatts per meter which is comparatively low with compared to Newport or Yaquina Head. To find out why, I compared the bathymetry of that location. Mean elevation of Waldport area was -24 m in case of Waldport but in case of Newport it was near -15 meters. However, we cannot conclude that the bathymetry of 15 meters is the only factor because in other region with same bathymetry they had wave power lower than Newport. Also, Newport region had a artificial manmade structure as Jetty controlling the flow of the river which may also be the factor for high wave power potential at Newport.

Figure 3: The graph obtained using GWR tool showing relationship between the Net Power (P_netA), Bathymetry (RASTERVALU), Peak Period (mean_peak_period) and Wave height ( mean_significant_wave_height ). The diagonal of the chart shows the histograms of each variable.

What did you learn from each of the analyses you conducted (i.e., from each exercise)?

I performed 3 different approaches in 3 different exercise. In first exercise, I compared different interpolation methods. Stochastic and deterministic, it turned out that the stochastic interpolation Kriging outperform deterministic method as IDW. I learned that the spatial autocorrelation is an important phenomenon to consider while performing analysis of the geographic datasets.

In second exercise, when performing GWR, I learned that we can calculate the relationship between the dependent variable and explanatory variables and more importantly, relationship between each variable as well to see which variables are closely related and which are not.

At last, I performed manual analysis to find out why there are hotspots in some areas and how the natural and man-made features affect the wave power. I learned that the natural and man-made feature do affect the wave power because the region in Newport where there was high potential for wave power had a man-made feature (jetty structure).

Significance. How are these results important to science? To resource managers?

The significance of this study is that it promotes the renewable and clean energy source. The wave energy generated is due to the natural phenomena which will continue until the sun and wind are prevalence in Ocean. The study is also important for the stake holders and investors who are looking for commercialization of the Wave power along the US coast.

Software learning. Your learning: what did you learn about software (a) Arc-Info, (b) GIS programming in Python, (c) programming in R, (d) Modelbuilder in Arc,or (e) other?

In this study, I used ArcGIS Pro for most of the analysis. This time new thing I learn was that we could obtain a graph showing distribution of each variables and their R-square value.

Statistics learning. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) cross-correlation/regression (cross-correlation, geographically weighted regression [GWR], regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA, multiple component analysis), (e) other techniques (change detection/confusion matrices, other)?

Kriging was not only the best interpolation techniques among them, but Kriging was also helpful to examine the spatial autocorrelation of my dependent variables and explanatory variables using Geostatistical Wizard. It was informative to view the Semivariagram of each variable and see the resemblance of their relationships similar to obtained from GWR.

At first, I was trying to see where the hotspots of wave power are in Oregon Coast. Later after the interpolation and analysis of the results, new question evolved, why is there hotspots in some areas and what may be the factors causing it?

Future techniques. What techniques would you like to explore to answer your research questions in the future?

In future, I would like to see how the topography of the land near and away from the coast affect the wave power. The wave power is the function of wind and ocean interaction, the topography of the land may affect the wind flow strength and its direction.

Spatial Distribution of Volcanic Vents in Distributed Volcanic Fields

The research questions that you asked.

What is the spatial distribution of volcanic vents in Harrat Khaybar, western Saudi Arabia? Are there spatial trends in volcanic vent distributions among the eruptive phases? Does the spatial distribution differ at various spatial scales?
Is there a spatial correlation between the locations of volcanic vents and structural features (e.g., fractures, faults, and fissures) in Carrán-Los Venados Volcanic Field (CLV) in southern Chile? Does the relationship between the two variables differ at different spatial scales?
What factors could lead to the differences in spatial correlation between the two variables at various spatial scales?

A description of the dataset you examined, with spatial and temporal resolution and extent.

I examined spatial and temporal datasets including volcanic vent locations, fracture sites, and eruption ages of volcanic vents in two basaltic volcanic fields, Harrat Khaybar (HK) in western Saudi Arabaia and Carrán-Los Venados (CLV) Volcanic Field in southern Chile. Vent location (point-like features) and age data for the two study sites were available through my previous work (mapping and categorizing volcanic vents in Harrat Khaybar) and the literature (Bertin et al. 2019) using satellite images (spatial resolution = 30 m). However, structural feature data (line-like features) were limited to the CLV due to limitation of digital data in the HK. The temporal resolution component is not applicable in this project because the natural time spans of these geologic features are too wide. Thus, I estimated an overall time-averaged eruption rate of one eruption per 4427 years for the HK and one eruption per 250 years for the CLV assuming a Poisson distribution for inter-event times using available age data (Table. 1).

Table 1. Summary of the examined dataset in this project.

Hypotheses: predictions of patterns and processes you looked for.

I predicted clustering of volcanic vents along the fractures. I expected that because structural features like faults are most likely to represent areas of crustal weakness at the time of magma emplacement.

Approaches: analysis approaches you used.

Point pattern analysis in ArcGIS pro:
- Multi-distance spatial cluster analysis (Ripley’s K function)
- Average nearest neighbor (ANN) technique.
Quantitative and visual neighborhood analyses in ArcGIS pro and Excel:
- Concentric buffers and the number of points that fall within each buffer.
- Visual criteria.

Results: what did you produce — maps? statistical relationships? other?

I could produce spatial probability maps and statistical relationships between my variables.

Ex. 1:

**Figure 1a.** The NNA analysis for All vents.

*The ANN tool is useful for evaluating the overall spatial patterns while the Ripley’s K function is more useful for recognizing spatial patterns at various spatial scales. (i.e., it accounts for spatial variation in density with respect to distance).

**Figure 2.** Spatial distribution and Kernel density estimate for each eruptive stage and all vents together at the HK.

*Kernel density estimations for every eruptive phase, except Age-4 that has too few vents for such analysis, and for all vents together showing that volcanism at this volcanic field tends to cluster on the central part of the field throughout the time. Overall, all vents together indicate an elongated area of high probability density, (between 10^-5 and 10^-4) along the center of the harrat.

Ex. 2 and 3:

**Figure 3.** Map showing the concentric buffers around the fractures in the study area and the location of eruptive centers. It also depicts tectonic settings in the CLV (Strike-slip system).

Table 2. Summary of the total number of vents fall in each buffer size.

**Figure 4.** Scatter plot showing the distribution of total number of vents with respect to distance. It indicates that the total number of vents is increasing as the distance increases from the line features (fractures).

**Figure 5.** Spatial probability isocontour maps of all vents and fractures. It depicts the distribution of probabilities throughout the CLV based on the locations of vents and fractures.

*The neighborhood analysis is useful for evaluating the spatial correlation between the two variables in the CLV. However, it does not capture the spatial correlation at various scales. The findings of the analysis indicate that there is no spatial correlation between the variables at all while it can be clearly seen that all volcanic vents are concentrated in the center and surrounded by the fractures.

**Figure 6.** Sketch showing some of the factors that can influence the distribution of the volcanic vents in basaltic volcanic provinces and impact the spatial correlation between the two variables at different spatial scales.

What did you learn from your results?

Spatial patterns of volcanic vents could differ in each eruptive stage and at various spatial scales, ranging in spatial distribution from clustered to dispersed or normally distributed.
The spatial correlation between my two variables, volcanic vents and structural features, varies at different scales and from one field to another based on the geologic settings of each province.
There are several other factors that can influence the distribution of the volcanic vents in basaltic volcanic provinces such as the type of plate tectonics, size and shape of magma chamber/source(s), magmatic flux rate, and the thickness of the crust, Which impact the spatial correlation between the two variables at different spatial scales.

Significance. How are these results important to science? to resource managers?

My results can:
- Contribute to the scientific understanding of distributed volcanism, in particular the key controls of intraplate volcanic propagation and origin and migration of magma
- Contribute to the development of volcanic hazard analysis and risk assessments for distributed volcanic fields.
- Highlight that more data is needed to reveal the main factors that influence the spatial distribution of volcanic vents in distributed volcanic fields. A better understanding of the associated plate tectonics, modeling of magma sources and crustal thickness (profile), and better spatial and temporal records of both variables are needed since each field might have unique factors

Software learning. Your learning: what did you learn about software (a) Arc-Info, (b) GIS programming in Python, (c) programming in R, (d) Modelbuilder in Arc,or (e) other?

I learned conducting different spatial analyses in ArcGIS Pro including multi-distance spatial cluster analysis (Ripley’s K function) and Average nearest neighbor (ANN) analysis to reveal the point pattern of volcanic vents in two distributed volcanic fields as well as quantitative and visual neighborhood analyses using concentric buffers to evaluate the spatial correlation between my two variables.

Statistics learning. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) cross-correlation/regression (cross-correlation, geographically weighted regression [GWR], regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA, multiple component analysis), (e) other techniques (change detection/confusion matrices, other)?

I learned about statistics that:
- Different techniques for hotspot analyses such as the ANN and Ripley’s K function could result in different levels of details (i.e., spatial resolution), where the ANN tool is useful for evaluating the overall spatial patterns while the Ripley’s K function is more useful for recognizing spatial patterns at various spatial scales. (i.e., it accounts for spatial variation in density with respect to distance).
- Some parameters in the kernel density function may have significant impact on the final products such as the bandwidth, area shape, and number of points.

Evolving question. How did the results of each analysis lead you to change/refine your question? Write out the original question you stated at the beginning of the class and restate the question(s) you now plan to address.

How is the spatial pattern of volcanic vents in the volcanic field of Khaybar related to the spatial pattern of regional structural features (faults, fissures, and fractures) via plate tectonics.
To what extent do the other factors (plate tectonic settings, size and shape of magma chamber/source(s), magmatic flux rate, and the thickness of the crust) influence the distribution of volcanic vents in distributed volcanic fields?

Future techniques. What techniques would you like to explore to answer your research questions in the future?

I would like to perform modeling techniques to model magma sources and crustal thickness (profile) using geology and remote sensing (e.g., LiDAR) to assess the contribution of the potential factors to the distribution of volcanic vents in basaltic volcanic fields.

Spatial Analysis of Trends in Tufted Puffin (Fratercula cirrhata) Breeding Habitat on the Oregon Coast

Background:

The Tufted Puffin (Fratercula cirrhata), is an iconic seabird that provides a wide range of ecological, economic, and historically important services such as ecotourism for local communities and bringing marine derived nutrients to terrestrial habitats. Tufted Puffin populations on the Oregon Coast have declined dramatically from over 5,000 birds in 1989 to 500 birds in 2021 (Figure 1). In 2018, the Tufted Puffin Species Status Assessment (SSA) determined that factors related to breeding site conditions are one of the most probable causes of puffin decline; however, little is known about the specific characteristics of nesting habitat along the Oregon coast, or how it relates to their population demographics. My research project will directly address this knowledge gap by analyzing how Tufted Puffin breeding habitat has changed on the Oregon Islands National Wildlife Refuge over the past 50 years.

Figure 1. Breeding Tufted Puffin Estimates from 4 Oregon Coast-Wide Surveys in 1979, 1988, 2008, and 2021. Coast-wide surveys obtained from USFWS.

Research Questions:

1) How is the spatial pattern of Tufted Puffin population occupancy related to the spatial pattern of vegetation cover in terms of burrowing habitat availability?

2) How have these spatial patterns at burrowing sites changed over time?

3) Is Tufted Puffin occupancy high when vegetation cover near breeding sites is high, and low when nearby vegetation cover is low?

Datasets:

The temporal extent of my research project is from 1979 to 2021, and the spatial extent is the 62 islands on Oregon Islands National Wildlife Refuge where Tufted Puffins have historically nested. For the purposes of this class and these exercises, I limited my spatial extent to 10 islands, and limited my temporal extent from 2009 to 2018.

To examine percent-cover of vegetation on Oregon Islands refuge over time, I used the National Agriculture Imagery Program (NAIP) imagery because the resolution is 1m, which is fine-scale enough to observe changes in NDVI on my island sites. I obtained NAIP imagery in Oregon for the years 2009 and 2018, because both these years had photography that was available in full color and near infrared. The spatial extent of the NAIP imagery includes the entire state of Oregon, so all my study sites are encapsulated in this dataset.

I also used the data from 4 Tufted Puffin Oregon coast wide surveys to compare Tufted Puffin population occupancy (Figure 2). These coast wide surveys were conducted in 1979, 1988, 2008, and 2021 to estimate the total population of Tufted Puffins in Oregon, and their distribution across islands.

Figure 2. Breeding Tufted Puffin Estimate on the Oregon Coast from 1979, 1988, 2008, and 2021 coast-wide surveys.

Hypotheses:

1) The spatial pattern of Tufted Puffin population occupancy will be clustered in response to the clustered spatial pattern of percent cover vegetation.

2) Suitable Tufted Puffin breeding habitat will have decreased from 2009 to 2018, with more areas being vulnerable to erosion and vegetative degradation.

3)Tufted Puffin occupancy will be high in areas where vegetative cover is high, and low where vegetative cover is low.

I expect percent cover of vegetation to be clustered, because these islands have large areas of sloped rock slabs where there will be no vegetation, close to areas with soil and dense vegetation. I expect the spatial pattern of Tufted Puffin density to be clustered in response to areas with clustered percent cover of vegetation because Tufted Puffin like lush, vegetated areas with grasses, forbs, and shrubs. I expect that vegetation cover will have decreased from 2009 to 2018, with more areas being vulnerable to erosion and vegetative degradation because of more severe and frequent weather conditions related to climate change.

Approaches:

To evaluate the spatial pattern of Tufted Puffin occupancy across islands, I used the average Nearest-Neighbor Distance (NND). To take this analysis one step further and determine if the spatial pattern of Tufted Puffin occupancy varied at different spatial scales, I used the Multi-Distance Spatial Cluster Analysis (Ripley’s K) function. I also used a kernel density estimator to determine if there were Tufted Puffin hot spots and identify where those might be along the Oregon Coast.

To examine the spatial relationship between Tufted Puffin occupancy and vegetation cover, I used multiple tools and steps. First, I downloaded and extracted polygon boundaries of each of the islands. Next, I downloaded 2018 and 2009 NAIP imagery into ArcGIS Pro and clipped the rasters to the extent of the polygon boundaries, so I was only analyzing the data I was interested in. Then, I used NDVI to quantify the vegetation cover on each island. After that, I used focal statistics to identify the mean NDVI value on each island. Finally, I plotted mean NDVI values against the Tufted Puffin occupancy of each island as a logistic regression in R.

Finally, I used a confusion matrix to observe changes in vegetation over time on one island. I began by clipping 2 NAIP rasters from 2009 and 2018 of the island to the previously extracted polygon boundary. Then, I used the iso cluster unsupervised classification tool to classify each raster into 2 classes. Next, I used the combine rasters tool to observe changes between years. The attribute table contained 4 columns that represented: 1) vegetation that stayed the same, 2) vegetation gained, 3) bare earth that stayed the same, 4) vegetation lost. The attribute table displayed the pixel values associated with each of these categories. By dividing each category’s number of pixels by the sum number of pixels in the raster, we could determine what percent each category was.

Results:

From the Ripley’s K and Nearest-Neighbor analysis, I determined that Tufted Puffin occupancy in Oregon is clustered. The kernel density estimator outputs a map displaying the hot spots of breeding Tufted Puffin occupancy on the Oregon Coast (Figure 3). There is a dense hotspot in Northern Oregon, and another dense hot spot in Southern Oregon. The NDVI function outputs a raster with values ranging from –1 to 1, which is used in many scientific applications (Figure 4). The logistic regression analyses output a graph depicting the NDVI values against breeding Tufted Puffin occupancy (Figure 5). I interpreted these results as a positive correlation between breeding bird occupancy and NDVI. The higher the NDVI is at sites, the higher probability there will be more breeding birds.

From the unsupervised classification and combine tools, I obtained a map of Goat Island with 4 color classes, where the colors corresponded to 1) vegetation gained, 2) bare rock that stayed consistent 3) vegetation that stayed the same between years, and 4) vegetation lost (Figure 6). I then used the attribute table to calculate the percentages of the number of pixels in each class relative to the overall number of pixels. From these results, I interpreted the vegetation decreased on Goat Island from 2009 to 2018 by 23%. The results also show an increase in vegetation from 2009 to 2018 by 8% (Table 1).

Figure 3. Hot spots of breeding Tufted Puffins on the Oregon Coast in 2021.

Figure 4. NDVI output raster of Goat Island from 2009 NAIP imagery.

Figure 5. Logistic Regression analyses of NDVI values and breeding Tufted Puffin occupancy.

Figure 6. Supervised classification of 2009 and 2018 NAIP imagery of Goat Island, followed by a raster combination. Red=vegetation lost between years, green=vegetation gained, yellow=vegetation that stayed the same, beige=bare rock that stayed the same.

Color	Pixel Count	Description	Percent
Green	29469	Veg gained	8%
Red	80737	Veg Lost	43%
Yellow	90247	Veg consistent	26%
Beige	152367	Bare consistent	23%

Table 1. Interpreted results from confusion matrix. Quantification of results from Figure 6- proportion of pixels representing vegetation gain, vegetation loss, consistent vegetation, and consistent bare rock.

What I learned from each analysis:

The Ripley’s K and hot spot analyses were very interesting because up until this class, I had mainly been focused on detecting changes on a temporal scale. However, these analyses helped me to reframe some of my initial questions to observe changes on a spatial scale as well. I learned that on a larger spatial scale, Tufted Puffin occupancy was clustered, and there was one Northern hot spot and one southern hot spot. This led me to consider other questions about why Tufted Puffins are not breeding in large quantities along the central coast of Oregon, and what site-specific or climatic factors might be associated with the hot spots. For Exercise 2, I determined the NDVI and plotted it against Tufted Puffin occupancy for high and low sites in a logistic regression analysis. Exercise 2 helped me to learn that Tufted Puffin occupancy is high where vegetation is high, and occupancy is low where vegetation is low. From here, I can further investigate the driving forces behind what might be causing vegetation to differ spatially at different sites. From the supervised classification exercise, I learned that utilizing the “combine” tool provides more information than the raster calculator tool alone, because we get 4 layers to display vegetation change.

Significance.

It is important for land managers to know where the hot spots are along the Oregon coast. Once we understand where high and low occupancy of Tufted Puffin are, we can begin to investigate what characteristics make successful breeding habitat, and why some sites might exhibit higher occupancy than others. For example, we can compare hot spots to “sinks,” and look at factors such as mean sea-surface temperature (SST), mean air temperature, distance to the mainland, amount of human disturbance, etc.

I also learned that there is a positive spatial relationship between Tufted Puffin occupancy and vegetation at breeding sites. Although managers will not be able to directly use the analyses I did in this class, I can build off these results, by incorporating more sites in my analyses, to give the logistic regression more statistical power. This spatial analysis will provide invaluable insight into the habitat requirements of Tufted Puffin- a species of greatest conservation need- in Oregon. Gaining a deeper understanding of the specific vegetative characteristics Tufted Puffin utilize during the breeding season can be used to further address conservation concerns, and guide refuge managers in decision making around habitat restoration.

Finally, the confusion matrix allowed me to quantify the amount of vegetation lost, the amount of vegetation gained, and identify where these processes are occurring. This is significant because it can better inform land managers about where habitat restoration might be effectively implemented. I anticipate that the proposed data collection and analysis will serve as a springboard for future conservation efforts, allowing managers to make comparisons with ecologically similar seabirds that also use the Oregon Islands National Wildlife Refuge for breeding.

What I learned (software):

I did the majority of my analyses in ArcGIS Pro, and after this course, I feel much more comfortable using this software. This was the first time I’ve explored ArcGIS Pro without following the steps of a strict tutorial, and I am very happy with how much I’ve learned and the progress I’ve made on my project. Most importantly, I learned that ArcGIS software can be tricky, but if the original plan does not work, there are lots of adjacent ways to address the same problem. This class helped me get into the habitat of talking my process out with other folks who are working on similar projects and google alternative solutions if I run into problems.

Aside from verbally communicating my results, I also became more skilled at visually communicating my results. One critical component to research I learned in this class was creating figures or graphs in Program R and Excel to better display the results of a spatial analysis.

What I learned (statistics):

One thing I learned about using hot spots was that a few outliers can dramatically skew the distribution of my data. It was very helpful for me to log-transform my data prior to using a kernel density analysis to work with more normally distributed data. I also learned a lot about GWR from talking with other students in this class, and determined I would like to use this approach for my own data further down the line. I might be able to compare Tufted Puffin occupancy with multiple variables, such as slope, elevation, vegetation, precipitation, temperature, and other climate data. The GWR tool in ArcGIS would also allow me to compare the Pearson’s correlation coefficient between my variables, which would help me decide which variables to include in candidate models.

Evolving question.

Initial questions from the beginning of class:

1) How is the spatial pattern of Tufted Puffin population density related to the spatial pattern of percent cover of vegetation, soil depth, elevation, and slope in terms of burrowing habitat availability?

2) How have these spatial patterns at burrowing sites changed over the last 50 years?

Questions I now plan to address:

How is the spatial and temporal pattern of Tufted Puffin population occupancy related to the spatial pattern of vegetation, elevation, and slope at Tufted Puffin breeding sites?
How have these spatial patterns at burrowing sites changed over the last 50 years?
How do these vegetation changes relate to the site-specific, climatic, and environmental variables influencing Tufted Puffin habitat at different spatial scales?

I still plan to identify how spatial patterns at burrowing sites have changed over the last 50 years. However, rather than exploring this question on just a temporal scale, I also plan to look at differences at different spatial scales. I originally planned to compare each island to a previous version of the same island. Now, I am excited to compare each island to other spatially distributed islands within the same year. For example, I might explore how cooler temperatures associated with islands on the Northern Coast affect the Tufted Puffin occupancy differently than warmer temperatures and less precipitation on the Southern Oregon Coast. For this third question, I am planning to look at vegetation cover as my independent variable, and a suite of climate processes as my response varaibles.

Future Research

This project allowed me to gain a deeper understanding of the relationship between Tufted Puffins and vegetative cover. My next step will be to observe this on a longer temporal scale, and a larger spatial scale. I am planning to now incorporate a separate aerial photography dataset I obtained from USFWS to observe changes in vegetation over 4 decades, rather than just 1 decade. I plan to use ArcGIS Pro to georeference this dataset, so the pixels of each island overlap from year to year, to make my results more meaningful. I am also planning to use a supervised classification technique to classify ground cover as vegetation, bare soil, rock, or water. To further compare vegetation cover to the occupancy of breeding Tufted Puffins on each island, I will use a logistic regression.

Moving forward, I am also planning to investigate the relationship between topography and Tufted Puffin occupancy over time. I plan to use the same basic process as I did when analyzing the vegetative cover, but instead of using NAIP rasters, I will use DEM/DTM rasters. I plan to clip DOGAMI DEMs and DTMs to the same polygon boundaries of the islands and subtract rasters from different years. I will also use spatial statistics to find the median elevation and slopes of the islands and use a regression approach to display the results.

I would also like to explore using GWR when I am further along in my research to determine which variables, such as slope, elevation, vegetation, precipitation, and temperature have the greatest impact on Tufted Puffin occupancy. To better understand the climate processes happening at each site, I am planning to download rasters of SST, precipitation, temperature, etc. From NOAA weather service sites, and extract values at each Tufted Puffin colony site.