Monthly Archives: April 2019

Determining suitable habitat for Olympia oysters in Yaquina Bay, OR

Exercise #1

Question that you asked:
My goal for my thesis work is to evaluate the distribution of native Olympia oysters in Yaquina Bay, Oregon by assessing habitat suitability through spatial analysis of three habitat parameters: salinity, substrate availability, and elevation. A map of predicted suitable habitat as a result of the spatial analysis will be compared with field observations of oyster locations within Yaquina Bay. The main research question I am examining in this project is:

How is the spatial pattern of three habitat parameters (salinity, substrate, elevation) [A] related to the spatial pattern of Olympia oysters in the Yaquina estuary [B] over time [C]?

For this blog post, I will be evaluating the [A] portion of this question and the three habitat parameters simultaneously to identify where habitat is least suitable to most suitable. To better understand the spatial pattern of the habitat parameters, I am evaluating a raster layer for each parameter, then combining them to determine where overlap between the layers shows the best environmental conditions for oysters to survive.

Name of the tool or approach that you used:
For this portion of my research analysis, I wanted to be able to make an educated guess about where the best and worst habitat for Olympia oysters would be located within Yaquina Bay by ranking different subcategories within each of the salinity, substrate, and elevation datasets.

To do this, I started by looking through the available literature on the subject and consulting with shellfish biologists to get an idea of what conditions oysters prefer in order to apply a ranking value. The following table is a compilation of that information:

Habitat parameter Subcategories Subcategory variable range Olympia oyster tolerance Rank value applied
Mean wet-season salinity (psu) Upper estuary < 16 psu somewhat, but not long-term 1
Upper mid estuary 16.1 – 23 psu X 4
Lower mid estuary 23.1 – 27 psu X 3
Lower estuary > 27 psu somewhat 2
Substrate availability 1.2 Unconsolidated mineral substrate possible 3 Gravelly mud unlikely 2 Sandy mud no 1
2 Biogenic substrate yes 4
3 Anthropogenic substrate yes 4
3.1 Anthropogenic rock yes 4
3.1.2 Anthropogenic rock rubble unlikely 2
3.1.3 Anthropogenic rock hash no 1 Unclassified uncertain
Bathymetric depth (compared to MLLW) 1.5 – 2.5m supratidal no 1
1 – 1.5m supratidal no 1
0.5 – 1m intertidal maybe 2
0 – 0.5m intertidal yes 3
-2 – 0m intertidal yes 4
-3 – -2m subtidal yes 4
-4 – -3m subtidal yes 4
-6 – -4m subtidal yes 4
-8 – -6m subtidal yes 3
-12.5 – -8m subtidal yes 3

Once I established my own ranking values, I decided to use the ‘weighted overlay’ function, found within the Spatial Analyst toolbox in ArcGIS Pro. Weighted overlay applies a numeric rank to values within the raster inputs on a scale that the ArcGIS user is able to set. For example, on a scale from 1-9 ranking 1 as areas of least fit and 9 as areas of best fit. This can be used to determine the most appropriate site or location for a desired product or phenomenon. I used the ranking value scale 1-4 where 1 indicates the lowest suitability of subcategories for that parameter and 4 indicates the highest suitability.

Brief description of steps you followed to complete the analysis:

To apply the weighted overlay function:

  1. Open the appropriate raster layers for analysis in ArcGIS Pro. Weighted overlay will only work with a raster input, specifically integer raster data. Here, I pulled all three of my habitat parameter raster layers from my geodatabase into the Contents pane and made each one visible in turn as I applied the weighted overlay function.
  2. In the Geoprocessing pane, type ‘weighted overlay’ into the search box. Weighted overlay can also be found in the Spatial Analyst toolbox.
  3. Once in the weighted overlay window within the Geoprocessing pane, determine the appropriate scale or ranking values for the analysis. I used a scale from 1-4, where 1 was low suitability and 4 was high suitability.
  4. Add raster layers for analysis by selecting them from your geodatabase and adding them into the window at the top left. To add more than one raster, click ‘Add raster’ at the bottom of the window.
  5. Select one of the raster inputs and see the subcategories for that raster appear on the upper right. Here, ranking values within the predetermined can be individually applied to the subcategories by clicking from a drop-down list. Do this for each subcategory within each raster input. I ranked each subcategory within each of my habitat rasters according to the ranks listed on the table above.
  6. Determine the weights of each raster input. The weights must add up to 100, but can be manipulated according to the needs of the analysis. A raster input can be given greater or lesser influence if that information is known. For my analysis, I made all three of my habitat raster inputs nearly equal weight (two inputs were assigned a weight of 33, one was weighted 34 to equal 100 total).
  7. Finally, run the tool and assuming no errors, an output raster will appear in the Contents pane and in the map window.

Brief description of results you obtained:

The first three images show each habitat parameter weighted by suitability, with green indicating most suitable and red indicating least suitable.

Salinity —

Bathymetry —

Substrate —

The results of the final weighted overlay show that the oysters are most likely to be in the mid estuary where salinity, bathymetry, and substrate are appropriate.


Critique of the method – what was useful, what was not?:

The weighted overlay was a simple approach to combining all of the raster layers for habitat and creating something spatially meaningful for my research analysis. The areas indicated in green in the resulting map generally reinforce what was found in the literature and predicted by local shellfish biologists. While the weighted overlay tool did generate a useful visual, it is highly dependent on the quality of the raster inputs. In my analysis, the detailed resolution of the bathymetry layer was very helpful, but the substrate layer is a more generalized assessment of sediment types within Yaquina Bay. It doesn’t show the nuances of substrate availability that might be important for finding exactly where an opportunistic species like Olympia oysters might actually have settled. For example, in Coos Bay Olympia oysters have been found attached to shopping carts that have been dumped. The substrate raster is a generalized layer that uses standardized subcategories and does not pinpoint such small features.

Additionally, the salinity layer is an average of wet-season salinity, but it can change dramatically throughout the year. Some in situ measurements from Yaquina Bay this April showed that the surface salinity with the subcategory range of 16-23 psu were actually <10 psu. While it is more reasonable to generalize salinity for the purposes of this analysis, it is important to note that the oysters are exposed to a greater range over time.

This spatial information serves as a prediction of suitable oyster habitat. The next step is to compare this predicted suitability to actual field observations. I’ve recently completed my first round of field surveys and will be analyzing how closely the observations align with the prediction in Exercise #2.

Spatial Pattern of NDVI change from the Center of an Artisanal Gold Mine

For Exercise 2, I wanted to explore how the spatial distribution of land use/land cover (LULC) varied in relation to an artisanal, small-scale gold mine (ASGM). To do this, I took the NDVI change maps which I had generated for Exercise 1 (modified slightly to produce better, more accurate results), as well as my dataset of mapped footprints of known areas of ASGM activity, found the median center of those areas, generated buffers/donuts at 250m apart, from 250m to 2000m, clipped the NDVI change layer to each buffer, and counted the amount of each type of NDVI pixel contained within each buffer. In addition, I performed this same analysis around non-mining villages, to examine how the spatial pattern of NDVI loss changes with distance from the center of non-mining villages. With this data, I could generate a chart showing how the percentage of pixels representing loss or decrease in NDVI changed as you moved further away from the center of mining activity. This examination looked at nine mining villages and nine non-mining villages

In order to perform this analysis, I continued my work using ArcGIS Pro, in addition to Excel for a bit of charting work.

To begin, I imported my NDVI change map, which detailed increases and decreases in NDVI values between imagery taken in April 2018 with Landsat 8, and imagery taken in April 2007 with Landsat 5, representing 11 years of NDVI change. I also imported my shapefile containing polygons which I had digitized over my high-resolution satellite imagery depicting areas of known ASGM activity. With this shapefile, I used ArcGIS’s Median Center tool, which found the median center of the mining areas near the village of Douta (fig. 1). From there, I generated buffers/rings at 250m intervals (e.g. 0-250m, 251-500m, 501-750m, etc.), from 250m to 2000m around this median center. I then used the clip tool to clip the overall NDVI change layer to each buffer, resulting in NDVI change for 0m to 250m from center, NDVI change for 251m to 500m from center, and so on.

Fig 1: Center of the village of Douta with a 2km NDVI buffer; the blue buffer shows the 1750m extent, whose values were subtracted from the 2000m values, to only give values between 1751 and 2000m

Once I had accomplished this, I generated histograms for each buffered NDVI change layer in order to count the amount of pixels contained within each buffer, and assign them to one of two classes: negative values, representing overall NDVI decrease from 2007 to 2018, and positive values, representing overall NDVI increase from 2007 to 2018. I did not account for magnitude of change, as I wanted a general idea of how NDVI was changing from the center. Fig. 2 shows an example of these histograms, specifically for the 500m buffer. The values from the previous buffers, e.g. the one to 250m, were subtracted to only show values from 251 to 500m.

Fig. 2: Douta histogram

I entered all the pixel values for negative change at each distance into Excel, as well as all of the pixel values for positive change, and was able to generate a chart showing how the percentage of the overall pixels in each subsequent buffer from center representing NDVI decrease change over distance (fig. 3)

Fig. 3: Pale blue lines indicate individual mining villages. The dark blue line indicates the average of those villages. Thin red lines indicate non-mining villages. The dark red line indicates the average for those villages.

On the whole, this exercise was useful for illustrating the problem I’m attempting to grapple with. I was frustrated with the lack of Landsat imagery from the late 2000s — I was unable to find any Landsat 5 imagery corrected for surface reflection aside from the year 2007. Additionally, there are problems with comparing this 2007 image to the 2018 image. I found that the rainy season before the 2018 image was taken, in 2017, was wetter than average, while I was unable to determine the rain fall that preceded the image in 2007. As such, it is possible that the 2018 Landsat 8 image shows a non-normal vegetative pattern — or it’s even possible that the 2007 image is showing a non-normal vegetative pattern! I require some more investigation into the historical meteorology of the area before I can say either way. Regardless, I feel that this is a useful first step in investing how LULC change relates to the establishment of ASGM.

What is the spatial pattern of the ground cover vegetation within 100 m, 500 m and 1 km of 20 wetland sites?

Exercise 1: What is the spatial pattern of the ground cover vegetation within 100 m, 500 m and 1 km of 20 wetland sites?

To determine these spatial patterns I first imported the latitudinal and longitudinal locations of my wetland sites as a point layer by adding a csv file with this data from the catalog to my map and choosing the Display X, Y Data option after right clicking on the imported file.
Display X, Y Data: X= longitude
Y= latitude

I then created 3 buffers around each point using the Buffer tool in spatial analyst. The buffers created had a 100m radius, a 500m radius, and a 1km radius.
Buffer: Input vector= wetland site points
Buffer= 100 (then 500, then 1000)
Unit= Meters

I used Clip in raster processing to clip my raster layer to the extent of my largest buffer layer so that the computer did not have to process so much extra raster data moving forward.
Clip: Input= raster layers
Clip to= 1kmBufferLayer

Then I created an NDVI raster layer from Landsat data obtained by the USGS Earth Explorer. I downloaded and then imported data from 1995 and 2019. The LIDAR data from 2019 was collected via the LandSat 8 satellite which collects, among other things, Near Infra-Red (NIR) and red wavelength reflectance. NIR is categorized as Band 5 and Red as Band 4. To create an NDVI layer from this data I had to rescale the Band 5 and Band 4 raster layers because the USGS self-correcting algorithm overcorrected for atmospheric disturbances. To do this I used to Raster Calculator in spatial management. This resulted in 2 new raster layers within the acceptable range (0-1000) for Band 5 and Band 4 of the 2019 LIDAR data. I did not have to rescale the 1995 data because there was no self-correcting algorithm used in that data.
Raster Calculator: Query= [(grid – Min value from grid) * (Max scale value – Min scale
value) / (Max value from grid – Min value from grid)] + Min scale value

I used the Raster Calculator again to create the final NDVI layer for 1995 and one for 2019 using the NIR and red wavelength data layers. The 1995 data was collected via LandSat 4 which categorizes NIR wavelengths as Band 4 and red wavelengths as Band 3.
Raster Calculator: Query= ((“NIR_Layer”-“red_Layer”)/(“NIR_Layer” + “red_Layer”))

2019 NDVI result below

I then used the Raster to Point tool to turn both the 1995 NDVI and 2019 NDVI data into a polygon layer.

Raster to Point: NDVI layers to point

Finally, then I used the Clip in spatial analyst tool to clip the polygon NDVI layers to the buffer layers I created earlier.
Clip: Clip “NDVI layers” to “Buffer layers”

Now I have the land cover type (in the form of a number signifying vegetation density) information for all of my sites within all of my buffer zones.

Ripley’s K analysis on two forested stands

Bryan Begay

  1. Question asked

Can a point pattern analysis help describe the relationship between the spatial pattern of trees in my area of interest with forest aesthetics? More specifically, how does Ripley’s K function describe forest aesthetics in different parts of the forest on the McDonald-Dunn Forest.

  1. Tool

The main tool that I used for the point pattern analysis was the Ripley’s K function.

  1. Steps taken for analysis

My workflow involved doing preprocessing of the LiDAR data, then creating a canopy height model to obtain an individual tree segmentation. The individual tree segmentation would then allow me to extract tree points with coordinates that could be usable points for the Ripley’s K Function.


LiDAR preprocessing

I started off with finding my harvest unit area of interest (Saddleback stand) and finding a nearby riparian stand that would be used to compare the Ripley’s K function outputs.   I create polygons to clip the LiDAR point clouds onto. I found the LiDAR files that were over the AOIs and used the ArcMap Create LAS Dataset (Data Management) to make workable files, then clipped the data sets to the polygons using the Extract LAS (3D analyst) tool. Fusion was used to merge the clipped LiDAR tiles to make one continuous data set for both AOIs. Then I normalized the point cloud with FUSION by using a 2008 DEM raster from DOGAMI, and the FUSION tools ASCII2DTM and Clipdata.


CHM, tree segmentation, and Ripley’s K

With the normalized point cloud, a canopy height model (CHM) was created in R-studio, and then an individual tree segmentation was made with an R package called lidR by using a watershed algorithm. The segmented trees were exported as a polygon shapefile that could be used in ArcMap. The Feature to Point tool (Data Management) was used to calculate the centroid of the polygons to identify individual trees as points. The points could then be used in RStudio spatstat package to be used in a Ripley’s K Function. The function was calculated for both Saddleback stand and a nearby riparian area.

  1. Results

The results show that the pattern for both the Saddleback stand and the riparian area were clustered. Both stands observed lines were plotted above the expected line for a random spatial pattern.  The lines were significantly different, being above the higher confidence envelope. The riparian stand has higher levels of clustering compared to Saddleback stand. The Saddleback stand showed a plotted clustering pattern as well, but not to the degree of the riparian stand.


  1. Critiques

Some critiques for my analysis would be to use a more robust individual tree segmentation algorithm analysis. For the sake of processing speed and creating delineated polygons with reduced noise, I used a resolution of 1 meter for my CHM. The 1 meter resolution for my CHM smoothed over the tree segmentation, possibly removing potential tree polygons but creating more defined segmented trees. The CHM lower resolution was used with a relatively simple watershed algorithm. Past algorithms I’ve used showed better results than watershed but required more detailed inputs. Another criticism I have is that using the feature to point does not necessarily give me the tree tops, but finds the centers of polygons that the tree segmentation identified as individual trees. Finding a more robust method for determining tree points would be more preferable.

Recession Analysis in Two Coastal Basins

Research Question

The question that I asked for this exercise was: What is the spatial variability of the recession timescale at different flow rates in rain-dominated, coastal basins?


Stream flow regimes are generally defined by five components: magnitude, frequency, duration, timing, and rate of change (Poff et al., 1997). For the purposes of this exercise, I am investigating the rate of change (or “flashiness”) component of flow regime in two river systems in the Oregon Coast Range: the Nehalem and Alsea River watersheds. I am analyzing recession behavior in these two systems to quantify a rate of change metric. I used the recession curve method, largely developed by Brutsaert and Nieber (1977), and later built upon by Krakauer et al. (2011) among many others. Recession curves describe the rate at which streamflow recedes in various streamflow conditions. In more general terms, recession curves provide an indication of watershed storage and groundwater behavior.


To complete the recession analysis in the Nehalem and Alsea watersheds, I followed the following steps:

  1. Downloaded streamflow data for the available period of record in 1-hour observation intervals using the dataRetrieval and EGRET (both developed by the USGS) packages in R.
  2. The data were in cubic-feet-per-second (cfs) units, which I converted to unit discharge (mm/hour) using respective basin areas.
  3. For recession analyses, only data points with insignificant precipitation are viable. Therefore, all time steps when precipitation was greater than 10% of total streamflow were removed. To do this, local precipitation data was necessary, however it is often hard to come by. As a result, the precipitation data sources for the Nehalem and the Alsea are different and the methods diverge slightly hereafter.
    1. Nehalem precipitation data was sourced from the USGS Vernonia site, which is upstream from the mainstem river gauging site where streamflow data for this analysis was sampled. For the purposed of this exercise, I assumed that rainfall (measured in inched at 30-minute intervals) was spatially consistent across the basin. For a more precise estimate of precipitation, multiple, spatially distributed rain gages would be needed.
    2. Alsea precipitation data in hourly timesteps was not readily available. There are a couple NOAA rain gages in the vicinity, but the data appeared to be spotty. As a result, I used daily precipitation data from PRISM to identify days in which total daily precipitation was greater than 10% of total daily streamflow. Next, I used the subset out the identified dates from the hourly streamflow data.
  4. Calculating rate of change: For this component of the analysis, I only wanted data from the receding hydrograph. I used the equation to estimate hourly -. Hourly streamflow (Q) for the corresponding hours was estimated as .
  5. (or  ) is the rate at which flow jumps values from one time-step to the next at a given flowrate. Because this is a recession analysis, I only wanted data that was on the receding slope and therefore subset the data to time-steps when  was negative.
  6. Lastly, I developed a simple linear regression model for vs Q for each watershed using the following equation:



When log-transformed and plotted, the discharge and rate of change of discharge follow a linear relationship, however the relationship is different for data analyzed at different temporal resolutions (Figures 1 a-b). The recession results from the daily timestep are likely muted because recession processed occur on shorter timescales, on the order of minutes to hours.

Figures 1 a-b. On the left, recession curve results for the Alsea River on a daily time step, after removing days with high precipitation. On the right, recession curve results for the river on an hourly time step, after days with high precipitation were removed. Both a and b include only recession data.


Table 1. Table showing different coefficient values for different temporal resolutions of Alsea recession data.

Data a coefficient b coefficient
Alsea (daily) -3.029 1.639
Alsea (hourly) -5.2346 0.9138


Recession analysis on year-round, hourly Nehalem River data resulted in an a coefficient value of -4.696 and a b coefficient value of 1.049, which are similar values for the Alsea hourly results. However, the Nehalem recession data is showing two linear trends in the plotted data (Figure 2). The two trendlines remained after controlling for both diurnal flux and season (Figure 3).

Figure 2. Recession results from the Nehalem River data. The red line is the calculated linear regression model, however two lines may be more representative of the recession behavior, such as those estimated in blue.

Figure 3. Recession data separated into seasons and only including receding slopes, night time hours, insignificant precipitation. The two trendlines are apparently less distinguished in certain seasons of the year.

Critique of Method

            Recession analysis is a method that I will use in my research, and this was a helpful exercise in that it helped me understand the nuances of recession data analysis, including data acquisition and availability, and opportunities for improvement. Moving forward, this analysis would benefit from: more spatially accurate precipitation data, more investigation of the explanations for the observed patterns and trends, and more sophisticated statistical comparison between sites.


PRISM Climate Group, Oregon State University,, created 4 Feb 2004.

Krakauer, N. Y., & Temimi, M. (2011). Stream recession curves and storage variability in small watersheds. Hydrology and Earth System Sciences, 15(7), 2377–2389.

Sawaske, S. R., & Freyberg, D. L. (2014). An analysis of trends in baseflow recession and low-flows in rain-dominated coastal streams of the pacific coast. Journal of Hydrology, 519, 599–610.

Brutsaert, W., & Nieber, J. L. (1977). Regionalized drought flow hydrographs from a mature glaciated plateau. Water Resources Research, 13(3), 637–643.

Poff, N. L., Allan, J. D., Bain, M. B., Karr, J. R., Prestegaard, K. L., Richter, B. D., … Stromberg, J. C. (1997). The Natural Flow Regime. BioScience, 47(11), 769–784.

Leaf spot patch analysis caused by Blackleg

My research involves the classification of a leaf spots on turnip derived from the pathogen blackleg. I had hypothesized that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves. This post focuses on the analysis of the clusters based on image classification through segmentation as well as the manually classified clusters. These clusters of pixels are expected to be representative of the diseased patches on the leaves. Here we seek to obtain some patch statistics which will be compared for relationships and accuracy at a later time. A large portion of this process went into image processing before the analysis could be conducted. All of the image processing took place in ArcGIS Pro. The next step involved the patch analysis which was conducted in Fragstats.

Some questions I asked myself about element A & B of my hypothesis included:

Can diseased patches be successfully identified and isolated by computer image classification and through manual classification?

If yes; what is the area and perimeter of the diseased patches based on image classification and manually classified diseased patches? I was mostly looking to obtain and gain some experience in the patch analysis provided by Fragstats. Much more thorough analysis can be and will be completed in Fragstats when variable A & B are compared for accuracy assessment in exercise two.

Tools used and methodology
The image processing and classification of pixels was conducted in ArcGIS Pro. This analysis required extracting both manually classified diseased patches and computer classified patches. I began by uploading band 3, which captures light energy in the red wavelength at 670 – 675 nm.

1. For computer classification of the image I used Segmentation.

a) I went to the Imagery tab and the Image Classification group. Click the drop-down arrow and an option for Segmentation should appear. Be sure the layer you desire to perform the segmentation on was selected prior to selecting the Segmentation Classification Tools. In the pane you have options to adjust Spectral detail, Spatial detail and Minimum segment size in pixels. There are variable and depend on image resolution, level of accuracy you wish to achieve, etc. Because my resolution to high I chose for higher spectral detail and spatial detail. I adjusted the spectral detail from 15.50 to 17 and the spatial detail from 15 to 17 as well. For the minimum segment size in pixels I chose 10. As mentioned these values are variable and although the standard values worked well for segmentation, I previewed other values and achieved greater results by adjusting. After previewing other options and deciding what values work best, click the Run button in the bottom right corner of the pane.

b) Masking all the cells that weren’t diseased was the next step and crucial to this diseased region selection process. To do this, go to the Analysis tab and the Raster group to find the Raster Functions. Open the Raster Functions and it should appear in the pane to the right. Search Mask and select it. Open the segmented image in the raster box and three options will be available for Included Ranges. These three options are representative of the blue, green and red bands. Because the data was adjusted from 12-bit to 8-bit in the segmentation process, we should have a range of values between 1 and 255. Despite having three band options, all the bands hold the same value because we are only working with the red band. Click on the segments you wish to include in the working window to check their values. Include a minimum and maximum which should be the same for all 1, 2 and 3 that includes the segments you want. The range I had included was 100 to 160, which is the RGB.Pixel Values that appear when clicking a pixel. Once you have these entered, click create new layer in the bottom of the right pane.

c) Next, I used the Clip function by navigating to the Imagery tab, the Analysis group and selecting Raster Functions. In the window pane to the left a search bar can be found along the top where you can enter Clip. Although there are other methods and locations the clip function can be found, I experienced different results depending on how I navigated to clipping, as well different clipping options in the window pane. In the Parameters section for Clip Properties, select the drop-down arrow for the raster you previously segmented. Leave Clipping Type and Clipping Geometry/Raster as the default. Adjust the active map view by zooming in or out to the region you want preserved after the clip. I use as small of area as I possibly can when clipping, while keeping all required segmented regions. Click the Capture Current Map Extent button found just to the right of the extent options in the pane (Green square map). To clip, click Create new layer at the bottom of the pane and a new map layer with the clipped region should appear in the map contents.

d) From here, go to the Analysis tab and Geoprocessing group where you can find the Tools. Click the Tools to open the options in the left pane. Here we want to use the Raster to Polygon (Conversion Tools) which can be found by searching at the top of the tools window pane. Use the most recent segmented-clipped raster for the Input raster, leave the Field blank and select where you would like the Output Polygon features to save. I left the remaining option as the default and clicked the Run arrow in the bottom right corner. Although we need the final product to be in raster format, this step is important for creating separate polygons which are grouped together as one unit.

e) As mentioned we must get the polygons back into a raster. Before we use the Polygon to Raster tool, polygons which overlap must be merged. This is one of the two reasons why we converted the raster to polygons. When the segmentation was conducted in (step d), diseased regions were not all categorized in the same bin. This resulted in regions which were maintained after masking but are different shades grey. Even one diseased patch may have two to three tones to it, resulting in segments for a patch. Now that everything is a polygon, we can merge these polygons that should be one polygon. Click the Edit tab and select Modify in the Features group. In the Modify Features, scroll down to the Construct group and click Merge. In Existing Features click Change the Selection and while holding shift, select the polygons which belong in one group. They will be highlighted in blue and appear in the pane if properly selected and if so, click Merge. Navigate to the Map tab and Selection group and click Clear. Use the same merge steps to merge any other polygons which belong to the same diseased lesion but were classified separately in raster segmentation.

f) Finally, the polygons can be converted back to a raster for the last step of image processing. Click the Analysis tab and find the Geoprocessing group to Tools once again. Search Polygon to Raster (Conversion Tools) in the Tools pane and select it. Use the polygons layer for Input Features and the window pane options should adjust to accommodate your raster. The only adjustment I made was to the Cellsize which I changed to 1 from 0.16 to maintain the cell size in my original raster. Select Run in the bottom right corner of the pane for the final layer. Each of the polygons should now be converted to a rasterized polygon with a unique ObjectID, Value and Count which can be found by going to the Attribute table for that layer or clicking on a group of pixels. This allows for each diseased patch to be aggregated and analyzed as such in Fragstats.

g) Go to the final raster layer, right click and go to data and export data. What I wanted is a TIFF file for analysis in Fragstats.

2. To compare the accuracy of the segmentation as a classification method for diseased pixels, manual classification was used as a ground truth technique.

a) For manual classification of the original red band, the Training Sample Manager was used. This allows for manual classification which can be used for supervised machine learning techniques of classification. Here, I simply used it to select diseased regions manually, but have an end goal of using the support vector machine learning model.

b) To get to the Training Sample Manager, go to the Imagery tab and Image Classification group and select the Classification Tools where you can click Training Sample Manager. In the Image Classification window pane go to create new scheme. Right click on New Scheme and Add New Class and name it diseased pixels and supply a value which is arbitrary and a description if desired. Click ok and select the diseased pixels scheme and then the Freehand tool in the pane. Draw an outline around the diseased regions in the image. Save these polygons and the classification scheme. Go to Add data in the Layer group on the Map tab to upload the training sample polygons.

c) Once the polygons are uploaded, use the Polygon to Raster protocol which can be found in step 1, part (f) followed by part (g).

Fragstats was used for the analysis of the polygons. For exercise 1 the intent in Fragstats was to simply obtain information about the diseased patches that were extracted from the red band image. This included the pixel number, area, perimeter, perimeter-area ratio and shape index. Many more options were available for patch analyses and will be considered for exercise 2.

1. Open up Fragstats for the patch analysis.

a) To import your first image click Add layer and go down to GeoTIFF grid and select it. In the Dataset name, search for your tiff file you created in ArcGIS Pro and select it and click ok. You have to do this for each of the datasets. Then go to the Analysis parameters and click Patch metrics in the Sampling strategy. For General options hit Browse to find a location for the output to save and click the Automatically save results checkbox.

b) Click on the red box called Patch metrics and in the Area – Edge tab select the Patch Area and Patch Perimeter. Click on the Shape tab and click Perimeter – Area ratio and Shape Index.

c) In the upper left corner, you can hit Run and it will check everything you have uploaded and the analysis you have selected. You must then hit Proceed after it checked the model consistency and it runs the analyses.

d) After this hit the Results options for viewing the output.

The patches aligned pretty well through visualization in ArcGIS Pro when comparing the two layers. The comparison between patches will be done in exercise 2. I did notice that the segmentation process missed many of the smaller diseased patches. Looking at Image 2, you can see that they were segmented from the surrounding regions but were lumped into a bin with that caught areas that appeared lighter in the image because of a reflection. Different thresholds could be used in the segmentation process in order to include those smaller patches. This could maybe be done if they followed a parameter involving shape. This may be considered for future segmentation. A concern is that you set the threshold too low and it includes many regions that aren’t diseased but include all the diseased regions also. This would result in many false positives for disease. The alternative is to set a high threshold value for diseased regions which would result in false negatives. The idea is to find a good balance between the two. Currently the segmentation is strictly based on the segments value which is attributed to the reflectance in the red band. As mentioned another parameter worth considering is shape. Since the diseased regions tend to be leaf spots resulting in a circular area typically, adjustments could be made to include more circular patches that don’t extend past a certain number of pixels. In table 1 we see that number of pixels for patches ranges from 8 to 50. The patch analysis provides some insights into possible spatial factors that explain these segmented regions and how the classification process could be done more accurately.

Table 1. Showing the 11 different patches that were manually selected and analyzed in Fragstats. The color indicates the corresponding patch that was detected through segmentation shown in Table 2.

Table 2. The 4 different patches that were identified through segmentation and analyzed in Fragstats. The color indicates the corresponding patch that was detected through segmentation shown in Table 1.


Critique of the method
The MicaSense red-edge camera has 5 bands which can be very helpful for applying different vegetative indices and compositing bands in order to help bring out the variation in spectral signatures between diseased and un-diseased tissue. Although the pixel size is just under 1 mm, which appears to be adequate for identifying lesions on the leaf, the bands do not have perfect overlap. This is due to the design of the camera which has five different lenses for the five different bands that are all separated by one to two inches. This could be corrected for if the extent for each of the five bands was manually adjusted for a near perfect overlap. Until this is resolved, the red band seems to show the most variation in pixel value for diseased patches in comparison to the other four bands and was used for this analysis.

Additionally, the amount of processing that is done in part 1 step (a) to get the segmented raster patches is has many steps. Methods for speeding up this process need to be considered for future analysis especially when conducting this analysis on the 500 leaves.

As mentioned before, Fragstats has many more statistically capabilities that were not applied in this analysis. Getting more statistics on the manually and segmented patches would be helpful for determining the level of accuracy as well as other parameters worth considering.


Image 1. The red band tif file uploaded into ArcGIS Pro for classification. The white patches are what we would expect the diseased regions to look like.

Image 2. The result of the segmentation performed on Image 1 described in step 1 part (a)

Image 3. The patches determined based on segmentation in part 1 from Image 2. Separate patches were created by using the Raster to Polygon tool, followed by the Polygon to Raster tool. Two layers were turned on here for the intent of showing the overlap between the raster patches and the original image.

Image 4. The patches determined based on the manual classification described in part 2 from Image 1. Training sample manager followed by polygon to raster was used to get these patches. Two layers were turned on here for the intent of showing the overlap between the raster patches and the original image.

Spatial Patterns of Salmonella Rates in Oregon

  1. Question Asked

At this stage I asked several questions regarding the spatial distribution of population characteristics in all counties in Oregon in 2014: What are the county level spatial patterns of reported age-adjusted Salmonella rates within Oregon in 2014? County level spatial patterns of proportions of females? Median Age? Proportion of infants/young children aged 0-4 years?

To answer these questions I used several different datasets. The first dataset used is a collection of all reported Salmonella cases in Oregon from 2008-2017 which includes information like sex, age group, county in which the case was reported, and onset of illness. The information in this dataset was deidentified by Oregon Health Authority. The second dataset used was a collection of Oregon population estimates over the same time period. This dataset includes sex and age group specific county level population information. I also obtained county level median ages from AmericanFactFinder. The last dataset used is a shapefile from the Oregon Spatial Data Library containing polygon information of all Oregon counties.

  1. Names of analytical tools/approaches used

I used a direct age adjustment (using the 2014 statewide population as the standard population) to obtain county level age-adjusted Salmonella rates. After calculating county level summary data e.g. proportion of females, proportion of children aged 0-4, median age, and age-adjusted Salmonella rates, I merged this information with a spatial dataframe containing polygonal data of every county in Oregon. After doing this I did both local (between 0-150 km) and global (statewide) spatial autocorrelation to get a Moran’s I statistic for each of the population variables listed above. I produced choropleth maps of each of the variables for Oregon as well. Finally, I produced a heatmap for county-level age-adjusted Salmonella rates using a Getis-Ord Gi* local statistic to evaluate statistically significant clustering of high/low rates of reported Salmonella cases.

  1. Description of the analytical process

After extensive reformatting, I was able to organize cases of Salmonella by age group and by county for the year 2014. After this I formatted 2014 county level population estimates in the same way. I then divided the Salmonella case dataframe by the population estimate dataframe to get rates by the different age groups. To get county age-adjusted rates I created a “standard population”, in this case I used Oregon’s statewide population broken down into the same age groups as above. I then multiplied the each of the county’s age-specific rates by the standard population’s matching age groups to create a dataframe of hypothetical cases. This dataframe represents the number of cases we would expect in each of the counties if they had the same population and age distribution as Oregon as a whole. I summed the expected Salmonella cases by county and divided this number by the 2014 statewide population. This yielded age-adjusted reported Salmonella rates by county.

Given that the population data contained county level populations broken down by age group and by sex I was able to calculate proportions of county populations which were female, and which were young children aged 0-4 years by dividing those respective group populations by the total county population.

After this I performed local and global spatial autocorrelation with Moran’s I using the county level median age, proportion of children, proportion of females, and age adjusted Salmonella rates which were associated with centroid points for each county. The global Moran’s I was calculated using the entire extent of the state and the local Moran’s I was calculated by limiting analysis to locations within 150 km of the centroid. Both global and local Moran’s I statistics were calculated using the Monte-Carlo method with 599 simulations.

Finally, I completed a Hot Spot Analysis using Getis-Ord Gi* to assess for any statistically significant hot or cold spots in Oregon. This was only done for the age-adjusted Salmonella rates. This was completed using the same county centroid points as above. I completed this analysis with a local weights matrix using Queen Adjacency for neighbor connectivity. The weighting scheme was set to where all neighbor weights when added together equaled 1.

  1. Brief description of results you obtained

Choropleth Maps of Oregon:

From the median age map, we can see that there are some clusters of older counties in the northeastern portion of the state and along west coast. Overall, the western portion of Oregon is younger than the eastern portion of the state.

From the proportion of children map there are a few clusters of counties in the northern portion of the state with high proportions of children compared to the rest of the state. Overall, the counties surrounding the Portland metro area have higher proportions of children compared to the rest of the state.

From the proportion of females map, we can see that the counties with the highest proportion of females are clustered in the western portion of the state.

Finally, from the age-adjusted county Salmonella rates map we can see that the highest rates of Salmonella occur mostly in the western portion of the state with a few counties in the northeast having high rates as well. Overall, the counties surrounding Multnomah county have the highest rates of Salmonella.

The global Moran’s I statistics:

  • County proportions of females: 0.053 with a p-value of 0.15. This suggests insignificant amounts of slight clustering.
  • County median age: 0.175 with a p-value of 0.02. This provides evidence of some significant mild clustering.
  • County proportions of children: 0.117 with a p-value of 0.05. This provides evidence of significant mild clustering
  • County age-adjusted Salmonella rates: -0.007 with a p-value of 0.32. This suggests insignificant amounts of higher dispersal than would be expected.

Local Moran’s I Statistics:

  • County proportions of females: 0.152 with a p-value of 0.02. This suggests significant amounts of mild clustering.
  • County median age: 0.110 with a p-value of 0.07. This provides evidence of some insignificant mild clustering.
  • County proportions of children: 0.052 with a p-value of 0.1617. This provides evidence of insignificant slight clustering
  • County age-adjusted Salmonella rates: -0.032 with a p-value of 0.5083. This suggests insignificant amounts of higher dispersal than would be expected.

Getis-Ord Gi*:

  • The heatmap shows a significant hotspot (with 95% confidence) in Clackamas county with another hotspot (with 90% confidence) in Hood River County. Three cold spots (with 90% confidence) are seen in Malheur, Crook, and Morrow counties.

  1. Critique of Methods

The choropleth maps were very useful at showing areas with high/values however this method was not able to detect counties with significantly different values compared their neighbors. Overall, it was useful as an exploratory tool. The global and local Moran’s I calculations were able to detect if high/low values were closely clustered or more dispersed than what is expected. However, I am unsure if this method was completely appropriate given the coarseness of this county level data. At a local scale, only the proportion of women showed a significant amount of clustering, and globally median age and proportion of children showed some amount of significant clustering. Given that most of the Moran’s I statistics were not associated with significant values, I don’t believe this analytical method highlighted a particularly meaningful spatial pattern in my data. The heatmap provided evidence of some significant hot and cold spots in Oregon, however this was based on immediate neighbor weights and perhaps global weights would be more appropriate. Overall, this tool was very useful in detecting significantly high/low Salmonella rates.

Exercise 1: Preparing for Point Pattern Analysis

Exercise 1

The Question in Context

In order to answer my question: are the dolphin sighting data points clustered along the transect surveys or do they have an equal distribution pattern? I need to use point pattern analysis. I am trying visualize where in space dolphins were sighted along the coast of California, specifically from my San Diego sighting area. In this exercise, the variable of interest is dolphin sightings. These are x,y coordinates (point data) indicating the presence of common bottlenose dolphins along a transect. However, these transect data were not recorded and I needed to recreate these lines to my best abilities. This process is more challenging than anticipated, but will prove useful in the short-term view of this class and project and long-term in management ramifications.

The Tools

As part of this exercise, I used ArcMap 10.6, GoogleEarth, qGIS, and Excel. Although I was only intending on importing my Excel data, saved as a .csv file into ArcMap, that was not working, so other tools were necessary. The final goal of this exercise was to complete point-pattern analyses comparing distance along recreated transects to sightings. From there, the sightings would be broken down by year, season, or environmental factor (El Niño versus La Niña years) to look for distributing patterns, specifically if the points were ever clustered or equally distributed at different points in time.

Steps/Outputs/Review of Methods and Analysis

My first step was to clean up my sightings data enough that it could be exported as a .csv and imported as x-y data into ArcMap. However, ArcMap, no matter the transformation equation, seemed to understand the projected or geographic coordinate systems. After many attempts, where my data ended up along the east coast of Africa or in the Gulf of Mexico, I tried a work around; I imported the .csv file into qGIS with the help of a classmate, and then exported that file as a shape file. Then, I was able to import that shape file into ArcMap and select the correct geographic and projected coordinate systems. The points finally appeared off the coast of California.

I then found a shape file of North America with a more accurate coastline, to add to the base map. This step will be important later when I add in track lines, and how the distributions of points along these track lines are related to bathymetry. The bathymetric lines will need to be rasterized and later interpolated.

The next step was the track line recreation. I chose to focus on the San Diego study site. This site has the most data and the most consistently and standardly collected data. The surveys always left the same port of Mission Bay, San Diego, CA traveled north at 5-10km/hr to a specific beach (landmark), then turned around. It is noted on sighting data whether the track line was surveyed on both directions (South to North and North to South), or unidirectional (South to North). Because some data were collected prior to the invention of a GPS and the commercial availability, I have to recreate these track lines. I started trying to use ArcMap to draw the lines but had difficulty. Luckily, after many attempts, it was suggested that I use Google Earth. Here I found a tool to create a survey line where I can mark the edges along the coastline at an approximate distance from shore, and then export that file. It took a while to realize that the file needed to be exported as a .kml and not a .kmz.

Once exported as a .kml, I was able to convert the .kml file to a layer file and then to a shape file in ArcMap. The next step in this is somehow getting all points within one kilometer of the track line (my spatial scale for this part of the project) to associate with that track line. One idea was snapping the points to the line. However, this did not work. I am still stuck here: the major step before I can have my point data with an association to the line and then begin a point pattern analysis in ArcMap and/or R Studio.


Although I do not currently have results of this exercise, fully. I can say for certain, that it has not been without trying, nor am I stopping. I have been brainstorming and milking resources from classmates and teaching assistants about how to associate the sighting data points with the track line to then do this cluster analysis. Hopefully, based on this can be exported to R studio where I can see distributions along the transect. I may be able to do a density-based analysis which would show if different sections along the transect, which I would need to designate and potentially rasterize first, have different densities of points. I would expect the sections to differ seasonally.


Although I add in my opinions on usefulness and ease above, I do believe this will be very helpful in analyzing distribution patterns. Right now, it is largely unknown if there are differences in distribution patterns for this population because they move rapidly and at great distances. But, by investigating data from only the San Diego site, I can determine if there are differences in distributions along the transects temporally and spatially. In addition, the total counts of sightings in each location per unit effort will be useful to see the influx to that entire survey area over time.

Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki

Point Pattern Analysis of Tree Distribution by Height in the HJ Andrews Forest

1. Given that HJ Andrew Experimental Forest is a 16,000-acre forest with manipulations spanning back to its establishment as a Long-Term Ecological Research (LTER) site in 1948, it is a highly spatially heterogeneous ecosystem. Forest harvest began in the 1950s and resulted in a mosaic of young plantation forest (~30 percent of total forest area) and old growth (~40 percent of forest) ( My objective is to quantify the spatial pattern of trees across the forest and eventually relate that to quantifiable landscape features.

Motivating Questions: How does the spatial pattern of trees vary across the HJ Andrews Forest? Specifically, I’m exploring the relationship between tree height and tree spacing. One specific question of interest is: How does the mean distance between trees in the same height class differ from the mean distance between a single height class of tree and all other trees? This question attempts to address the clustering vs. dispersion of trees by height.

This is an analysis of the spatial distribution of one variable, tree height, so a consideration of the internal processes that may influence the spatial distribution of this variable is necessary.

  • Microclimate caused by the clustering or dispersion of trees could be either an attraction or repulsion process. Microclimate influences relative humidity and exposure to wind, along with many other factors, so clusters of trees would tend to have different microclimate features than more dispersed trees.
  • Population and community dynamics will influence the spatial distribution of trees. The speed at which a colonizer can take over a space and competition between different colonizers influence the distribution. Aboveground and belowground tree growth adn spacing of trees will be influenced by these factors.
  • Source and sink processes in a forest may result from topographical features, like valleys and hillslopes. I expect valleys to be a source (capable of producing a surplus of organisms) sink they will tend to be in areas near streams, so not water limited and in areas that serve as catchments for nutrients, so not nutrient limited.
  • Spatial distribution of tree height certainly is different according to the scale. The spatial pattern looks different at a single tree scale compared with a 50-m scale, compared with a 5-km scale. Some of these differences are revealed by Figures 3, 4 and 5, below.

2. My approach is to use k-nearest neighbor to examine distance between a given tree and proximal trees. I created ten height classes by using kmeans to find ten cluster centers in the tree height data, then used K nearest neighbor to examine the distance between each cluster center and the 30 closest trees.

3. For this analysis, I used a LiDAR dataset of vegetation heights downloaded from the HJ Andrews online data portal ( The selected data is from the third entry, titled, “Vegetation Height digital elevation model (DEM) from 2011 LiDAR, Upper Blue River Watershed (Oct 27th, 2011 – Nov 1st, 2011). A description of this dataset can be found here:

I performed the majority of the analyses in R and used QGIS for data visualization. I used the ‘st_read’ function from the ‘sf’ package in R to read in the stem map shapefile (stem_map.shp). The stem map shapefile includes crown radii (m) and tree heights (m) as well as the point locations of trees.

Because the stem map shapefile essentially provides a census of trees within the HJA Forest, and that is (1) way too much data to deal with at one time, and (2) not useful to perform statistical analyses on since we can just look at the data mapped out and visually discern where trees are located, I decided to use kmeans to group the trees into 10 height classes, resulting in the data space being partitioned into Veronoi cells.

I used the ‘nngeo’ package to examine k-nearest neighbors relations between and within height clusters. I examined the relationship between a single tree (point) and its nearest neighbor of the same height class. I also examined the relationship between a single tree of one height class and its nearest neighbor of any height class to start to elucidate to what extent trees of similar heights cluster or are dispersed spatially.

I performed T-tests on each tree height class to test if there was a significant difference between (1) the mean distance between a tree and the next closest tree within the same height class and (2) the mean distance between a tree and the next closest tree of any height class. I calculated the mean, standard deviation and kurtosis of each height class distance (both within and between height classes).

4. Results

The table below shows the center of each of the ten height class groupings, mean distances between and within tree height classes, standard deviation and kurtosis of those means, and p-values. Results of all t-tests were significant (p<0.01), meaning that there is strong evidence that the within group mean distances are not equal to zero, so there are differences in mean distances between trees of the same height class (within group) and mean distance to the closest trees of any height class. In other words, the distance between a large tree and its nearest neighbor (of any height class) is significantly different than the distance between a large tree and its nearest neighbor within the large tree height class. The same is true of trees within and outside of the small height class, as well as each of the other ten height classes.

Table 1. Results from k-nearest neighbor analysis and t-tests

Tree Class Class Center (Height (m)) Mean Distance to 30 closest trees (m) St Dev Mean Distance Kurtosis Mean Distance Mean Distance to Closest Tree (m) St Dev Mean Mean Distance to Closest Tree Within Group (m) St Dev Group Mean P-value
1 11.8 12.5 4.4 6.4 2.1 2 4 6.3 <0.01
2 15.1 11.7 3.9 3 2.4 1.7 4.9 6.6 <0.01
3 19.9 12.6 3.4 4 3.6 1.3 6.8 6.7 <0.01
4 24.4 13.4 3.1 5.6 4 1.4 6.9 6.5 <0.01
5 29.7 14.6 2.9 3.9 4.6 1.4 8 6.9 <0.01
6 35.5 16.7 3 2.8 5.3 1.6 9.3 7.2 <0.01
7 42 18.6 2.6 4.5 6.1 1.7 10.1 7 <0.01
8 50 19.7 2.5 6.6 6.9 1.8 11.4 7.3 <0.01
9 59 20.5 2.6 3.1 7.5 1.8 13.5 8.7 <0.01
10 70 21.2 2.7 0.7 8.1 1.9 15 11 <0.01

Fig 1. Distribution of all tree heights (m) in HJ Andrews Forest.




Fig 2. Mean distance (m) and standard deviation (m) between trees of each of ten height classes and the next closest 30 trees within that height class. Generally, as trees get larger, mean distance between them is larger.

Fig 3. Distribution of the tallest tree class (70 m tree class) across the HJ Andrews Forest.


Fig 4. A representative distribution of all tree height classes on either side of a road in the HJ Andrews Forest, showing the extent of clustering and the extent of dispersion of height classes. Height classes are in ascending order from smallest class (~12m tall; Class 1) to tallest class (70m tall; Class 10).


Fig 5. A closer look in the same area at Fig. 4, where small trees are clustered near the road and clustered tightly together, while larger trees are more dispersed.


Fig 6. Mean distance between trees of the same height class and other trees of the same height class (blue) and mean distance between trees of one height class and any other tree (red). The overlapping red confidence interval with the blue points suggests that the average distance between small trees is not significantly different than distance between small trees and any trees. The general upward trend suggests that as trees get taller, the distance between them increases and variance slightly decreases.

5. Critique of the method:

The results make sense, but do not provide much more information about the actual distribution of trees (clustering vs. dispersion) than simple maps of point data, so the next step might be to examine tree heights within different management regimes. The current analysis tells me that trees are somewhat clustered by height, and that the mean distance between a tree of one height class to a tree of the same height class is, in most cases, different from the mean distance of a tree of one height class to a tree of any other height class. I’ve examined a map of different management regimes within the HJ Andrews Forest and there are clear areas of old growth, harvested areas, clearly defined plots, etc., so I would expect some of these areas to show tree clustering by height class. The patterns I found using this analysis were not as clear as I was expecting. Using kmeans and nearest neighbor analysis is a great way to start to examine the spatial relationships between and among data, but with such a large and highly varied dataset there can be shortcomings, especially when it comes to drawing any concrete conclusions.


HJ Andrews Online Data Repository:

Johnson, S.; Lienkaemper, G. 2016. Stream network from 1997 survey and 2008 LiDAR flight, Andrews Experimental Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: (10 April 2019) DOI:

Spies, T. 2016. LiDAR data (August 2008) for the Andrews Experimental Forest and Willamette National Forest study areas. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: (10 April 2019) DOI:

Spies, T. 2016. LiDAR data (October 2011) for the Upper Blue River Watershed, Willamette National Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: (10 April 2019) DOI:

Spies, T. 2014. Forest metrics derived from the 2008 Lidar point clouds, includes canopy closure, percentile height, and stem mapping for the Andrews Experimental Forest.. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: (10 April 2019) DOI: 6073/pasta/875e10383e8c8aee3c9a49e0155eef1d.

Exercise 1: Principal component analysis for rural index weighting & rural indicator measure contributions for Texas counties

Questions Being Asked:

How much do different indicator measures of rurality in Texas counties each contribute to a basic index of rurality both statistically and spatially? How much does the weighted basic index differ from a simple calculated mean for rurality?

Tools and Approaches Used

The major statistical method I used to answer these questions was principal component analysis (PCA) via the prcomp function in R and subsequent rurality indicator variable weighting for counties in Texas. The raw indicator data used in this analysis are as follows: Texas county median household income (2010 Census Data), Texas county population density (2010 Census data), and rasterized discrete data of land use for the state of Texas (2011 National Land Cover Database).

Description of Analysis Steps

1) Data Wrangling: Both income and population density data were natively in county-level polygon form, so no conversion was required in ArcGIS. Land use required extensive conversion in order to produce the indicator variable that would both be at the appropriate spatial definition and measure the correct aspect of land use. The raw land use discrete data was first reclassified from many categories of land, such as “deciduous forest,” “barren land,” and “scrub/shrub,” into binary “developed” and “undeveloped” classification groups for the entire state of Texas. Then, the resulting binary raster was converted to unsimplified polygons and the “tabulate intersection” ArcGIS tool was utilized in order to intersect Texas county boundaries with the produced unsimplified polygons. This tool allowed for calculation of percent of area of each Texas county that is considered developed land by the USGS.

2) Data Scaling: After the 3 variables were in analyzable form, each were scaled on a 0 to 100-point scale for visual comparison, PCA analysis, and computation of index weights.

3) Creation of Maps of Scaled Variables for Comparison: ArcGIS was utilized to create maps of each of the scaled variables to visually compare the 3 factors to one another.

4) PCA and Weight Computation: The PCA was completed in R using the prcomp function and variable weights were extracted from the procedure by determining the proportion of variance each variable contributed to. This specific step helps answer how much each indicator variable contributes to rurality in Texas.

5) Simple Means Map, Weighted Index Map, and Statistical Comparison: The scaled variables were used to calculate both an unweighted mean rurality score and weighted mean rurality score for all Texas counties. Maps of each of these measures were produced for visual comparison. A student’s t-test was then completed to compare unweighted and weighted rurality scores for counties.


Maps of Scaled Variables of Texas Counties for Comparison

Rough but similar spatial patterns can be seen in all three maps. Counties with large cities and generally more urbanized counties have higher median income, population density, and percent of land area developed (Green=high, yellow=medium, red=low). There are more extreme values in the population density and percent of land area developed maps than median income.

PCA and Weighting Computation

The biplot of the PCA procedure helps visualize how the samples of the 3 variables relate to one another and how much each variable contributes to the first principal component. Important factors to notice in this plot are the direction of the arrows and clustering of the points. Typically, the more the variable arrow points toward the principal component 1 (PC1) axis, the more it contributes to the proportion of variance. The population density and percent land area developed variables happen to overlap one another in this plot, as they have highly similar contributions to the variance in PC1.

This table shows that population density contributes to 44.4% of the variance in PC1, income contributes to 11.11%, and percent land area developed contributes 44.5%. This PCA procedure indicates that when weighting rurality scores by county in Texas using these 3 variables, percent land area developed should have the highest weight, population density should have the second highest weight, and income should have the lowest. The weighting of variables in the weighted county rurality mean score follows these proportions.

Simple Means Map, Weighted Index Map, and Statistical Comparison

Visually comparing the two maps of rurality scores, there is is some overlap, but quite stark differences. County scores overall have become more rural (lower scores) based on simple visual comparison. Statistical analysis will allow a more nuanced view of the actual differences between the two.

The computed t-test indicates that the weighted index, on average, produces county rurality scores that are significantly more rural (lower scores on the 0-100 scale) than the unweighted index (p<0.001). The overall unweighted rurality score mean for all Texas counties is ~17, while the overall weighted rurality score mean is ~7.5. This indicates that the basic weighted index may better capture the multidimensionality of rurality for Texas counties than unweighted mean scores.


Based on my understanding, PCA is quite useful for creation of weighted indexes. Unfortunately, the variables I utilized for this analysis have somewhat varying patterns of clustering and possible outliers, which can be seen in the above biplot. These possible outliers could have an effect on the weighting of variables in the index. Due to the small sample size of counties and in order to produce a complete picture of all counties in Texas, I did not want to remove any counties from the analysis. PCA as it relates to index weighting also requires that there is either an a priori understanding or statistical proof of a correlation between variables. Based on the directions of the arrows in the biplot, there is likely correlation between the variables, but more considerations may be needed to confirm this. Also, the scaling of the median income variable is slightly off because the maximum value is over 100. This will need to be corrected in a later analysis.

My future analysis will include more health-related variables in the PCA procedure to determine how health variables contribute to rurality in comparison to the basic rural indicator variables analyzed in this exercise.

Ex 1: Mapping the stain: Using spatial autocorrelation to look at clustering of infection probabilities for black stain root disease

My questions:

I am using a simulation model to analyze spatial patterns of black stain root disease of Douglas-fir at the individual tree, stand, and landscape scales. For exercise 1, I focused on the spatial pattern of probability of infection, asking:

  • What is the spatial pattern of probability of infection for black stain root disease in the forest landscape?
  • How does this spatial pattern differ between landscapes where stands are clustered by management class and landscapes where management classes are randomly distributed?

    Fig 1. Left: Raster of the clustered landscape, where stands are spatially grouped by each of the three forest management classes. Each management class has a different tree density, making the different classes clearly visible as three wedges in the landscape. Right: Raster of the landscape where management classes are randomly assigned to stands with no predetermined spatial clustering. The color of each cell represents the value for infection probability of that cell. White cells in both landscapes are non-tree areas with NA values.

Tool or approach that you used: Spatial autocorrelation analysis, Moran’s I, correlogram (R)

My model calculates probability of infection for each tree based on a variety of tree characteristics, including proximity to infected trees, so I expected to see spatial autocorrelation (when a variable is related to itself in space) with the clustering of high and low values of probability of infection. Because some management practices (i.e., high planting density, clear-cut harvest, thinning, shorter rotation length) have been shown to promote the spread of infection, there is reason to hypothesize that more intensive management strategies – and their spatial patterns in the landscape – may affect the spread of black stain at multiple scales.

I am interested in hotspot analysis to later analyze how the spatial pattern of infection hotspots map against different forest management approaches and forest ownerships. However, as a first step, I needed to show that there is some clustering in infection probabilities (spatial autocorrelation) in my data. I used the “Moran” function in the “raster” package (Hijmans 2019) in R to calculate the global Moran’s I statistic. The Moran’s I statistic ranges from -1 (perfect dispersion, e.g., a checkerboard) to +1 (perfect clustering), with a value of 0 indicating perfect randomness.

Moran’s I = -1

Moran’s I = 0

Moran’s I = 1









I calculated this statistic at multiple lag distances, h, to generate a graph of the values of the Moran’s I statistic across various values of h. You can think of the lag distance of the size of the window of neighbors being considered for each cell in a raster grid. The graph produced by plotting the calculated value of Moran’s I across various lag values is called a “correlogram.”

What did I actually do? A brief description of steps I followed to complete the analysis

1. Imported my raster files, corrected the spatial scale, and re-projected the rasters to fall somewhere over western Oregon.

I am playing with hypothetical landscapes (with the characteristics of real-world landscapes), so the spatial scale (extent, resolution) is relevant but the geographic placement is somewhat arbitrary. I looked at two landscapes: one where management classes are clustered (“clustered” landscape), and one where management classes are randomly distributed (“random”). For each landscape, I used two rasters: probability of infection (continuous values from 0 to 1) and non-tree/tree (binary, 0s and 1s).

2. Masked non-tree cells

Since not all cells in my raster grid contain trees, I set all non-tree cells to NA for my analysis in order to avoid comparing the probability of infection between trees and non-trees. I used the tree rasters to create a mask.
c.tree[ c.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA
c.pi.tree <- mask(c.pi, c.tree) # Combine the prob inf with tree, leaving all others NA
# Repeat with randomly distributed management landscape
r.tree[ r.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA
r.pi.tree <- mask(r.pi, r.tree) # Combine the prob inf with tree, leaving all others NA

Fig 2. Filled and hollow weights matrices.

3. Calculated Global Moran’s I for multiple values of lag distance.

For each lag distance, I created a weights matrix so the Moran function in the raster package would know how to weight each neighbor pixel at a given distance. Then, I let it run, calculating Moran’s I for each lag to create the data points for a correlogram.

I produced two correlograms: one where all cells within a given distance (lag) were given a weight of 1 and another “hollow” weights matrix when only cells at a given distance were given a weight of 1 (see example below).

4. Plotted the global Moran’s I for each landscape and compare.







What did I find? Brief description of results I obtained.

The correlograms show that similar values become less clustered at greater distances, approaching a random distribution by about 50 cell distances. In other words, cells are more similar to the cells around them than they are to more-distant cells. The many peaks and troughs in the correlogram are present because there are gaps between trees because of their regular spacing in plantation management.

In general, the highest values of Moran’s I were similar between the landscape with clustered management landscape and the landscape with randomly distributed management classes. However, the rate of decrease in the value of Moran’s I with increasing lag distance was higher for the random landscape than the clustered landscape. In other words, similar infection probabilities had larger clusters when forest management classes were clustered. For the clustered landscape, there was actually spatial autocorrelation at lag distances of 100 to 150, likely because of the clusters of higher infection probability in the “old growth” management cluster.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Filled” weights matrix.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Hollow” weights matrix.














Critique of the method – what was useful, what was not?

My biggest issue initially was finding a package to perform a hotspot analysis on raster data in R. I found some packages with detailed tutorials (e.g., hotspotr), but some had not been updated recently enough to work in the latest version of R. I could have done this analysis in ArcMap, but I am trying to use open-source software and free applications and improve my programming abilities in R.

The Moran function I eventually used in the raster package worked quickly and effectively, but it does not provide statistics (e.g., p-values) to interpret the significance of the Moran’s I values produced. I also had to make the correlogram by hand with the raster package. Other packages do include additional statistics but are either more complex to use or designed for point data. There are also built-in correlogram functions in packages like spdep or ncf, but they were very slow, potentially taking hours on a 300 x 300 cell raster. That said, it may just be my inexperience that made a clear path difficult to find.


Glen, S. 2016. Moran’s I: Definition, Examples.

Robert J. Hijmans (2019). raster: Geographic Data Analysis and Modeling. R package version 2.8-19.


Exercise 1: Comparison of Interpolation Methods Using Fisheries Abundance Data

Question Asked

I would like to understand how the spatial variability of forage fish in the California Current Ecosystem is related to the spatial pattern of seascape classes (based on remotely sensed sea-surface variables)? In exercise 1, I will asking “what is the spatial pattern of forage fish in the California Current ecosystem?” In order to address this question, I will be testing the use of different types of interpolation on my point-data.

Approaches Used and Methods

To address these questions, the Kriging Interpolation and Inverse Distance Weighting Interpolation tools were employed in ArcMap. All processes were completed in ESRI ArcMap 10.6.1. Interpolation is described as a method of constructing new data using existing data as references. In spatial and temporal analyses, there are a range of different types of interpolation that can be used.

The original data, which includes a series of about 1300 trawls, the catch per unit effort (CPUE) per species per trawl, the latitude, longitude, and water column depth of each trawl, and the functional group of each species caught, were loaded into ArcMap using the “Display X,Y Point Data” function. The four functional groups used in this analysis are Forage, Young-of-the-Year Rockfish (YOYRock), YOYGround, and crustaceans. In the case of this analysis, all fishes that are part of the YOYRock functional group were included.

Representation of the raw YOYRock Trawl data in ArcMap

After the data were uploaded, they had to be converted to feature classes. For the purposes of this exercise, all trawl data from 2002 to 2015 was included as one feature class, though the way in which the data are organized make it easy to break the trawls down at a finer temporal resolution. The result was a Point Shapefile that was then binned into four abundance groups to make interpretation easier. The IDW and Kriging tools (from the “Spatial Analyst” Toolbox) were then employed. The base equation for both interpolations is virtually the same, and both are commonly used methods for continuous data in point form, but there are some major differences in the calculation of some of the stated variables:

Representation of the equation used to assign values to missing cells using the IDW and Kriging methods. Z(si) = the measured value at the ith location, λi = an unknown weight for the measured value at the ith location, s0 = the prediction location, and N = the number of measured values

Z(si) = the measured value at the ith location, λi = an unknown weight for the measured value at the ith location, s0 = the prediction location, and N = the number of measured values

1) IDW: IDW, or Inverse-distance weighting, is what’s known as a deterministic interpolation method, as it relies on surrounding values to populate the resulting surface. One of the major assumptions with using IDW is that it assumes that the variables being mapped decrease in influence linearly as you move away from a given value. In IDW, the weight given to the “measured value at the ith location” is solely calculated linearly, decreasing as you move farther away from a given value. In this case, the “power” value, which corresponds to the weight, was the default 2.

2) Kriging: Kriging is an interpolation method based on geostatistics including autocorrelation. The equation used to calculate the missing values is the same as for IDW, except that the weight variable is calculated using a mathematical function within a certain specified radius of the missing value. For this reason, one of the main assumptions when using Kriging is that the “distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface.” (ESRI, ND).


The resulting interpolations can be found below. I’ve included output that focuses on the significant region of Monterey Bay to provide context regarding the proximity of the trawls to one another and to show detail on the boundaries of the interpolations.

Full Interpolation using IDW

Full Interpolation using Kriging

Detail of IDW in Monterey Bay, CA

Detail of Kriging in Monterey Bay, CA

Critiques of Methods

Any analysis of the results will conclude that the Kriging Tool provided a much more robust interpretation of the same patterns that can be observed in the original data and in the IDW interpolation. Both interpolations displayed significant patterns near Monterey Bay, and the Kriging Interpolation also represented additional areas of higher abundance north of San Francisco. While I do believe that both interpolation methods provide a good place to begin analysis, I believe that several adjustments will have to be made in order to create a useable result.

My first mistake was using the entire time series – since the interpolations are distance-based and extremely sensitive to points within close proximity to one another, the clear clusters of point most likely influenced the interpolations. A next step for me will be breaking the data down to an annual resolution, as I feel that shapefiles with one point at each trawl location will provide better data for interpolation. Additionally, this will provide multiple maps, which will allow for a chance to observe how the modeled patterns of YOYRock abundance have changed over time.

Another next step will be exploring the fine adjustments available within each interpolation method. I now have a greater understanding of the mechanics which drive IDW, so I’m eager o rerun the analyses at different powers to see how each impacts the interpolation. Similarly, the radius used to decide which points are considered while calculating a Kriging Interpolation can be adjusted, so that will be done in the future as an experiment.

Finally, I ran out of time to explore the symbology of the results – I hypothesize that classification by a smaller number of classes would result in more robust interpolation maps, as the visualizations now show a vast majority of the space to fall into the lowest class. The interpolation data is relatively bimodal in nature, so an adjustment in the symbology tab would likely result in a a more accurate and precise representation of abundance.

Overall, I see interpolation as a valuable way to identify spatial patterns from point data. In the case of species abundance data, I believe that Kriging is the superior method, as it does not have the linear influence assumption that’s baked into IDW. Additionally, the geostatical methods used in Kriging generally allow for a more robust and precise interpolation regardless of the type of continuous data being used.

The next steps mentioned above will be taken before the presentation of Tutorial 1.


Santora, Jarrod & Hazen, Elliott & Schroeder, Isaac & Bograd, Steven & Sakuma, KM & Field, JC. (2017). Impacts of ocean-climate variability on biodiversity of pelagic forage species in an upwelling ecosystem. Marine Ecology Progress Series. 580. 10.3354/meps12278.


Determining spatial autocorrelation of refugee settlements in Uganda

  1. Question that you asked

In analyzing the distribution of settlement boundaries that I obtained from OpenStreetMap, I wanted to know the general clustering and regionality of settlements in order to understand how other spatial statistics that I perform in my next step will behave based on the results from this explanatory step. The question I’m introducing for Exercise A centers around how similar or different nearby settlements are to each other: is there a regionality in settlement sizes? Some future questions that I’m considering are whether or not clustered settlements have higher detection in the World Settlement Footprint classification (which I’m using instead of GHSL because it’s a more localized classification). This would be determined using the percent of OSM settlement area that was detected by WSF.

  1. Name of the tool or approach that you used.

I decided to use spatial autocorrelation to answer this question, since the essence of my question centers around what the pattern of my data looks like, how clustered or not clustered are these settlements, and how that might affect my future analysis and considerations.

For this, I employed the Spatial Autocorrelation tool in ArcGIS Pro, which uses Global Moran I’s algorithm.

  1. Brief description of steps you followed to complete the analysis.

In order to identify the refugee settlements, I went through multiple rounds of querying with OpenStreetMap to extract the boundaries that are directly related to refugee camp boundaries. Hannah Friedrich and I have worked together on defining some of these boundaries, and she took some time to delineate boundaries for a separate study of hers. While I will incorporate these delineated ‘regions of interest,’ I will most likely not move forward with them in further analyses because it would present an issue with scaling up and manual interpretation. Below I list the three polygon data I’m analyzing here.

OSM Boundaries: This is a result of multiple query efforts within Overpass Turbo, a online server for downloading OpenStreetMap data. Various queries on attribute tags were performed in order to select relevant refugee boundary data.

Regions of Interest: This is a result from Hannah’s efforts of limiting areas within the OSM Boundaries that appear to be built up using high resolution imagery available on Google Satellite.

Selected OSM Boundaries: This is a selection from OSM Boundaries that merges polygons of the same settlement into multipolygons; this also a selection of boundary polygons that are less than 2000 hectares to account for some boundaries that are designated versus actually settled.

I then ingested these boundaries into Google Earth Engine, extracted the pixel areas for the WSF pixels within the three different boundary layers, and re-exported these.

I then imported these layers into ArcGIS Pro and performed the Spatial Autocorrelation tool and experimented with multiple parameters.

I ultimately chose the “row standardization” option given the possible bias in ‘sampling’ design, since this creation of this data is most often from the HOT Uganda team and resources might limit where they can travel and collect data.

  1. Brief description of results you obtained.

Ultimately, my results showed clustering within the OSM-identified settlements, which is to be expected. There are a variety of questions that arise from this that might confound my analysis that are still moderately troubling: the method of data recording, the spotlight effect, the presence of organizations able to record data, and the inherent clustering based on human behavior, or drivers such as conflict or environmental conditions. This initial analysis is really to understand the degree of clustering that I might expect in my future analyses – if there is high clustering, then I need to understand that clustering when I’m testing additional questions with my next exercise, like nearest road or nearest urban area. The images below demonstrates the result of an analysis settlement area in the “Selected OSM Boundaries,” “OSM Boundaries,” and “Regions of Interest.”

Moran I, Selection of OSM Boundaries

Moran I, Delineated Regions of Interest

Moran I, All OSM Boundaries


While OSM Boundaries and Selection from OSM Boundaries show high confidence of clustering, the “Regions of Interest” pattern shows a less confident result of clustering. This makes sense, given that both the OSM Selection and OSM Boundaries both have overlapping polygons and varying sizes.

Below is a map illustrating the distribution of refugee areas in northwest Uganda, where most of the settlements are concentrated.

  1. Critique of the method – what was useful, what was not?

Given my shortcomings with understanding statistical analyses, the interpretation of the results was most difficult for me. While my patterns appeared mostly clustered, the z-score and Moran’s Index showed changes when different parameters of comparison were chosen. Ultimately, I’m not sure if there would be a more effective spatial analysis to analyze aspects of these settlements that would prove more helpful in figuring out relevant information for moving forward with Exercise 2 and beyond.

Ex. 1: Spatial Autocorrelation in Stream Cross-Sectional Change In the Andrews Forest

Question that you asked

How much spatial autocorrelation exists in historic streambed changes in the Andrews Forest, both in the across-stream direction and the along-stream direction?

Name of the tool or approach that you used

I addressed each direction of the problems separately using. Since this resulted in one-dimensional, evenly spaced or faux-evenly spaced data, I just used the acf function in R to analyze the data.

Critique of the method – what was useful, what was not?

The methods provided an interesting look at patterns within the cross section data set. However, I think the one-dimensional approach fails to capture interactions between across- and along- channel changes as well as any temporal autocorrelation.

Description of steps you followed to complete the analysis and results you obtained.

For my project, I am looking at changes in the streambed along 4 stream reaches (black boxes) along two streams (blue lines) in the Andrews forest. The watershed of the larger stream, Lookout Creek is shown in black.

Researchers in the Andrews forest repeatedly surveyed fixed cross sections along the reaches from 1978 to 2011. In this exercise, I worked with a subset of the data from 1978 to 1998. The cartoon below does not contain real data but shows the arrangement of numbered reaches (blue lines) and their fixed endpoint stakes (blue dots) along a stream.

Part A – across-channel changes

Each individual cross section spans from bank to bank and show the topography and bathymetry across the width of the stream. Below are two consecutive years of cross-sectional profiles of the stream bed at Lower Lookout Creek, cross section # 3.

Profile from 1995.

Profile from 1996.

It looks like there was a lot of change between these two profiles. The active stream channel appears to have avulse from one side of the cross section to the other.

I compared the two profiles to see how much elevation was gained or lost.

Here grey line represents the surface of the stream bed in 1995, and the black line represents the surface of the stream bed in 1996. The area show in red represents material that was eroded away, and the green area represents material that was newly deposited.

We can simplify the plot above to show only the positive or negative elevation change across the channel. The figure below shows the same information as the figure above, but directly in terms of elevation change.

You can clearly see from this figure that this site at this year had one broad area of elevation gain or deposition, and one broad area of elevation loss or erosion. This isn’t precisely an attraction/repulsion phenomenon, but there can be positive feedbacks associated with erosion and deposition. For example, sediment deposited in a stream may locally obstruct or slow down the water passing by it and encourage deposition in adjacent areas.

Now, not every pair of years had this *much* elevation change, but also, not every pair of years shows such broad regions of positive or negative elevation change. Here is a figure of the elevation change at the same cross section between 1996 and 1997. You can see that elevation change is both smaller in magnitude and shows up in smaller patches.

I used spatial autocorrelation metrics to investigate the patch size. Note that autocorrelation at some scales is always expected when looking at topographic change. Even if you had extremely precise and accurate measurements, you would expect the stream bed change at one point to be strongly correlated with the change a few cm away both because the two points probably experience very similar forces and also because gravity tends to smooth out topographic irregularities in nature. But in most scenarios, you would probably not expect to find a correlation between the changes in a streambed with a change at a point 100 m away from the bank.

I picked two distances that seemed potentially interesting: 1 meter and 5 meters. Then I investigated autocorrelation at 1m and 5m lag distances using the acf function in R.

These lag distances are superimposed on the first change graph in the figure below.

I expected that all cross sections would have higher autocorrelation at the shorter lag distance than the longer one, but I also expected that a set of cross sections with very large patch sizes might show relatively higher autocorrelation at the larger lag distance compared to the small-patched cross sections. I need to spend more time looking into how the magnitude of the changes influences the autocorrelation function so that I can understand it better.

I repeated the process for every cross section at every year. Results are shown below.

The cross sections showed mostly positive autocorrelation at 1 m. Some years showed more autocorrelation across the board than others, and some individual cross sections seemed like they frequently had high autocorrelation values at multiple years.

Different patterns emerge at this spatial scale. At a 5 m lag, neutral or negative autocorrelation was much more common.


Part B – along-channel changes

I also wanted to look at autocorrelation between cross sections. Was the change at one cross section related to the change at cross sections immediately up- or down-stream?

Because I don’t have true geographic locations of the cross sections, I couldn’t do any fancy two dimensional work like interpolating the data into a river bathymetry model. Instead, I consolidated the cross-sectional change data into a single metric for each cross section.

Here is another look at the change plot I showed earlier. This is the same plot as before, but I’ve colored the positive values green to show deposition and the negative values red to show erosion.

I simplified this plot by removing the horizontal across-stream component and summing the red and green areas separately.

Then I had a choice: I could add the red and green areas together to show the total area of the cross section that changed in any way, or I could subtract the red area from the green area to show the net deposition in the cross section. Each of these metrics could be interesting for different reasons. For example, the total area of change might be relevant for ecological processes, but the net overal change could be relevant for tracking net sediment transport in these streams.

I started by adding the boxes together to calculate the total area of change.

In this way, I made a score for every cross section for every year pair. Hypothetical scores for the cartoon river are shown below.

Then I used the acf function again in R to determine how correlated adjacent cross section scores were with each other, compared to more distant cross section scores. This method is not fully spatial explicit, since I did not include any information about how far apart each cross section was from its neighbors. I just assumed that they were roughly equidistant within each reach. Indeed, the distance between cross sections can become hard to define when they are placed along curved parts of the stream.

I calculated these values with every year pair on each of the four reaches. I then repeated it with the subtraction method described above.

For the “addition” method, autocorrelation values were generally close to neutral with some reaches in some years tending towards positive values. Positive values could occur if changes in one reach directly influenced an adjacent reach (i.e. if upstream channel change influenced the hydraulics directly downstream) or if adjacent cross sections were influenced by similar other factors (i.e. two adjacent cross sections placed on a steep part of the reach might experience more similar changes to each other than two cross sections placed in a shallower part of the reach.)

The pattern is different for net change, where negative autocorrelation is more common. I think this is very interesting, because it implies that parts of the stream that had net deposition were more likely to be adjacent to parts of the stream that had net erosion. This could be an expression of a sink/source phenomenon where material scoured from one area is transported and deposited immediately downstream at the next cross section.

Exercise 1: Spatial Distribution of Juniper Saplings

1.Question asked and data used: 

  • What are the spatial patterns of juniper sapling establishment within a treated watershed 14 years after most juniper were removed?
  • Two data sets were used for this analysis:
    • Thirty-three belt transects (30 m in length, 3 m in width) were used to assess the density of juniper saplings within the watershed (data in form of excel spreadsheet listing latitude and longitude with number of juniper per transect)
    • An orthomosaic created from UAV imagery of a small area within the watershed (data in form of multispectral raster, with brightness values for red, green, blue, near-infrared, and re-edge wavelengths)
  • Watershed map (National Agricultural Imagery Program image from 2016 shown) with belt transects and UAV study area:

2. Approach and tools used: A number of techniques were initially applied to explore the data and to examine which approaches were more (or less) effective.

  • Belt transects:
    • Kriging
    • Inverse Distance Weighted (IDW) interpolation
    • Spatial autocorrelation (Global Moran’s I)
    • Hot-spot analysis
  • Classified raster (UAV orthomosaic):
    • Hot-spot analysis

3. Steps used in analysis:

  • Slope and aspect calculation:
    • The slope and aspect tools (Spatial Analyst Tools) were applied to 30 m DEM (from This information was noted for hot spots and cold spots but not used for further analysis during this exercise.
    • The extract multi values to points was used to extract the slope, aspect, and elevation value
  • Belt transects:
    • Projected data into NAD 1983 UTM Zone 10N
    • The location of each survey was symbolized by color based on the number of juniper found.
    • Kriging (Spatial Analyst Tools and Geostatistical Wizard) was used for initial exploratory analysis to assess general spatial distribution across the watershed
      • The number of juniper per transect used as the z-value field
      • Histogram and QQPlot used assess distribution of the number of juniper per transect for the geostatistical layer
      • For the raster layer, the predicted values were compared to the observed values by extracting the values at each survey location
    • IDW interpolation
      •  Juniper used as input feature
      • Maximum neighbors:15; Minimum neighbors:10
    • Spatial autocorrelation (Global Moran’s I) (Spatial Statistics Tools)
      • Calculated using the number of juniper per transect
      • HTML file created lists Moran’s index, z-score, and p-value
    • Hot Spot Analysis (Getis-Ord Gi) (Spatial Statistics Tools)
      • Conceptualization of spatial relationships: Fixed distance band
      • Distance method: Euclidean distance
      • Default was used for the distance band
  • Classified Raster (orthomosaic):
    • Supervised classification (support vector machine) was used to identify juniper within the UAV orthomosaic
    • General procedure for classification: create training samples-> assess using interactive training sample manager and scatterplots->create .ecd file->apply classifier->apply Majority Filter tool
    • A point file was created from pixel clusters identified as juniper within the image
    • Hot Spot analysis (Getis-Ord Gi)

      • Conducted to assess areas of concentration of juniper saplings within the sample area
      • Integrate tool used to aggregate juniper data (5 m used for analysis, but other distances were initially tested)
      • Hot Spot analysis tool using aggregated data created in previous step as input layer (other inputs were the same as those used for the belt transect hot spot analysis)

4. Overview of Results:

  • Belt Transects
    • Kriging. I used two different approaches to kriging. First, I used the kriging tool under spatial analyst tools and then used the geostatistical wizard to calculate ordinary prediction kriging. The resulting maps using these two approaches were different. In particular, the use of the geostatistical wizard created maps more similar to the IDW while the geostatistical wizard created a map with different contours.
      • Related statistics:
        • minimum:0.05
        • maximum: 6.97
        • mean: 2.83
        • standard deviation: 0.86
      • Spatial Analyst Kriging Map:
      • Using the export values function, the predicted values of this kriging method at each belt transect site were very similar (within 0.05) to the observed values.
      • Ordinary Prediction Kriging:
      • The QQ plot indicates that assumptions of normality may not be met:
      • The ordinary prediction kriging also tended to overestimate the juniper density based upong the distribution chart:
        • Related cross-validation/error statistics for the ordinary prediction kriging:
          • Mean -0.0583700520511708
            Root-Mean-Square 2.09329560839956
            Mean Standardized -0.0193126589526469
            Root-Mean-Square Standardized 0.986330022079785
            Average Standard Error 2.09938534113134
    • IDW
      • While general trends were similar to kriging, the size and shape of contours between the methods were different.
      • Mean: 3.08, range: 0.00 to 7.00, standard deviation: 1.16
      • IDW
    • Spatial autocorrelation (Global Moran’s I)
      • Moran’s I based on juniper per transect: -0.019, with a p-value of 0.92
      • Indicates that the pattern of juniper density is considered random and not significantly different than a random distribution
    • Hot Spot analysis (Getis-Ord Gi)
      • One hot spot found (90% confidence): northwestern aspect (300 degrees) with 8.8% slope
      • One cold spot found (95% confidence): north-northwestern aspect (347 degrees) with 11.5% slope
      • Remaining points were considered insignificant
      • Map of Hot Spot analysis:
  • Classified Raster
    • Hot Spot analysis (Getis-Ord Gi)
      • Within the area covered by the orthomosaic, four hot spots were found:
        • One area with 99% confidence: 11% slope, 325 degree aspect
        • Three areas with 95% confidence: 3% slope, 254 degree aspect; 11% slope, 167 degree aspect; and 8% slope,137 degree aspect
      • UAV Orthomosaic Hot Spot map:

5. Discussion/Critique of Methods.

Based on the maps produced using the IDW and kriging methods, some general trends in juniper spatial distribution exist within this watershed. For example, we see lower densities in the north-northeastern areas and greater densities of juniper in the southeastern and western areas. However, both the IDW and the kriging raster using the spatial analyst tool produced the characteristic “bulls-eye” pattern often associated with IDW. When compared to the ordinary prediction kriging maps, different interpretations of juniper density in the watershed might be made. Compared to the belt transects data, the kriging created using the spatial analyst tool was similar to the observed values while the ordinary prediction kriging tended to overestimate the distribution. However, more ground data is necessary to determine how accurate these prediction would be in other areas.  One of the biggest takeaways from this is to carefully consider the approach used and the nature of the data (for example, the use of ordinary versus simply kriging, etc.). From a user standpoint, I found the geostatistical wizard the most useful approach — particularly as it made inspecting the statistics and semivariogram very easy. However, in the future I would explore different methods of kriging within this tool.

The spatial autocorrelation and hot spot tools were useful, although the results did not provide much significant information regarding the spatial distribution of juniper in the cases examined here. For future analysis, particularly when examining the relationship between juniper density and other watershed characteristics (e.g., slope, aspect, etc.) these tools may become more important. In the case of the UAV orthomosaic, this provided a “test run” for what will be a larger dataset. The steps taken prior to analysis, particularly creating a point layer from the classified pixel clusters, will be time intensive and may require alternative approaches. At the limited scale of the UAV orthomosaics these hotspots did not provide much useful information, but if extrapolated over a larger area more patterns may be observed.

The distribution of juniper saplings in this case may be associated with a number of factors. Juniper seeds may be spread through birds or wind, resulting in the spread of juniper saplings throughout the watershed. At a much larger scale, if seed sources are less available, this distribution may be more localized. This pattern may also vary with the presence of mature juniper. As previously discussed some patterns emerged in this data (lower densities in the northern areas of the study area, for example) which may be associated with the spatial distribution of other characteristics, such as soil moisture. For example, sources and areas of higher juniper may include areas were we anticipate higher soil moisture such as flatter slopes with northerly aspects. These factors will be assessed further in exercise 2.


Spatial variation of economic losses resulting from a joint earthquake/tsunami event: An application to Seaside, OR


What is the spatial variability of economic value of buildings as well as losses resulting from a joint earthquake/tsunami event? How does this spatial variability relate to independent earthquake and tsunami events, as well as the intensity of the hazard?

The purpose of this analysis was to consider the spatial variability of initial economic value, as well as economic losses resulting from a joint earthquake/tsunami event. The losses were deaggregated by hazard (earthquake only, tsunami only, joint earthquake/tsunami), as well as intensity of the event (100-year, 250-year, etc.).

Tool and approach

Two methods were implemented to evaluate the spatial variability economic value and losses: (1) interpolation via kernel density, and (2) a hotspot analysis using the Getis-Ord Gi* statistic. The economic losses were determined using a probabilistic earthquake/tsunami damage and loss model. This model implements Monte-Carlo methods to estimate the expected economic losses following an earthquake/tsunami event. For this application, 10,000 simulations were ran, from which the average loss at each building was computed for earthquake only, tsunami only, and a joint earthquake/tsunami event.

Description of steps

The average losses at each building resulting from the earthquake/tsunami loss model were added as an attribute to a GIS shapefile. Two methods to evaluate the spatial distribution were considered:

  1. Interpolation via kernel density: Spatial interpolation was performed using a kernel density estimate. A kernel with a size proportional to the value of the attribute under consideration (in this case economic value/loss) is placed at each data point. A map is then created by taking the sum of all kernels. The kernel radius and shape can vary to produce different results. In this analysis, a quartic kernel was utilized with a radius of 200 meters. The interpolation was performed using the built in interpolation feature in QGIS 3.
  2. Hotspot analysis using the Getis-Ord Gi* statistic: Hotspot analysis was performed using the Getis-Ord Gi* statistic. This statistic results in a p-value and z-score at each attribute, providing insight into whether the null-hypothesis can be rejected (in this case, spatial randomness). As such, features with a small p-value and a very large (or very small) z-score indicate that the null can be rejected (or that the data is not spatially random). Consequently, applied across an entire spatial dataset, the hotspot analysis identifies statistically significant clusters of high (or low) values. The hotspot analysis was performed using the available Hotspot Analysis plugin for QGIS 3.

Description of results

The results of the analysis are shown in Figure 1. The columns correspond to interpolation and hotspot analysis respectively. The first row shows the building values, whereas the second shows the economic losses resulting from a joint earthquake/tsunami event (2,500-year return period).

Areas of high economic value and losses can be easily observed from the interpolation analysis. Here, areas of red correspond to larger damages. Similarly, statistically significant clusters of large (and small) damages can be observed from the hotspot analysis. Again, red corresponds to a statistically significant hot spot (e.g. a cluster of large values and losses), whereas blue corresponds to a statistically significant cold spot (e.g. a cluster of small economic values and losses).

A large concentration of economic value is centrally located along the coast, and is due to the presence of resorts and condominium complexes. This area is observed from both the interpolation and hotspot analysis. Interestingly, more clusters are observed from the hotspot analysis as opposed to the interpolation. This could be explained by the scaling of the interpolation. In this case, the red regions correspond to a maximum value of $20M. If this value was reduced by half, more areas of high concentration would be observed.

The hotspot analysis provides insight into statistically significant clusters of high and low values, as opposed to single points of high values; however, when comparing interpolation and hotspot analysis, it should not be neglected that the results of the latter are visually more difficult to observe. This is due to the discrete nature of the Getis-Ord Gi* statistic (e.g. each point corresponds to a p-value and z-score, as opposed to the continuous surfaces of interpolation). This results in polygons that are shaded according to confidence levels.

Figure 1: Comparison of interpolation and hotspot analysis for both initial building value and economic losses

In addition to the initial value and economic losses resulting from the 2,500-year earthquake/tsunami event, interpolated maps were deaggregated based on hazard (earthquake only, tsunami only, combined) as well as intensity of the event (return years 100, 250, 500, 1,000, and 2,500). The results are shown in Figure 2, where each row corresponds to the hazard, and each column to the intensity of the event. Furthermore, the total economic losses to all buildings in Seaside were determined based on hazard and intensity (Figure 3).

Figure 2 shows that the economic losses are spatially consistent across Seaside for the 100-year event, and begin to exhibit spatial variability as the intensity increases. Losses begin to accumulate for the 250-year event near the center of Seaside, and it can be seen that the earthquake is the primary driving force. Similar trends are observed for the 500-year event. The 1000- and 2500-year events begin to see significant tsunami losses that are not as spatially concentrated as the earthquake losses, but are more evenly distributed along the coast. Figure 3 shows that the tsunami losses begin to dominate for the 1000-year event.

Figure 2: Earthquake and tsunami hazard deaggregated by hazard and intensity

Figure 3: Total earthquake and tsunami damages across Seaside, OR


Both the interpolation and hotspot analyses have limitations. As previously mentioned the hotspot analysis can be aesthetically challenging. Additionally, difficulties may arise in communicating the confidence levels to community planners and resource managers who may not have a statistical background.

Similarly, spatial interpolation via kernel density has its own limitations. As there are subjective options when performing the interpolation and viewing the results (e.g. radius, color scheme, and maximum values), the resulting maps could easily be deceiving. Figure 4 shows the same data but use of a different radius to define the kernel. It can be seen that the map on the right appears more severe than the map on the left. The practicality of a spatial interpolation map ultimately depends on the GIS analyst.

Figure 4: Comparison of interpolation resulting from different kernel radii.

Courtney’s Ex 1: Spatial patterns of ion concentrations in groundwater

1: Question being asked

How do ion and isotope concentrations vary spatially? And what trends in concentrations account for variability between my samples?

For detailed background information, I point you towards my problem statement for this class. In short, I’m hoping to use geochemical and isotope values for wells 32 well I sampled last summer in order to better understand how groundwater flows across faults in the Walla Walla Subbasin in northeastern Oregon. In the analysis for this post, I only used a subset of 18 wells for which I had the largest suite of measured parameters.

2: Approach used

For this exercise, I’m investigating spatial patterns of my ion and isotope concentrations. I could do this by individually examining distributions of all 19 of the parameters that I sampled for… but that would be graphically overwhelming and not terribly useful. It’s too difficult to draw conclusions by comparing 19 different maps of distributions.  Instead, I’m using a method called principal component analysis (PCA for short) to statistically evaluate my samples by the groups of parameters that account for large amounts of the variation that differentiates one sample from another.

The nuts and bolts of PCA involve sophisticated analysis that calculates eigenvectors, linear algebra, and n-dimensional properties of a data set, and I will let people more knowledgeable than me explain this if you’re curious about how exactly PCA works.

R spits out two kinds of useful graphic output for PCA based on the concept of a “biplot”. The first is called a loading plot, where two principal components are graphed against each other, for example PC1 and PC2. The value for each parameter in PC1 and PC2 is used to give the point a cartesian coordinate which defines a vector in comparison to the origin. The second kind of output is called a scree plot, which applies the same concept to the sample points (called “individuals” in the code) instead of to the  parameters (called “variables” in the code). I used an R package called ggbiplot to combined the scree and loading plots in one figure.

3: Methods

I used R to calculate the principal component vectors for my data and to create charts for them, and then symbolized my data in ArcGIS Pro to see if there was a spatial pattern to the information in the PCA output. The code in R is fairly manageable. I’ve included it below with my comments.

# Source:

# Preliminary set up stuff – add libraries, set working directory



setwd(“~/_Oregon State University/Thesis/Summer Fieldwork/PCA”)

# Before importing your data, make sure that every entry in its table has a numeric entry in it.

# This process will spit out garbage if you have any blank cells or text.

# The number of rows in the data set needs to exceed the number of columns. I used an argument like [,c(1:11)] to enforce this in my data

#Process 1 of 2: read data set 1 into R, view it, compute PCA, display summaries and biplots

# it’s important to set the row names so we can use them as labels in ggbiplot

PCAdata.ions2 <- read.csv(“ions_allchem20190401_usgsdwnld.csv”, row.names = 1)


Ions2.pca <- prcomp(PCAdata.ions2[,c(1:11)], center = TRUE, scale. = TRUE)


ggbiplot(Ions2.pca, choices=c(1,2), labels=rownames(PCAdata.ions2))

ggbiplot(Ions2.pca, choices=c(1,3), labels=rownames(PCAdata.ions2))

#read data set 2 into R, view it, compute PCA, display summaries and biplots

PCAdata.nonions <-read.csv(“NonIons_allchem20190401_usgsdwnld.csv”, row.names = 1)


NonIons.pca <- prcomp(PCAdata.nonions[,c(1:8)],center = TRUE, scale. = TRUE)


ggbiplot(NonIons.pca, labels=rownames(PCAdata.nonions))

ggbiplot(NonIons.pca, choices=c(1,3), labels=rownames(PCAdata.nonions))

# if you want plots other than PC1 vs PC2 or PC1 vs PC3 just change the argument “choices = (1,2)”

# Process 2 of 2 add tools to get individual coordinates

# Source

# only install the package if you haven’t before. You will need to have RTools 3.5 already installed

# install.packages(“devtools”)

# library(“devtools”)

# install_github(“kassambara/factoextra”)

# now run things, it will spit out tab-delimited data. The “ind” command outputs PC information for the individuals, and the “var” command outputs the PC information for the variables.


ind <- get_pca_ind(Ions2.pca)


ind2 <- get_pca_ind(NonIons.pca)


# copy that info into text files, import them into csv format, join in Arc!

var1 <- get_pca_var(Ions2.pca)


var2< get_pca_var(NonIons.pca)


Next, I symbolized my data in ArcGIS Pro. Because I only had 18 samples, it wasn’t too onerous to use selection in the attribute table to export data to another layer to symbolize the groups I saw in the principle component analysis biplot.

To examine how the principle components affected individual points, I used the second process in the above code to generate a tab-delimited chart of the principal component loadings for each sample. I copied this into a .txt file, imported it into Excel to create a CSV, and then joined that CSV to my shapefile of well locations. For the variables’ coordinates, I likewise created .txt files and imported the into Excel, but I symbolized them in Excel.

4: Results

In this section, I’ll show the graphs and maps that I created for the ion section of my data, and give an explanation of what I learned from this analysis.

For my ion data, the first two out of the 11 principle components explain about 80% of the variation in my data.

The first one, PCI, explains 55% of the variation in my data set and is shown on the X axis of the graph below. It is strongly influenced by concentrations of calcium, magnesium, chloride, alkalinity, sulfate, bromide, and sodium, which all have eigenvalues on the X axis greater than |1|. Comparing the scree plot to the loading plot, wells 56287, 3074, 4179, and 4167 are the primary individual drivers of this PC.

PC2 explains another 25% of the variance in my data, and is shown on the Y axis. Fluoride, silicon, and manganese are the primary variable drivers of variation between the individuals in PC2.

CvS PC1 PC2 ions biplot with groups

Based on the coordinate information for the variables, the variation explained by PC1 is due to increased levels of calcium, magnesium, potassium, sodium, alkalinity, bromide, chloride, and sulfate at four wells. This combination of ions all being similar could be explained by recent recharge influenced by agricultural chemicals, such as fertilizers. In an undisturbed basalt setting though time, calcium and magnesium tend to decrease as sodium and potassium increase. The fact the these four parameters vary together in PCI indicates that a process other than ion replacement is driving ion concentration. The variation in PC2 is predominantly explained by fluoride, dissolved silica, and manganese. Elevated concentrations of these three variables in comparison to the other variables indicates that the groundwater has spent more time in the basalt.In PC2, calcium and magnesium are inversely related to sodium, indicating that the cation concentrations are driven by the ion replacement process.

While the figure above locates wells based on their grouping on a biplot comparing two principle components, the following figures show the wells color-coded by their loadings in PC1 and in PC2. I’m hypothesizing that wells with smaller PC1 values are influenced by agricultural recharge, and wells with more positive values on PC2 are more influenced by the processes of ion dissolution as water travels through the basalt.

color-coded map of PC1 values for the 18 wells

PC1 is heavily driven by four particular wells, which show up here as being in colors of blue and blue-grey.

The distribution of PC2 values shows a pattern that I find really interesting. More negative – bluer – values indicate samples which were less influenced by the ion replacement and dissolution processes of groundwater traveling through the basalt. The values at high elevation are negative, which makes sense because they’re closer to the upland recharge zone, but so do some samples in the lowlands to the north which are downgradient of certain faults. This could indicate that those PC2 negative downgradient wells are cut off from the longer flowpaths of groundwater by faults that act as barriers to groundwater flow. The pinker – more positive – points indicate samples that were more influenced by the along-flowpath ion reactions.

Next, I present the PCA results for the non-ion parameters. Unlike the “ions” set whose title was somewhat self-explanatory, this PCA brought together different types of chemical and physical properties of water. These include pH, temperature, and specific conductivity which were measured in the field,  the analyzed values of hydrogen and oxygen isotopes as well as tritium, the elevation of the bottom of the well, and the cation ratio. The cation ratio is a standard calculated parameter for groundwater chemistry which is equal to ([Na] + [K]) / ([Ca] + [Mg]). As water spends more time in contact with the rock, its ion exchange with the rock leads to a smaller cation ratio. The first biplot graphs PC1 (~40% of variation) on the X  axis against PC2 (22% of variation) on the Y axis, and the second biplot graphs PC1 on the X axis versus PC3 (~18% of variation) on the Y axis.

The PC1 grouping explains variance among the samples by inversely correlating tritium, deuterium (d2H) and oxygen-18 (d18O) with pH and the cation ratio.  More positive PC1 values show that the individual has high d18O and d2H values, and low cation ratio and pH values. All d18O and d2H values are negative; values that are closer to 0 (“larger: for the purposes of this plot) are more enriched in the heavier isotope. PC2 is predominantly driven by an inverse correlation between well bottom elevation and temperature. PC3 inversely correlates tritium, specific conductance, and well bottom elevation with temperature, deuterium, and 18-oxygen.

Yellow/orange points on this map have more positive PC1 loadings. They have higher d18O and d2H values, and lower cation ratio and pH values. The lower cation ratio and pH values indicate that the sample is less chemically evolved based on the expected ion exchange reactions in the basalt. I found it interested that samples on that end of the spectrum were also more enriched in the heavier isotopes of hydrogen and oxygen.

Unfortunately I can’t seem to upload the maps for PC2 and PC3 for non-ions, I’m getting a message that all of my allowance for image uploads has been used up…

5: Critique

I’m pretty happy with what I put together using principal component analysis. In a perfect world I would have learned to do the spatial visualization elegantly in R instead of messing around with exporting data to Arc, and made my variable charts in R instead of exporting them to excel. However for the purposes of this project the less elegant method was quicker for me.

Exercise 1: Ventenata spatial clustering

Question Asked

I am interested in understanding the invasion potential of the recently introduced annual grass ventenata (Ventenata dubia) across eastern Oregon. Here I ask, what is the spatial pattern of the ventenata invasion across the Blue Mountains Ecoregion of eastern Oregon?

Tools and Approaches Used

To address this question, I (1) tested for spatial correlation at various distances using Moran’s I spatial autocorrelation coefficients plotted with a correlogram, and (2) performed hot-spot analysis (Getis-Ord Gi) to identify statistically significant clusters of areas with high and low ventenata cover.

Description of Analysis Steps

1a) Moran’s I: To compute Moran’s I spatial autocorrelation coefficient for all of my sample units, I used the “ape” package in R version 3.5.1. The first step to this analysis was to convert the ventenata data and associated coordinates into a distance matrix. Once the distance matrix was created, the Moran.I function computed the observed and expected spatial autocorrelation coefficients for the variable of interest (ventenata abundance). The function produces a test statistic that tests the null hypothesis of no correlation. See Gittleman and Kot (1990) for details on how the Moran.I function calculates Moran’s I statistics.

1b) Correlogram: I plotted a correlogram using Moran’s I coefficients with increasing distances (lags) to examine patterns of spatial autocorrelation in my data. I used the correlog function in the spdep package in R to plot a correlogram with lag intervals of 10,000m. The function has the option of randomly resampling the data at each increment to incorporate statistical significance. This randomization tests the null hypothesis of no autocorrelation. I ran the function with resamp = 100. Black points on the correlogram are indicative of Moran’s I values significantly larger or smaller than expected under the null hypothesis.

2) Hot Spot Analysis: I used the hot spot analysis (Getis-Ord Gi*) tool in Arc GIS to identify statistically significant clusters of areas with high and low ventenata cover across my study area. The tool produces z-scores and p-values that test the null hypothesis of a random distribution of high and low values rather than clusters of high or low values. High z-scores indicate clusters of high values and low z-scores indicate clusters of low values. Low p-values indicate that these clusters are more pronounced than would be expected by chance.


1a) Moran’s I: The Moran’s I spatial autocorrelation coefficient estimate for all of the points across the entire sample area was 0.3 ± 0.05 (p < 0.3). This value is not particularly informative, as it only indicates that the data is positive spatially autocorrelated, but does not provide information to describe the spatial pattern. I chose to follow the Moran’s I up with a correlogram to uncover the spatial pattern driving the autocorrelation.

1b) Correlogram: The Moran’s I spatial correlogram shows a general trend of decreasing autocorrelation from 0 to about 70,000m where sudden jumps in Moran’s I values occur to up to ~0.3. Following this jump, the correlation decreases to -0.5 to -0.2 between 120,000 and 152,000m, then increases to ~0.3 at 170,000m, decreases to almost -1.0 just after 200,000m, and finally increases to almost 1 at 220,000m. The general trend appears to be decreasing from 0.2 to -0.9 at 220,000m with some high peaks interspersed. These high and low peaks indicate distinct ventenata patches distributed throughout the study area, suggesting a clustered spatial pattern of the ventenata invasion. The extreme high and low values at distances over 200,000 are likely a result of the few sample units being compared at these distances, thus these are not so informative of the overall spatial pattern.

2) Hot Spot Analysis: Hot spot analysis in ArcGIS depicted clusters ranging from high ventenata cover (large red circles) to low ventenata cover (small blue circles) across my study area (Fig. 2) using the calculated z-scores and p-values for each sample unit. The resulting map shows distinct clusters of high, low, and moderate ventenata cover distributed across seven sampled burn perimeters (displayed in light orange). The highest cover clusters are all located within the Ochoco and Aldrich Mountains in the center of the study region. The fires on the perimeters of the region exhibited clusters of low to no ventenata cover.

Critique of Methods Used

When run on all of the data across the entire region, Moran’s I did not produce a useful statistic, indicating only if the data was spatial autocorrelation without indicating a spatial pattern. However, when visualized with a correlogram at varying distances, the correlation coefficients suddenly told a story of spatial clustering. The results from the hot spot analysis reinforce the findings from the correlogram by clearing depicting clusters on a map of the study area. The hot spot analysis further explores these results by mapping the clusters of high and low ventenata cover on top of each of my sample units, providing a useful visualization of exactly where the clusters of high and low cover fall across the region.


Gittleman, J. L. and Kot, M. (1990) Adaptation: statistics and a null model for estimating phylogenetic effects. Systematic Zoology39, 227–241.


Exercise 1: What is the spatial pattern of western hemlock dwarf mistletoe at the Wolf Rock reference stand?

For Exercise 1, I wanted to analyze the spatial pattern of western hemlock dwarf mistletoe infections in live western hemlocks on my 2.2 ha reference stand (Wolf Rock). This was without considering any attributes of the western hemlock trees themselves. Simply, what was the spatial pattern of infection?

To answer this I used the “Average Nearest Neighbor” tool in the Spatial Statistics toolbox in ArcMap. This tool calculates a z-score and a p-value from that z distribution. This is a commonly used method in dwarf mistletoe literature for assessing the clustering of infection centers. Also, the equations for this tool assume that points are free to locate wherever in space and that there are no barriers to spread.

ArcMap makes running these analyses very simple so I created a selection of infected trees (red dots), created a new feature, and then ran the tool. The p-value from my test was 0.097 and my Nearest Neighbor Index was 0.970, indicating that the spatial pattern of the infections are somewhat clustered with an alpha of 0.10.

Average Nearest Neighbor is a good test for analyzing whether or not a set of coordinates are clustered. The degree of clustering of may be harder to interpret as a lower p-value may not necessarily mean points are more clustered. Also I was unable to see where my clusters are, and if my intuitions match the analysis (see map). One other important consideration is the study area. Changes in analysis area can drastically change the result of your clustering analysis (i.e. larger study areas may make data look more clustered). Lastly, there was no option for edge correction. This may have skewed some of the clustering results along the edge of my study site and 2.2 ha is pretty small to be subsampled without losing a lot of my data.


After confirming that my infections were clustered, I wanted to see if the pattern I saw in my map, was actually on the ground. I wanted to know, where are infected trees clustered with infected trees and where are uninfected trees clustered with uninfected trees? Again, this was without considering any attributes of the western hemlock trees themselves.

I used the “Optimized Hot Spot Analysis” tool in the Mapping Clusters toolbox to analyze the incidence of infection data (0 = absence, and 1 = presence). The Optimized Hot Spot Analysis tool can automatically aggregate incidence data that are normally not appropriate for hot spot analysis. It also calculates several other metrics for me that made analysis easy. I could take these automatically calculated metrics and alter them in a regular hot spot analysis if needed.

This map displays clustering that matched up closely with my intuitions from Map 1. On the left, the blue values show a cluster of uninfected trees that are closely clustered with other uninfected trees. The larger swath on the right show a cluster of trees that are closely clustered with other infected trees. In the middle a mix of uninfected trees and infected trees are mixed without displaying any significant clustering. Lastly, small clusters in the top left and bottom left of infected trees were identified. These clusters may be edge of larger clusters outside my stand, or lightly infected trees that are starting a new infection center. These results will be extremely valuable in informing my steps for Exercise 2 because I can assess the conditions of both patches and determine differences between the two. I can also determine if distance to the refugia impact the clustering of infection because it appears the infected cluster is closer to the fire refugia.

The hot spot analysis was extremely useful for analyzing and displaying the information I needed about the clustering and was very useful for building off of the Average Nearest Neighbor analysis.

My data set also included a severity rating for dwarf mistletoe infected western hemlocks in my study site. I ran a similar hot spot analysis to above to determine if there were any similarities with how severity played out in the stand compared to solely incidence data. My data ranged from 0 – 5, 0 indicating uninfected trees and 5 indicating most heavily infected. These are classified data, not continuous but still appropriate for the optimized hot spot analysis. Western hemlock dwarf mistletoe forms infection centers, starting from a residual infected western hemlock that survived some disturbance. From there the infection spreads outwards. Another facet of infection centers is that the most heavily infected trees are almost always aggregated in the center of the infection center and infection severity decreases as you move towards the outside of the infection center. This is intuitive when you think about infected trees in terms of the time they’ve been exposed to a dwarf mistletoe seed rain: the trees in the center of the infection center likely have been exposed to infectious seed the longest. These trees can be rated using a severity rating system that essentially determines the proportion of tree crown infected. This is calculated in a way that gives a rating that is easily interpretable, in this case, 0-5.

This third map tells me about how severity is aggregated in the stand. I can see that the wide swath in the middle of the stand, associated with the fire refugia, has the largest aggregation of severely infected trees. This is what I expected in the stand because the trees in the fire refugia survived the fire and provide an infectious seed source for the post-fire regeneration. Also, on the edges of this high severity cluster, are lower severity values indicating the expected pattern of infection centers are playing out. The west side of the stand shows a large clustering of low severity ratings. We can see that the high density of uninfected trees, falls into our cold spot of low or no severity. Interestingly, the hot spot of trees found previously  in the southwest corner, is actually a cluster of low severity trees. This may be a new infection center forming or an exterior edge of another infection center outside the plot.  Lastly, the two pockets of low severity on the east side of the stand are more distinct when considering their severity.

This second application of hot spot analysis tells another story about my data and how dwarf mistletoe is patterned spatially. The non-significant swath in the center of my stand using the incidence data turns out to be a significant clustering of highly infected trees among other new observations.


Exercise 1: The Variogram


How does the variance between the depths to the first lava flow in my filed area vary with increasing distance?

The Tool: Variogram

I used a variogram to analyze the variance within my dataset. Variograms are discrete functions calculated using to measure the average correlation between pairs of measurements at various distance (Cameron and Hunter 2002). These distances are referred to as the binned distances (Anselin, 2016). In this study, binned distances determine the distances by which the depth to first lava flow is autocorrelated (Aneslin, 2016).

variog(geodata,coords = geodata$coords,data = geodata$data,max.dist=”number”) (R documentation)

The R code above need the geodata, an array of the data you are testing, the cords, or the coordinates those data correspond too.

Brief Methodology:

I selected data based on the quality of the descriptions, in the well log and assigned each well log I have interacted with a value from 0 to 3. Data with a score 0 represents either a dry well or a destroyed well. Data that has a score of 3 is well described and denotes either the depth to first water or the standing water level. I used data with a score of 3 for this analysis. I denoted the depth to the first lava flow in each of the well logs.

Figure 1: My field area, the well log locations are the blue circles, a rough delineation of my field area is in while.

First I normalized my data, giving a mean of 0 and a standard deviation of 1. Then I determined a max distribution, which determines the max bin size the variogram uses. The max distribution determines the lag increment, or the distances between which the increment is calculated (Anselin, 2016).

The data are projected, meaning that their horizontal distance is measured in meters. However, they are recorded in decimal degrees and located at the center of the township and range section that they are located in. Rather than convert my data from decimal degrees to meters, I recognized that there are around 111 km in 1 degree (at the equator), that there are 0.6214 miles in a kilometer and that there are 6 miles in a township. This helped me determine the max distribution for the variog function in R. I decided upon a max distributions of 0.09 and 0.5 degrees.
I then used R’s variog function on the normalized data with the max distribution of 0.09 and 0.5 degrees.

Succinct Results:

Figure 2: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.09 degrees. Note the low semivariance with the lower bin size (.02 to 0.08 degrees).

Figure 3: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.5 degrees. Note the low saw-toothed pattern of semivariance. From 0.01 to 0.08, semivariance is low, it spikes up, and lowers again around 0.2, spikes again at 0.3 degrees and lowers again at 0.4.


I tested max bin sizes of 0.5 and 0.09 degrees to see how the variogram changes with an increasing bin size. Changing the max bin size, called the max distribution in R, changes the variogram. The smaller bin size, 0.09 degrees, limits the max bin size to a narrow range of values. In effect, the variogram only tests the covariance of data points that are separated by a maximum .09 or degrees. Increasing the bin size increases the distance around which the covariance of the data points are tested. Thus, mad distributions of 0.5 result in a spikey plot. Normally, one would expect the variogram to plateau at larger bin sizes, representing large variance with the data with larger distances, but figure 2 does not.

At its simplest, the variogram in figures 1 implies that data points that are correlated in space are more likely to have similar depths to the first lava flow. You can see this in the low variance in you find in locations that are close to each other, the smaller distances, and the higher variance in distances that are farther from each other.

Figure 2, with the max distribution of 0.5 degrees one could made an argument for a distribution of the locations of lava flows based on the locations of volcanic centers. Medicine Lake Volcano lies in the North of the study area and Lassen Peak lies in the south. The two spikes in variance with location might be linked to the distances between those two centers. In other words, distances that correspond to low values of semivariance (>1) correspond to either regions that lie on the same lava flow, or another near surface lava flow sourced from another volcanic center in the region (figure 3). Rather than finding the same lava flow at depth, we are seeing different lava flows at similar depths.

Figure 4: My field area with the rough delineations of two of the near surface lava flows in the region. 1 degree of longitude corresponds to roughly 111 km.

Lava flows are not attracted or repulsed to or from each other, but they do follow the laws of physics. Often, Volcanoes build topography, when lava erupts from them, the lava will flow from the high elevations of the volcano, to basins, using paleo-valleys as conduits for flow. Thus, if you know the paleo topography you can understand where a lava might have flowed, and where it might emplace. Predicting paleo-topography can be difficult in old volcanic regimes, but on the geologic timescales we are looking at, I can predict that the topography of the MLV-LP-BVM triangle has not changed much over the past 5 ma.

Lavas flows from high to low topography and from Volcanos. The two main volcanos in my field area are Lassen Peak in the South and Medicine Lake Volcano, in the North. Moreover, lava flows from high to low elevation. Lavas emplace in basin, if they sit long enough, the top soils form, the basin subsides, and another the cycle repeats.

My data does have different spatial patterns at different scales. If you look at regions that are within 0.1 of a degree of distance then you would expect to see a similar depth to the lava flow. If you move to 0.5 of a degree, you see a sees-saw effect, where the depth to the lava flow moves from having lava close to the surface of deeper down. This variation stems from the proximity of the data point to the sources and the sinks of the lava flow.

I would use this technique again, it helped me think about my problem.

Cameron, K, and P Hunter. 2002. Using Spatial Models and Kriging Techniques to Optimize Long-Term Ground-Water Monitoring Networks: A Case Study. Environmetrics 13:629-59.

Anselin, Luc. “Variogram”. Copyright 2016. Powerpoint File.

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

Exercise 1: The Spatial Patterns of Natural Resource Governance Perceptions


What are the spatial patterns of natural resource governance perceptions in the Puget Sound?

Tools and Approaches

  1. Moran’s I (with correlograms) and Semivariograms in R studio
  2. Kriging and IDW in ArcGIS Pro
  3. Hotspot Analysis in ArcGIS Pro


Analysis Steps

  1. To compute Moran’s I, I used the “ape” library in R which has a function called Moran.I(). This function takes the variable in question (governance perceptions), and a distance matrix to compute the observed and expected values of Moran’s I, as well as the standard deviation and a p-value. For this analysis, I also subset my data to examine spatial autocorrelation by demographics including area (urban, suburban, rural), political ideology, life satisfaction, income, and cluster (created by running a cluster analysis on the seven variables which comprise the governance index).  I created correlograms for the variables that were significant (urban, conservative, and liberal) using the “ncf” library and the correlog() function. These figures give a better picture of spatial autocorrelation at various distances.  To create semivariograms, I used the “gstat” and “lattice” libraries which contain a function called variogram. This function takes the variable of interest along with latitude and longitude locations. The object created can then be plotted. For this analysis I used the same subsets as in the Moran’s I analysis.
  2. To preform interpolation on my data, I loaded my point data into ArcGIS Pro. I then used the Spatial Analysis toolbox to preform Kriging  and IDW to compare the outputs of the two techniques. I used my indexed variable of governance perceptions. The values of the variable vary from 1 to 7. I then also uploaded a shapefile bounding the sample area, as well as a shapefile of shoreline, to delineate my study area better.
  3. To run a hotspot analysis I used my previously loaded point data inArcGIS Pro. I then used the Spatial Analysis toolbox to preform ‘hotspot analysis.’ I used my indexed variable of governance perceptions with values from 1 to 7. I used the shapefile of shoreline to delineate my study area better.


  1. The Moran’s I calculation was insignificant for rural, suburban, cluster groups, life satisfaction, and income, suggesting no spatial autocorrelation of governance perceptions by these subsets.


The Moran’s I calculation was significant for urban:

Observed value: -0.014

P-value: 0.0002


The Moran’s I calculation was also significant for ideology:


Observed value: -0.006

P-value: 0.002


Observed value: -0.002

P-value: 0.05


This suggests that in these subsets there is spatial autocorrelation between individual governance perceptions.

The semivariograms for the subsets that are significantly spatially autocorrelated are presented below.


None of these plots suggest high degrees of spatial autocorrelation. The urban plot does so more than the ideology plots, but the y axis scale is still very small.









The plot (top Urban, bottom left Liberal, bottom right Conservative) help to confirm the findings from above. The Moran’s I fluctuates around zero without much variation. The large spike in variation that the graphs do show are only for non significant points. Significant points are filled in, where non-significant points are open circles.

2. Interpolation

The kriging (bottom left) with individual points and IDW (bottom right), do not look incredibly different in terms of general trends. The kirging with shoreline (top) gives possibly the most interesting visual of spatial patterns. In general, perceptions are better (more green) in the center, where there is greater shoreline. There are also two sections that appear much more negative. To examine these locations further, I preformed a hotspot analysis.

3.  Hotspot Analysis

This image confirms the two bright red spots from the interpolation to be “cold spots” or spots that the value of perception is significantly lower  than the average perception (neutral) at a 99% confidence. The orange dots are a 95% confidence. The green corridor appears to hold in the southern part of the Sound and is confirmed at a “hotspot” or a spot that the value of perception is significantly higher than the areas surrounding it at a 99% confidence level.

The three main areas of red or orange correspond to the cities of Shelton (bottom), Port Angeles (west), and Everett with a little of south Whidbey Island (east).

  1. Critique

I believe all methods are useful, but some are redundant. I think it would probably be sufficient to do only one of each type of method—spatial autocorrelation and interpolation—but it is interesting and more convincing to see the same type of analysis done in different ways. The p-values from the Moran’s I appear to agree with the shape of the curve’s in the semivariograms, where the smaller p-values have more defined shapes. The same goes for the interpolation methods, while they are interesting to see side-by-side, they essentially tell the same story. I think in this case, the hotspot analysis shows the most interesting interpretation of the data because it indicates areas of potential concern.

The Biogeography of Coastal Bottlenose Dolphins off of California, USA between 1981-2016


Common bottlenose dolphins (Tursiops truncatus), hereafter referred to as bottlenose dolphins, are long-lived, marine mammals that inhabit the coastal and offshore waters of the California Current Ecosystem. Because of their geographical diversity, bottlenose dolphins are divided into many different species and subspecies (Hoelzel, Potter, and Best 1998). Bottlenose dolphins exist in two distinct ecotypes off the west coast of the United States: a coastal (inshore) ecotype and an offshore (island) ecotype. The coastal ecotype inhabits nearshore waters, generally less than 1 km from shore, between Ensenada, Baja California, Mexico and San Francisco, California, USA (Bearzi 2005; Defran and Weller 1999). Less is known about the range of the offshore ecotype , which is broadly defined as more than 2 km offshore off the entire west coast of the USA (Carretta et al. 2016). Current population abundance estimates are 453 coastal individuals and 1,924 offshore individuals (Carretta et al. 2017). The offshore and coastal bottlenose dolphins off of California are genetically distinct (Wells and Scott 1990).

Both ecotypes breed in summer and calve the following summer, which may be thermoregulatory adaptation (Hanson and Defran 1993). These dolphins are crepuscular feeders that predominantly hunt prey in the early morning and late afternoon (Hanson and Defran 1993), which correlates to the movement patterns of their fish prey. Out of 25 prey fish species, surf perches and croakers make up nearly 25% of coastal T. truncatus diet (Hanson and Defran 1993). These fish, unlike T. truncatus, are not federally protected, and neither are their habitats. Therefore, major threats to dolphins and their prey species include habitat degradation, overfishing, and harmful algal blooms (McCabe et al. 2010).

This project aims to better understand that distribution of coastal bottlenose dolphins in the waters off of California, specifically in relation to distance from shore, and how that distance has changed over time.


This part of the overarching project focuses on understanding the biogeography of coastal bottlenose dolphins. Later stages in the project will require the addition of offshore bottlenose sightings to compare population habitats.

Beginning in 1981, georeferenced sighting data of coastal bottlenose dolphin off the California, USA coast were collected by R.H. Defran and team. The data were provided in the datum, NAD 1983. Small boats less than 10 meters in length were used to collect the majority of the field data, including GPS points, photographs, and biopsy samples. These surveys followed similar tracklines with a specific start and end location, which will be used to calculate the sighting per unit effort. Over the next four decades, varying amounts of data were collected in six different regions (Fig. 1). Coastal T. truncatus sightings from 1981-2015 parallel much of the California land mass, concentrating in specific areas (Fig. 2). Many of the sightings are clustered nearby larger cities due to logistics of port locations. The greater number of coastal dolphin sightings is due to the bias in effort toward proximity to shore and longer study period. All samples were collected under a NOAA-NMFS permit.Additional data required will likely be sourced from publicly-available, long-term data collections, such as ERDDAP or MODIS.

Distance from shore will be calculated in a program such as ArcGIS or R package. These data will be used later in the project to compare to additional static, dynamic, and long-term environmental drivers. These factors will be tested as possible layers to add in mapping and finally estimating population distribution patterns of the dolphins.

Figure 1. Breakdown of coastal bottlenose dolphin sightings by decade. Image source: Alexa Kownacki.













I predict that the coastal bottlenose dolphins will be associated with different bathymetry patterns and appear clustered based on a depth profile via mechanisms such as prey distribution and abundance, nutrient plumes, and predator avoidance.


My objective is to first find a bathymetric layer that covers the coast of the entirety of California, USA to import into ArcMap 10.6. Then I need to interpolate the data to create a smooth surface. Then, I can add my dolphin sighting points and create a way to associate each point with a depth. These depth and point data would be exported to R for further analysis. Once I have extracted these data, I can run a KS-test to compare the shape of distribution based on two different factors, such as points from El Niño years versus La Niña years to see if there is a difference in average sighting depth or more common sighting depths based on the climatic patterns. I am also interested in using the spatial statistic analysis tool, Moran’s I, to see if the sightings are clustered. If so, I would run a cluster analysis to see if the sightings are clustered by depth. If not, then maybe there are other drivers that I can test, such as distance from shore, upwelling index values, or sea surface temperature. Additionally, these patterns would be analyzed over different time scales, such as monthly, seasonally, or decadally.

Expected Outcome:

Ideally, I would produce multiple maps from ArcGIS representing different spatial scales at defined increments, such as by month (all Januaries, all Februaries, etc.), by year or binned time increment (i.e. 1981-1989, 1990-1999), and also potentially grouping based on El Niño or La Niña year. Different symbologies would represent coastal dolphin sightings distances from shore. The maps would visually display seafloor depths in a color spectrum by 10 meter difference. Because the coastlines of California vary in terms of depth profiles, I would expect there to be clusters of sightings at different distances from shore, but similar depth profiles if my hypothesis is true. Also, data with the quantified values of seafloor depth would be associated with each data point (dolphin sighting) for further analysis in R.


This project draws upon decades of rich spatiotemporal and biological information of two neighboring long-lived cetacean populations that inhabit contrasting coastal and offshore waters of the California Bight. The coastal ecotype has a strong, positive relationship with distance to shore, in that it is usually sighted within five kilometers, and therefore is in frequent contact with human-related activities. However, patterns of distances to shore over decades, related to habitat type and possibly linked to prey species distribution, or long-term environmental drivers, is largely unknown. By better understanding the distribution and biogeography of these marine mammals, managers can better mitigate the potential effects of humans on the dolphins and see where and when animals may be at higher risk of disturbance.


I have a moderate amount of experience in ArcMap from past coursework (GEOG 560 and 561), as well as practical applications and map-making. I have very little experience in Modelbuilder and Python-based GIS programming. I am becoming more familiar with the R program after two statistics courses and analyzing some of my own preliminary data. I am experienced in image processing in ACDSee, PhotoShop, ImageJ, and other analyses mainly from marine vertebrate data through NOAA Fisheries.

Literature Cited:

Bearzi, Maddalena. 2005. “Aspects of the Ecology and Behaviour of Bottlenose Dolphins (Tursiops Truncatus) in Santa Monica Bay, California.” Journal of Cetacean Research Managemente 7 (1): 75–83.

Carretta, James V., Kerri Danil, Susan J. Chivers, David W. Weller, David S. Janiger, Michelle Berman-Kowalewski, Keith M. Hernandez, et al. 2016. “Recovery Rates of Bottlenose Dolphin (Tursiops Truncatus) Carcasses Estimated from Stranding and Survival Rate Data.” Marine Mammal Science 32 (1): 349–62.

Carretta, James V, Karin A Forney, Erin M Oleson, David W Weller, Aimee R Lang, Jason Baker, Marcia M Muto, et al. 2017. “U.S. Pacific Marine Mammal Stock Assessments: 2016.” NOAA Technical Memorandum NMFS, no. June.

Defran, R. H., and David W Weller. 1999. “Occurrence , Distribution , Site Fidelity , and School Size of Bottlenose Dolphins ( Tursiops T R U N C a T U S ) Off San Diego , California.” Marine Mammal Science 15 (April): 366–80.

Hanson, Mark T, and R.H. Defran. 1993. “The Behavior and Feeding Ecology of the Pacific Coast Bottlenose Dolphin, Tursiops Truncatus.” Aquatic Mammals 19 (3): 127–42.

Hoelzel, A. R., C. W. Potter, and P. B. Best. 1998. “Genetic Differentiation between Parapatric ‘nearshore’ and ‘Offshore’ Populations of the Bottlenose Dolphin.” Proceedings of the Royal Society B: Biological Sciences 265 (1402): 1177–83.

McCabe, Elizabeth J.Berens, Damon P. Gannon, Nélio B. Barros, and Randall S. Wells. 2010. “Prey Selection by Resident Common Bottlenose Dolphins (Tursiops Truncatus) in Sarasota Bay, Florida.” Marine Biology 157 (5): 931–42.

Wells, Randall S., and Michael D. Scott. 1990. “Estimating Bottlenose Dolphin Population Parameters From Individual Identification and Capture-Release Techniques.” Report International Whaling Commission, no. 12.


Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki


My Spatial Problem: Streamflow variability and associated processes in rain-dominated, coastal basins

My research explores the variability in streamflow patterns in rain-dominated systems of the Pacific Northwest. Rain-dominated climate regimes occur primarily in the coastal portion of the PNW. Because precipitation only occurs as rain and does not occur in significant quantities during the summer season, underground storage is a crucial component of both the water cycle and streamflow stability in these systems. The objectives of my research are: first) to describe variations in stream hydrograph stability across multiple catchments and multiple catchment scales, and second) to use estimations of catchment storage processes to help explain potential variations in streamflow patterns.

I will be analyzing multiple datasets in order to meet my objectives. Those include: geophysical data, land-use/management data, and streamflow data. All landscape data will be analyzed at the finest spatial resolution available for the dataset. Hourly streamflow data will be analyzed for the available period of record, which varies by site. Second-fourth order streams in the Siletz and Smith river basins have hourly discharge data for the recent 5-8 years. USGS hydrological stations have similar data for 10-30 years.

I expect to see that streams in different hydrogeologic setting demonstrate different streamflow patterns over time. Streams located in more permeable, thicker lithosphere, may demonstrate more stable stream flows. In the winter, that may appear as muted storm peaks, while in the summer, that may appear as more sustained baseflows through the non-rainy season. Land cover may play an important role in streamflow regimes as well. The amount of water taken up and stored by vegetation may depend on the density, age, structure, and species composition of the forest. Watersheds with a large areas managed under industrial timber production may confound streamflow behavior.

I envision approaching my objectives by first developing some descriptive statistics for the hydrographs at each site. Such descriptors may include: recession analysis, dynamic storage, 7q10, arc peak, and various other streamflow analysis metrics with which I am not yet familiar. All sites are of varying contributing areas, so that will have to be taken into consideration in the analysis. Once descriptive statistics are developed across each site, I would like to integrate an analysis of the landscape attributes as potential explanatory variables for any variations observed in hydrograph statistics across space to see how much explanatory power they have, and how much variation there is.

The product of this project is expected to include both maps and statistical relationships. For maps, I would like to produce: 1) a depiction of the hydrogeologic settings across the study area, and 2) a depiction of expected streamflow patterns given the hydrogeologic settings. I would like to understand the statistical relationship between various streamflow metrics and the hydrogeologic setting of the given stream.

The results of this analysis may be important to resource managers and the scientific community as it will contribute information about hydrological processes, with a focus on the critical zone, in the PNW coastal landscapes. The persistence of streams and rivers in this region is crucial to several species of native salmonids, as well as to the economic well-being of the local communities. With potential variations in the climate, understanding underlying hydrological processes and the drivers of such processes may contribute to more proactive management approaches. Furthermore, an analysis that is relevant to the fine-scale processes that exist in the study area may contribute more accurate classifications of hydrologic regimes and analyses of streamflow vulnerability to long-term changes in the hydrological cycle.

As far as preparation for this project: I am well-versed in ArcGIS software. I have used Model Builder and Python some, though could use more practice. I am most comfortable with spatial and statistical analyses through R software. I have only used imagery analyses in a class, and I’d like to use it for my analysis so I am looking forward to learning more through this class.

A stain on the record? Have forest management practices set up PNW landscapes for a black-stain-filled future?

Describe the research question that you are exploring.

I am looking at how forest management practices influence the spread of black stain root disease (BSRD), a fungal root disease that affects Douglas-fir in the Pacific Northwest. While older trees become infected, BSRD primarily causes mortality in younger trees (< 30-35 years old). Management practices (e.g., thinning, harvest) attract insects that carry the disease and are associated with increased BSRD incidence. As forest management practices in the Pacific Northwest change to favor shorter-rotations of Douglas-fir monocultures, the distribution of Douglas-fir age classes is shifting towards younger stands and the frequency of harvest disturbance is increasing across the landscape. Though limited, our present understanding of this disease system indicates that these management trends, as well as the resulting disturbance regime and forest landscape age structure, may be creating favorable conditions for BSRD spread.

In this course, I would like to use spatial analyses to answer the question of whether forest management and the conditions that it creates act as a driver of the spread of black stain root disease. Specifically:

  • How do spatial patterns of forest management practices and the forest stand and landscape conditions that they create relate to spatial patterns of BSRD infection probabilities at the stand and landscape scale?
  • How do spatial patterns of forest management practices relate to landscape connectivity with respect to BSRD by affecting the area of susceptible forest and creating dispersal corridors and/or barriers throughout the landscape?

Example landscape with stands of three different forest management regimes (shades of green) and trees infected by black stain root disease (red). Forgive the 90s-esque graphics… NetLogo, the program I am using to develop and run my model, is powerful but old-school.

Describe the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I will be analyzing the raster outputs of a spatial model that I built in the agent-based modeling program NetLogo (Wilensky 1999). The rasters contain the states of forested landscapes (managed as individual stands) at a given time during the model run. Variables include tree age, presence/absence of trees, management regime, probability of infection, infection status (infected/not infected), and cause of infection (root transmission, vector transmission).

The forested landscapes I am looking at are about 3,000 to 4,000 ha, with each pixel representing a ~1.5 m x 1.5 m area that can occupied by one tree. I run each model for a 300-year time series with 1-year intervals, though raster outputs may be produced at 10-year intervals.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I hypothesize that landscapes with higher proportions of intensively managed, short-rotation stands will have higher probabilities of BSRD infection at the stand and landscape scales. In landscapes with high proportion of short-rotation stands, there will be large areas of suitable habitat for the pathogen and its vectors, frequent harvest that attracts disease vectors, and greater levels of connectivity for the spread of disease. In landscapes with a large proportion of older forests managed for conservation, I hypothesize that these forests will act as barriers to the spread of BSRD. High connectivity could be evidenced by greater landscape-scale dispersion of infections, whereas low connectivity would lead to a high degree of clustering of infections in the landscape.

I also hypothesize that intensively managed, short-rotation stands will have the highest probabilities of infection, followed by intensively managed, medium-rotation stands, and finally old-growth stands. However, I hypothesize that each stand’s probability of infection will depend not only on its own management but also on the management of neighboring stands and the broader landscape. At some threshold proportion of intensive management in the landscape, I hypothesize that there will be a shift in the scale of the drivers of infection, such that landscape-scale management patterns overtake stand-scale management as a predictor of infection probability.

Approaches: Describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to learn about landscape connectivity analyses and spatial statistics such as clustering/dispersion as well as spatiotemporal analyses to analyze the relationships between discrete disturbance events and disease spread. I would like to learn how to separate the effects of connectivity from the effect of the area of suitable pathogen habitat. I am most interested in using R or Python to analyze my data, and I would like to move away from ESRI programs because of my interest in open-source and free tools for science and the prohibitive cost of ESRI software licenses for independent researchers and organizations with limited financial means.

Expected outcome – What do you want to produce – Maps? Statistical relationships?

My primary interest is to evaluate statistical relationships between spatial patterns of management and disease measures, but I would also like to produce maps to demonstrate model inputs and outputs (i.e., figures for my thesis).

Significance – How is your spatial problem important to science? To resource managers?

From a scientific perspective, this research aims to contribute to the body of research examining relationships between spatial patterns and ecological processes and complex behaviors in ecological systems. This research will examine how the diversity of the landscape age structure and disturbance regimes affect the susceptibility of the landscape to disease, contributing to literature relating diversity and stability in ecological systems. In addition, “neighborhood” and “spillover” effects will be tested by analyzing stand-scale infection probability with respect to the infection probability of neighboring stands and more broadly in the landscape. Analysis of threshold responses to changes in stand- and landscape-scale management patterns and shifts in the scale of disease drivers will contribute to understanding of cross-scale system interactions and emergent properties in the field of complex systems science.

From an applied perspective, the goal of this research is to inform management practices and understand the potential threat of black stain root disease in the Pacific Northwest. This will be achieved by improving understanding of the drivers of BSRD spread at multuiple scales and highlighting priority areas for future research. This project is a first step towards identifying evidence-based, landscape-scale management strategies that could be taken to mitigate BSRD disease issues. In addition, the structure of this model provides a platform for looking at multi-scale interactions between forest management and spatial spread processes. Its use is not restricted to a specific region and could be adapted for other current and emerging disease issues.

Your level of preparation – How much experience do you have with: (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

Over the past 5 years, I have worked on and off with all the programs/platforms listed. For some, I have been formally trained, but for others, I have been largely self-taught. However, lack of continuous use has eroded my skills to some degree.

a. I have frequently used ArcInfo for making maps, visualizing data, and processing and analyzing spatial data. However, I do not have a lot of experience with spatial statistics in ArcInfo.

b. Modelbuilder/Python: Last spring, I took GEOG 562 and learned to program in Python, developing a script that used arcpy to prepare and manipulate spatial data for my final project. I felt comfortable programming in Python at that time, but I have not used Python much since the course.

c. I have frequently used R to clean and prepare data, perform simple statistical analyses (ANOVA, linear regression), and create plots. I have taken several workshops on using R for spatial analysis, but I have used rarely used the R packages I learned about outside of those workshops.

d. I have used ENVI to correct, patch, and combine satellite images, and I have performed supervised classifications to create land cover maps. I have worked primarily with LANDSAT images. I have also used CLASlite (an image processing software designed for classifying tropical forest cover).

e. Covered in part d.


Wilensky, U. (1999). NetLogo. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

Adam Bouché

A UAS and LiDAR based approach to maximizing forest aesthetics in a timber harvest

Bryan Begay

Research Question: 

Can LiDAR derived from an Unmanned Aerial System (UAS) create a point cloud driven visualization model for maximizing forest aesthetics in a highly visible timber harvest?


A variable retention thinning is planned to be implemented in a harvest unit on the McDonald-Dunn Forest in a visible area near Corvallis. UAS systems offer an efficient way to collect data over large areas to create high quality data sets from LiDAR that can capture the structure of a forest stand. There is a need  for a model/methodology that utilizes UAS LiDAR point clouds to generate a visualization model to create a  timber harvest in an areas with high visibility that maximize forest aesthetics. Inputs for the model include DTMs, Google Earth Pro view shed tool, and point clouds. The point clouds can be manipulated to visualize an optimal silvicultural prescription that maximizes forest landscape aesthetics. Ancillary data of view shed and terrain from DTMs are inputs expected to help create a visualization model.

A description of the data set you will be analyzing, including the spatial and temporal resolution and extent: 

The data set I will be using will include high resolution LiDAR point clouds of a stand, Digital Terrain Models (DTM) from LiDAR point clouds flown by the USFS previously, and additional ancillary data from Google Earth Pro. The Google Earth Pro data will use the view shed tool for assessing the visual impact of regions in the harvesting unit. The spatial resolution will be using high resolution LiDAR point clouds on an area that is a few square kilometers. The temporal resolution will span data acquisition before the harvest, and then an assessment of the computer based prescription after harvest. The temporal resolution of the point cloud collected from the UAV will be collected in a discrete time frame of one day. The DTM data set and google earth pro data sets will be variable, but I anticipate them to be newer high resolution Google Earth imagery and high resolution LiDAR data sets.


I hypothesizes that LiDAR point clouds can be used in a visualization model to create a silvicultural prescription in a timber harvest that maximizes forest aesthetics in a logged area . Google Earth Pro view shed tool, high quality LiDAR point clouds, and a large body of literature on forest aesthetics provide a data set that is very rich in inputs to create a visualization model for timber harvests that maximizes forest aesthetics.


I would like to do some sort of analysis looking at the spatial relationship between forest aesthetics and timber harvests. A part of this analysis would look at the relationship of the spatial pattern of residual structure left from the thinning and the landscape aesthetics.

Expected outcome:

I would like an expected outcome to be a visualization model of the harvest unit that utilizes view shed and point clouds that maximizes forest aesthetics in a high viewership area.


This spatial problem is important to the profession of forestry as well as other land managers, since it helps maintain the social license for foresters to practice forestry in areas that are highly visible. Public acceptance of harvesting practices is increased when forest aesthetics is taken into account, so creating a methodology and model to assist in creating silvicultural prescriptions that increase forest aesthetics is critical for public acceptance of forestry.

Level of preparation:

A. I have experience in ArcGIS.

B. No experience in modelbuilder and Python programming in GIS.

C. Some experience in R.

D. Experience in Digital Image Processing.

E. I’ve used Google Earth Engine and very little experience with MATLAB.

How winter wetland habitat change over time affects songbird communities

A description of the research question that you are exploring.

For my research, I am exploring the relationship between the spatial pattern of the differences between present (2019) and past (1995) wintering songbird community composition metrics (abundance, richness, evenness, and weighted rarity) and the spatial pattern of landscape-level land cover variable changes (listed here: Landscape Variables) in the same timeframe via the mechanisms of change over time and landscape-level variable importance to habitat suitability. I will be looking at data for 20 wetland wintering habitat sites in the Corvallis area.

I am interested in comparing the community composition metrics (abundance, richness, evenness, weighted rarity) of songbirds from 1995 to those found in 2019. I will then look at how the above-mentioned landscape level variables at these sites (within a 100m buffer, 500m buffer, and 1km buffer from the wetland) have changed from 1995 to 2019 at those same sites to determine if and what variable changes influence songbird community composition.

Other spatial factors likely influence songbird community composition metrics but I am only concerned with those that were included in Adamus’s 2002 dissertation study.

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I have species richness data in the form of a spreadsheet from 1995 including songbird abundance data for my 20 sites recorded via point count surveys from January 4th to March 20th. I have the same information collected by the same methods at those same sites from January 4th 2019 to March 20th 2019. I will use this data to calculate the species metrics (listed above).

I am still in the process of gathering my spatial datasets for this project. As of now, I have open source areal imagery from Google Earth Engine that my advisor used to analyze the sites in 1995 and that same areal imagery from 2019. I hope to locate LIDAR data, NVI data, and more sophisticated ground cover data for my sites in this class. One of the reasons I enrolled in this class was to get ideas and aid with obtaining this data.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I predict that the greater the change in the landscape level variables at a site from 1995 to 2019 the greater the difference in community composition measurements between those years via the landscape level variables influence on habitat suitability for wintering songbirds. Additionally, I think the changes in the variables that my advisor determined to be most influential to wintering songbird community composition metrics at these sites in 1995 will have the greatest effect on the change in species measurements from 1995 to 2019. For example, he found that wetlands with a higher percentage of open canopy forest cover in the surrounding area had a positive correlation with high abundance (Adamus, 2002) at a site so I hypothesis that those sites that have lost the most open canopy forest will also have the greatest decrease in abundance.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to learn how to use LIDAR and/or NVI data to determine ground cover in regards to my advisors’ categories (attached). I would also like to explore methods for comparing the amount of change from 1995 to 2019 at my sites that is suitable to my data and my purposes.
Expected outcome: what do you want to produce — maps? statistical relationships? other?
I expect to have a spreadsheet with quantified variable change as well as species richness measurements in order to reevaluate variable importance as well as the statistical relationship between landscape level variable differences and species richness differences. As an intermediate step, I will also produce a map(s) portraying the change at my sites from 1995 to 2019.

Significance. How is your spatial problem important to science? to resource managers?

The quality of wintering habitat is correlated with overall survivability and reproductive success for songbirds the following year (Norris et al. 2004). It is important to know how these habitats have changed as well as the consequences of those changes in regards to songbird community metrics. Therefore, it is extremely important for both science and resource managers. If we want to assure that our environment remains healthy and balanced with stable songbird communities it is this work and work like it is necessary. It is also important to those who wish to manage songbird populations so they know where to allocate resources when it comes to habitat variables to preserve.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

a. I am in between proficient and having a working knowledge of the basics of Arc-Info from taking various courses and teaching the basics of Arc-Pro.b. I have a working knowledge with modelbuilder and GIS programming in Python because I took courses on both subjects and now help teach a programming for ArcGIS class.
c. I have a working knowledge of R because I took an R course related to species distributions and I have used R in two statistics courses.
d. I am a novice in image processing but did take a digital terrain modeling course last term.
e. I am a novice in but would like to learn about software that helps me analyze NVI and LIDAR data


Adamus P,. 2002. Multiscale Relationships of Wintering Birds with Riparian and Wetland Habitat in the Willamette Valley, Oregon. Oregon State University.

Norris R., Marra P., Kyser K., Sherrt W., & Ratcliffe L. 2004. Tropical Winter Habitat Limits Reproductive Success on the Temperate Breeding Grounds in a Migratory Bird. Biological Sciences (271).

Assessing western juniper sapling re-establishment in a semiarid watershed

1. My spatial problem

  • A description of the research question that you are exploring.

Research question: How is the spatial pattern of juniper density (A) related to the spatial pattern of slope, aspect, and a combination of the two (B) via soil moisture and solar radiation?

The expansion of western juniper has become a concern in many rangeland areas, and is associated with a number of ecological and hydrological impacts [1,2]. The study site is located in a semiarid watershed in central OR, and was established to assess ecohydrological characteristics associated with juniper expansion and removal. The majority of western juniper was removed from this watershed in 2005 -2006 and juniper saplings have become re-established in this area. The objectives of this project are 1) to determine the relationship between the density of western juniper sapling re-establishment and slope and aspect in this watershed and 2) assess density changes in vegetation cover at this watershed since juniper removal.

The intent of this project is also to establish a methodology that will be expanded to include patterns of soil moisture, soil surface temperature, and soil type (but additional data needs to be collected). While juniper density and vegetation cover for the purposes of this project are the dependent variables, in the future I am exploring how certain ecohydrological characteristics vary with juniper density.

  • A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

A combination of National Agriculture Imagery Program (NAIP), unmanned aerial vehicle (UAV) imagery, and ground-based data will be used. Forty-one belt transects (3m by 30m) representing areas of varying aspect and slope were conducted in summer of 2018 to assess juniper density across the entire watershed.  UAV-based multispectral imagery (red, green, blue, red-edge, and near-infrared bands) was collected at a study plot in the watershed in October 2018, but needs to be expanded to represent topography of the watershed. Resolution of the UAV imagery is approximately 2.5cm/pixel. NAIP imagery (1 m resolution) will be used to assess density over time. A 10 m digital elevation model (DEM) will be used to model the topography of the watershed. Alternatively, a higher-resolution DEM created from UAV imagery may be used if it is available. Prior to analysis, support vector machine (SVM) supervised classification will be used to identify juniper in the UAV and NAIP imagery and to assess overall vegetation cover in NAIP imagery. It should be noted that while I have used SVM classification successfully with multispectral UAV imagery to assess juniper density (shown in the figure below), the accuracy of this process will need to be assessed with the NAIP data as there is a large difference in spatial resolution (2.5 cm compared to 1 m). If I am unable to identify juniper successfully in NAIP imagery, then only UAV and ground-based data will be used to assess juniper density.


  • Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I expect that juniper density will be positively related to north aspects and negatively related to slope angle as these characteristics promote higher levels of soil moisture. However, these patterns may vary with both soil type and amount of non-juniper vegetation cover.  Similar to past studies [2], we may anticipate that overall vegetation cover may change at this watershed in response to the removal of juniper. Changes in vegetation cover associated with juniper density may be related to juniper transpiration, soil moisture, and canopy interception.

  • Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

Initially, I will assess the spatial pattern of slope, aspect, and a combination of the two in the watershed. Further, the relationship between observed juniper density and the slope angle and aspect characteristics associated with highest soil moisture will be assessed. I would like to integrate the NAIP imagery, DEM, transect data, and UAV imagery for analysis of juniper density. During this term, this model will focus on slope and aspect but I would like to establish a model that I can expand to use to assess other characteristics as well (such as soil moisture, soil temperature, and vegetation cover).  Using the NAIP imagery, I would like to see assess temporal changes in vegetation cover. In particular, I am interested in understanding how to best use data with different resolutions and how to expand this methodology for more complicated analysis in future research.

  • Expected outcome: what do you want to produce — maps? statistical relationships?

I would like to create maps characterizing juniper density within the watershed and determine if there is a statistical relationship between juniper density and slope and aspect (that can be expanded as more variables are addressed). Additionally, I would like to assess changes in total vegetation cover over time in this watershed.

  • How is your spatial problem important to science? to resource managers?

The expansion of western juniper is associated with a number of ecohydrological changes, to include reduced undercanopy soil moisture [3], reduced productivity[4] and increased erosion[5]. The removal of western juniper is labor intensive and expensive, and can be difficult to implement. An improved understanding of the spatial patterns associated with juniper re-establishment after removal can be used to target treatment when appropriate but also to improve our understanding of the ecohydrologic impacts of juniper expansion within these communities. This analysis can also provide information about which areas are at greatest risk based on topography. Additionally, understanding changing in vegetation cover helps to determine the timing and impacts of removal. While beyond the scope of this project, this information may be used to determine how the rate of juniper expansion relates to other ecohydrological characteristics over time.

2. Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

  • I have used ArcMap/Pro a moderate amount in a classroom setting (GIS 1&2, GIS in water resources), for independent research, and in the workplace. Outside of a classroom setting, I have largely used ArcInfo for map creation and geospatial analysis (primarily unsupervised and supervised classification).
  • I have used Modelbuilder a limited amount, and it has largely been limited to class assignments in Geog 560/561. I do not have experience with GIS programming in Python.
  • I have some experience using R, primarily for basic statistical analysis. I have no experience using R for spatial statistics.
  • I have limited experience using ENVI, but I would need to refresh the basic procedures. I do not have experience with MATLAB.
  • I have some experience with Google Earth Engine, although I would need to review the basics to be effective.


  1. Coultrap, D.E.; Fulgham, K.O.; Lancaster, D.L.; Gustafson, J.; Lile, D.F.; George, M.R. Relationships between western juniper (Juniperus occidentalis) and understory vegetation. Invasive Plant Sci. Manag. 2008, 1, 3–11.
  2. Dittel, J.W.; Sanchez, D.; Ellsworth, L.M.; Morozumi, C.N.; Mata-Gonzalez, R. Vegetation response to juniper reduction and grazing exclusion in sagebrush-steppe habitat in eastern Oregon. Rangel. Ecol. Manag. 2018, 71, 213–219.
  3. Lebron, I.; Madsen, M.D.; Chandler, D.G.; Robinson, D.A.; Wendroth, O.; Belnap, J. Ecohydrological controls on soil moisture and hydraulic conductivity within a pinyon-juniper woodland. Water Resour. Res. 2007, 43, W08422.
  4. Miller, R.F.; Svejcar, T.J.; Rose, J.A. Impacts of western juniper on plant community composition and structure. J. Range Manag. 2000, 53, 574–585.
  5. Reid, K.; Wilcox, B.; Breshears, D.; MacDonald, L. Runoff and erosion in a pinon-juniper woodland: Influence of vegetation patches. Soil Sci. Soc. Am. J. 1999, 63, 1869–1879.

Cross-sectional Change in the Andrews Forest

Background and Research Questions

This project explores stream bed mobility in the HJ Andrews experimental forest in relationship to peak discharge events and channel geometry. The HJ Andrews Experimental Forest is a Long Term Ecological Research (LTER) Site located east of Eugene on the western slope of the Cascade Range. The site has been managed since the 1950s for ecological and forestry research, and the largest stream in the forest has been gauged since 1950. From 1978 to 2011, researchers conducted repeated cross sectional surveys on five reaches in three different creeks in the forest. An analysis of these cross-sectional profiles will help researchers and managers gain a better understanding of how, when, and where stream beds responds to extreme hydrologic events.

Water flowing through streams exerts stresses on the bed material. Whether or not these stresses have the capacity to mobilize sediment and change the shape of a stream bed depends on the amount of water moving through the stream along with other factors. As with many geomorphological processes, bed sediment transport is dominated by movement during extreme events. Although cross sectional changes overall appear to be strongly related to the magnitude of greatest flow between measurements, I am interested in investigating confounding factors including the extent of temporal and spatial autocorrelation in the data set.

I would like to explore (a) if and (b) in what manner aggradation or erosion in one cross section might be related to changes in adjacent cross sections. I would also like to know if aggradation or erosion in one cross section in one year are related to changes in the same cross section in an adjacent year.


I am analyzing cross sectional measurements of five reaches within the Andrews Forest, which range from 12 m to 55 m in horizontal extent and 1.2 to 7.3 meters in vertical extent. The cross sections are variously spaced along an along-stream dimension, and they are were surveyed every one to five years over a period of 30 years between 1978 and 2011. The vertical precision of the data set is roughly 1 cm (though the data are unlikely to be accurate to 1 cm) and the horizontal precision varies from 1 cm to several decimeters.

The cross sectional change between two adjacent pairs of years at one cross section is shown below. Areas of erosion are shown in red and areas of aggradation are shown in green.

Hypothesis and Approach

I predict that there will be a relationship between changes at one cross section during one year and the same cross section at another year. Portions of the stream bed that have been recently scoured or contain newly emplaced sediment may be less armored than undisturbed portions of the stream bed. These less armored portions of the stream bed may be more susceptible to future disturbance. Alternately, changes at a cross section may represent longer-term processes related to channel geometry: e.g. a series of cross sections could show continued incision of a cut bank over multiple years.

I do not expect to see a detectable relationship between changes at adjacent cross sections because I expect that the most important geographic controls on channel change are either smaller (e.g. pools) or much larger (e.g. along-stream variation in discharge) than the distance between cross sections.

I want to use this class as an opportunity to explore statistical relationships, but I don’t know yet what kinds of analyses are best suited to this problem. I’d like to learn more in general about how to handle spatial autocorrelation, and when it is and isn’t a statistical issue.

This project could be scientifically useful for improving our understanding of sediment transport in Pacific Northwest mountain streams. Resource managers may also have an interest in sediment transport because it relates to stream channel mobility (“How much can we depend on this creek staying in the same place?”) and ecology (“How vulnerable is stream life to disruption via bed transport?”). From a resource management perspective, it is becoming increasingly useful to study peak flow events because downscaled climate models for our region predict increased frequencies of large storms.


I feel good about my experience with spatial technology, and I’m most interested in learning about how to use that technology to answer statistical questions. I am highly proficient with Arc software. I have TAed one undergraduate class and independently taught another short undergraduate class in ArcGIS. I have a working knowledge of ArcPy, but I still need to use references regularly to write code. I conducted some undergraduate remote sensing research using ArcPy and used ArcPy for research at a government science agency. I work extensively in R. I’ve done image processing using Arc software, Python, ENVI, ImageJ, and raster tools in R, but it’s a very broad field, and I definitely think I could learn more.


Aerial remote sensing detection of Leptosphaeria spp. on Turnip


Of the many pathogens disrupting healthy growth of brassica species in the valley is the pathogen commonly referred to as Blackleg. This fungal pathogen has been reported to nearly wipe out canola production in Europe, Australia, Canada and in more recent years has devastated the United States (West et al., 2001). In 2013 the pathogen was reported for the first time in the pacific northwest since the 1970’s (Agostini et al., 2013) and has since been reported in the Willamette Valley (Claassen, 2016). There are two known species of Blackleg, Leptosphaeria maculuns and Leptosphaeria biglobosa. These are not to be mistaken with the potato bacterial pathogen Pectobacterium atrosepticum, which is also referred to as Blackleg. While much of the crop failure in canola has been associated with Leptosphaeria maculuns, both species are found in the valley and seem to be of similar consequence to turnip.

Classification of cercospera leaf spot for instance has been accomplished applying a support vector machine model but utilized a hyperspectral camera with high spectral and spatial resolution (Rumpf et al., 2010). Because plant diseases can oftentimes be difficult to see even with the naked eye, researchers have struggled to successfully detect specific plant diseases as spatial resolution decreased. While this analysis focuses solely on detection at 1.5 meters, it is possible for detection of blackleg despite lowered spatial resolution as result of increased flying elevations.

Here we consider how spatial patterns from diseased leaves is related to ground truth disease ratings of turnip leaves based on spectral signatures. With the application of a support vector machine model, classification of diseased versus non-diseased tissue is expected to generate a predictive model. This model will be used to determine if single turnip leaves which are diseased and non-diseased are accurately categorized based on the ground truth.



This project will be using a data set derived from roughly 500 turnip leaves, harvested here in the valley during the months of February to April of 2019. Roughly 200 of these leaves will be used in training the data model. The remaining leaves will be used for the data analysis in determining accuracy of the model versus ground truth. The spatial resolution is less than 1 mm and images are in raster format of single turnip leaves with five bands (blue, green, red, far-red, NIR). I do not anticipate using every band for the analysis and will likely apply some VI as a variable. The camera being used is a mica sense Red-edge which is 12-bit radiometric resolution.



I hypothesize that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves.



In order to accomplish this analysis, I will be using ArcGIS Pro where I have quite a bit of experience, but not particularly on this subject or type of analysis. The workflow for analysis will begin with image processing where I have little experience but don’t require expertise in this area. I hope to conduct the image processing in Pix4D where I will begin with image calibration based on the reflectance panel in each image. Followed by cropping down to simply the leaf under assessment. From here there may be some smoothing and enhancing the contrast of the image but is still undetermined.

Images will then be brought into ArcGIS Pro for conducting a spatial analysis. I intend to use spatial pattern analysis of manually classified disease versus unsupervised segmentation of the leaves for exercise 1. Next I plan on then using this information in spatial regression to improve image-based classification for exercise 2. For exercise 3 I intend to use the support vector model wizard, which will be used for training a model. This involves highlighting regions of diseased tissue and regions of non-diseased tissue to obtain a trained model when a sufficient number of pixels to create support vectors is reached. The x and y-axis for the model are yet to be determined but will likely be NIR and red-edge digital number values. Some alternatives are using different VI’s such as NDVI as explanatory variable. Turnip images which were never used for training the model will be used for analysis of the support vector machine’s ability to classify diseased or non-diseased regions and then leaves entirely. I anticipate every leaf to have at least a few pixels which will classify as diseased and will therefore set a threshold for a maximum number of diseased pixels in the image, while yet classifying it as non-diseased. I also might require a certain number of pixels to be bordering one another qualify as diseased region. The methodology may require some troubleshooting, but the expectations are clear and the methods to reach that outcome are mostly drawn out.


Expected Outcome

I expect the model to have very high accuracy after the model is fine tuned for accuracy based on contrast in spectral signatures I expect to see between diseased versus non-diseased leaves. Below I have outlined the three outcomes I would like to ultimately achieve. Due to time restrictions, the scope of my research is limited to outcomes 1 & 2 below.

  1. Train a support vector machine model for classification of pixels in turnip leaves as either diseased or non-diseased.
  2. Accurately apply the SVM model on turnip leaves from many geographical locations in the valley with different levels of diseases severity and different times in the year.
  3. Scale up from 1.5 meters and test the ability of the model to maintain accurate classification of blackleg on turnip.



I intend to publish and further the collective knowledge in aerial remote sensing. This more specifically applies to those in the area of agronomy or plant pathology. This is very applied science and is a resource for those in the industry of agriculture.

Traditionally detection for this pathogen has depended on a reliable field scout who may need to cover fifty acres or more looking for signs or symptoms of this disease. Nowadays, precision agriculture has introduced the use of drones to perform unbiased field scouting for the grower. This saves time and can be very reliable if done properly. An important aspect of disease control relies on early detection. If early detection can be accomplished, growers have time to respond accordingly. This may allow for early sprays with lighter applications rates or less controlled substances, cultural control of nearby fields, etc. in order to stop the spread of disease.


Works cited

Agostini, A., Johnson, D. A., Hulbert, S., Demoz, B., Fernando, W. G. D., & Paulitz, T. (2013). First report of blackleg caused by Leptosphaeria maculans on canola in Idaho. Plant disease, 97(6), 842-842.

Claassen, B. J. (2016). Investigations of Black Leg and Light Leaf Spot on Brassicaceae Hosts in Oregon.

Rumpf, T., Mahlein, A. K., Steiner, U., Oerke, E. C., Dehne, H. W., & Plümer, L. (2010). Early detection and classification of plant diseases with Support Vector Machines based on hyperspectral reflectance. Computers and Electronics in Agriculture, 74(1), 91-99.

West, J. S., Kharbanda, P. D., Barbetti, M. J., & Fitt, B. D. (2001). Epidemiology and management of Leptosphaeria maculans (phoma stem canker) on oilseed rape in Australia, Canada and Europe. Plant pathology, 50(1), 10-27.

Courtney’s Spatial Problem

In the formula “How is the spatial pattern of A related to the spatial pattern of B via mechanism C?”, my research question for this class is made of the following parts:

  • A: ion and isotope concentrations in wells tapping the basalt aquifer
  • B: mapped faults
  • C: groundwater flow as determined by hydraulic conductivity of the geologic setting

This all adds up into:

“How is the spatial pattern of ion and isotope concentrations in wells tapping the basalt aquifer related to the spatial pattern of mapped faults via the mechanism of groundwater flow as determined by hydraulic conductivity of the geologic setting?”

This question might have thrown you, the reader, into the figurative deep end! For context, I’m studying the basalt aquifer in the Oregon portion of the Walla Walla Basin. This is located in north eastern Oregon, about thirty minutes northeast of Pendleton.

map showing the state of Oregon, with an inset map showing the study area

Figure 1: Walla Walla Sub-Basin Location

The wells that I am studying are drilled into the three most extensive formations within the Columbia River Basalts that are present in my study area: the shallow Saddle Mountains layer, the intermediate Wanapum Basalt , and the deepest Grande Ronde Basalt. The wells and faults that I am studying are predominantly on the transition area between the “lowland” areas where the basalts are covered by layers of sediment deposited in the ~15 million years since they were deposited and the upland areas where the basalt is exposed at the surface.

geologic map showing formations, folds, and faults in the entire Walla Walla Basin, highlighting my study area

Figure 2: Geologic map of the study area

Wells in this area are between 300 and 1,200 feet in depth, and primarily serve irrigation and municipal uses. Over the past 50 years there has been a noticeable downward trend in groundwater elevations in many of the wells. My research is part of a larger Oregon Water Resource Department project that seeks to better understand this groundwater system, faults and all. Faults add an element of the unknown to models of groundwater flow unless they are specifically studied, as they can formed either barrier or pathways for groundwater flow depending on their structures and characteristics. Faults with low hydraulic conductivity can block or significantly slow groundwater flow, while faults with higher hydraulic conductivity allow water to flow through them more easily. My research uses ion chemistry and isotope concentrations to characterize the path that groundwater has taken through the subsurface into the well.


A: In my research I have analytical data for 32 wells, whose XY locations were determined by field confirmation of my collaborators’ well log database. As groundwater is a 3D system, I have to consider Z values as well. The well depths and lithology information is also from my collaborators’ database, and was based on information of varying quality recorded at the time that the well was drilled. My analytical data provides a snapshot of water chemistry during the summer of 2018. I have only one temporal data point per well. At all 32 wells, I collected samples to be analyzed for pH, temperature, specific conductivity, oxygen isotopes 16 and 18, and hydrogen isotopes 1 and 2. At a subset of 18 of those wells I collected additional samples for tritium, carbon 14, and major ion analysis.

B: The shapefile of faults mapped at the surface was created by Madin and Geitgey of the USGS in their 2007 publication on the geology of the Umatilla Basin. There is some uncertainty in my analysis as far as extending this surface information into the subsurface. USGS studies have constrained proposed ranges of dip angles for the families of faults that I am studying, but not exact angles for any single mapped fault.

Figure 3: locations of wells that were sampled mapped with mapped fault locations.


Where faults act as barriers, I hypothesize that parameter values will differ in groups on either side of a fault. Specifically, a barrier fault might cause older, warmer water to rise into upper aquifer layers, and the downstream well might show a signature of more local recharge.

Where faults act as conduits, I hypothesize that water chemistry and isotopes of samples from wells on either side of the fault would indicate a relatively direct flowpath from the upstream well to a downstream well. Over a short distance, this means that ion and isotope concentrations would not differ significantly in wells across the fault.


I would like to use principal component analysis to identify grouping trends of the samples, and then map the results. Additionally, a bivariate comparison of wells on either side of the fault could be interesting? I would like to find some way to bring in distance from a fault into the model too.

Expected outcome: My output would be a mixture of statistical relationships and maps of those relationships.

Significance.  How is your spatial problem important to science? to resource managers?

The Walla Walla Subbasin’s basalt aquifers have recently been deemed over-allocated by the Oregon Water Resource Department (OWRD), and water managers are looking for methods to better regulate the aquifer when wells run dry. However, the faults are a big unknown when considering how to enforce the prior appropriation doctrine where junior permit holders are regulated off in times of water shortage. If a junior and senior water permit holder have wells that are separated by a fault, is it likely that stopping the junior permittee’s water use would actually result in more water available to the senior permit holder?

My approach is not novel in my scientific field. Several studies have evaluated similar parameters elsewhere in the Columbia River Basalts and also used statistical methods, but have not focused specifically on faults.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

Arc-GIS: Highly proficient in Desktop and Pro

R – novice, can copy/paste/edit code to suit my basic needs

Python – novice, took a class two years ago but have forgotten much of it

Image processing – working knowledge of ENVI from GEOG 580, and of Gravit from GEOG 572

Other spatiotemporal analysis – I haven’t really worked with software besides ArcGIS or QGIS?



Individual foraging specialisations of gray whales along the Oregon coast through prey preferences

Research Question

The Pacific Coast Feeding Group (PCFG) is a subgroup of the Eastern North Pacific (ENP) population of gray whales (Scordino et al.2018). The ENP, a population of 24,000-26,000 individuals, migrates along the U.S. west coast from breeding grounds in the lagoons of Baja California, Mexico to feeding grounds in the Bering Sea (Omura 1988). The PCFG, currently estimated at 264 individuals (Calambokidis et al.2017), stray from this norm and do not complete the full migration, instead choosing to spend their summer feeding months along the Pacific Northwest coast (Scordino et al.2011). Since gray whales as a species already exhibit specialization (by being the only baleen whale that benthically forages) and since the PCFG display a second tier of specialization by not using the Bering Sea feeding grounds, it seems plausible that individuals within the PCFG might have individual foraging specializations and preferences. Therefore, my research aims to investigate whether individual gray whales in Port Orford exhibit individual foraging specializations. Individual foraging specializations can occur in a number of different ways including habitat type (rocky reef vs sand/soft sediment), distance to kelp, time and distance of foraging bouts, and prey type and density. For this class, my research question is whether prey species drives the amount of time a foraging whale will spend at a specific foraging location.



Prey data

The prey data has been obtained through zooplankton tow net samples from a research kayak in the summers of 2016, 2017 and 2018. Kayak sampling effort varies widely between the three years due to weather creating unsafe conditions for the team to collect samples. These samples have been sorted and enumerated to the zooplankton species level so that for each day when a prey sample was collected, we have known absolute abundances of prey species at each sampling station.

Whale data

The whale data is in the form of theodolite tracklines of gray whales that used the Port Orford study area during the summers of 2016, 2017 and 2018. Since whale tracking occurs at the same sites as prey sampling, we are able to map the prey community present at a particular location that whales forage at. The tracklines occur on a very fine spatial resolution as the study area is approximately 2.5 km in diameter, though some of the tracklines extend out to approximately 8 km offshore. Furthermore, as whales forage in the area, photographs are taken of each individual in order to match the trackline with a particular individual. This way, potential individual specializations may be detected if there are repeat tracklines of an individual.



There will be differences in time spent by individual gray whales foraging in an areas with different prey communities. However, these differences will likely not be constant/stable over time. Most likely, foraging will be largely driven by availability of prey and therefore individual whales will be rather flexible in their foraging strategies.



Individual patterns in time and space use within the Port Orford study area will be assessed through the identification of foraging bouts. Theodolite tracks from 2016-2018 longer than one hour will be included in this analysis. Using ArcGIS, tracklines will be clipped so that only foraging points within the study sites are included in this analysis. Radii will be created around each of the 12 zooplankton sampling locations to identify overlap between whale foraging and known prey communities at sampling stations. Time spent within these radii will be calculated. Statistical analyses (likely GAMs) will be run in order to identify whether individuals spend more and/or less time at a foraging location based on the prey species that are present at that location.


Expected Outcomes

This analysis will likely result in plots of time spent at a foraging locations against abundance of different prey species, which will be based on the results of the statistical analyses. This will allow the comparison of whether individual whales have preferences for different kinds of prey species. Maps might also be very nice to visualize the movements of whales however I am unsure how the time element of foraging bouts could be incorporated into this (perhaps with shading of color?).



This spatial problem is important to science since genetic evidence suggests that there are significant differences in mtDNA between the ENP and PCFG (Frasier et al.2011; Lang et al.2014), and therefore it has been recommended that the PCFG should be recognized as being demographically independent. In the face of a proposed resumption of the Makah gray whale hunt as well as increased anthropogenic coastal use, there is a strong need to better understand the distribution and foraging ecology of the PCFG. This subgroup has an important economic value to many coastal PNW towns as many tourists are interested in seeing the gray whales. Therefore, understanding what drives their distribution and foraging habits will allow us to properly manage the areas where they prefer to forage.



I have novice/working knowledge of Arc-Info and Modelbuilder. I have never used Python before. I have working knowledge of R and am proficient in image processing.


Literature Cited

Calambokidis, J. C., Laake, J. L. and A. Pérez. 2017. Updated analysis of abundance and population structure of seasonal gray whales in the Pacific Northwest, 1996-2015. Draft Document for EIS.

Frasier, T. R., Koroscil, S. M., White, B. N. and J. D. Darling. 2011. Assessment of population substructure in relation to summer feeding ground use in the eastern North Pacific gray whale. Endangered Species Research 14:39-48.

Lang, A. R., Calambokidis, J. C., Scordino, J., Pease, V. L., Klimek, A., Burkanov, V. N., Gearin, P., Litovka, D. I., Robertson, K. M., Mate, B. R., Jacobsen, J. K. and B. L. Taylor. 2014. Assessment of genetic structure among eastern North Pacific gray whales on their feeding grounds. Marine Mammal Science 30(4):1473-1493.

Omura, H. 1988. Distribution and migration of the Western Pacific stock of the gray whale. The Scientific Reports of the Whales Research Institute 39:1-9.

Scordino, J., Bickham, J., Brandon, J. and A. Akmajian. 2011. What is the PCFG? A review of available information. Paper SC/63/AWMP1 submitted to the International Whaling Commission Scientific Committee.

Scordino, J., Weller, D., Reeves, R., Burnham, R., Allyn, L., Goddard-Codding, C., Brandon, J., Willoughby, A., Lui, A., Lang, A., Mate, B., Akmajian, A., Szaniszlo, W. and L. Irvine. 2018. Report of gray whale implementation review coordination call on 5 December 2018.

Grant Z’s Spatial Problem

My spatial problem is about land use/land cover (LULC) change associated with the establishment of artisanal, small-scale gold mines (ASGM) in rural Senegal. Put in the vernacular we’ve learned in the course, I’d phrase my question this way:

How is the spatial pattern of LULC change related to the spatial pattern of ASGM establishment via the mechanism of deforestation?

As part of their establishment, ASGM requires clearing the land where the mines will be, as well as additional timber harvesting to build the homes where the miners will live while working and additionally to bolster the mines shafts themselves. As such, I’m curious as to what the exact change is that accompanies ASGM establishment, as this is a sub-set of my graduate thesis which seeks to understand more broadly how ASGM impacts the environment and the lives of the miners themselves, to understand better if a household diversifying into ASGM is better suited towards adapting to future climate change than if they hadn’t.

The dataset I have is very high resolution (VHR) satellite imagery (panchromatic and multispectral) courtesy of the Digital Globe Foundation. The panchromatic imagery has a resolution of .3m while the MS imagery is around 1.24m — as part of the preprocessing I’ve pansharpened the images so the overall imagery is stills sub-meter, which is necessary to investigate ASGM as its footprint is too small for detection with Landsat or other sensors. The imagery covers about 16 gold mines I’ve identified, and has imagery from 2018 and 2009/2010.

My present hypothesis is that, while obviously there will be a decrease in LULC at the mine in general, the area around the mine will also be indicative of some change — in my literature review, I’ve found some information that the environment up to 20km around a gold mine can be impacted. I think in this case the impact won’t be as drastic, but it’s what I’m expecting.

Currently I’m not sure how to approach the problem. The first step will be to just map out the mining locations first in ArcPro and from there try to see how the forest cover has changed between present and past. Beyond that I’m not sure how to answer the question.

Ultimately I’d like to produce maps, to demonstrate the environmental impact that ASGM has (or not, potentially!). I’ve also considered producing similar maps showing the relative impact subsistence agriculture has had in non-mining villages as a comparison.

I feel somewhat prepared for this task. I’m comfortable working with ArcGIS and have some knowledge of ArcPy (though I’m a bit rusty). I can use ModelBuilder pretty competently and feel that I have a good grasp of what to do in that arena. My major hurdle right now is not knowing what I don’t know — I need some more exposure to the tools available to me for addressing this problem. However, I feel confident that with enough time and work, this problem is not intractable!

One problem is that, given the rural area and part of the world I’m looking at, Digital Globe does not frequently sample the area and so the imagery I have is not necessarily on the anniversary date. Given the drastic climate differences (rainy season and dry season), the vegetation may look drastically different.

As an example, bellow is Kharakhena in November 2008, with the original village highlighted in red.

Here is the village of Kharakhena in March 2017. The village is highlighted in yellow and the mine in blue.

The difference is very dramatic, but I’m not sure how to approach this analytically.

This is my spatial problem! I believe that this is significant as a part of my thesis work exploring livelihood diversification in rural Kedougou. As 70% of the population in the region lives below the poverty line, and most people engage in subsistence farming as their primary livelihood, I am interested in how ASGM operates as a livelihood diversification option. Specifically, I’m interested in assessing whether it is an “erosive” coping strategy: one which households undertake out of lack of other options and one which may diminish the household’s overall capacity to react to future stresses and shocks. I’m assessing this through an evaluation of the “Three Capitals” (sometimes five) which constitute a household’s assets. The three capitals are Economic, Environmental, and Social. I’m assessing the impact of ASGM on miners’ Social and Economic capital through interviews which I’ve already conducted, and I’m assessing the impact of ASGM on miners’ Environmental capital through remote sensing. Together, I hope to paint a holistic picture of ASGM’s impact on miners’ livelihood capitals in the region in order to better understand if it is indeed an “erosive” coping strategy, which, if it is, needs to be known in order to help miners and farmers find different ways to adapt to future climate change.

Spatial pattern of invasion

  1.  A description of the research question that you are exploring.

I am interested in how the spatial pattern of invasion by the recently introduced annual grass, ventenata, is influenced by the spatial pattern of suitable habitat patches (scablands) via the susceptibility of these habitat patches to invasion and ventenata’s invasion potential.  Habitat invisibility is determined by the environmental characteristics, community composition, and spatial arrangement of suitable habitat patches and the invasibility of ventenata is influenced by its dispersal ability and fecundity.

I am also interested in understanding how spatial autocorrelation influences relationships between ventenata, plant community composition, and environmental variables within and across my sample units.

  1. A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I collected spatial data (coordinates and environmental variables) and plant species abundance data (including ventenata) data for 110 plots within and surrounding seven burn perimeters across the Blue Mountain Ecoregion of eastern Oregon (Fig. 1). Target areas were located to capture a range of ventenata cover from 0% ventenata cover to over 90% cover across a range of plant community types and environmental variables including aspect, slope, and canopy cover within and just outside recently burned areas. Once a target area was identified, plot centers were randomly located using a random azimuth and random number of paces between 5 and 100 from the target areas. Sample plots were restricted to public lands within 1600m of the nearest road to aid plot access. Environmental data for sample plots includes canopy cover, soil variables (depth, pH, carbon content, texture, color, and phosphorus content), rock cover, average yearly precipitation, elevation, slope, aspect, litter cover, and percent bare ground cover. I am planning to identify and calculate spatial pattern of habitat patches using Simpson Potential Vegetation Type raster data (Simpson 2013) to 30m resolution in Arc GIS which was developed to identify potential vegetation types across the Blue Mountain Ecoregion.

  1. Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

Ventenata appears to readily invade unforested, rocky scabland patches, but is less prominent in surrounding forested areas. These areas may act as “hotspots” of invasion from which ventenata can spread into surrounding, less ideal habitat types. The “biodviersty-invasion hypothesis” (Elton 1958) posits that more biodiverse areas will be less susceptible to invasion, but propagule pressure hypotheses suggests that areas close to areas that are heavily invaded will be more likely to be invaded (Colautti et al. 2005). If environmental factors such as biodiversity influence invasion success, I would expect diverse habitats to have a higher resistance to the ventenata invasion and be less invaded, but if propagule pressure is a stronger driver of the ventenata invasion, diversity may be trumped by proximity to invaded patches which may increase these patches risk of invasion despite species composition.

  1. Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to develop an overall understanding of the spatial pattern of invasion by performing Moran’s I to test for spatial autocorrelation. Additionally, I would like to identify hotspots of invasion using hotspot analysis. From these hotspots, I am interested in predicting spread using kernel density functions that estimate ventenata distribution. Finally I would like to relate the spatial pattern of vententata to environmental characteristics of habitat patches (including species composition) through the use of cross-correlation and geographically weighted regression.

  1. Expected outcome: what do you want to produce — maps? statistical relationships? other?

I would like to produce figures that represent multivariate statistical relationships between ventenata, patch size, location, and environmental/ community variables. I am also interested in creating a map depicting areas at highest risk of invasion based on spatial and environmental data if appropriate considering the statistical relationships that result from the analysis.

  1. How is your spatial problem important to science? to resource managers?

Ventenata is rapidly invading natural and agricultural areas throughout the inland Northwest where associated ecological and economic losses are readily becoming evident. However, possibly the most concerning aspect of the invasion is ventenata’s potential to increase fire intensity and frequency in invaded scabland patches, that prior to invasion supported light fuel loads and acted as natural fire breaks for the surrounding forest.  Such shifts to the fire regime could dramatically alter landscape-scale biodiversity and cause additional socioeconomic losses. Despite these concerns, little is known of the drivers influencing ventenata’s invasion potential and few management options exist. Understanding how the spatial arrangement and size of scabland patch influences susceptibility to invasion by ventenata could help managers target areas at the highest risk for invasion and mitigate losses.

  1. Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

I have a working knowledge of programming in R and manipulating spatial data/ map making in ArcGIS. I have taken introductory level classes in R and ArcGIS, and used these tools for work and for my research. I have no experience in modelbuilder, GIS programming in python, and image processing. I am eager to learn how to use these tools and apply them to help answer ecological research questions.


Colautti, R. I., Grigorovich, I. A., & MacIsaac, H. J. (2006). Propagule pressure: a null model for biological invasions. Biological Invasions, 8(5), 1023-1037.

Elton, C.S. (1958). The ecology of invasions by animals andplants. T. Methuen and Co., London.

Simpson, M. 2013. Developer of the forest vegetation zone map. Ecologist, Central Oregon Area Ecology and Forest Health Program. USDA Forest Service, Pacific Northwest Region, Bend, Oregon, USA


Using Spatial Statistics to Determine the Subsurface Spatial Distribution of Lava Flows in Northern California

Research Question

I am trying to determine potential fluid flow paths based on the spatial distribution of lava flows in the Medicine Lake, Lassen Peak Big Valley Mountain area of Northern California. The final goal of this project is a 3-D subsurface framework of the geology from which we can model fluid and heat flow.

Thus we want to know “How the spatial pattern of the depth of the lava flows attributes of wells is related to the spatial pattern of lava flow depth (B1), which in turn is related to pre-lava-flow topography (B2) and the regional geology (B3), because lava flows follow topography and subside post emplacement at rates that follow basin subsidence rates in the region, and form barriers for/conduits of groundwater (C1).”


My data are a series of more than 1500 well logs. Each contain data that pertains to a map coordinate and with information about changes in lithology, or rock type, with respect to depth. Well logs are collected the year the well is made. I have well logs that range from the 1960s to the present. Temporal data is less important for the initial part of my study (Fig 1).

Figure 1: What is a Well Log? A well log is a record of the changes in rock type that occur with changes in depth and a location at the surface of the earth.

I also have data from geologic maps. Geologic maps provide the spatial extent of the surface exposure of a geologic unit, which includes information about its contacts with other rock units in x,y space as well as information about the elevation (Fig 2).

Figure 2: My study area in Northern California, it lies between the Big Valley Mountain, Medicine Lake Volcano and Lassen Peak with the Pit River draining the middle of it. The green dots represent the center of the township and range section that the wells are located in, and the size of the circle indicates the number of wells in that section.



Figure 3: Cross section of a Basalt Column (Lyle 2000). In Volcanic systems, fluid flow is limited to the Vesicular flow tops, the rubbly bases (P.P.C in this diagram) and sedimentary interbeds that lie between vertically stacked flows. These sections of the rock are the only locations with high enough permeability and porosity to allow the movement of water.

This means that if we know where the lava beds lie, and we know their contacts, then we can outline potential zone of flow. Determining the patterns of how lavas flow and emplacement then allows us to determine the lava flow’s spatial extent, and therefore potential flow paths. I expect lavas to follow the laws of physics. They travel as viscous fluids and fill basins, and so I would expect them to be thickest where paleotopography was lowest, they will likely thin out at the edges, and they will be down slope from the volcano or vent that erupted them. Given the regional geology, I would expect the thickest lava flows to lie in the Pit River basin, near the range bounding Big Valley Mountain Fault (Fig 3).

At its simplest, we expect lava flows to follow the geologic principals of original horizontality and of cross cutting relationships.


I would like to learn both how to apply variograms and kriging and how they work; plus any new techniques that I am not yet aware of. We can also make the assumption that all our residuals with follow the rules of stationarity. This means that any irregularities in the data represent unacknowledged geologic features.

Expected outcome

My first goal is to create a 3-D subsurface map of the connectivity and contacts of lava flows in the Medicine Lake, Lassen Peak Big Valley Mountain region.  Ideally I would begin the process with Geog 566. I would like to have a few surface I can test in the field by the end of this term, but understanding different methods with which I can make this framework is my first goal.


Understanding the distribution of lava flows in the region ties into the regional geology of my study area. As I stated in Question 3, lavas fill basins. Basin morphology, and the amount of space lavas can fill depends on the slip rates and distribution on faults in the Medicine Lake, Lassen Peak Big Valley mountain triangle. By constraining slip rates in the basin, we both build a better picture of the regional geology, and we can make more rigorous checks on the validity of our statistical outputs.


Another equally important point is the next step in the project. After we make the 3-D framework, the USGS will use it to model fluid and heat flow in the region. Comprehending potential changes in groundwater flow in the region will allow city manages in the region to better manage water in the future.




Arc-Info Not much, ArcMap, yes
Modelbuilder and/or GIS programming in Python Working knowledge of python outside of GIS programming
R None
Image Processing Working Knowledge
Relevant Software Matlab



Lyle, P. “The Eruption Environment of Multi-Tiered Columnar Basalt Lava Flows.” Journal Of The Geological Society, vol. 157, 2000, pp. 715–722.

Epidemiology of Western Hemlock Dwarf Mistletoe Post – Mixed Severity Fire

Description of the Research Question

My study site is located in the HJ Andrews Experimental Forest, where the two most recent mixed severity fires burned leaving a sizable fire refugia in the middle of the stand. Western hemlock dwarf mistletoe (WHDM) survived the fire in this refugia. WHDM spreads via explosively discharged seed and rarely by animals. This means that for the pathogen to spread, the seed must reach susceptible hosts. WHDM is a obligatory parasite of western hemlock. The regeneration post fire may pose a barrier to the pathogens spread because of the structure and composition. Lastly, the structure of the fire refugia may determine the rate of spread. This is because the structure largely determines where infections exist in the vertical profile of the canopy.

My research question is: How is the spatial pattern of WHDM spread extent and intensification from fire refugia related to the spatial pattern of the structure and composition of the fire refugia and the surrounding regeneration via barriers to viable seed reaching susceptible hosts?

I have three objectives related to the spatial organization or the regenerating forest surrounding the fire refugia and one related to the fire refugia itself. My objectives are to determine how fire refugia affects dwarf mistletoe’s spread and intensification through:

  1. The stand density of regenerating trees and of surviving trees, post fire.
  2. The stand age and structure of regenerating trees and surviving trees in fire refugia.
  3. The tree species composition of the regenerating trees adjacent to fire refugia.


  1. Whether intensification dynamics of WHDM inside a fire refugia resemble those of WHDM in other infection centers.

Dataset Description

Spatial locations of the extent of spread and intensity (also referred to as severity) of WHDM infection include presence/absence of infection in susceptible hosts and will have a severity rating for each infected tree. The presence/absence data will be two measurements, one from 1992 and one from 2019. The intensity rating will be from 2019 only.

Spatial locations/descriptions of the structure and composition of the fire refugia and regeneration surrounding refugia include X,Y of each tree, a variety of forest inventory attributes such as diameters, heights, and species for each tree, and delineation of the fire refugia boundaries. This data has been measured several times: 1992, 1997, 2013, and 2019. The GPS coordinates were recorded with handheld GPs units most likely under canopy cover so resolution may vary.

These data sets are bounded by a 2.2 ha rectangle that confines the study area.


The spatial pattern of western hemlock dwarf mistletoe in the study site will be several discrete clusters. This arrangement forms because WHDM spreads from an initial infection point outward. The initial infection serves as an approximate center point and the cluster, or infection center, grows outward from there. Separations between clusters are maintained by forest structure and composition or disturbances. New clusters form from remnant trees surviving disturbances, or random dispersal of seed long distances by animals. In this case, the fire refugia protect the remnant trees from disturbance and these trees are the new focal points for infection centers

The spatial pattern of the forest structure and composition will drive the direction and rate of spread of WHDM because variances in these two attributes will cause varying amounts of barriers to WHDM seed dispersal. Because seeds are shot from an infected tree, that seed needs to reach a new uninfected branch. This means physical barriers to spread can affect seed dispersal and non susceptible species will stop spread.


I would like to utilize spatial analyses that allow me to understand how the clustering of WHDM is related to the boundaries of the fire refugia. Also, how the infection centers have changed over time utilizing forest structure and composition metrics of the regeneration surrounding the refugia. Lastly, something that can incorporate a severity rating on a scale instead of simply presence/absence data and describe the distribution of severely infected trees vs lightly infected trees and how that relates to the fire refugia boundary and forest metrics of the regeneration surrounding the refugia.

Expected Outcome

I would like to produce statistical relationships that can determine the significance of forest density, species composition, age, and structure on the ability of WHDM to spread. Also, I would like to produce statistical relationships that can describe whether or not a fire refugia alters the way WHDM spreads and intensifies when compared to commonly observed models. Maps as visuals for describing the change over time would be a useful end product as well.


Understanding the spread patterns of WHDM is important for resource managers seeking to increase biodiversity and produce forest products. Focus has shifted to creating silvivultural prescriptions that emulate natural disturbances that are still economically viable and that maintain ecosystem functions. Disturbance events can control WHDM but also create opportunities to increase its spread and intensification so managers need to have an understanding of how a particular forest structure will affect WHDM. Also, if we want to maintain biodiversity, understanding how WHDM infection centers created by fire develop is important. Fire frequency and severity may be increasing in the future and the loss of mixed severity fire would mean a significant loss of WHDM. Land managers seeking to emulate burns can use this information to plan burns that preserve patches of WHDM if desired and understand how the pathogen will progress 25 years later. This is not usually the case for forest pathogens.

Level of Preparation

I have worked in ArcMap quite a bit, but I haven’t much experience with the wide range of functionality of ArcInfo. I used ModelBuilder somewhat to keep my queries and basic analyses organized. No experience programming in Python. I have taken two stats classes before this using R and feel I have a working knowledge and have no problem learning new tools in it. However, I have very little work with image processing such as working with rasters and some small exposure to LiDAR processing.

Predicting spatial distribution of Olympia oysters in the Yaquina estuary, OR

My spatial problem

  • A description of the research question that you are exploring.

My research aims to determine the current abundance and spatial distribution of native Olympia oysters in Yaquina Bay, Oregon. This oyster species has experienced massive decline in population due to overharvest during European settlement of the western United States. Yet its value to the ecosystem, its cultural importance, and its tastiness have made the Olympia oyster a current priority for population enhancement. For my research, I will be focusing on a local population of Olympia oysters in the Yaquina estuary. The goal of my project is to gather baseline information about their current abundance and spatial distribution, then develop a repeatable biological monitoring protocol for assessing this population in the future. Using spatial technology, I will first assess whether the distribution of Olympia oysters can be predicted using three habitat parameters: salinity, substrate availability, and elevation. In collaboration with the Oregon Department of Fish and Wildlife (ODFW), I will use the results of this spatial analysis and field surveys to determine ‘index sites’, which are specific locations within the estuary that are indicative of the larger population. These index sites will be revisited in the future by ODFW’s Shellfish and Estuarine Assessment of Coastal Oregon (SEACOR) team to assess changes in population size and spread over time. If predictions of Olympia oyster distribution are accurate based on the habitat parameters I’ve identified, then I’d also like to analyze potential species distribution under future environmental conditions and under different management scenarios, including habitat restoration and population enhancement.

For this course, I will be exploring this main research question:

How is the spatial pattern of Olympia oysters in the Yaquina estuary [A] related to (caused by) the spatial pattern of three habitat parameters (salinity, substrate, elevation) [B]?


  • A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I will be using three spatial datasets, representing each of the habitat parameters, overlaid on one another to rank most to least likely locations for Olympia oyster presence. The salinity dataset is based on historical measurements (1960-2006) and represents a gradient from highest salinity (~32psu) at the mouth of the estuary to fresher water up stream (<16psu). Elevation is represented through a bathymetric dataset from 2001-2002, sourced from the Environmental Protection Agency office in Newport, OR. The substrate data comes from the Oregon ShoreZone mapping effort in 2014, which is managed and updated by the Oregon Coastal Management Program. There’s a couple different ways this data can be used, either as a substrate layer that characterizes substrate type broadly (low resolution) or through vector data with associated data tables that describe the substrate within a tidal zone along the shoreline (higher resolution, but spatial extent is limited).

The images here show the three habitat parameter spatial datasets:

Yaquina Bay bathymetry derived from subtidal soundings in 1953, 1999, 1998, and 2000 by U.S. Army Corps of Engineers.
Data from EPA.

Salinity figure digitized from Lewis et al. (2019) based on Oregon’s wet-season salinity measurements (average salinity November-April).
Lewis, N. S., E. W. Fox, and T. H. DeWitt. 2019. Estimating the distribution of harvested
estuarine bivalves with natural history-based habitat suitability models. Estuarine, Coastal and Shelf Science, 219: 453-472.

Substrate component classes of Yaquina Bay based on data classifications from Coastal and Marine Ecological Classification Standard (CMECS) ‘Estuarine Substrate Component’ layer.
Data from Oregon Coastal Management Program.

  • Predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I am hypothesizing that the distribution/spread of Olympia oysters in Yaquina Bay is influenced by availability of appropriate habitat parameters; where these parameters align within the appropriate range will determine where the oysters can be found. However, I think that I will find that not all of the parameters equally influence oyster distribution. For example, Olympia oysters have been observed to tolerate a broad salinity range, but are absolutely not present without suitable substrate. I am expecting to see that the influence of a particular habitat parameter changes depending on where the oysters are located within the estuary. I’m curious to see, if possible, which parameter will be most important at what life stage and what may drive changes in population per specific site in the estuary.

I do expect that I will be able to make a prediction about where the oysters will be located based on the habitat parameters, though I am uncertain that the resolution of the spatial data is sophisticated enough to capture nuances in distribution. For example, Olympia oysters are known to be opportunistic in finding suitable substrate and will settle on a wide variety of hard surfaces, including derelict boating equipment, discarded shopping carts, and pilings.


  • Describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I want to be able to produce a model that can predict where the oysters are located based not just on the three habitat parameters of interest, but under various environmental conditions and different management scenarios. For example, where might the oysters settle in a given year if rainfall is substantially higher, or if adult oysters spawn earlier, or if oyster growers create Olympia oyster beds for harvest, or if a new habitat restoration site is established, etc.

I’m not especially handy at statistical analysis, so I would like to gain a better understanding of statistics through spatial data. I know that I will need to use statistics to determine how successfully the prediction of Olympia oysters aligns with actual observations in the field, but currently unsure how to do that. A recent study in Yaquina estuary was just released using a similar approach for predicting the distribution of five other bivalve species. This study used R to generate a logistic regression model to determine the probability of each species presence within a given area. I would like to do something similar for my analysis, but need some help.


  • What do you want to produce — maps? statistical relationships?

The desired products of this research are habitat suitability maps of the current and future (pending the success of the initial effort) distribution of native Olympia oysters for use by ODFW. As a part of this effort, I will create a map of index site locations to be used in future species monitoring. I would also like to generate a predictive model that can determine distribution of oysters based on annual changes in the local environment (El Nino conditions, heavy rainfall, restoration efforts, introduction of invaders, etc.). While salinity, substrate, and elevation seem to be the main factors influencing oyster distribution, there are a number of other factors that can have effects, including temperature, proximity to the mouth of the estuary, and tidal retention.


  • How is your spatial problem important to science? To resource managers?

ODFW currently does not have reliable baseline information on the distribution of Olympia oysters in Oregon. As an ecological engineer, the species provides a number of important benefits to the ecosystem, including water filtration and habitat for other marine creatures. It is culturally significant to local tribes, including the Confederated Tribes of Siletz. This species is not currently listed as threatened or endangered, but if it becomes listed one day, then that designation will trigger a number of mitigation and conservation measures that will be difficult and expensive for agencies and private landowners. Additionally, there’s been some exploration that if the population can become robust again, there is potential to grow and harvest this species as a specialty food product. Given the current slow food movement and interest in local products, Olympia oysters could fit well in this niche.


  • How much experience do you have with:

(a) Arc-Info – Little experience, used a bit with older versions of ArcMap.

(b) Modelbuilder and/or GIS programming in Python – I am comfortable with ModelBuilder, but have no experience with Python.

(c) R – Some experience; I took Stats 511 where we used R heavily in a series of lab exercises. I have not applied my own data in R.

(d) Image processing – I have used a variety of Adobe products for graphic design, including Photoshop and InDesign.

My Spatial Problem – Elevated Blood Lead Levels in Bangladeshi Children: What is to blame?


No level of lead exposure is considered safe, and humans have experienced its toxicity to nearly every organ system, especially the central nervous system (1,2).  Exposures to lead, especially for the developing neurological system of children, are known to be hazardous and greatly impact children (2).

Environmental contamination is a widespread public health issue throughout the country of Bangladesh. Within the 1990s, widespread arsenic poisoning was described in populations which used groundwater wells for drinking water and were contaminated with naturally-occurring arsenic (3). In addition, Bangladesh is a developing country with increasing industry and continued burning of gasoline pointing to high exposures of lead through air pollution (4). Children also make up about 13% of the labor force in Bangladesh which include pottery glazing, lead melting, welding, car repair, and ship breaking which are all known exposures of lead (5). Based on historical air pollution and industries within urban areas of Bangladesh, the exposures to lead and known elevated childhood blood levels alarm health authorities and more information is needed as to determine exact exposures within communities.

I will utilize a 16-year developed maternal cohort study (R01ES015533 PI:Christiani/CoI Kile; P42ES16454: Bellinger/CoI: Kile; R01ES023441,PI: Kile) in Pabna and Sirajdikhan Upazilas in Bangladesh to examine the possible spatial relationships of lead exposures. Children born from the participating mothers in the maternal study were followed from early childhood and were tested for blood lead levels at 4 to 5 years old. Many children within this cohort showed elevated blood lead levels above the U.S. recommendation level of 5 ug/dL (Table 1).

Table 1: Summary results of blood lead levels (ug/dL) of children aged 4-5 years in Pabna and Sirajdikhan (n=348)


Minimum Mean Maximum Above 5 ug/dL and percentage


3.86 19.6

165, 47%

Pabna (n=183)


1.32 10.8

51, 28%

Sirajdikahn (n=165) 0.00 6.67 19.6

114, 69%

Research question:

What are the potential spatial relationships of blood lead levels in children aged 4-5 years old within two areas in Bangladesh and association with possible sources of exposure from high traffic or urban areas?

Dataset analyzing:

Blood lead concentrations (ug/dL) of children at ages 4-5 years old only taken at one timepoint with coordinates of their homes (n=348). There is one other categorical variable of if the children are above or below the 5 ug/dL US recommendation for lead exposure. The resolution is fine scale within meters of one another over the span of square kilometers. I will also be employing base maps of the two districts where the children live. This will allow network review for the major traffic and highway areas and distance to urban centers. I have a slight hesitation for the level of information I will have within the country of Bangladesh for understanding the street network. My pilot data from analysis in R made it difficult to gather these data. I am looking to import the data into Arc for more extensive data analysis.


I hypothesize that the blood lead levels of the children are spatially correlated and increased blood lead levels in either specific hot spots and/or locations closer to roads and/or urban centers will be due to inhalation and dermal exposure from air pollution.


I would first like to perform spatial autocorrelation to determine if the blood lead levels of children are spatially correlated.  Secondly, I would like to employ the hot-spot analysis tool from Arc to determine if within the sample of children there are specific hotspots of higher blood lead levels. We do know that between the two districts of samples taken, the district closer to urban landscape has higher, more prevalent levels of lead in children (Table 1). To try and pinpoint more specific exposures, I would like to intersect the locations of the children’s homes with a buffer (100 m) from major highway and traffic areas. I would then aggregate the intersected homes and determine if those blood lead levels are higher comparatively to the children that live greater than 100 m. The first attempt would be crude analysis of within or outside the buffer, but if we do find that individuals within the buffer have significantly higher blood lead levels, I would like to see if there is a drop off of between 100-250 m, 250 – 500 m, and 500 m + in distance from high traffic roadways. Lastly, if we do not find trends with distance to roadways and blood lead levels, I will overlay base maps of industry related areas and identify if they overlap with hot spots within our spatial spread of the blood lead levels.

Expected outcomes:

  1. Spatial autocorrelation: I hope to describe the blood lead level data as spatially autocorrelated by producing linear plots as a visual representation of their autocorrelated relationships
  2. Hot spot analysis: From Arc, I will produce maps through the hot-spot analysis tool to hone into different areas of interest that might have the highest levels comparatively. This map would be useful to understand blood lead level relationships within the two different districts, and if there are areas of interest for further analysis.
  3. Distance to roadways: From Arc, I want to produce maps that show a 100 m buffer around major roadway systems and intersection of locations of children’s homes from the cohort. I would then like to graphically produce a simple boxplot of blood lead levels within the buffer and outside of the buffer zone to compare the mean blood lead levels. If I do find differences, I would like to produce another map of the gradient change of going farther away from high traffic roadways and blood lead levels (further buffer/intersection analysis).
  4. Intersection of Hot Spots and Industry: I would like to produce a map similar to the hot-spot analysis which overlays the areas of known industry from a base map with the potential hot-spots of children’s homes.


In a 16+ year cohort understanding the consequences of chronic arsenic exposures, my work is significant to help communities better protect children from undue burden of environmental pollution. I hope to bring a better understanding to the communities as to why their children have high blood lead levels compared to US averages. My research will help explain if the areas the children live in are correlated with increased exposures and lead to more help public health actions on how to limit lead exposures.

Level of preparation:

a) Intermediate knowledge of ArcGIS Pro

b) I will concurrently be taking a GIS programming course in Python, and I have taken intro Python coding coursework.

c)I have completed coursework (GEOG 561) programming in R with these data to understand spatial relationships. I also have extensive statistical modeling experience in R.

Works Cited:

  1. Tong S, Schirnding YE von, Prapamontol T. Environmental lead exposure: a public health problem of global dimensions. Bull World Health Organ. 2000;78:1068–77.
  2. WHO | Lead [Internet]. WHO. [cited 2019 Mar 17]. Available from:
  3. Smith AH, Lingas EO, Rahman M. Contamination of drinking-water by arsenic in Bangladesh: a public health emergency. Bull World Health Organ. 2000;78(9):1093–103.
  4. Kaiser R, Henderson A K, Daley W R, Naughton M, Khan M H, Rahman M, et al. Blood lead levels of primary school children in Dhaka, Bangladesh. Environ Health Perspect. 2001 Jun 1;109(6):563–6.
  5. Mitra AK, Haque A, Islam M, Bashar SAMK. Lead Poisoning: An Alarming Public Health Problem in Bangladesh. Int J Environ Res Public Health. 2009 Jan;6(1):84–95.

Faye Andrews, MPH

Deaggregation of infrastructure damages and functionality based on a joint earthquake/tsunami event: an application to Seaside, Oregon.

Research Question and Background

The Pacific Northwest is subject to a rupture of the Cascadia Subduction Zone (CSZ) which will consequently result in both an earthquake and tsunami. While all communities along the coast are vulnerable to the earthquake hazard (e.g. ground shaking), low lying communities are particularly vulnerable to both the earthquake as well as the subsequent tsunami. Completely mitigating all damage resulting from the joint earthquake/tsunami event is impossible, however, understanding the risks associated with each hazard individually can allow community planners and resource managers to isolate particularly vulnerable areas and infrastructure within the city.

The city of Seaside, Oregon is a low-lying community that is subject to both the earthquake and tsunami resulting from a rupture of the CSZ. The infrastructure at Seaside can be divided into four components: (1) buildings, (2) electric power system, (3) transportation system, and (4) water supply system. Similarly, the hazards can be viewed jointly (both earthquake and tsunami), as well as independently (just earthquake or tsunami).

Within this context, I’m particularly interested in looking at how the spatial pattern of infrastructure damage and functionality is related to individual earthquake and tsunami hazards via ground shaking and inundation respectively. Furthermore, I’m interested in looking at how these spatial patterns change as the intensity of the hazard increases.

Description of Dataset

The dataset I will be analyzing consists of two components: (1) spatial maps, and (2) infrastructure damage and functionality codes. Part of this analysis will be merging these two components to spatially view the infrastructure damage and functionality.

The spatial maps consist of:

  1. Building locations (represented as tax lots)
  2. Hazard maps: earthquake ground shaking and tsunami inundation hazard maps

The infrastructure damage and functionality codes implement Monte-Carlo methods to probabilistically define damages, losses, and connectivity. The four infrastructure codes consist of:

  1. Buildings: expected damage and economic losses to buildings.
  2. Electric power system: a connectivity analysis of each building to the electric substation. There is one electric substation within Seaside.
  3. Transportation system: a connectivity analysis of each building to critical infrastructure. Critical infrastructure at Seaside consists of two fire stations and one hospital.
  4. Water supply system: a connectivity analysis of each building to their respective pumping station. There are three water pumping stations within Seaside, and each building is assigned to a single pumping station.


I hypothesize that the infrastructure damage is not spatially variable for the earthquake hazard, however it will be for the tsunami hazard (e.g. distance from coast). The relative damages due to tsunami will also increase as the intensity of the hazard increases.  That is, for small events, the damages will be dominated by earthquake, whereas for larger events, the damages will be dominated by the tsunami.


While color-coordinating tax-lots based on economic losses provides a means to visualize damages throughout a study region, I am interested in learning about kernel density estimation and hot spot analysis to identify vulnerable regions (not just individual buildings). I am also interested in learning about different spatial network analysis methods, as only connectivity analyses within the infrastructure networks (electric, transportation, and water) have been considered so far.

Expected outcome

I’m hoping to produce maps showing how damages and economic losses relate to both joint hazards (earthquake and tsunami), as well as independent hazards (just earthquake or tsunami). I would also like to produce maps showing the connectivity of individual tax-lots to critical infrastructure. Furthermore, I would like to investigate visualizing both the economic losses and connectivity analysis through color-coordinating tax-lots, kernel density estimation and hot-spot analysis.


The ability to spatially isolate vulnerable areas will allow community planners and resource managers a means to better prepare mitigation plans. Deaggregating the damages and losses by infrastructure and hazard will isolate the relative importance of each, and can assist in mitigation measures. For example, identifying that the earthquake is the dominating force in producing building damages within a specific region, planners and resource managers can support retrofit options for homeowners within that region.

Level of preparation

  1. Arc-info: novice
  2. ModelBuilder and/or GIS programming in Python: Although I haven’t done GIS programming in Python, I am highly proficient in Python and am comfortable working with GIS data. Learning how to merge python and GIS should not be difficult.
  3. R: novice
  4. Image processing: novice
  5. Other relevant software: I’m proficient in QGIS.

Examining the Spatial Relationships between Seascapes and Forage Fishes

Description of Research Question

My objective is to study the spatial relationships between sea-surface conditions and assemblages of forage fish in the California Current System from 1998 to 2015. Forage fish are a class of fishes that are of importance to humans and resource managers, as they serve as the main diet for economically and recreationally valuable large-game fishes. Using a combination of remotely sensed and in-situ data, sea-surface conditions can be classified into distinct classes, known as “seascapes,” that change gradually over time. These seascapes, which are based on a conglomeration of measurable oceanographic conditions, can be used to infer conditions within the water column. My goal is to determine if any relationship exists between forage fish assemblages and certain seascape classes by examining the changes in the spatial patterns related to each over time. Forage fish assemblage may be related to seascapes as certain seascape classes may correspond to physical (temperature) or biological (chlorophyll concentration) conditions, either on the surface or in the water column, which happen to be favorable for a specific species or group of species.

My question can be formatted as: “How is the spatial pattern of forage fish assemblage in the California Current System related to the spatial pattern of seascapes based on the sea-surface conditions used to classify the seascapes (temperature, salinity, and chlorophyll)?

Description of Data

Midwater trawls have been conducted annually by the National Oceanic and Atmospheric Administration’s (NOAA) Southwest Fisheries Science Center (SWFSC) in an attempt to monitor the recruitment of pelagic rockfish (Sebastes spp.) and other epipelagic micronekton at SWFSC stations off California. The trawls have informed a dataset that represents overall abundance of all midwater pelagic species that commonly reside along the majority of the nearshore coast of California from 1998 to 2015. Each trawl contains both fish abundance, recorded in absolute abundance, and location data, recorded in the form of latitude and longitude. The dataset also includes a breakdown of species by taxa, which will be used to determine if a fish is a “forage fish.”

Seascapes have been classified using a combination of in-situ data (from the trawls) and remotely sensed data from NASA’s MODIS program. Seascapes were classified using the methods described in Kavanaugh et al., 2014 and represent the seascape class in the immediate area that each trawl occurred. Seascapes are classified at 1 km and 4 km spatial resolution and at 8-day and monthly temporal resolution. Each seascape has been assigned an ID number which is used to identify similar conditions throughout the dataset.

The map below shows the locations of every trawl over the course of the study.

Figure 1: Map showing all trawl sites contained in the dataset. Trawls occurred at a consistent depth using consistent methods between and including the years of 1998 and 2015


I hypothesize that any measurable spatial changes in the spatial extend of certain seascape classes will also be identifiable in the spatial variability of forage fish assemblage over time. Preliminary multivariate community structure analysis has shown some statistically significant relationships between certain species and certain seascape classes using this data. If spatial patterns do exist, I expect there to be some relationship between the surface conditions and the fish found at depth of the midwater trawls.

Hypothesis: I expect the spatial distribution of forage fish species to be related to spatial distribution of seascape conditions based on the variables used to classify the seascapes (temperature, salinity, chlorophyll).

Potential Approaches

I hope to utilize the tools within both R and the ArcGIS Suite of products to identify and measure spatial patterns in both seascape classes and forage fish assemblages over the designated time period. I also aim to run analyses to determine if any relationship exists between the variability in spatial extent of each variable. These analyses will be used to supplement the previously completed multivariate community structure analyses done on these data.

For Exercise 1, I will identify and test for the spatial patterns of the forage fish family Gobiidae (Goby) and Seascape Class 10, as initial indicator species analyses indicated that there may be a relationship between the two. In Ex. 2, cross-correlation and/or GWR will examine relationships between these patterns.

Expected Outcome/Ideal Outcome

Ideally, I would like to determine and define the relationship between seascape classes and forage fishes in the California Current System over the designated period of time. Any sort of definitive answer, positive, negative, or none, provides valuable insight into the relationships between this remotely sensed data and these fishes. If that claim could be bolstered by a visual which outlines the relationship between my variables (or lack thereof), that would be icing on the theoretical cake.

Significance of Research

Measuring the predictability of forage fish assemblage has wide-ranging impacts and could be found useful by policymakers, fishermen, conservationists, and even members of the general public. Additionally, this research can be used to underscore the importance of seascape-based management or seascape approaches to ecology or management. This research could also be used as inspiration for future studies about different species, taxa, or geographic locations.

Level of Preparation

I completed a minor in GIS during my undergraduate studies, but have not had to utilize those skills for about 15 months. After some time, I believe that I will be extremely comfortable using the software. I have basic exposure to R software (mostly in the context of statistical analysis) and have used CodeAcademy to further my understanding of Python. I did some image processing during my undergraduate studies as well, but am not particularly comfortable with that set of skills. I have used leaflet to embed my maps and create time series before, so that could be an option for this work.


Kavanaugh M. T., Hales B., Saraceno M., Spitz Y.H., White A. E., Letelier R. M. 2014. Hierarchical and dynamic seascapes: A quantitative framework for scaling pelagic biogeochemistry and ecology, Progress in Oceanography, Volume 120, Pages 291-304, ISSN 0079-6611,

Sakuma, K., Lindley, S. 2017. Rockfish Recruitment and Ecosystem Assessment Cruise Report.  United States Department of Commerce: National Oceanic and Atmospheric Administration, National Marine Fisheries Service.

-Willem Klajbor, 2019

Seth Rothbard My Spatial Problem

A description of the research question that you are exploring

Of the 31 pathogens known to cause foodborne illness, Salmonella is estimated to contribute to the second highest number of illnesses, the most hospitalizations, and the highest number of deaths in the US when compared to other domestically acquired foodborne illnesses1. Salmonellosis is the bacterial illness caused by Salmonella infection. It is estimated there are approximately 1.2 million cases of salmonellosis and around 450 deaths every year in the US due to Salmonella1. Over time there has been marked variability in the number of reported cases per year. Salmonellosis is a mandatory reportable illness in Oregon and available information indicates that incidence rates of this disease have been stable since the new millennium2. The objective of this study is to perform spatial analysis of lab-confirmed Salmonella in Oregon counties for the years 2008-2017 for which county level data are available and determine whether some counties have a higher risk of Salmonella infection compared to others. I also wish to explore the socioeconomic factors associated with high incidence rate counties. My research question that I wish to explore is:How are spatial patterns of Salmonella related to spatial patterns of socioeconomic factors? Certain socioeconomic patterns such as lower levels of education and income may increase rates of Salmonella in these populations as a result of improperly preparing/cooking foods, less strict sanitation practices, and/or higher rates of eating high risk foods.

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent

The Oregon Health Authority has created a database called the Oregon Public Health Epidemiology User System (ORPHEUS) as a repository for relevant exposure and geospatial data related to disease cases reported to public health departments all across the state. This database has been maintained by the state since 1989 and includes information regarding various diseases. The dataset I will be using is a collection of every single reported non-typhoidal Salmonella case within Oregon from 2008-2017. The distinction between typhoidal Salmonella and non-typhoidal is that the typhoidal variety of Salmonella causes typhoid fever while non-typhoidal Salmonella causes salmonellosis (a common gastrointestinal disease and a type of “food poisoning” as it is usually referred to). The spatial resolution of this data has been obscured to the county level to protect personal privacy and confidentiality. I will also be using data from the American Community Survey and the CDC’s Social Vulnerability Index. These datasets contain social vulnerability related variables for Oregon at the county level. In the case of the American Community Survey, data is available for the years 2009-2017 and the Social Vulnerability Index has data available for 2014 and 2016. Yearly county population estimates will also be used from Portland State University’s Population Research Center. Because of the high amounts of available data I will choose to start my exploratory analysis for Oregon in 2014 as all data is reported for that year.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I expect counties with younger populations (higher proportions of infants and newborns) as well as counties with higher proportions of females to have higher adjusted incidences of Salmonella. Prior surveillance suggests that children under the age of 5 are at the highest risk for Salmonella infection likely due to their developing immune system and how they interact with their environment. Specifically, many young children do not/are unable to wash their hands prior to touching their mouths. Females are also known to have a higher risk of Salmonella infection, however the mechanism behind this is relatively unknown with some explanations suggesting that it is due to that females are more likely to have more interactions with young children. I also expect counties with lower Social Vulnerability scores to have higher rates of Salmonella infections. Higher rates of poverty and lower amounts of education are often associated with more negative health outcomes.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to calculate age and sex adjusted rates of disease for each county in Oregon. I am also interested in undertaking cluster analysis and calculate spatial autocorrelation among Oregon counties over time. Finally, I would like to perform a regression of county disease incidence rates by the different socio-economic factors found in the American Community Survey and Social Vulnerability Index. I would be interested in learning about spatial Poisson regression to assess which variables are significantly associated with the presence of disease. I would also be interested in learning about hotspot analysis to evaluate if there are areas of Oregon with significantly higher disease rates. Ideally, all of my analyses will be performed in R and ArcGIS.

Expected outcome: what do you want to produce — maps? statistical relationships? other?

I would like to produce choropleth maps of adjusted Salmonella infection rates as well as for hotspot analysis. I want to produce regression models to describe how incidence rates of Salmonella vary across different socioeconomic indicators. I also want to create graphs to describe spatial autocorrelation patterns as well as to show disease rates over time.

Significance. How is your spatial problem important to science? to resource managers?

This analysis will be helpful to identify county populations which are at higher risk for Salmonella infections. The inclusion of social vulnerability variables will be useful for state/local policy makers. Reforms can be proposed or further studied to assess how addressing the needs of particularly vulnerable populations will affect the incidence of Salmonella. This research will be beneficial for further public health research as trends found here may also hold true for other foodborne illness. The aim of this research is to benefit the health of communities in Oregon by highlighting the association between social vulnerability and the risk of foodborne illness.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

I have no experience with Arc-Info, programming in Python, and image processing. I have some limited experience within Modelbuilder. I am very comfortable performing statistical analyses within R and have some experience using the software to create maps using various packages.


  1. Estimates of Foodborne Illness in the United States. Centers for Disease Control and Prevention. Published July 15, 2016. Accessed July 31, 2018.
  2. Oregon Health Authority. Salmonellosis 2016 Report. Oregon Public Health Division. Available at: Accessed July 31, 2018.

Natural Resource Governance Perceptions and Environmental Restoration

Research Question

How is the spatial pattern of individuals perception of natural resource governance related to the spatial pattern of environmental restoration sites via distance and abundance of improved sites?


My Datasets

Puget Sound Partnership Environmental Outputs Data

The Puget Sound Partnership—a governmental monitoring entity—keeps records of environmental restoration projects throughout the Sound. There are GPS points for restoration site locations across their governing boundaries. I have downloaded the points, but I am still working on figuring out this dataset. There are over 12,000 entries, and many appear duplicative.

Puget Sound Partnership Social Data

I stratified a random sample (28% response, n= 2323) of the general public from the Puget Sound in Washington from one time period. They data are from a survey of subjective wellbeing related to natural environments. I am specifically examining the first block of seven questions related to perceptions of natural resource governance. These questions have been indexed into one perception score. Around 1770 individuals gave location data (cross street and zip code) which have been converted to GPS points. I also have demographic information for individuals.



Based on current research, there is a significant correlation between environmental metrics and subjective wellbeing such as green space and air pollution (Diener, Oishi, and Tay 2018). I hypothesize that 1) shorter distances between individuals and restoration sites, and greater number of restoration sites near individuals, will correlate positively with governance perceptions, and 2) positive environmental outcomes will correlate positively with governance perceptions.



I would like to test the statistical significance of distance from individual to restoration sites on governance perceptions, and test whether the number of sites within a radius moderates that relationship. I have previously created a plot of perception versus distance from other individuals, and perceptions are not spatially autocorrelated. To expand on this work, I would like to use spatial relationship modeling approaches, such as geographically weighted regression.


Expected outcome

I would like to produce statistical relationships between my dependent and independent variables. My dependent variables are good governance and life satisfaction (collected with demographic information). My independent variables are age, sex, race, area (self-indicated urban, suburban, or rural), years lived in the Puget Sound, political ideology (a proxy from voting precincts), income, education, number of restoration sites, and environmental improvement score.

I expect my relationships to be correlational and produce betas, p-values, and r2 values, which I will display as tables. The large volume of points (n = 1770 individuals & n = 12,000 restoration sites) I do not believe maps would provide visually relevant images. I already have maps of both perception points, and restoration points.



Incorporating aspects of subjective wellbeing and general public perspectives about natural resources into scientific assessment and decision-making processes could help managers improve human wellbeing and environmental outcomes simultaneously. The links between metrics of subjective wellbeing related to natural environments and metrics of ecosystem health have not been studied holistically. There are gaps in knowledge around understanding the connections among these systems. Research suggests that good governance plays an important role in improving wellbeing because governing systems provide goods and services that make people better off (Landman 2003). Current research, around good governance perceptions, has shown links to support for environmental improvement measures, but also shows individuals care less about environmental effectiveness of measures (Bennett et al. 2017). Research lacks knowledge in whether positive perceptions are linked to environmental conditions. To understand the connections between natural systems and subjective wellbeing, further research is needed that includes case studies that can illuminate general trends, as well as analyses that can show connections spatially (Milner‐Gulland et al. 2014).


Level of preparation

  • Arc-Info

I have taken one class that used ArcPro; GEOG 560.

  • Modelbuilder and/or GIS programming in Python

In GOEG 560 we completed one exercise that used Modelbuilder.

  • R

I have taken one class on R (FW 599), and have been using it actively for my own analyses for a few months, as well as taken GEOG 561, which primarily used R.

  • image processing

I took three digital photo classes using adobe photoshop and am very proficient in its use. I often use it to amend maps I make in Arc.

  • other relevant software

I do not believe I have expertise in any other relevant software.


Literature Cited

Bennett, Nathan J., Robin Roth, Sarah C. Klain, Kai Chan, Patrick Christie, Douglas A. Clark, Georgina Cullman, et al. 2017. “Conservation Social Science: Understanding and Integrating Human Dimensions to Improve Conservation.” Biological Conservation 205 (January): 93–108.

Diener, Ed, Shigehiro Oishi, and Louis Tay. 2018. “Advances in Subjective Well-Being Research.” Nature Human Behaviour 2 (4): 253.

Landman, Todd. 2003. “Map-Making and Analysis of the Main International Initiatives on Developing Indicators on Democracy and Good Governance.” Human Rights Centre University of Essex.

Milner‐Gulland, E. J., J. A. Mcgregor, M. Agarwala, G. Atkinson, P. Bevan, T. Clements, T. Daw, et al. 2014. “Accounting for the Impact of Conservation on Human Well-Being.” Conservation Biology 28 (5): 1160–66.


Exploring spatial variation in drivers of soil CO2 efflux in HJ Andrews Forest

Description of Research Question

My objective is to capture modes of variance that exist between and within a subset of variables that I expect to correlate most strongly with soil CO2 efflux in the HJ Andrews (HJA) forest. By stratifying the forest, I plan to determine future sampling sites that will be used to explore the relationship between soil C inputs, soil C stocks and CO2 outputs. Aboveground and belowground biomass are major sources of soil carbon and drivers of soil respiration, so biomass will be used as a proxy for soil CO2 efflux for the purposes of this analysis.

Research Question: How is the spatial distribution of biomass in the HJA forest related to stand age, slope, aspect, elevation and geomorphon as a result of varying degrees of exposure to solar radiation, wind gusts, precipitation, humidity, etc.? What is the overall variance of the HJA forest along these vectors and is that variance spatially autocorrelated?

Description of Dataset

I will use LiDAR data from 2011 and 2014 at 0.3-0.5 m vertical resolution and at 1 m2 horizontal resolution covering the Lookout Creek Watershed and the Upper Blue River Watershed to the northern extent of HJA. These LiDAR data include a high hit model and a bare earth model. I will also use NAIP imagery to approximate forest stand age, which is 1 m resolution and covers years between 2002 and 2018.


I expect areas containing more biomass to positively correlate with south-facing slopes due to more exposure to solar radiation resulting in faster rates of vegetation growth. I expect older stands to positively correlate with greater biomass. I expect steeper slopes to correspond to less biomass due to more weathering and thinner soil horizons, supporting less growth. I expect higher elevations to correspond to lower biomass due to greater sustained winds, higher windspeeds and more snow accumulation. As geomopohon describes the geometric structure of the terrain, it is a collection of multiple factors that could positively or negatively correlate with biomass. For example, I expect a ridge to negatively correlate with biomass because of the combination of greater slopes and more exposure to winds, while I expect a valley to positively correlate with biomass due to more wind protection and thicker soil horizons with more organic matter and more water retention.


With an end goal of identifying sampling sites, I’ll need to cluster or stratify the HJA forest. I’ll begin with a clustering analysis, then perform supervised and unsupervised classification, followed by sensitivity analysis comparing the results of the clustering analysis and both classifications. I will need to address spatial autocorrelation either as part of the clustering analysis or separately. I’ll need to plan sampling by accessibility, so I’ll examine an HJA roads layer as well.

Expected Outcome

I plan to produce maps (or use/improve on already available ones) at a spatial scale relevant to my study so I can identify potential sampling sites. I plan to map biomass across the HJA forest and produce a stratification of factors most closely related to soil CO2 efflux. Depending on resolution of the data and the results of the stratification, I may need to constrain my analysis. I plan to produce a statistical summary of the strata relating and describing the covariates and how much variance is explained by the stratification.


Carbon sequestration is a highly relevant research area where many unknowns still exist. Given that soil is an enormous C reservoir, small changes in soil C stocks can have huge impacts on the rest of the C cycle. As CO2 is a potent greenhouse gas, more release of C from soil can cause greater warming of our planet and can lead to a positive feedback loop where the warming cycle is amplified. It is in our best interest to have a good understanding of current soil C stocks and fluxes in forested systems so we can hypothesize how they might change under different or future conditions. By using biomass as a proxy for soil CO2 efflux, I will identify locations that are likely to have greater CO2 efflux and I will be able to make informed predictions about which drivers are most significantly correlated to CO2 efflux. I will be able to test these analyses in the future using field sampling techniques.

My level of preparation/proficiency

I have limited experience with Arc-Info (GEOG 560) and I’ve used Modelbuilder a few times. I have no experience with GIS programming in Python or R, but am proficient with coding in R (3 statistics courses and my own data analysis) and am comfortable seeking answers to questions in the R environment. I have no experience image processing.

What is rural? Creation and comparison of health disparity-inclined rural indices in Texas

A description of the research question that you are exploring.

I am exploring rural classification of counties in the state of Texas by creating two rural indices and comparing them to one another to determine the effects of specific weighted measures on rurality index score. One of the indices will contain basic rural indicator variables, while the other will contain the basic variables plus more complex indicators of rurality. Specifically, I would like to compare the indicator variables to one another to see how much each contributes to an overall rurality score in Texas

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

Various rural indicator variables will need to be obtained before combining them into indices. Previous geographical research has indicated that a variety of measures can indicate how rural or urban an area is. Some of these measures include population density, ethnic diversity, land use, household income, road density, percent of population with health insurance, and more. For the majority of these indicator variables, the sources will be basic 2010 US census data, 2014 US census TIGER/Line data, and 2011 national land cover database (NLCD) data. Basic US census data exists at the census block and census tract level in polygons, while NLCD data exists at 30m by 30m spatial resolution in raster grid form and TIGER/Line data exists at 1km spatial resolution in raster grid form.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I expect there will be significant differences in rurality index score for Texas counties when comparing a basic rural index containing only population density, income, and land use to a more complex index that also measures rural/urban status via diversity, percent uninsured, and road density. Rural areas in comparison to urban areas commonly have lower healthcare access, lower average socioeconomic status, and have a higher percent Caucasian population than urban areas, so these variables could be indicative of what constitutes rural and urban. I also expect specific variables to contribute significantly more to rurality than others. For example, population density is likely to have high contribution to rurality. More concisely, I expect the spatial and statistical pattern of rurality in Texas will become more dispersed and even across the state when including health-related variables because of the increased multidimensionality and contextual factors these variables will provide.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I am planning to use various methods to convert the census block/tract and raster grid indicator variables to county data; likely via zonal statistics or other similar methods in ArcGIS. I would also like to use statistical weighting procedures to create both the “basic” and “complex” rural indices. Some weighting procedures I have heard of that could work for this include principle component analysis and factor analysis. A PCA procedure specifically could be used because of its robust ability to produce indices that are weighted via proportion of variance that can be attributed to each variable in the measurement of rurality.

Expected outcome: what do you want to produce — maps? statistical relationships? other?

I would like to create maps of Texas comparing county rural index scores for the two indices for visual comparison. In addition, I would like to statistically compare the two indices and determine which specific indicator variables attributed most to the differences in county rurality scores between the two indices.

Significance. How is your spatial problem important to science? to resource managers?

This spatial problem is significant because rural/urban classification is inconsistent in rural health disparities research and is commonly an after-thought in comparison to the health outcome being studied. Existing measures of rurality were not created for health disparities research and are instead most useful for bureaucratic and economic purposes (Meilleur et al., 2013). This research will improve the classification methods for rurality by introducing a more scientific and health research-inclined method. Further, this research statistically compares specific indicator variables within indices to determine those that are most significant for rural classification.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

I am an intermediate user of ArcGIS but have little experience in modelbuilder and Python GIS programming. I am proficient in statistical programming in R, an intermediate user of ENVI for image processing, and have also used R for spatial analysis.



Meilleur, A., Subramanian, S. V., Plascak, J. J., Fisher, J. L., Paskett, E. D., & Lamont, E. B. (2013). Rural residence and cancer outcomes in the United States: issues and challenges. Cancer Epidemiology and Prevention Biomarkers, 22(10), 1657–1667.