Category Archives: Exercise 2

Exploring GPS Collar Fix Success as a Function of Environmental Characteristics

Global Positioning Systems (GPS) technology has improved the ability of wildlife biologists to gain greater insight into wildlife behavior, movements and space-use. This is achieved by fitting wildlife, such as deer, with a collar equipped with a GPS unit. My research relies on GPS data representative of Columbian black-tailed deer (Odocoileus hemionus columbianus) in western Oregon to assess habitat and space use. Deer were captured and monitored from 2012-2017 in the Alsea, Trask, Dixon and Indigo Wildlife Management Units. Although this technology offers plentiful opportunities to answer research questions, GPS data is accompanied by errors in the form of missed fixes or inaccurate locations that should be corrected prior to conducting analyses. A common cause of GPS error is physical obstruction between the GPS collar and satellites such as topography or vegetation. One way to correct for habitat-induced-bias is to place GPS collars at fixed locations with known environmental characteristics and model GPS fix success rates (FSR) as a function of topographic and vegetative attributes. Nine GPS collars were left out for 5-7 day intervals at fixed locations (n=53) in the study area in February and March, 2020. Fix schedules matched the same fix schedule that was used on the deer collars. Locations were chosen specifically to represent a variety of terrain and vegetation in the study area. Results from the stationary collar test can be used to create a predictive map of FSR, and use as weights in subsequent habitat selection regression analyses.

  1. Question

1.1         What is the spatial extent of the influence of sky availability on GPS collar fix success?

1.2         What vegetative and topographic features best predict average GPS collar fix success rates?

  1. Name of the tool or approach that you used.

2.1         I used focal mean statistics (a form of moving window analysis) to derive sky availability values at varying spatial scales. I performed focal mean statistics in R using the ‘raster’ package, and the ‘Extract Multi-Values to Points’ tool in ArcGIS to obtain values from test site locations. I used the ‘lme4’ and ‘MuMIn’ packages in R to perform linear regression modeling and model selection.

2.2         I used generalized linear mixed regression modeling to predict fix success rates as a function of environmental attributes. I used the ‘lme4’ and ‘MuMIn’ packages in R to perform linear regression modeling and model selection.

CovariateData typeResolutionSourceSoftwareTool/Package
Canopy Cover %Raster30×30GNN  
Slope°Raster10×10Derived from DEM rasterArcGISSlope (Spatial Analyst)
Aspect°Raster10×10Derived from DEM rasterArcGISAspect (Spatial Analyst)
Land coverVector30×30Oregon Statewide Habitat Map- 2018  
Sky availability%Raster10×10Derived from DEMArcGISSky-view Factor (Relief Visualization toolbox)
Sky Availability% of varying focal windowsRaster10×10Derived from Sky Availability rasterR‘Raster’
Table 1. Most explanatory variables for the stationary collar test were measured on the ground by technicians (canopy cover, slope, aspect, land cover type) but I created the sky availability variable (described in 3.1). The variables described here will be used in ‘Exercise 3’ to create a predictive fix success map across the study area.

  1. Brief description of steps you followed to complete the analysis.

Response variable: Fix Success Rate

GPS fix data was downloaded from each of the 9 GPS collars, and FSR was calculated for each test site by taking the number of successfully obtained fixes by total number of attempted fixes. Collar data includes: coordinates, date, time, the number of satellites that were used to obtain a location and Dilution of Precision (DOP). DOP is calculated by the GPS collar, and is a measurement of accuracy based on satellite quantity and position. The higher the DOP, the less accurate the fix record is.

3.1         Moving Window Analysis

I loaded the sky availability raster into R and created new rasters with the ‘focal’ function. Six different window sizes were subjectively chosen based on what distances seemed reasonable to me to test: 30x30m, 50x50m, 110x110m, 210x210m, 510x510m, 1010x1010m. With this technique, each focal cell is reclassified as the average sky availability within each moving window.

After creating new raster data, I imported the raster stack of different scales into ArcGIS, along with the coordinates of the stationary collar test locations (n=53). I extracted the sky availability values using the ‘Extract Multi-Values to Points’ tool, and exported the attribute table as a .csv file to bring back into R.

A linear mixed model was used to predict FSR as a function of sky availability per test site, with collar ID as a random intercept because fix success can also vary between individual collars. I ran 7 univariate models and used corrected Akaike’s Information Criterion (AICc) to perform model selection. The top model was used in subsequent analyses.

m.SA1 <- lmer (Fix.Success.Rate ~ skyavail_s + (1|CollarID), data=cont.data) 
m.f3 <- lmer (Fix.Success.Rate ~ f3_s + (1|CollarID), data=cont.data) 
m.f5 <- lmer (Fix.Success.Rate ~ f5_s + (1|CollarID), data=cont.data) 
m.f11 <- lmer (Fix.Success.Rate ~ f11_s + (1|CollarID), data=cont.data) 
m.f21 <- lmer (Fix.Success.Rate ~ f21_s + (1|CollarID), data=cont.data) 
m.f51 <- lmer (Fix.Success.Rate ~ f51_s + (1|CollarID), data=cont.data) 
m.f101 <- lmer (Fix.Success.Rate ~ f101_s + (1|CollarID), data=cont.data)   
skyavail.AICc <- model.sel(m.SA1,m.f3,m.f5,m.f11,m.f21,m.f51,m.f101)
Univariate generalized linear mixed model set that was used to compare FSR as a function of average sky availability at varying spatial moving window sizes.

3.2         Collar FSR Linear Regression

Most variables were measured on the ground by technicians (canopy cover, slope, aspect, land cover type) but sky availability was obtained as described above.

Using a generalized linear mixed model framework, I developed competing models using test site as the sample unit, and collar ID as the random intercept. Models were considered competitive if they were within 2 AICc units of the top model, and model selection was performed in a tiered approach. The first tier model set included univariate models of scaled covariates, interactions and any non-linear forms that seemed appropriate. Competitive models would proceed to subsequent modeling efforts.

  1. Brief description of results you obtained.

4.1         Moving Window Analysis

Model selection output (Table 2) resulted in an ambiguous outcome. It appears that sky availability values within the moving window size of 110x110m is the magic number that best predicts variability in FSR, however the evidence for this is not very strong. All models appear competitive with delta AIC <2, and the top model only carrying 3% more of the weight than the next most competitive model.

ModeldfAICcdeltaweight
m.f114.00-47.570.000.18
m.f54.00-47.260.310.15
m.f34.00-47.220.340.15
m.f1014.00-47.080.490.14
m.f214.00-47.000.570.13
m.SA14.00-46.930.640.13
m.f514.00-46.640.930.11
Table 2. Model output from sky availability moving window analysis. All models were competitive (within 2 AIC units from the top model). The ‘top’ model was used in subsequent modeling for the purposes of this exercise.

4.2         Collar FSR Linear Regression

Judging from the raw data, the only nonlinear term that seemed appropriate was the quadratic form of canopy cover (Figure 1), and was the most competitive model with 73% of the weight in the model set (Table 3), and a conditional coefficient of determination of 0.72. The scaled covariate of canopy cover was the next most competitive model, but with a delta AICc score of slightly >2.

Figure 1. GPS fix success rate as a function of environmental attributes, and was used to inform non-linear relationships to include in model sets.

 AICcdeltaweight
m.CanCov2-61.310.000.76
m.CanCov_s-58.602.700.20
m.null-55.296.010.04
m.aspect_s-49.0012.310.00
m.SkyAvail-47.5713.740.00
m.slope_s-47.4313.870.00
m.SA.aspect_s-32.8328.470.00
m.SA.aspect_s1-32.8328.470.00
m.Landcover-32.5228.790.00
m.SA.slope_s-31.9129.400.00
m.SA.slope_s1-31.9129.400.00
m.full-9.9251.390.00
Table 3. Model selection results relating FSR to topographic and vegetative variables in western Oregon. The most competitive model was the quadratic form of canopy cover.

  1. Critique of the method – what was useful, what was not?

5.1         Moving Window Analysis

My next steps are to re-examine the focal window type I used, and instead try an annulus style instead of the entire square to calculate the average sky availability values. This is in hopes that I will get a more clear result in which spatial scale is most important in sky availability influencing FSR. The ‘focal’ function was very easy and efficient.

5.2         Collar FSR Linear Regression

I was a little bit surprised that topography didn’t play a stronger role in influencing FSR, but canopy cover does make sense. However I am surprised that the quadratic form of canopy cover explained the most variation in FSR, because it doesn’t make sense that at lower canopy cover proportions FSR would also be low.

Moving forward, I plan to filter the fix data to reclassify locations that are likely to be inaccurate using cutoff values for number of satellites and DOP values. Doing so will decrease the overall FSR, and introduce more variability into the data and could potentially show different results.

Exercise 2: Assessing relationship between zooplankton density and kelp

Question asked

Following exercise 1, (where I compared different interpolations to estimate zooplankton density), I wanted to make my interpolations more ecologically informed. I hypothesize that zooplankton will be found closer to kelp and that with increasing distance to kelp, there will be decreasing zooplankton density. Therefore, my question for exercise 2 was “Does zooplankton density decrease with increasing distance from kelp?”.

Approaches/tools used and methods

I used a simple linear regression in RStudio to address this question.

At the end of every field season, we use the theodolite to trace outlines of surface kelp. This produces many lat/lon points. Therefore, I first needed to create a kelp polygon layer from all of these points in R. Since I have never worked with spatial data in R before, it took me quite some time to figure out how to do this and to ensure that I was working in the correct coordinate system. 

Next, I needed to calculate the distance between each of the sampling stations and the edge of each of the kelp polygons. Once this was done, I needed to select the kelp patch with the minimum distance for each of my 12 sampling stations.

Finally, I plotted the zooplankton densities for each of those stations against the minimum distance for each station, as well as ran a simple linear regression to see what the relationship between zooplankton density and distance to kelp is.

Results

I was only able to run this analysis on 2017 and 2018 data because I am not yet finished with scoring the 2019 images for zooplankton density. However, the relationship is the same for both 2017 and 2018, namely, that zooplankton density decreases with increasing distance to kelp.

Since there is a big jump in distances to kelp (this is because two of the sampling stations are ‘controls’ which are over sandy bottom habitat where there is no kelp nearby), I also decided to log both axes to try and normalize the distance axis.

Critique

While I would have like to have explored a more complex tool or technique, this exercise has taught me to always keep your data in mind and to acknowledge the limitations that are contained within it. I only have 12 sampling stations and so am rather limited in the analysis that I can reliably run. Sometimes our data truly conform to a simple solution, which in my case, the simple linear regression seemed the most straightforward and obvious solution to answering my question about whether or not zooplankton density and distance to kelp are correlated. I feel pretty confident in the R2 and p-values that the above regressions show and so my next step will be to try and figure out how I can incorporate this linear function into my interpolations.

Exercise 2_Relationship between Raod Density and Landslide Buffer Zone

1.Questions

In Exercise 1, I tried to generate point data for the entire Oregon landslide. Due to the limitations of survey, the generated hot spot analysis does not show the situation of Oregon landslide. So in Exercise 2, I will first generate a hot spot analysis map of the relevant landslide volume, and select about 30 points of data through the hot spot analysis map to create a buffer zone. After that, the road density of the analysis area is calculated. By comparing the relationship between different buffer radius and road density, analyze which spatial scale are the relationships strongest.

2.Name of the tool or approach that you used

a) Hot Spot Analysis based on Volume of Landslide

Hot spot analysis is used, field is volume, Through the hot spot analysis map, find the gathering point, based on these points, select about 30 points to create a buffer zone as a reference object for comparison later.

b) Create a Buffer and Mutiple Ring Buffer

Buffer zone is created for these selected points, two different radius buffer are created, one is 250 meters, other is 500 meters. The reason to create buffer is to compare with road denstiy, to find strongest relationships in those spatial scale.

c) Line Density

In ArcGis Pro, the roads in analysis region are inport, and then using line density tool to calculate road density. As the reference for later comparison.

3. Results

Figure 1. Portland Landslide Hot Spot Analysis

Figure 1 shows a heat map of landslides occurring in the city of Portland based on landslide volume. It can be found that the landslides in the mountains near Oregon city are similar in volume and cluster. Therefore, the buffer in Exercise 2 will be created around the surrounding point data.

Figure 2. Main Regoin Road Density Map
Chart 1. Main Region Road Density

Figure 2 shows the distribution of road density in the analysis area. The darker the color, the greater the road density. Through observation, landslides are roughly concentrated in the densest areas of roads. Table 1 shows the average value of road density in the area, which is 0.316.

Figure 3. 250m Buffer zone map
Figure 4. 500m Buffer zone map

Figures 3 and 4 show the relationship between roads and landslides in buffer zones with different radii. Although most of the landslides occurred around the mountains, it can be found that in the buffer zone of 250m and 500m, the greater the road density, the higher the impact. Analyzing Figure 3, Figure 4 and Figure 2 together, it is found that if there is a greater road density around the mountain, the more landslides that occur, the larger the affected area.

5. Critique of the method

In Exercise 2, my data is mainly concentrated around Oregon city in Portland, so when using hot spot analysis, I think it is efficient. The hot spot analysis based on the volume of landslides expresses whether the landslide data is clustered in Portland, and where is the most approximate data set.
When using the buffer tool, due to the different data volume of different points, when creating a buffer zone, I am not sure whether to create a buffer zone based on its volume, or all data have a uniform radius.But for this exercise, this tool is effective.

Hongyu Lu

Lab blog – Exercise 2 _ Crop fires and human health

  1. Question that you asked

With the data I had, I was interested in examining whether proximity to location of crop fires was associated with individual’s health outcomes. The reason that I had restricted my analysis to crop fires was because I want to explore whether there is a health-related argument for making illegal the burning of crop residue – which is the source of these fires.

  • Name of the tool or approach that you used.

I estimated the following logit regression model:

Hit = βo + β1EXPOSUREit + eit

where H is a binary variable for reported illness =1 if individual reported illness and 0 otherwise. Exposure is the count of fires experienced by the household in 5, 10 and 15 km distance respectively. It may be possible as a next step to control for other factors at a broader scale e.g. by including district level factors such as population and road density in the above regression model.

  • Brief description of steps you followed to complete the analysis.

In my data, I had point locations of daily fires as well addresses of household.

  • In order to better manage my data I assigned week of fire to my entire daily fire dataset. I then aggregated my data into 52 weekly fire files. Each fire had an associated longitude and latitude.
    • I was able to write out a code in stata that allowed me to calculate weekly counts of fires within a 5km radius. This resulted in 52 variables (weekly fires) with values corresponding to the count of fires in a buffer distance for each household.
    • I then summed up fires for the 10 weeks prior to the date of the interview to calculate the exposure variable for each household.
    • I repeated step (ii) and (iii) for a 10k and 15km distance.
    • As a last step before the regression I merged this with the individual level datafile.
  • Brief description of results you obtained.

Looking at the distribution of our sample (Table 1), it does not appear that any single demographic has a disproportionate exposure to fires. Across genders and all age cohorts 40% of individuals have had exposure to at least one fire within a 15km distance. About 15 % have had exposure to at least 1 fire within a 5km distance over the past 10 weeks. The count and percentage of individuals who have been exposed to fires within a 5km distance is highest in the province of Punjab. Given the distribution across provinces, it may be worthwhile to restrict the analysis (logit regression) to a sample based on Punjab and Sindh.

 5km 10km 15km 
 N(%)N(%)N(%)
Male13,42914.827,43730.437,38541.4
Female13,61314.927,79730.637,79441.6
       
Age      
    Up to 5 years4,18715.48,55031.41179543.3
    6 – 17 years10,58514.921,37830.32905341.3
   18 – 60 years11,12314.722,90730.23108441
    greater than 601,147152,39931.4324842.5
       
Province      
   Punjab20,72232.837,04258.64508671.4
   Sindh5,58310.514,74627.62433145.6
   KPK6431.529406.7479411
   Baluchistan940.45062.49684.5
       
Occupation      
   Agriculture138314.6304932.3461448.9
   Mining/Fisheries1,95117.43,88934.7485043.3
Table 1: Sample Distribution
Note : The table presents count and percentage of individuals with at least 1 fire within a 5 km,10 km and 15 km buffer distance – in a 2-month period. Percentages are rounded to one decimal place.

The results of the logit analysis are presented in Table 2 below. The magnitude as a percentage of mean for two of the dependent variables – respiratory illness and blood pressure suggests a substantial positive association and confirms the original hypothesis. However, these are preliminary estimates and other explanatory variables will need to be included to gauge the impact of exposure to cropfires.

Table 2: Logit regression for the impact of exposure on reported illness
 5km10km15km
 (1)(2)(3)
Respiratory Illness0.017*0.007**0.003*
Mean = 0.013(0.009)(0.003)(0.002)
    
Cough, cold, fever0.00020.002*0.001**
Mean = 0.15(0.003)(0.001)(0.001)
    
Blood pressure0.032***0.016***0.0079***
Mean = 0.0130.0000.0000.000
    
*** p<0.01, ** p<0.05, * p<0.1. Standard errors reported in parenthesis.

Critique of the method – what was useful, what was not?

The logit regression is well suited for a binary dependent variable model and in this case, it is useful for estimating overall association in illness and exposure. It is also possible to predict errors from the regression which can be tested for spatial autocorrelation. If error terms are spatially correlated, it would suggest that the incidence of illness is spatially correlated. However,  I was interested in exploring the temporal distribution of the fires, but I was not able to use the logit analysis for that purpose.

Unmanned aerial vehicle (UAV)-based photogrammetric approach to delineate tree seedlings

Exercise 2: Assessing the growth of artificially regenerated seedlings using UAV imagery

  1. The question that is addressing in this study

Evaluation of tree health and growth after artificial regeneration is an important task, and monitoring of seedlings after the regeneration potentially provide vital attributes for the forest managers to make future decisions. However, monitoring can be unique to site characteristics and may complex based on site accessibility and terrain conditions. Hence the conventional monitoring approaches become more challenging. With the emergence of remote sensing technology, especially the applications of unmanned aerial vehicles (UAVs) provide a novel approach for seedling detection and monitoring, even in challenging terrains and remote areas (Buters et al., 2019).

After the artificial regeneration of seedlings, the growth and health of each seedling may depend on different parameters such as soil type, exposure to sun and wind, drainage conditions, space constraints, insect and disease susceptibility, etc (International Society of Arboriculture., 2011). UAV based monitoring approaches have high potential to classify and identify planets in disturbed forest environments and to monitor plant health as well (Berni et al., 2009; Lehmann et al., 2017). Thus, in this exercise, I am interested in developing an automated method to detect individual seedlings and their attributes (i.e., location, height, canopy area, number of trees in a given plot, etc.) from UAV based imagery (as the first step,). Additionally, in the second step of this study is focused on evaluating “How does the surrounding material affect the health of seedlings and its growth.”

2. Name of the tool or approach used, and steps followed.

Step 1:  Detecting individual seedlings and their attributes

Canopy height model

I used R software packages to detect the seedlings (automatically) from the ortho-mosaic image.

  • Packages used: lidR, raster, sp, raster, rgdal, and EBImage

As the initial step, point cloud data were extracted using AgiSoft software (Fig.1(a)). Then the lasground function in lidR package was used to classify the points as ground or not ground by using the “cloth simulation filter” (Fig.1 (b)). The points which were classified as ‘ground’ are assigned a value of 2, according to las specifications (The American Society for Photogrammetry & Remote Sensing, 2011). In the next step, I used the lasnormalize function in lidR package to remove the topography from the point cloud created (Fig.1 (c)). Next, the normalized point clouds were cropped with the area of interest (AOI) shapefile (Fig. 2 (d)) using lasclip function.

Figure 1. Map of (a). point clouds generated over the study site, (b). point clouds after the cloth simulation function performed, (c). normalized point cloud, and (d). finalized point cloud mapped in AOI.

Then I used the output of the lasclip function to make the canopy height model, using grid_canopy command, and use 0.01m as the resolution, which is similar to the resolution of the raster image. Next, dsmtin algorithm was used to create the digital surface model.  After that, a variable window filter algorithm was used to identify the treetops from a canopy height model. The minimum height for the detection of seedling was considered as 0.2 m for the canopy height model for a potential treetop. All canopy height pixels beneath this value will be masked out. The threshold of minimum height (0.2m) was used to minimize the effect of woody debris and small shrubs. Finally, tree seedling locations were exported to ArcMap and plotted them on top of the AOI raster to observe the performance of the algorithm.

Figure 2. Map of (a). area of interest (b). canopy height model, (c). locations detected of seedlings, and (d). canopy areas of detected seedlings.

Step 2: Geographically weighted regression

Image classification tools available in ArcMap 10.7:

  1. Supervised image classification (SIL): Maximum Likelihood Algorithm (explained in blogpost 1)
  2. Spatial Analyst Tool: Tools available in Zonal
  3. Geographically weighted regression

Quantifying the surface materials

In order to identify the surface material surrounding each seedling, a supervised classification was performed. The supervised classification approach is more accurate and minimizes the effect of shadow and other noises compared to the unsupervised classification. The raster image was classified into five classes, including tree seedlings, woody materials, grass, barren land, and barren land.

As the next step, I created 3 types of buffer zones around each seedling with 1m, 2m, and 3m buffer radius (Fig.3 (a)). Next “Tabulate Area” option that is available in the “Spatial Analyst Tool” was used to extract the corresponding area for each class (Fig.3(b)).

Figure 3. Map of (a). 1-meter buffer zones around each tree seedling, and (b). a schematic diagram showing the extracted classes from the classified image by using a 1-meter buffer zone.

Geographically weighted regression

Geographically weighted regression is an extension of the traditional regression approach, which allows providing local estimation for each location instead of a sole regression for the entire area. In this regression, approach observations are weighted concerning their proximity to a point (point I), which is determined by the kernel size. “This ensures that the weighting of observation is no longer constant in the calibration, but instead varies with point I. As a result, observations closer to the point I have a stronger influence on the estimation of the parameters for location I.” (The Landscape Toolbox, 2012)

To determine the effect of surrounding surface materials on seedling growth, two types of geographically weighted regression analyses were performed using ArcMap 10.7.  (1). Area of bare ground (Fig. 4) and (2). amount of woody debris (Fig.5) within three buffer zones (1m, 2m, and 3m) considered as the independent variable while estimated tree height considered as the dependent variable.

  • Effect of bare ground

Generally, bare ground indicates many indicators that are related to subsurface soil conditions such as insufficient nature of nutrients, the behavior of subsurface water table and water holding capacity. Hence, the presence of bare ground around the seedling can be used as an indirect indicator to predict the health conditions of seedlings.

Figure 4. Map of geographically weighted regression by considering seedling height as the dependent variable and area of bare ground within (a). 1-meter buffer, (b). 2-meter buffer, and (c). 3-meter buffer as the independent variable.

  • Effect of woody debris

The presence of woody debris around seedlings can cause both positive and negative impacts. For example presence of old tree stumps and debris act as microsites: especially when the seedlings shaded by stumps and logs tend to grow well particularly on hot, dry sites (Landis et al., 2010) However, the presence of woody debris/slashes may act as an obstacle to seedlings and limit the accessibility to sunlight and seedling growth. To study the effect of woody debris and their impacts on seedling growth I performed another geographically weighted regression by considering three types of buffers (Fig.5).

 Figure 5. Map of geographically weighted regression by considering seedling height as the dependent variable and area of woody debris within (a). 1-meter buffer, (b). 2-meter buffer, and (c). 3-meter buffer as the independent variable.

Additionally, I created a map of normalized difference green/red normalized green, red difference index (NDGRI) to validate the results obtained by geographically weighted regressing (Fig.6 (a)). Furthermore, I generated a map of geographically weighted regression by considering the seedling height as the dependent variable and canopy area of seedling in the 1-m buffer zone as the independent variable by assuming healthy seedlings are taller and have larger canopy area.

Figure 6. Map of (a). normalized difference green/red normalized green, red difference index (NDGRI), and (b). geographically weighted regression by considering seedling height as the dependent variable and area of canopy cover within 1-meter buffer as the independent variable.

3.Results

  • Bare ground coeffects:

Results obtained by geographically weighted regression using height and bare ground showed a decreasing trend of coefficient values with increasing the buffer size (Fig.7 (b)). Furthermore, with a given buffer zone, the values of the coefficients are not ranging drastically (only change can be observed in the fourth decimal place of the coefficients). These results suggest that the presence of bare ground within 1-meter and 2-meter buffer zones have a considerable impact on seedling growth. Obtained trends indicate that seedlings located in the northwest part of the study area have lower coeffects values, while seedlings located in the southeast part have relatively higher negative coefficient values (Fig.7(b)). A similar type of trend was observed for the area of woody debris as the independent variable: especially for woody debris within a 1-meter buffer zone, shows positive coefficients for seedlings that are located in the northwest part of the study region. Further, magnitudes of the coefficients are lower with increasing buffer size. Additionally, I evaluated the variation of coeffects (from both geographically weighted regressions) with seedling heights (Fig. A1, Appendix A). The observed results indicate that taller seedlings always tend to show either positive or lower negative coefficients.  

Figure 7. (a). Map of seedling heights and corresponding seedling I.D.’s, plots of coefficients of geographically weighted regression by considering seedling height as the dependent variable, and (b). area of bare ground, and (c). area of woody debris within (1). 1-meter buffer, (2). 2-meter buffer, and (3). 3-meter buffer as independent variables. Trends of coefficients on the x-axis and seedling ID. on the y-axis.

By considering the observed results, we can interpret that area of the bare ground, and the presence of woody debris have an impact on the growth of seedlings. For example, trees growing surrounding more woody debris have the highest (negative) coeffect values and lowest heights. Similarly, areas that are prominent with bare ground show a negative effect on seedling growth. Results illustrated in figure 6 proves that trees growing in areas where bare ground and woody debris are prominent does not show a healthy seedling population. Overall, the results obtained from geographically weighted regression indicate the type and amount of material around the seedlings are vital, especially for their health and growth.

  • Possible background factors

Woody debris: generally, the presence of woody debris and their degraded products can provide nutrients to the soil and will help for the growth of seedlings (Orman et al., 2016). However, this process may vary with the size of the woody debris. For example, if debris are coarse, it will take a longer time to degraded and will act as obstacles for seedling growth. This type of observation can be observed in the southeast part of the study region (Fig. 2(a) and Fig.8 (a)). Furthermore, the northwest part of the region shows smaller size woody debris, and possibly they may degrade at a faster rate and provide some nutrients to the soil (Fig.2(a) and Fig.8(b)).

Additionally, down wood act as moisture reservoirs and provide persistence microsites for regeneration seedlings (Amaranthus et al. 1989). However, an excess amount of moisture conditions may negatively affect the seedling growth by rotting the root system of the seedlings. However, the presence of an excess amount of woody debris may tend to accumulate more moisture and will act as an adverse effect for seedling growth (as explained under Geography).

Figure 8. Seedlings surrounded by (a). coarse woody debris, and (b). small woody debris.  

Bare Ground: Visual observation of bare ground can provide some essential facts about the subsurface soil conditions. For example, the bare ground can indicate several factors about subsurface soil conditions(i.e., lack of nutrients, problems associated with groundwater table, soil pH, subsurface geology, etc.). Hence, it is essential to have some prior information about the subsurface soil conditions to understand possible factors that are affecting seedling growth. Overall, we can interpret that seedling growth is lesser with the presence of bare ground; therefore, we can interpret that poor soil conditions hinder seedling growth.  

Geography: To identify the possible effects of terrain conditions, especially for the results obtained from geographically weighted regression, I generated a digital elevation model (Fig. 9) and searched for possible prior information on site conditions. The digital elevation model suggests that the majority of healthy seedlings located in the northwest part of the study area with relatively high elevations (Fig.2(a) and Fig.9). Contrastingly, seedlings growing under stressed conditions in the southeast part of the study region were located at lower elevated areas (Fig.2(a) and Fig.9). Consequently, we can hypothesize that the accumulation of the excess amount of water (groundwater or surface water) in lower elevated areas can negatively affect seedling growth. This might be the indirect cause for the presence of bare ground in lower elevated areas (i.e., stagnant water may damage the root systems of grass and may die due to rotten roots and will expose the bare ground). Similarly, the presence of excess water may hinder the growth of seedlings as well. As a side effect of the presence of groundwater with woody debris may tend to increase excess moisture conditions in subsurface soil layer and will act as a negative factor for seedling growth, especially for their root system.

Figure 9. Digital elevation model of the study area.

4.Critique

Overall, the automated detection of seedlings and derived attributes show promising results. However, the presence of woody debris may affect the detection of trees, especially for small unhealthy seedlings. For example, the developed model will falsely detect piles of woody debris as seedlings. According to the results obtained from the canopy height model, the heights of most of the woody debris are less than 0.2 m. Hence, to avoid this type of interference, I introduced a threshold value (0.2 m) for the canopy height model.

Additionally, lack of prior background information (i.e., ground conditions, soil and hydrological conditions, quantitative information of woody debris, etc.) became quite challenging when interpreting the results of this study. Therefore, the collection of in-situ information would be beneficial for future research activities. Additionally, prior assessment of woody debris and their decaying nature would be valuable as a supportive attribute for estimating seedling health.  

Appendix A

Figure A1. Estimated coefficients from geographically weighted regression for the area of bare ground (top) and area of woody debris (bottom)  with seedlings height.

References

Amaranthus, M.P.; Parrish, D.S.; Perry, D.A. 1989. Decaying logs as moisture reservoirs after drought and wildfire. In: Alexander, E., ed. Proceedings of the watershed 1989 symposium on stewardship of soil, air, and water resources. R10-MB-77. Juneau, AK: U.S. Department of Agriculture, Forest Service, Alaska Region: 191–194.

 Berni, J.A.J., Zarco-Tejada, P.J., Sepulcre-Cantó, G., Fereres, E., Villalobos, F., 2009. Mapping canopy conductance and CWSI in olive orchards using high resolution thermal remote sensing imagery. Remote Sens. Environ. 113, 2380–2388. https://doi.org/10.1016/j.rse.2009.06.018

Buters, T., Belton, D., Cross, A., 2019. Seed and Seedling Detection Using Unmanned Aerial Vehicles and Automated Image Classification in the Monitoring of Ecological Recovery. Drones 3, 53. https://doi.org/10.3390/drones3030053

International Society of Arboriculture., 2011. Tree Selection and Placement [WWW Document]. Tree Sel. Place. URL https://www.treesaregood.org/portals/0/docs/treecare/TreeSelection.pdf

Landis, T.D., Dumroese, R.K., Haase, D.L., 2010. Seedling Processing, Storage, and Outplanting, in: The Container Tree Nursery Manual.

Lehmann, J.R.K., Prinz, T., Ziller, S.R., Thiele, J., Heringer, G., Meira-Neto, J.A.A., Buttschardt, T.K., 2017. Open-Source Processing and Analysis of Aerial Imagery Acquired with a Low-Cost Unmanned Aerial System to Support Invasive Plant Management. Front. Environ. Sci. 5. https://doi.org/10.3389/fenvs.2017.00044

Orman, O., Adamus, M., Szewczyk, J., 2016. Regeneration processes on coarse woody debris in mixed forests: Do tree germinants and seedlings have species-specific responses when grown on coarse woody debris? | Request PDF [WWW Document]. ResearchGate. URL https://www.researchgate.net/publication/304404774_Regeneration_processes_on_coarse_woody_debris_in_mixed_forests_Do_tree_germinants_and_seedlings_have_species-specific_responses_when_grown_on_coarse_woody_debris (accessed 5.8.20).

The American Society for Photogrammetry & Remote Sensing, 2011. LAS SPECIFICATION VERSION 1.4 – R13.

The Landscape Toolbox, 2012. Geographically Weighted Regression [WWW Document]. Landsc. Toolbox. URL https://wiki.landscapetoolbox.org/doku.php/spatial_analysis_methods:geographically_weighted_regression

The spatial and temporal correlation between the Particulate Organic Phosphorous (POP) and PP (Primary Production) in Bermuda Ocean.

Question

How the spatial and temporal correlation between the Particulate Organic Phosphorous (POP) and PP (Primary Production) in Bermuda Ocean.

My raw data is unique, where the dates and coordinates of POP and PP is not matching each other, there are only eight data which match, hence I need to adjust it into several ways:

  1. For spatial correlation, I expected that the nearest coordinate of POP to PP will have a very high correlation. In my hypothesis, Primary production is affected by the Phosphorus which came from the dust deposition. The location of this sampling is on the southeast ocean of Bermuda island, hence it is expected that the dust deposition is originated from the Sahara desert.
  2. For temporal correlation, I divided the correlation based on the four seasons. Since the site is in the northern hemisphere, hence winter is from December to February, spring from March until May, summer from June until August, and autumn from September to November. The dust from Sahara will be transported in summer, where the heavier particle will deposition first near to the Sahara Desert and the small particle will deposition latter by the precipitation. Thus, based on this, the hypothesis is there will be a high correlation between the POP and PP in autumn and winter.

Tool and Approach

At first, I would like to use the Neighborhood analysis, however after I accessed the raw data thus, I divided the data based on the nearest distance between POP and PP.

  1. ArcMap 10.7.1. I used join table for POP and PP data which based on the date since the coordinate is different and the FID has a different length.
  1. MATLAB. I used MATLAB to average the data based on distance and generate the R square and scatter plot.

Description of Analysis  

  1. Spatial Category

The spatial category is based on the distance between POP and PP wherein this step I used the Pythagorean formula. Subsequently, I divided the data based on three categories: I (0-0.030 degree), II (0.031 – 0.67 degree), and III (0.68 – 0.9 degree).

  1. Temporal Category

At first, the temporal category is based on all data from 2012 – 2016, and then I also analyzed the temporal pattern based on the distance category.

  1. Time Series plot to see the propagation of POP and PP over five time period.
  2. Scatter plot and Regression analysis

A Scatter plot is done for both spatial and temporal to see the correlation between the POP and PP spatially and temporally. The regression analysis is done to prove the significance of the correlation.

Formula:

w = regress(POPw,PPw)

tblw = table(POPw,PPw);

lmw = fitlm(tblw,’linear’)

Result

  1. Time Series POP
Figure 1. POP time series in Bermuda Ocean from 2012-2016

From figure 1, we can see there is a lag in temporal patter, wherein 2012, there is no record on may, while in 2014 there is no record from January to February and in 2015 there is no record for January and October. In addition, in 2016, the data for winter is only available in March. However, overall we can see that the POP pattern is clustered from January to August, especially in 2013-2015, where the highest POP is converged in January and April.

2. Time Series PP

Figure 2. PP time series in Bermuda Ocean from 2012-2016

From figure 2, the lag also appears, wherein 2013, there is no record in March, while in 2014 there is no record in January, February, and May. In addition, in 2015 the temporal lag is similar to POP and in 2016 there is no record from January to February. However, almost similar to the POP, the temporal pattern is clustered from January to July, so does with the result, the highest PP is concentrated from January to April.

3. Scatter Plot based on Seasonal

Figure 3. The scatter plot of seasonal pattern between POP vs PP

From figure 3 we can conclude that the high concentration of PP happens in winter which until 20.87 mgc/m3/day which also followed by the PP which until 0.034 umol/kg. However, even the PP in spring is also high, however, we can see that the value of POP is concentrated on 0-0.01 wherein this value, the PP is relatively high. However, after I tried to prove this finding by matching it to the R square, it is found that the highest R-square is on autumn (0.17) compare to another season, where the lowest is on spring (0.00458), followed by winter (0.0773) and summer (0.0846). It is indicated that the deposition of POP in the Bermuda ocean is based on a dry and wet deposition.

4. Scatter Plot based on distance category

figure 4. The scatter plot of POP vs PP for distance 0-0.03 degree

Figure 4 is the nearest distance measurement between POP and PP and the result is not likely what I expected, where in autumn the PP is relatively high, however, unfortunately, it is not followed with the POP value, where the lowest POP value happens on autumn. The R square indicates that in this category, the summer season has the highest correlation (0.227), followed by spring (0.117), autumn (0.00146), and winter (0.000297).

figure 5. The scatter plot of POP vs PP for distance 0.031-0.67 degree

As overall, figure 5 indicates the lowest R score compare to category distance I and III in figure 4 and figure 6, which about 0.0036. However, if we see per seasonal, hence winter has the highest R score (0.0613) and summer is the lowest (0.0117). However, there is no significant measurement on winter compare to other season. If refer to the hypothesis, this result of this category matches the hypothesis very well.

figure 6. The scatter plot of POP vs PP for distance 0.68-0.9 degree

From figure 6 we can see the relationship of the farthest distance between POP and PP. The result of R square is similar to figure 5, where the highest is in winter (0.509). However, the trend value of PP is different to category I and II, where the highest PP score for category III is in on summer, whereas two others are in autumn.

Critique of the method

My raw data is a little bit messy, I need to think more about how to process it properly. I am not satisfied with the analysis and the result that I got this my exercise 2. I did not access the normal distribution before doing the regression, which I guess makes me cannot get a significant finding and the R-square is very low compare to the global average. However, by dividing the data into several categories make me can determine the spatial and temporal pattern that I can clip to the Hysplit back trajectory data because in that step I need exact coordinates and time to deploy the modeling.

Assessing relationship between channel width and wood cover in South Fork McKenzie River

Question

My question for exercise 2 involves the relationship between wood cover (in this case, wood area (m^2)/ plot area (m^2)) and channel width through plot center (m). My response variable is wood cover, and my explanatory variable is channel width.

Approach

I used geographic weighted regression (ArcGIS Pro) and simple linear regression (R Studio) to assess this relationship.

Steps

My initial approach involved using the wood accumulation index (WAI) raster I developed in exercise 1. It became clear this was not the best approach, and Julia and I agreed I should attempt to investigate the relationship between wood cover and stream width. My hypothesis was as channel width increased, wood should accumulate and therefore we will see a higher wood cover in plots at wider portions of the channel.

Approach:

Step 1. I delineated a length of stream that approximated the main flow of the South Fork McKenzie River (Fig. 1). This line feature was approximately 850m in length. In order to produce enough observations for regression analysis, I divide length by 30, and plotted points even spaced along this line at 28.3m intervals.

Step 2. Next, I produced plot around these points with 10m buffers and used calculate geometry tool to calculate the overall area encompassed by these plots 312 m^2).

Step 3. I dissolved all features classified as wood into one feature. I then intersected this shapefile with my plots shapefile. This resulted in 30 in-plot wood features. Next, I made a new column for wood area and calculated area for these 30 features.

Step 4. To develop a wood cover variable, I created another column and used the field calculator tool to calculate wood cover: (WoodArea) / (PlotArea) * 100.

Step 5. To produce my explanatory variable (channel width through plot center), I used the measure tool to measure a line segment from one side of the channel to the other that passed through plot center. I approximated the shortest segment, and I recorded these observations in a new column (StreamWid).

Step 6. I used the geographic weighted regression tool to calculate coefficients for StreamWid. These were all very low

Step 7. I imported the associated database file into R studio and performed a simple linear regression.

A close up of a tree

Description automatically generated

Figure 1: Map demonstrating wood plot sampling method

Results

Results from this exercise are not promising. The p-value associated with my linear regression model (figure 2) was 0.72. Therefore, including stream width in the model seems to be insignificant. Additionally, the associated r2 is .005 indicating very little correlation between stream width and wood cover.

Figure 2: Simple Linear Regression Model

Results from the GWR were similar. Local r^2 values were low and ranged from .009 to .016 (figure 3).

Figure 3: R^2 results from GWR

Critique

The approach I used for this exercise is sound and easily applicable to other hydrological modeling (albeit, this is not my realm of expertise). Stream width does not appear to correlate with wood cover in the particular flow transect that I sampled. However, it may be interesting to explore other explanator variables including distance to nearest land feature, water indices, or various plot sizes.

Can Human Behavior be Delimited to the Interactions of Categorical Artifacts in Space?

As the title of this section alludes to, I chose to assess the spatial interactions that categorical artifacts have with each other in space. The assumption that must be made states that areas of the archaeological that once contained human activities creates a source for specific artifact patterns. This assumption can then be broken down into a series of additional corollaries:

C: Human behavior associated with domestic activities will result in a source for a medium count of diverse artifact types (i.e. charcoal, bone, debitage).

C2: Human behavior associated with tool production will result in a source for formal and informal tools, a high abundance of debitage, no abundance of bone and charcoal.

C3: Human behavior associated with animal processing will result in a source for blades, scrappers/uniface, utilized flakes, medium abundance of debitage, high abundance of bone, and no abundance of charcoal.

C4: Human behaviors that produce features such as hearths and pits create a strong source for a diversity of artifacts within an immediate vicinity of the feature.

The above corollaries walk through an assumption that is generally made about archaeological remains when observing the distribution of artifacts. Each on of these corollaries can be broken down even further and more specifically but, because I am not conducting an experimental archaeological investigation (in which these corollaries would be hypotheses), I have chosen to maintain a presence of generality. Additionally, in my statistical calculations, I have only included the three most abundant artifact types to analyze, debitage, bone, and charcoal. These provide for adequate sample sizes and I hypothesize that they share the most amount of variance with human activities in the past.

The questions I wish to address for this blog post is: How does debitage, bone, and charcoal interact with each other to form clustered or dispersed boundaries reflecting human use and perception of space throughout the site, in one point in time and through time?

Cautious Approach to Spatial Autocorrelation

              To assess the question stated above, I used spatial autocorrelation on debitage, bone, and charcoal at this archaeological site. Using Ripley’s K and the Pair Correlation Function (PCF) through R, I analyzed the autocorrelation between all of the artifacts, individual artifacts, and across each artifact type at the horizontal intervals of 0-25cm, 25-50cm, 50-60cm, 60-74cm, 74-100cm. I judgmentally chose these segments through the steps outlines in my blog post 1. By using autocorrelation, I hope to identify boundaries, clusters, and a dispersion of human activities across the site at each interval. This type of analysis must be taken with caution because, as an archaeologist, I cannot assume that 10-25cm intervals contain a single occupancy period at this site. Therefore, by assessing these intervals, I am not defining a single point in time, but I am arguing that these intervals indicate shifts in how people at the site organized themselves in terms of human behavior and spatial perception.

              An additional cautionary method I needed to implement was the correction for inhomogeneity in my dataset by using ‘Kinhom’ for Ripley’s K and ‘pcfinhom’ for the PCF in the spatstat package. This is a necessary step to undergo if the data being analyzed is not homogenously distributed across the study area. This aspect of data can be easily assessed through the production of density maps showing observable clusters. Ultimately, the inhomogeneous correction for these autocorrelation methods more realistically addresses the observed and Poisson distributions.

Now, onto the fun part.

Results

Figure 1 shows the results of Ripley’s K and the PCF operated on an interval of 0-25cm. I had to segregate the boundary of the archaeological site due to the excavation methods and amorphous topography of the surface. Therefore, I am under the assumption that the archaeological data here is an adequate representative of human behavior within the whole site. By comparing Ripley’s K and the PCF, there is a suggested a tendency for clustering (above the red line) from 0-20cm (not significant) and from 90-115cm (significant) at this interval. From the density map, it is clear that the clustered areas reflect the size of 90-115cm but there is also evidence for suggested clustering from 0-20cm which may not have been highlighted in the density map.

Figure 1 Contains a raw data map of all artifact types at 0-25cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 2 shows Ripley’s K and the PCF operated on the interval of 25-50cm. Ripley’s K and the PCF identify clustering at 0-15cm (significant). Additionally, the PCF shows clustering at 100-120cm. These are fascinating results because from looking at the raw data one might infer a larger cluster located near the left margins of the site, but the functions both reflect two sized clusters that are identified through the density map.

Figure 2 Contains a raw data map of all artifact types at 25-50cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 3 shows Ripley’s K and the PCF operated on the interval 50-60cm. In this instance, the results suggest a rapid drop of in clustering within close proximity and clustering starting at 15-25cm (mot significant). This best represents the spatial location of artifacts within the raw data map because the density map appears to reflect slightly larger clusters.

Figure 3 Contains a raw data map of all artifact types at 50-60cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 4 is my personal favorite because of the fascinating oscillation that occurs around the boundary for random distributions. Ripley’s K and the PCF both show significant clustering at 0-20cm. Any artifact past 20cm appears to be randomly distributed within space. Again, this slightly contradicts the observable characteristics in the raw data and density maps, leading to further investigation when imposing specific clusters and human behavior within this surface in the future.

Figure 4 Contains a raw data map of all artifact types at 60-74cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 5 shows a tendency for clustering at 0-15cm (not significant) with two further tendencies for clusters at 25-40cm (not significant) and 60-80cm (not significant). These cluster extents can be seen within the raw data and density maps but may warrant further analysis prior to assigning a specified number of clusters to this segment.

Figure 5 Contains a raw data map of all artifact types at 74-100cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Lastly, figure 6 shows the cross correlation between each artifact type within a PCF corrected for inhomogeneity compared between the 50-60cm interval and 60-74cm interval. This figure helps relate to my initial question as to how each artifact relates to the other within space. By addressing shifts in artifact-artifact relationships, I hope to strengthen my argument that people between these two intervals perceived and used space differently at this site. Through these graphics, there are some similarities and differences that we can identify. The correlation between bone and debitage is suggested to be closer at 60-74cm, while it begins around 20cm (not significant) at 50-60cm. The correlation between charcoal and debitage is suggested to be fairly similar between the intervals. This aligns with my corollaries showing that debitage and charcoal interact with one another only in specific domestic human behaviors. So, it can be assumed that this specific interaction may influence similar spatial patterns through time. The last correlation between charcoal and bone at these intervals is fascinating and warrants further investigation because the graphics almost appear to be mirror reflections of one another, showing a closer correlation at the 50-60cm interval.

Figure 6 Contains the cross correlation of Bone-Debitage, Charcoal-Debitage, and Charcoal-Bone operated on by a PCF corrected for inhomogeneity comparing both intervals at 50-60cm and 60-74cm. The red dashed line represents complete spatial randomness, the black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Critique

The major critiques I have for this approach is the affect borders have on statistical interpretations and, in a similar vein, the scale of data. Borders have always been a barrier within statistical analyses (pun intended). The problem with borders is the absence of data that lies beyond it that is both there and not there (kind of like Schrodinger’s cat). I would have to do more research into the specific mathematical properties affecting by borders, but it is important to understand that they do influence both calculations and interpretations. The second critique on scale is also important because Ripley’s K and a PCF can be operated when there are very few data points, but it will always suggest one’s data is always dispersed. From a mathematical view, this is a correct and appropriate assumption but is trivial to interpret by simply looking at your raw data. So, do not waste time with trying these functions on sparse data that is visibly dispersed already.

Assessing patterns of race and education in Portland over two decades

For Exercise 2 I changed the dataset and question that I am asking, so I will be sharing the results from a hotspot analysis (which would have been exercise 1) and a geographically weighted regression. I am interested in identifying gentrification in Portland, OR so I am looking at race, income, and education data at the census tract level. I got these data for 1990, 2000, and 2010, and have been experimenting to see which will be useful. An important note is that census tracts change shapes over the years, so for this class I am using the 2010 census tracts and acknowledging that there is an error associated with this decision. I obtained the census tracts for all of Multnomah County and trimmed it down to exclude large census tracts on the outskirts with small populations, census tracts with no data in 1990, and some of them east of the city of Portland (closer to Gresham).

The race and education data came as counts and the income is median income per census tract. I added in the raw population numbers in order to calculate race and education percentages. These are percentage of population that is white and percentage that has a bachelor’s degree or higher.

The raw percentages of white population show that in 1990, overall Portland is over 80% white in most tracts except for between the Willamette River and the Portland Airport. Over the next decade the who eastern side of the city becomes less predominantly white and continues to 2010. The area in the northeast that was less white to begin with had an increase in proportion of white population.

I conducted a hotspot analysis of the changes in each demographic from 1990-2000 and 2000-2010. This turned out to be very useful for finding the patterns in the data (more useful than when I ran it for my previous data). Below I have shown the hotspot map for change in race from 2000-2010 (which is similar to that of the previous decade). It becomes very obvious that there is an increase in white people (in red) in the north part of Portland (between the airport and the Willamette river) and a decrease in the eastern part of the city. I found this visualization of race data made by Portland State University that displayed a dot per person and assigned them a color indicating their race. The map swipes between 2000 and 2010 and confirms the pattern I see but provides insight that white people are replacing the black population in the northern part of the city and the population of people with Asian descent is increasing in the east.

Hot spot analysis of change in white population from 1990-2000. The red shows hot spots of where there has been an increase in proportion of the population that is white, and the blue shows a cold spot of where there has been a decrease in proportion of population that is white.

The hotspots for both education and income are generally in the same regions as the hotspots for race, in both decades, but are smaller. This led me to the next step, which was conducting a geographically weighted regression.

Understanding the spatial distribution of the different variables and their correlation is important to understanding their relationship. Since my thesis research will entail mapping gentrification in order to map the spatial correlation with urban gardens, I need to ensure that I have a reliable source of information for gentrification. My hope was that I could use these three variables to map gentrification.  My hypothesis for the pattern between the three variables is that they will be relatively similar. I imagine in such a white city there will be places with increases in income and no change in race. I also understand that there is an increase in Asian population in the east, which unlike black populations is not necessarily lower income on average.

I conducted the Geographically Weighted Regression with the ArcGIS tool a variety of ways. I tested each of the three different variables as the dependent and explanatory variables, and tried with just one explanatory variable, and two explanatory variables. Overall, I think this tool did not provide extremely useful results because there are too many assumptions that are not valid. The results show that the variables are somewhat correlated. The standardized residual maps show that most of the census tracts have similar changes in the three variables. When change in race is the dependent variable, and education and income are both explanatory variables, the northern region that is likely gentrifying is identified as having a higher standardized residual. The graphs show that change in race and change in education have some correlation but change in income has very little correlation with either race or education. The graphs below show these relationships between the variables in 2000-2010.

GWR results with change in race from 2000-2010 as the dependent variable and change in education and income as the explanatory variables. (race_00_1 is the change in proportion of population that is white, inc_00_10 is change in median income, and edu_00_10 is the change in proportion of the population with a bachelor’s or higher.

Overall, I think that the Geographically Weighted Regression tool was not extremely useful. It helped identify that there is some correlation between the change in race and education, but the hotspots show a slightly different story. The hotspots show that the three variables have increases in similar parts of the city. It is possible that I am not using the tool correctly, but I believe that the assumptions needed to utilize the GWR tool are ones that I cannot make with this data.

Exercise 2 – Spatial-Temporal Autocorrelation in Migration and Insurance Claims

1. Question and Significance

My goal for this exercise: to identify to what extent my county-level data is spatially and temporally autocorrelated. This is part of a larger effort to identify a suitable data sample that does not violate the assumption of independent observations. 

Exercise 2 significance in context: Do we see significant out-migration in instances of high flood claims or is there simply a trend of adjacent counties having similar values due to space and time similarities? Or simply, is this a local or larger scale phenomenon?

2. Tool or Approach Utilized

While originally I was interested in attempting to apply geographic weighted regression to examine the spatial relationship of flood insurance claims and migration flows. After reading in-depth about GWR I found that not all assumptions of Ordinary Least Squares could be satisfied. Thus, I was interested in finding suitable subsets within my data that could be used for analysis. I was primarily interested in spatial and temporal autocorrelation (the assumption that all observations are independent). To examine the extent of autocorrelation I created subsets of my data and created a series of scatterplots to visualize spatial and temporal relationships of adjacent counties.  

3. Description of Methods

I first created 2 subset datasets, one of urban well-populated adjacent counties and the other of rural low-populated adjacent counties all in Florida. Florida was utilized due to its high participation in flood insurance. I used Broward county as the metro county of comparison (High population center) to compare with adjacent high population counties. I used a subset of 3 rural low population counties in Northeastern Florida where Dixie County was selected as the primary county for comparison with adjacent rural low population counties. 

I then explored trends in the data related to spatial autocorrelation by urban and rural out-flow rates over time. I then plotted 1 urban county out-flow rate against adjacent counties out-flow rates and did the same for rural counties. I then did the same thing for a hurricane period of time and a non-hurricane period of time. Finally I followed the same steps for hurricane and non-hurricane periods but with the variables or in-flow rates and claims per county population rates. 

Next, I examined temporal autocorrelation by selecting a county and plotting out-flow rates at time 1 on 1 axis and time 2 on another axis for both hurricane and non-hurricane periods with both urban and rural counties. I then tried this with the variable of claims per county population. 

4. Brief Description of Results

I first examined the general trend of out-flow rates across the whole time period (1990-2010). Looking at the above plots above we can see there is a general relationship between out-flow rates in spatially adjacent counties across time.

A) Hurricane Period Out-Migration Plots

Looking at these 2 plots there are obviously far fewer data points meaning they will be much more affected by outliers. From the metro hurricane period (1998-2001) we see similar out-flow rates with higher rates in adjacent counties for this period. For the rural hurricane period (1994-1996) we see generally higher out-migration rates with a somewhat positive linear relationship. Meaning that as Dixie County out-migration rate increases the adjacent county out-migration rates also increase (not causal).

B) Non-Hurricane Period Out-Migration Plots

The metro county out-migration rates reflected here are for (1990,1993,2002, 2003) the lowest insurance claim years and the rural county years are (1999-2002). We see no real discernible trend for non-hurricane metro out-flow rates which seem approximately similar to the metro hurricane years plot above. The rural non-hurricane period also does not demonstrate a significant linear trend with approximately the same values as hurricane years. This likely due to the small number of data points. I saw similar results when plotting the variables of in-flow rates and claims per county population rate.

Temporal Autocorrelation Plots

A) Hurricane Period Time 1 and Time 2 for single County Out-migration Rates

These plots are the result of the subsetting of Broward County (metro)  and Wakulla County (rural)  out flow rates at different times for the X and Y axes during a Hurricane period. We can see that out-flow rates seem relatively constant across the different times indicating evidence of temporal autocorrelation.

B) Non-Hurricane Period Time 1 and Time 2 for single County Out-Migration Rates

These plots give similar results to the hurricane period however the Broward plot does not have enough non-hurricane time periods to show this trend. As indicated for the hurricane time period we see evidence of temporal autocorrelation here. Similar results were seen with other variables such as claims per county population and in-flow rates.

Critique of Method

This was a useful exercise to help examine the spatial and temporal relationships at a small scale. I would say this was useful in painting a partial picture of the extent of the autocorrelation issue, however, not a complete one. This method did not help me narrow down a sample dataset to use for future methods and I am unsure of what my next steps are in continuing with this data in an attempt to fit an econometric model. I would be curious to find some sort of model that allows looser assumption requirements or transformation methods that may help me overcome these autocorrelation issues.

Greenland Ice Velocity vs. Terminus Retreat (II)

Introduction:

I decided to simplify my exercise 2 for a number of reasons, primarily because raw annual data turned out to be more interesting than I anticipated, and because I wanted to leave some tools/questions behind for exercise 3 (also, generating annual data was not a cake walk). As a result, exercise 2 has become entirely focused on the annual correlation between terminus retreat in a glacier, and the corresponding glacier velocity at the terminus location. ArcGIS pro was used to generate the terminus distance retreats for each available year, to create very small buffers around termini, and to extract underlying velocity raster values. Once a master-dataset was generated, I assessed their relationships in excel.

QUESTION:

1.For any given year in my dataset, what is the correlation/relationship between the distance of individual glacier terminus retreat, and individual glacier velocity? (on individual scale and Greenland wide scale)

Simplified: At “T1” how is local glacier velocity correlated to local glacier retreat?

Context from our lab exercises:

  1. What kinds of enhancing or limiting processes might occur that allow B to influence A, in space?

I presume that glacier velocity and retreat distance will be linked via the mechanism of basal melting (wet basal conditions will not only allow glaciers to move faster, but can also be representative of negative mass balance). Fully understanding basal slip is still a major problem in glacier physics. My hypothesis is based purely on the fact that glacier velocity seems to increase early in the melt season- indicating it is linked to negative seasonal mass balance. The spatial influence is obvious in that the retreat and velocity are being measured on the same glacier. More distal influences could include snow accumulation and ice temperature.

  • What kinds of enhancing or limiting processes might occur that allow B to influence A, in time?

This is a more difficult question. I would think that a glacier that experiences basal slip at T would be more likely to experience it at T+1, in a sort of positive feedback. Basal melt could reduce mass balance, which reduces thermal inertia and in turn results in more basal melt. That however does not explain how a brief pulse of velocity at T, followed by a slowdown at T+1, would influence terminus retreat at T+1. I cannot think of a mechanism in which velocity at T would influence terminus retreat at T+1 in this scenario.

APPROACH: “Buffer, Extract, and Correlate”

ArcGIS Pro Tools used:

(for generating annual datasets: Feature Vertices to Points, Near, Summary statistics, Join field)

Buffer, Construct Pyramids, Extract Multivalues to Points, Join field, Table to Excel

Excel Tools used:

Time series (plots), R-squared regression analysis, normalization

Steps:

Data was manipulated as such:

(step 0: Using “Approach #2” from exercise 1, annual datasets of terminus retreat were generated, and annual velocity rasters were added to the map)

1.The BUFFER tool was used to create very small 500m buffers around glacier point data- (because velocity data comes at 500m resolution, this can be thought of almost like a point source- but not quite)

2.For some reason the raster datasets had pyramids, so I had to go through and use  CONSTRUCT PYRAMIDS and set each to 0.

3.The EXTRACT MULTIVALUES TO POINTS was used to extract and average the velocity values sitting directly beneath buffer shapes, for each year in the dataset. This data is automatically added to the buffered-shapefile feature class attribute table. (see below)

4.The TABLE TO EXCEL tool was used to export the buffer attribute table, containing terminus retreat and underlying velocity data for every available year, to an excel file. I find it easier to manipulate and visualize data in other software, as Arc can get a bit clunky.

5.Time series plots containing both distance retreated and local velocity were constructed for every glacier and visually inspected.

6.R-squared linear regression plots were plotted together for each year to assess statistical correlation between the two variables at T1

7.Glacier velocities and retreats were normalized on individual glacier scales (each year subtracted mean and divided by it), to eliminate absolute magnitude effects.

8. Normalized data was plotted in R-squared linear regression plots.

9. Time series were averaged over the entire island on an annual basis, and plotted, visually inspected.

RESULTS:

The methods used ended up being simpler than I anticipated, because the results I ran into were interesting and perplexing, and I wanted to investigate them further. To assess glaciers on an individual scale, I created a map with inlayed data (I originally intended to include more time series but I ran into an error with data alignment that I need to figure out, hence some arrows leading to nowhere) :

Because I assume this will come out at terrible resolution in the blog post- in the time series: Red lines represent the glacier velocity and blue lines represent terminus retreat- plotted by year. The raster overlay is Greenland surface velocity, in the year 2017, measured in meters/year. The points represent each glacier terminus and are colored according to their overall retreat (2005-2017) For some glaciers, velocity over the years seems to roughly follow the retreat particularly in the Northwest of the island. For other glaciers, notably several on the East coast, the glacier velocity does not appear particularly related to terminus retreat. Another purely visual assessment is that, on identical color scales, the dots (retreat magnitudes) do not always seem to match up with underlying velocities. In simpler terms, just by the eye test, the fastest glaciers don’t always necessarily retreat the fastest. But perhaps there is an overall trend? To assess, I made a simple r-squared linear interpolation graph for each of these (~240) pairs, color coding each point by year:

The results indicate no clear correlation. Each point on the graph represents an individual glacier in a specific year, so each glacier is represented a total of 7 times. The linear R-squared relationship between these 2 variables is never greater than .002, for any individual year. As a result we can say that there is no correlation between ABSOLUTE magnitude of glacier velocity and ABSOLUTE magnitude of glacier retreat. That still doesn’t explain why a fair number of glaciers appear to have curves that follow each-other on relative scales. After all, when each year is averaged out among all 240 glaciers, we get the following time series graph:

Here, we notice two things in particular. First, the overall curves seem to covary to a certain extent. Second, there appears to be two “steady states” so to speak: one pre-2008 and one post-2008. Before 2008, Ice retreat and ice velocity were both almost negligible compared to their post-2008 counterparts. The correlation between the two can be isolated using 240 glacier averages over the course of the study period plotted together:

Here, each point represents a single year. When annually averaged across all glaciers, the r-squared of termination retreat vs. ice velocity becomes 0.53. When 2006 and 2007 are removed, that R-squared jumps to 0.65. These values are more in line with what I was expecting to see based on the map and time series produced above. So how can I reconcile the two linear regressions? Clearly the two variables are connected in some way, but I need to figure out how. My first thought is that perhaps I need to eliminate the absolute magnitude of the variables in each glacier, and normalize them into relative magnitudes. That way I can compare relatively large years and small years in a less biased way aka: (for each glacier: does a relative change in velocity correlate to a relative change in terminus retreat?) I normalized measurements for each glacier by subtracting annual measurements from the study period mean, and dividing by the mean, plotted here:

While the R-squared is no better here not even displayed, the graph does depict a reality with two clear steady states- one pre-2008 and one post 2008. Normalized velocities were miniscule before 2008, and then quite significant after. Furthermore, the normalized terminus retreat appears to be far more variable in the years post 2008 (they tend to span a taller height on the graph). These are clear trends I want to dig further into in exercise 3.

CRITIQUE:

If I were to go back and do this exercise over again, I would probably increase the size of the buffer. In creating such small buffers, I created some shapes that were not even overlapping with velocity data (in many cases, velocity data stopped farther up-valley than the terminus location- e.g. the blue circles from my first figure would be sitting over nothing some years). Buffer zones under these conditions extracted values of Nil, and left the velocity data relatively sparsely populated. While most of the data was well populated for retreat values, it was somewhat difficult to find glaciers with a full 8 years of velocity data. As a result, all of these correlations are performed on velocity data that is more sparsely populated than I would like. Several time series for the glaciers are missing many years of velocity data that could be holding important information.

I would say that overall I am disappointed with how much I got done in exercise 2, though I think in order to do this project correctly it is important to look closely at the data I produce before asking totally new questions. I got stuck in a bit of a rabbit hole looking at all the data I produced and took it out of spatial context. Plus, the project is pretty fluid since it was just generated for this class, so I am free to investigate things that capture my interest rather than a specific hard-nosed question.

MOVING FORWARD:

Looking forward towards exercise 3, I now want to begin investigating varying buffer sizes, calculating and observing Rate of change (first derivatives) as well as possibly time lags (though looking at the time series data, I doubt time lags are going to have a big effect on any of the correlations). I want to do more spatial cross correlation, in the following exercise to investigate these factors as well (Perhaps IDW, and more Morans I on varying radii).

Geographically Weighted Regression Analysis on Benthic Organism Count and Total Organic Carbon off Newport, OR

For Exercise 2, I continued my investigation into the relationship between benthic abundance and total organic carbon (TOC) between Point Conception, CA, and the Canada border with a special emphasis on a region off Newport, OR. For the Newport region, I hypothesize that there is a seasonally fluctuating positive relationship between benthic organism count and TOC. I further hypothesize that commercial fishermen tossing bait overboard, river discharge, summer upwelling, and the dumping of dredge materials outside Yaquina Bay impact this relationship. To investigate my hypothesis, I conducted geographically weighted regressions (GWRs) on benthic box core data from five months (April, June, August, October, and December) in 2011. I have ~12 samples from each of the months. Although the analysis did show some interesting results, I am skeptical of these results due to the lack of variation between data taken from each month and small sample size. For all months, TOC levels were less than 1% and mean organism abundance was about the same. I will run the samples through a t-test to determine if there is any statistical difference between samples from each month. Also, I discovered the GWR requires a large sample size (in the hundreds at least) to produce trustworthy results. Accordingly, I expanded my sampling to a larger area off Newport and included monthly data from all sampled years (2010-2016) so that I could have 100+ data points for each month (April, June, August, October – no December data was available for other years). The results of my analysis on the expanded area differ from my results from the limited region, suggesting a negative relationship between organism abundance and TOC for some months.

These conflicting results led me to expand my investigation area even further; I looked at the relationship between organism abundance and TOC for the entire Oregon Coast and the entire sampling area, with mixed results. Because of my inconclusive results, I believe that I need to take a more focused approach with regard to the species I’m investigating. My advisor, Dr. Sarah Henkel, suggested that I look at the relationship between the ratio of polychaetas (bristle worms) to crustaceans and TOC in order to investigate potential polluted areas. Previous researchers have determined that the ratio between polychaetas and crustaceans can be used as an indicator of pollution. This line of inquiry could help me to identify areas where increased TOC results from pollution (like the dumping of dredge materials and agricultural discharge). I also decided that I need to spend some time reading more papers about benthic TOC levels in coastal areas in order to understand why TOC on the Oregon coast is so much lower than in other areas. One hypothesis is that benthic TOC is low because the waters off the coast of Oregon are incredibly productive – especially during summer upwelling. Huge plankton blooms occur, and those plankton are preyed upon by other organisms before they die and sink to the bottom. Oregon’s rivers also transport large quantities of nitrogen and carbon into coastal areas (especially during the fall-winter rainy season). However, the same storms that cause nutrient transport also cause high wave energy and mixing along the coastline. Transported nutrients therefore remain suspended rather than settling in the benthos. Previous researchers have described both a shift in benthic organismal community structure and a “TOC horizon” off the Oregon coast at 90m depth. Perhaps the coastal processes of upwelling and wave action (among others, i.e. longshore transport) are less impactful on benthic TOC accumulation at that depth. 

For this blog post, I will explain the work I’ve done so far using figures and then muse over the process of conducting geographically weighted regressions, keeping my hypotheses in mind.

These points depict the mean centers of each sample collection site. The numbers tell you the approximate depth of each site. All samples depicted here were taken in 2011.

For my initial analysis, I examined 12 sample sites off Newport, OR, in five months (April, June, August, October, and December). All of the samples were in slightly different locations because of course it’s impossible to get a boat back to the exact same spot. The points depicting the sample sites above are placed at the centroids of the true same points. Not all of the stations had samples for each of the five months, but most of them did. I ran a geographically weighted regression analysis for each month using a fixed kernel size based on AICc (Akaike Information Criterion). I would have preferred to run analysis with a fixed bandwidth parameter using the 30 nearest sample points to a given sample, but I could not use that approach because I had an n value of ~12/month.

These graphs illustrate the seasonal changes in GWR coefficient and R2 values at each sample site.

In order to understand my result, I graphed the GWR regression coefficients and r2 values for each site (note than MB40 only has 4 months of data and NS50 has three months of data). The graphs reveal a spatially complex and seasonal relationship between organism count and TOC within the sample area. Overall, I saw the strongest positive relationships between organism count and TOC in April and August. The largest r2 values were associated with April (~.6 at some sites) and June (~.7 at some sites) sampling. These results could support my hypothesis that discarded bait from the commercial Dungeness crab fishery in the winter and early spring causes and increase in TOC and organism abundance. Also, the results for August suggest that organism counts increase as a result of spring and summer upwelling (increasing organism abundance). However, as I mentioned earlier, I am skeptical of these results because of the small sample size and relatively small differences between TOC and organism count between months. For instance, I believe the August results might be skewed by one sample site (BB50) that has over 200 bristle worms (Spiophanes norrisi) in a single box core.

200+ Spiophanes norrisi were collected at one sample sites in August 2011.

To increase my sample size, I included samples taken from 2010-2016 off Newport, OR. Unfortunately, the only December samples that I have are from 2011, and so n=12 even in my extended analysis. The other sample months were April, June, August, and October. For this analysis, I used an adaptive GWR based on 30 nearest neighbors to each sample point. While my analysis of the smaller area only returned one negative coefficient (August 2011 @ NS50), the results from each month in my larger analysis area returned both positive and negative coefficients for each month. I created a box and whisker plot of the results to visualize this relationship more clearly.

Box and whisker plots of seasonal correlation coefficient in the extended Newport sampling area, depicted in the included map.

I also created box and whisker plots of the organism count and TOC data from each month from the extended Newport sample area (pictured above).

Seasonal box and whisker plots of total organism count from the extended Newport sample area, 2010-2016.
Seasonal box and whisker plots of TOC from the extended Newport sample area, 2010-2016.

My results show seasonal complexity, with peak mean GWR regression coefficient, mean organism count, and mean TOC occurring in August. Notably, all months included both positive and negative GWR coefficients. Both April and October have negative coefficient outliers. Also, as you can see the TOC box and whisker plots are somewhat ugly as far as having non-normal distribution. A few “next steps” will be to explore methods of data transformation and investigate non-linear relationships or relational thresholds within the organism count/TOC correlation. Dr. Henkel told me that she usually uses a fourth root transformation or organism count data. I suspect that if there is a linear relationship between organism count and TOC, that relationship might only hold true within a range of values. Specifically, I hypothesize that extremely high TOC might be correlated with decreased organism count above a certain level.

To provide context for my Newport area results, I conducted an adaptive GWR using 30 nearest neighbors on an EPA 2003 dataset that ranges from Point Conception to Canada. Unfortunately, I do not know specifically what months the data was collected, but I know that it was taken some time in the summer or early fall. I will dig into the metadata to see if I can find that information. In any case, the results allowed me to see the complexity in the relationship between organism abundance and TOC along the coast.

GWR coefficients from EPA 2003 data.

Red values represent positive GWR coefficients and blue values represent negative GWR coefficients. Because I know that the relationship between organism count and TOC varies seasonally, some of the apparent spatial variability depicted above could actually be seasonal variable – more investigation is required.

My GWR Process

For me, the process of conducting GWR in ArcMap highlighted the impacts of sample size, spatial scale, and adaptive vs. fixed kernel size on results. When I looked at a small sample size (off Newport), the resulting GWR coefficients were largely positive. However, when I expanded the sample area, I saw both positive and negative coefficients within each month. I also noticed that the settings I selected within the GWR tool in ArcMap caused variation in the results. This observation provided me with an example of why it is so important to select the appropriate statistical methodology based on the type, spatial scale, distribution, and sample size of your dataset. For my analysis, I found the adaptive 30n kernel to be most appropriate and useful. 

A Century of Oregon Salt Marsh Expansion and Contraction (Part II)

Background

As a reminder, my ultimate research goal is to quantitatively characterize salt marsh volumetric change from 1939 to the present within Oregon estuaries using a combination of sediment cores, for which I have excess 210Pb-derived sediment accumulation rates, and historical aerial photographs (HAPs), from which I plan to measure horizontal expansion and retreat. For my Exercise 1, I successfully georeferenced and digitized HAPs from Alsea Bay that are roughly decadal, spanning 1939 to 1991. The next major step is analyzing my data using the USGS Digital Shoreline Analysis System (DSAS; Himmelstoss et al. 2018). However, prior to analyzing my data in the framework, I must first tie up a number of loose ends; I will explore the following questions for my Exercise 2:

Questions

  1. What is an appropriate method of quantifying uncertainty associated with each salt marsh outline?  
  2. How should shoreline be defined? More specifically, to what degree should I incorporate channels into my digitization of each layer?
  3. How should I define a modern shoreline?

Methods

Step 1: Determine Uncertainties

Each digitized layer must have an associated uncertainty to be run through DSAS. Moore (2000) details all possible errors associated with analyzing aerial photography.

  1. Radial and tangential distortions from camera lenses and film deformations caused during collection, processing, and storage combine to produce significant image space distortions. Distortions are theoretically accounted for by transformations applied to georeferenced images. I used a second-order polynomial transformation which is appropriate for HAPs. However, errors exist associated with each ground control point and without ground-truthing, which is beyond the scope of this study, it is practically impossible to determine the accuracy of my georeferencing. As others have done, I will report the root mean square error (RMSE) associated with each georeferenced aerial image.
  2. Moore (2000) additionally details that objects may appear displaced from true ground positions due to differences in relief, distance from the center of the photo, camera tilt, flight altitude, and atmospheric refraction. In my HAPs, errors associated with object space displacements are particularly great along the upslope edges. However, as I wrote in my Exercise 1 Blog Post, I am not interested in retreat of the upslope edge. Laura Brophy and her colleagues have been working on this issue (Brophy et al. 2019), and I do not believe that the rate of retreat will be significate in my timeframe of interest. Currently I have digitized the upslope edge of all fringing marsh using PMEP’s (Pacific Marine and Estuarine Fish Habitat Partnership) Current Wetland Extent Map. In the future I might either use polylines rather than polygons, or I will mask out the tree line in the finished product.
  3. Other errors are associated with interpretation and digitization of the features of interest (Moore 2000). Others have simply reported the scale at which their images have been digitized. I considered comparing a modern digitization completed by myself to PMEP’s Current Wetland Extent Map; however, this map is primarily based on LiDAR imagery with 5 m resolution, which is much coarser than the resolution of my imagery. Thus, to account for errors associated with digitization I will adopt the common method of assuming 1 m of uncertainty based on previous studies (Ruggiero et al. 2013).      

Based on Ruggiero et al. (2013) I plan to characterize error for each year using the equation:

where Up is the total uncertainty of the shoreline position, Ug is the uncertainty associated with georeferencing (RMSE of the control points used to georeference each image), and Ud is the uncertainty of the digitization (1 m). Uncertainty associated with NOAA T- and H-sheets is considered to be 10 m (Ruggiero et al. 2013).  

Step 2: Reduce Channel Complexity

Though I have not assessed the impact of scale on changing the overall area of each marsh platform, I hypothesize that smaller spatial scales would result in decreased volume as more and more tidal creeks are incorporated. Because I have digitized my salt marshes at the smallest spatial scale possible for each photo and the highest quality HAPs are 1939 and 1952, these may erroneously appear relatively too small due to higher channel density than the lowest quality HAPs, 1983 and 1991. Thus, I must decide how to correct for differences in salt marsh channel digitizing due to differences in spatial and spectral resolution between photographs as this alters the overall area of the salt marsh complex.

Preliminary results suggest that resampling the cell size of the highest resolution photograph (1952, 0.2 m) at the lowest resolution photograph (1983, 0.7 m) resolution using the ArcGIS Pro Raster Cell Resample tool (Nearest Neighbor and Cubic resample type) does not produce a noticeable difference. Thus, it appears as though differences in spectral resolution primarily impact apparent differences in image qualities, and thus ability to distinguish tidal creeks. Unfortunately, because no meta data besides date was provided by the UO Historical Photography Library, assessing spectral quality is outside the scope of this study.

My next attempt to standardize channel digitization was to limit channels to widths greater than or equal to the uncertainty associated with each image. This was performed using the ArcGIS Pro Buffer tool. I simply selected the marsh polygon of interest, selected the Buffer tool, specified the buffer distance as the error (e.g., 1 m), then selected that buffered polygon and buffered it again, specifying the buffer distance as the negative error (e.g., -1 m). Comparison of the second buffered polygon indicates that little to no resolution was lost on the marsh edges but that the channels were appropriately limited. I then deleted the original polygon and the first buffered layer.

Upon completion of this first round of Buffering, I realized that because the error is different between each year, the channel density is still different. Thus, I repeated the first step using the worst uncertainty estimate of 3.8 m. At this point all marsh outlines should have a channel density not primarily driven by differences in spectral resolution.   

Step 3: Determine a Modern Shoreline

As stated previously, the PMEP map uses LiDAR with a resolution of 5 m, which is too low for comparison with my HAPs. An OSU MS thesis from 2004 (Russel Scranton, 2004) digitized Alsea Bay using digital ortho quads from 2001 and ground-truthing with RTK GPS. I incorporated this layer, though it required some edits made using the original 2001 digital ortho quads. Because I am comparing horizontal growth rates with sediment core derived vertical accretion rates (and my cores were collected between 2013 and 2017) I then needed to decide what imagery to use from the last ~20 years. I settled on 2009 and 2018 as these years have aerial imagery taken at relatively high resolution (< 0.5 m). I digitized these in a similar manner to how I digitized my HAPs.   

Step 4: Merge Data into a Single Feature Class

DSAS requires data be merged into a single feature class using the Merge tool in ArcGIS Pro. This file must also have two columns in the attribute table noting the date and uncertainty of each shoreline. I specified Jan. 1 for each date as I mostly only had the year of collection, and I specified the uncertainty using the above method. Originally, I merged my polygon data but after realizing DSAS only runs with polyline data, I converted each individual layer using the Polygon to Line tool in ArcGIS Pro, and then re-Merged and edited the attribute table of my shoreline layer. Before placing the data into a personal geodatabase that will be accessed by DSAS in ArcMap, I made certain my layer was in meters.  

Step 5: Initial DSAS Run

DSAS requires a baseline layer for orientation when casting transects. I decided to use my 1939 polyline buffered at 10 m as the baseline. In ArcMap, I specified my baseline, making certain that the program would cast offshore. I then performed an initial casting of transects using a 10 m search radius, 10 m transect density, and a value of 50 for transect smoothing. After calculating transect statistics, I colored my transects based on rates of growth or erosion.         

Results

Final results related to this project are related to estimates of uncertainties and the spatial resolution for each year. Uncertainty ranged from 1.0 to 3.8 m amongst shorelines digitized from the last century. These values are actually quite low compares to other similar studies (e.g. Schieder et al. 2017). Because Oregon salt marshes are typically much smaller than those on the US East and Gulf and European coasts, my georeferencing RMSEs are likely smaller as images need not be stretched over greater distances.

The 1887 shoreline has an uncertainty of 10 m but I have not year included this in the analysis. Because the change in shoreline shape between 1887 and 1939 is so stark, it should be analyzed separately from the 1939 to present data.

The spatial resolution ranged from 0.30 to 0.73 m. Estimating spectral resolution is beyond the scope of this study.

Table 1. Uncertainties (calculated based on georeferencing RMSEs and digitization uncertainty of 1 m) and spatial resolution for each year.

  Spatial Resolution:
YearRMSE (m)Total Uncertainty (m)Cell Size X (m)Cell Size Y (m)
1887 10  
19391.92.20.230.24
19522.02.20.460.45
19592.22.40.280.28
19723.13.30.550.54
19831.82.10.730.71
19913.63.80.710.71
20010.911.41.01.0
20090.181.00.600.60
20180.091.00.300.30

The initial DSAS run clearly highlights that though there are a few areas of accumulation, there exists primarily areas of erosion in the Alsea Bay salt marshes. It appears as though the northern edge of the largest salt marsh island has been experiencing the greatest rates of erosion. This pattern is likely related to the diking of the northern portion of fork of the river in the middle of the century. This would have reduced connectivity and thus sediment transport to this portion of the bay, resulting in erosion under rising relative sea level. The northern portion of the fringing marsh appears to have experienced the slowest rates of erosion. The reason for this is unclear but potentially related to oscillating periods of accumulation than erosion as evidenced by a detailed look at the different shoreline layers.    

Figure 1. Preliminary results: DSAS calculated transects displaying rates of erosion (red = 4.1 m/y) and expansion (blue = 0.6 m/y; white = neutral) of the Alsea Bay salt marshes from 1939 to 2018.

Despite these initial exciting results, there are clearly many areas where this initial DSAS run can be improved and these are both highlighted in the image and briefly outlined below.

Critique

As a general note, I am very happy I chose to move forward with just Alsea Bay salt marshes for this project, rather than georeferencing and digitizing all of my salt marshes at once. Determining all of the little quirks of the Editing tools in ArcGIS Pro and the specifications of DSAS on one dataset will allow me to move much faster through successive datasets. For instance, I did not realize that DSAS required polylines rather than polygons. Going forward I will digitize all of my marshes using polylines rather than polygons and I believe this may actually save me time since I will not have to digitize any upslope edges.   

The methods I used for determining uncertainty seem to be universally accepted given the difficulty of estimating true accuracies of shoreline positions back through time. I will continue to use this method for the remainder of this project.

The Buffering tool saved me a lot of time. Considering a digitization for a single year takes me approximately one day to complete, not having to re-digitize at a universal resolution or to a specific channel width allowed me to make more progress in other aspects of Exercise 2.

I am really excited to move forward with the DSAS program. For my next Exercise, I plan to play around with the DSAS settings to improve the transect projections. Depending on the difficulty of this, I may also have time to being incorporating my sediment core data. Below are some initial areas for which the DSAS projection could be improved.

Figure 2. Closeup view of transect projections.
  1. Remove or mask out all upslope edges since these are not of interest and all digitized using the same PMEP boundaries.
  2. Remove or mask out all complexity within the salt marsh complexes, especially within the northern portion of the larger island and the southern portion of the fringing marsh.
  3. Play around with the transect search distance to remove transects bridging the gap between the largest and smallest islands. This may require running each marsh complex separately (which I would rather not due so that all transect symbology are on the same scale).  
  4. Play around with transect smoothing to reduce the number of intersecting transects in areas with complex shoreline.   

Literature

Brophy, L. S., Greene, C. M., Hare, V. C., Holycross, B., Lanier, A., Heady, W. N., … & Dana, R. (2019). Insights into estuary habitat loss in the western United States using a new method for mapping maximum extent of tidal wetlands. PloS one14(8).

Himmelstoss, E.A., Henderson, R.E., Kratzmann, M.G., & Farris, A.S. (2018). Digital Shoreline Analysis System (DSAS) version 5.0 user guide: U.S. Geological Survey Open-File Report 2018–1179, 110 p., https://doi.org/10.3133/ ofr20181179.

Moore, L. J. (2000). Shoreline mapping techniques. Journal of coastal research, 111-124.

Ruggiero, P., Kratzmann, M. G., Himmelstoss, E. A., Reid, D., Allan, J., & Kaminsky, G. (2013). National assessment of shoreline change: historical shoreline change along the Pacific Northwest coast. US Geological Survey.

Schieder, N. W., Walters, D. C., & Kirwan, M. L. (2018). Massive upland to wetland conversion compensated for historical marsh loss in Chesapeake Bay, USA. Estuaries and Coasts, 41(4), 940-951.

Scranton, R. W. (2004). The application of Geographic Information Systems for delineation and classification of tidal wetlands for resource management of Oregon’s coastal watersheds.