Category Archives: 2020

Posts associated with 2020 term

Exercise 3: Investigating spatial and temporal relationships between prey density sampling stations

Question asked

Following exercise 2, it seems that the relationship between distance to kelp and density of zooplankton cannot appropriately be explained via a linear function. Therefore, I wanted to explore whether certain stations are correlated with one another to continue to try and figure out a way to create a spatial layer of zooplankton density across my entire study area that is ecologically informed. Therefore, my questions for exercise 3 were:

  • “Are there correlations between zooplankton density values at different stations?”
  • “If so, what is the nature of these correlations – positive/negative?”
  • “How does distance factor into these relationships?”

Approaches/tools used, methods & results

I first visualized the zooplankton density at all of my stations for 2018 across time (Fig 1). Since this plot looks a little convoluted, I also decided to split it out by the two sites (Mill Rocks and Tichenor Cove; Fig 2) as well as by station (Fig 3). Just by visual inspection, it does look like there may be peaks and troughs in zooplankton density occurring at the same time across several stations.

To investigate this, I then ran a bivariate correlation matrix on my data to compare prey density between all pairs of stations on the same days (Fig 4), which resulted in only 3 pairs being significantly correlated (Fig 5). 

While this was interesting to see, these plots did not help elucidate where these correlations are occurring in space. Therefore, to add the spatial element into this, I plotted the correlation coefficients of each pair by the distance between that pair of stations (Fig 6). Looking at the plot, there is no obvious, discernable trend. Interestingly, there appear to be some stations that are strongly positively correlated but far apart, while there are also stations that are positively correlated that are close to each other. The same trend applies for negative correlations, though there is not as strong of a negative correlation as there is a positive one. However, a majority of the stations also fall in the middle, indicating that for a lot of stations, zooplankton density at one station does not relate to density at another station.

Fig 6

Although this plot did bring the distance between stations into context, it is hard to visualize that this actually looks like. Therefore, I visualized this plot in two ways. One was by drawing color-coded lines of the strongest positive and negative correlations on my site map (Fig 7), while the second was the same plot as Fig 6, however the points were color-coded by the habitat type (Fig 8) of the two stations (since habitat type is likely also a determinant of where we find zooplankton). 


It has become very clear to me that trying to figure out a way to interpolate prey density across my sites is not as straightforward as I thought it was going to be, which can feel very frustrating at times. However, I simply need to remind myself that the marine environment is incredibly complex and dynamic and that it would be near miraculous if the relationships between habitat, environment, prey, and predator could be easily defined by one variable. For now, exercise 3 has continued to uncover the patterns that do and don’t exist in my data and has led to more analyses to try to keep disentangling the patterns. My next step for the final project will be to investigate temporal cross-correlation to see whether the density at station x1 at time t is related to the density at station x2 at time t-1. I am a little doubtful about whether or not this will work because my sampling frequency varies between stations (sometimes we can’t do certain stations due to unworkable conditions) and there are sometimes 3 day gaps in sampling. However, I shall persevere and see how it goes!

Multiple Regression Analysis Relating to Benthic Polychaetes Amphipods (BPA) Ratios off the Oregon and Washington Coasts

  • Question I asked:
  1. Is Benthic Polychaetes Amphipods (BPA) ratio positively correlated to total organic carbon TOC off the Oregon and Washington coasts?
  2. Is BPA ratio positively correlated to depth off the coast of Washington State?
  3. How strong of predictors of BPA are TOC and Depth off the Washington coast?
  4. Does the relationship between distance to nearest harbor/river and BPA differ at different depth ranges off the Oregon coast?
  5. How effectively can you determine TOC based on depth and distance to nearest harbor or river in Oregon?
  • Tool or Approach Used:
  1. Benthic Polychaetes Amphipods (BPA) Index (See Above)
  2. Generate Near Table (Analysis) Tool in ArcMap
  3. Join (Data Management) Tool in ArcMap
  4. Multiple Regression Analysis w/Interaction Term
  • Steps Followed:
  1. First, I queried by data in Access to determine the total number of polychaetae and amphipods in each box core sample. Then, I joined that information to and existing Excel file and generated a new column to determine BPA.
  2. I typed the BPA equation listed above into Excel and used it to populate a new column to generate BPA index values for each benthic box core sample.
  3. I created a shapefile that contained the GPS locations of all the major rivers outlets and harbors in Oregon.
  4. I created a shapefile of all the benthic box core samples off the coast of Washington and then a shapefile of all the samples off the coast of Oregon.
  5. I used the Generate Near Table (Analysis) Tool in ArcMap with Oregon benthic box core samples as the “Input features” and major river mouths & harbors in Oregon as the “Near features.” This generated a distance value from each box core sample to the nearest river or harbor.
  6. I then used the Join tool too join the distance to nearest river or harbor values to the benthic box core data. After I completed the join, I exported the Attribute table to a text file and opened it in Excel using the “Add Data” function.
  7. In Excel, I used the correlation and Data Analysis regression tools to answer the questions posed above.
  • Results:
  1. Is Benthic Polychaetes Amphipods (BPA) ratio positively correlated to total organic carbon TOC off the Oregon and Washington coasts? – Washington: Pearson’s I Correlation Coefficient: 0.178069128 – WA Regression equation: y = 0.1332x + 0.803, y=BPA, x=TOC – WA Polynomial Relationship between TOC & BPA: y = -0.6365×3 + 0.5372×2 + 0.4564x + 0.7076
This figure shows that off the coast of Washington, BPA tends to increase at higher levels of TOC. TOC is reported in terms of percent organic carbon present in the benthic box core sample.
A square polynomial rather than linear function fits the relationship between TOC and BOA off the Washington Coast more closely. The R-squared value for the linear function was 0.0317 while the R-squared function for the polynomial function is 0.1313.

– Oregon: Pearson’s I Correlation Coefficient: 0.186643563 – OR Regression Equation: y = 0.0906x + 0.8261, y=BPA, x=TOC

This linear regression function suggests that BPA tends to increase at increased levels of TOC off the Oregon coast. Notice that maximum TOC levels are much higher in Oregon than in Washington. The max TOC in Washington was 1.4%, while in Oregon it is over 3%.

2. Is BPA ratio positively correlated to depth off the coast of Washington? Oregon? – WA Pearson’s I Correlation Coefficient: 0.650932232 – WA Regression equation: y = 0.0079x + 0.3124, R2=0.4237, y=BPA, x=Depth

This figure shows that BPA tends to increase at increased depth off the Washington coast. Notice the relatively high R-squared value here, at 0.4237.

– OR Pearson’s: 0.179339185 – OR Regression: y = 0.0005x + 0.8135, R2=0.0322, y=BPA, x=Depth

This figure shows that overall, BPA tends to increase with increased depth off the Oregon coast. However, notice that there seems to be a stronger positive trend between 0 and 200 m depth and a negative trend between 200 and 600 m.

-Because I saw a positive relationship between BPA and TOC from 0-200 m and a negative relationship at 200-600 m, I ran a Pearson’s I correlation coefficient on those two subsets of the depth data and found: – OR 0-200 m correlation: 0.504465898 – OR 200 -600 m correlation: -0.354042185

3. How strong of predictors of BPA are TOC and Depth off the Washington coast? – The results of a multiple regression analysis revealed an R2 value of 0.435745363315495 with y=BPA, x1=TOC, and x2=Depth.

4. Does the relationship between distance to nearest harbor/river and BPA differ at different depth ranges off the Oregon coast?

Here we see that relationship between distance from the nearest major river or harbor and BPA for depths up to 50m. Note that there is a negative relationship for this depth range but a positive relationship at all other depth ranges.

– Multiple Regression (at all depths): y = 0.760833428 – 0.000278547x1 + 0.741952481x2 -0.000206669, y= BPA, x1=Depth, x2= distance from river or harbor

5. How effectively can you determine TOC based on depth and distance from river or harbor in Oregon?   – Multiple Regression: y = -0.141824199 + 0.003378366x1 + 1.147497178x2 + 0.003876666, y = TOC, x1=Depth, x2= Distance from river or harbor, R2 = 0.670711926

This figure shows the relationship between depth and TOC off the Oregon coast. This is a single variable linear regression. This multiple regression R-squared value is a slightly better fit at 0.67 rather than 0.62.
  • I found the Multiple Regression Analysis, Pearson’s I correlation coefficients, and ArcMap tools to be effective for my analysis. It was somewhat cumbersome to run these analyses in Excel, however. I would like to work with multiple regressions in R for ease of use, to determine interaction coefficients, and to graph results.

Exercise 3_ Work in Progress

  • For the third exercise, I am exploring the MODIS fire data in more detail to access spatial patterns and seasonal impacts. I am still relying on mapping exercises and hotspot analysis to examine variations following the wheat harvest season in April and the rice harvest season in October. Some graphs are shown below:
The map above is intended to show wheat production (in 000 Tonnes) across districts. This serves as a useful reference for examining post wheat harvest crop burning.
Results of the Hotspot Analysis using fire radiative power (FRP) show that for the clusters of fires tend to occur in the northern part of the Punjab province. In winter month – post rice harvest, the intensity of fires is lower than in April – May 2011. There are some high intensity fires that occur in rice cultivating districts of the Sindh province.
Hotspot analysis using data for 2013 shows some high intensity fires in the winter months also. However, spatially the fires appear to be concentrated in the same region, as in 2011.
For2014 – in the post wheat harvest season, there are fires of higher intensity clustered in parts of southern Punjab – specifically Bahwalnagar district where a significant amount of wheat is cultivated. There also appear to be clusters of high intensity fires occurring in SIndh.

Exercise3. Influence of landslide volume on traffic distance and time


In Exercise 2, I created a buffer zone and compared the relationship between road density and landslides. Exercise 3 I want to further analyze the impact of landslides on roads and traffic. For large landslides, how does it affect travel distance and time? If the landslides at the same location have different volumes, do people choose different ways to detour?

2. Tool or approach that be used

Buffer: First of all, attribute a table to find out what is volume and area of landslide, based on the area of landslide, determine if the construction site is needs, if yes, create buffer with appropriate distance.

Network analysis: Network analysis is to analyze what is trip time and distance with buffer. First click Network Analysis, then choose route as target. The route is presented, then click edit and create features, add stops and buffer, stops should be same when the route with or without buffer is analyzed.

Show directions: This tool is to estimate trip time and dictance. Also the difference between highways and urban roads will also be shown.


Figure 1. 213-highway normally trip map
Figure 2. 213-highway 250m buffer zone detour map
Figure 3. 213-highway 500m buffer detour map
Figure 4. Directions of normal trip on 213.
Figure 5. Directions of 500m buffer on 213

By comparing Figures 4 and 5, it can be seen that the 500m buffer zone will cause a 5-minute detour. Due to the difference in speed limit and traffic flow on highways and urban roads, the time may be longer.

Unmanned aerial vehicle (UAV)-based photogrammetric approach to delineate tree seedlings

Exercise 3: Visualizing the seedling structure and assessing seedling growth using point cloud data

  1. The question that is addressing in this study

Characterization of the forest canopy and projection of forest inventory from Light detection and ranging (lidar) data has been widely used over the last two decades. Active remote sensing approaches (i.e., lidar) can use to construct three dimensional (3D) structures of tree canopies and underlying terrains (Pearse et al., 2019). However, due to some drawbacks associated with lidar data such as the low density, expensiveness of acquisition of data, and difficulties for utilizing for small areas, the practical use of lidar data becomes limited (Tao et al., 2011). However, the emergence of unmanned areal vehicles (UAVs) photogrammetric point cloud technology could able to address most of the drawbacks associated with lidar data. Especially, lightweight UAVs provide a better platform to acquire digital photos with lower operational costs (Ota et al., 2017). Therefore dense 3D point clouds generated from Structure-from-Motion (SFM) photogrammetry can use as ample source to substitute the lidar point cloud data and utilize to construct 3D structures to estimate forest biophysical properties.

Consequently, voxel-based matrics approaches are commonly used in forestry applications, especially for characterizing the forest canopy and other forest attributes in 3D space. With these approaches, point clouds are split “along both the vertical and horizontal axes to form volumetric pixels (volume pixel or voxel)” (Pearse et al., 2019). The main advantage of the voxel-based prediction is that we can extract the information within different layers of the forest canopy. However, most of the voxel-based matrices have been utilized with lidar point cloud data and terrestrial laser scanning data (Pearse et al., 2019). Since the acquisition of both lidar point cloud data and terrestrial laser scanning data expansive, UAVs based point cloud data use as an alternative solution. Therefore, in this exercise, I am interested in characterizing the forest attributes (in 3D space) using a voxel-based matrics approach by utilizing UAVs photogrammetric point cloud data. Specifically, identify “how the tree crown completeness varies as a function of the number of points that arranged in the 3D point cloud/ voxel resolution”.

2. Name of the tool or approach used, and steps followed.

Step 1:  Pre-processing point cloud data

Selecting Area of Interest

I used R software packages to develop point cloud data from UAV images.

  • Packages used: lidR, sp, raster, rgdal, and EBImage

As the initial step, point cloud data were extracted using AgiSoft software (Fig.1(a)). Then the lasground function in the lidR package was used to classify the points as ground or not ground by using the “cloth simulation filter” (Fig.1 (b)). In the next step, I used the lasnormalize function in the lidR package to remove the topography from the point cloud created (Fig.1 (c)). Next, the normalized point clouds were cropped with the area of interest (AOI) shapefile (Fig. 2 (d)) using lasclip function.

Figure 1. Map of (a). point clouds generated over the study site, (b). point clouds after the cloth simulation function performed, (c). normalized point cloud, and (d). finalized point cloud mapped in AOI and selected sub-set of AOI.

Step 2:  Voxelizing the point cloud data

For better visualization and to increase the efficiency of computer performance, I selected a subset of AOI to illustrate the voxel models(Fig. 1(d)). First, the “lasfilterduplicates” function (in lidR package) was used to remove the duplicate point from point cloud data. After removing the duplicate points, voxelization of point cloud data performed using the “lasvoxelize”  function in the lidR package. This function allows two different options (cubic and non-cubic) to reduce the number of points by voxelizing the point cloud. Figure 2 represents the cubic voxel approach, and relative x, y, and z dimensions indicate the resolution of the voxel. Figure 3 represents the non-cubic approach, and relative dimensions for the voxel represent in each diagram. Especially in this approach, z-axis dimensions are different from x and y dimensions.

Figure 2. Voxel models developed by using a Cubic voxel approach, voxel resolution (a) 1m, (b), 0.1 m, (c). 0.01 m, and (d). 0.001 m. The top row represents the side view, and the bottom row represents the top view of the voxel models.  

Figure 3. Voxel models developed by using a Non-cubic voxels approach, voxel dimensions  (a) 1 x 1x 2 m, (b), 0.1 x0.1 x 0.2 m. (c). 0.01 x0.01 x0.02 m, and (d). 0.001 x 0.001 x0.002 m. The top row represents the side view, and the bottom row represents the top view of the voxel models.

Step 3:  Visualization approaches

The viewshed3d package uses 3D point cloud data for visualizing the different environmental settings, especially for construction 3D environments for ecological research. Due to the potential usefulness of this package, I used it for visualization of the study area and canopy structure of the seedlings in 3D space. At the initial stage, duplicated points were removed from the point cloud. Then the noise associated with point cloud data removed, and the reconstruct_ground() function was used to reconstruct the ground layer of the study area. Finally, an appropriate voxel model (described in section 3) used to visualize the seedlings in 3D space.

Figure 4. (a). Example of detected noise of point clouds (red), (b). reconstruction of the ground and illustrating canopy cover os seedlings, and (c) and (d). illustration of canopy cover over the AOI.

Step 4: Geographically weighted regression(continued from exercise 2)

Image classification tools available in ArcMap 10.7:

  • Constructing a digital elevation model
  • Spatial Analyst Tool: Tools available in Zonal
  • Geographically weighted regression

Assessing the effect of geography  

To identify the influence of geography for seedling growth, I constructed a digital elevation model for the study area. Relative elevation values for each seedling location were extracted by using the Zonal tool available in ArcMap 10.7. As the next step, a geographically weighted regression performed by considering the tree height as the dependent variable and relative elevation of the seedling locations as the independent variable. The observed coefficients for the geographically weighted regression illustrated in figure 5.

Figure 5. Map of (a). coefficients observed from geographically weighted regression by considering seedling height as the dependent variable and elevation of seedling locations as the independent variable and (b). orthomosaic image of the study area.


  • Voxelization of point cloud data

Results obtain by voxelization of point cloud data indicate that reducing of voxel volume can provide better outputs in terms of visualization of seedlings in 3D space. For example, 1 x1 x 1 m3 cubic voxel cloud not able to visualize the structure of the seedlings in sub-AIO, while a volume of 0.1 x0.1x 0.1 m3 voxel could able to demonstrate the seedling architecture in an appropriate manner. After certain trials, a volume of 0.01 x 0.01 x0.01 m3 showed relatively the best voxel model for visualizing the seedling architecture. Even the voxel volume reduces (beyond 0.01 x 0.01 x0.01 m3), the visualization of the seeling article does not improve drastically. Further reducing the voxel volume requires more time and computer power to process the seedling article. Therefore, I selected 0.01 x 0.01 x0.01 m3 voxel volume (model) as the best cubic voxel model to identify the seedling architecture for this study.

Additionally, I performed voxelization of point cloud data with a non-cubic approach, and the observed results are similar to the cubic voxel models. The only difference for the non-cubic approach is the z-axis change (change of vertical resolution). We can change the resolution of the voxel image by changing the z-axis dimensions based on our interest. For example, we can characterize/observe the canopy structure by changing the resolution of the z-axis, as shown in figure 6. Non-cubic approach appropriate if we are interested in studying the variation of canopy structure relative vertically.

Based on the results obtained by visualization of the study area (step2), provide a better platform to understand the actual canopy structures of each seedling. Mainly, this helps to compare the results we obtained in the 2D environment (in exercise 2) and assess the results with the help of 3D representations. By looking at the canopy structures of the seedlings, we can confirm that seedlings growing in the northwest side of the AOI has relatively larger and healthy seedlings while the southeast side of the AOI has smaller and unhealthy seedlings. This observation agrees with the results obtained by geographically weighted regression (exercise 2 and 3). Both cubic and non-cubic approaches are applicable and can be utilized based on our desired output. The identified optimal voxel resolution (0.01 x 0.01 x0.01 m3) can be used as a reference for further voxel-based analysis, and it will help to save data processing time and with less computer power.

Figure 6. Variation of non-cubic voxels relative to z-axis changes (top) (a). 1 m, (b). 0.1 m, (c). 0.01m, (bottom) (d). 2 m, (e). 0.2m, and (f) 0.02 m.

Results obtained by geographically weighted regression (height as the dependent variable and relative elevation of seedling locations as the independent variable) showed a decreasing trend of coefficient values with decreasing the elevation profile (Fig.4 (a)). Furthermore, obtained trends indicate that seedlings located in the northwest part of the study area have higher coeffects values, while seedlings located in the southeast part have relatively lower coefficient values (Fig.4(a)). Generally, the elevation of seedling locations may indicate essential parameters about the geography and water table of the study area. We can assume that lower elevation areas are more prone to accumulate water, and the groundwater table may close to the surface compare to the higher elevations. As described in the previous exercise, this might be the indirect cause for the presence of bare ground in lower elevated areas (i.e., stagnant water may damage the root systems of grass and may die due to rotten roots and will expose the bare ground). Similarly, the presence of excess water may hinder the growth of seedlings as well. As a side effect of the presence of groundwater with woody debris may tend to increase excess moisture conditions in subsurface soil layer and will act as a negative factor for seedling growth, especially for their root system.


Overall, the visualization of seedlings using the voxel approached showed promising results, especially identifying the optimal voxel size/volume that can be utilized for developing seedling architecture in future studies. However, to obtain better visualization outputs, we need to create high-density point clouds (described in step 1), and that requires more time and computer power.

If we are interested in a relative lager area, visualization of objects may need to be enhanced. Therefore, after selecting the appropriate voxel model (i.e., 0.001 x0.001 x 0.001 m3), plotting tools available in lidR package can be used to create better visualization structures, as illustrated in Figure A1.

Additionally, the detection of branches in early-stage seedlings may be difficult due to its unique shape and small branches. However, the developed approach can be useful for identifying the number of branches of a mature trees.

Appendix A

Figure A1. Different methods of plotting the voxel model using the plotting function available in lidR package to increase the visibility of voxel models.


Ota, T., Ogawa, M., Mizoue, N., Fukumoto, K., Yoshida, S., 2017. Forest structure estimation from a UAV-Based photogrammetric point cloud in managed temperate coniferous forests. Forests 8, 343.

Pearse, G.D., Watt, M.S., Dash, J.P., Stone, C., Caccamo, G., 2019. Comparison of models describing forest inventory attributes using standard and voxel-based lidar predictors across a range of pulse densities. Int. J. Appl. Earth Obs. Geoinformation 78, 341–351.

Tao, W., Lei, Y., Mooney, P., 2011. Dense point cloud extraction from UAV captured images in forest area, in: Proceedings 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services. Presented at the Proceedings 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services, pp. 389–392.

What factors that contribute to the Primary Production in Bermuda Ocean

Background and Research Question

Mineral dust is the most important external source of phosphorus (P), a key nutrient controlling phytoplankton productivity and carbon uptake, to the offshore ocean (Stockdale et al., 2016). Paytan and MacLaughin (2007) emphasized that atmospheric P can be important as the major external supply to the offshore ocean, particularly in oligotrophic areas of the open ocean and areas that are P-limited, such as Bermuda Ocean. The most important source of atmospheric P is desert dust, which has been estimated to supply 83% (1.15 Tg⋅a−1) of the total global sources of atmospheric P (Mahowald et al., 2008). Of that dust, it is estimated that 10% is leachable P (Stockdale et al., 2016). In addition, Saharan dust supplies a significant fraction of the P budget of the highly weathered soils of America’s tropical forests and of the oligotrophic water of the Atlantic Ocean, increasing the fertility of these ecosystems (Gross et al., 2015).

Hence, this background is my starting point to try analyzing the correlation between the Particulate Organic Phosphorus (POP) and Primary Production (PP) in Bermuda Ocean. In exercise 2, I tried to answer these questions, however, the result was not likely that I expected. In this exercise 3, I explored a lot about my raw data and I realized that the way I divide the data will affect the result a lot. In this exercise I dug up a lot about the time series, regression, and auto-correlation function in R.

However, due to my misinterpret data in exercise 2, I ended up adding one more variable in this exercise 3 to broaden my analysis and convince my result. The previous study revealed that most phosphorus (P) and iron (Fe) are present as minerals that are not immediately soluble in water, hence not bioavailable (Lidewijde et al (2000), Shi et al (2012)). Lidewijde et al (2000) also stated that phosphorus (P) and iron (Fe), if deposited to the surface ocean, may pass through the photic zone with no effect on primary productivity, owing to their high settling velocity and low solubility. The photic zone has relatively low levels of nutrient concentrations, as a result, phytoplankton does not receive enough nutrients. Moreover, there are several factors that contribute to the primary production, such as physical factors (temperature, hydrostatic pressure, turbulent mixing), chemical factors (oxygen and trace elements), and biological factors. Hence, I added the temperature variable in this exercise 3.

In this exercise 3, I would like to find out what factor that contributes to the PP in Bermuda Ocean and how is the time cycle of three variables (PP, POP, and Temperature)?


Dr. Julia helped me in pre-processing data by using excel, and for further analysis, I used three tools in R:

  1. Time series function: This function will help us to plot the time series trend for every data. Ex code: plot.ts(POPR['PP'], main="PP depth 0-7 meter", ylab="PP")
  2. Linear regression model: This function will help us to identify the correlation between two designated variables, in this exercise, the dependent variable is Primary Production and the independent variables are POP and Temperature. Ex code: POPR.lm<-lm(POP~PP,data=POPR) and summary(POPR.lm). To plot it into a scatter and linear line, I used ggplot function. Ex code: ggplot(POPR, aes(x=POP, y=PP))+ geom_point() + geom_smooth(method=lm,se=FALSE)
  3. Auto-correlation function: This function is used to find patterns in the data. Specifically, the autocorrelation function tells you the correlation between points separated by various time lags. In this exercise, the lag ranges from +1 to -1, where +1 is perfectly related and -1 is inversely related. In afc chart, the dashed line represents the boundary of the significance of correlation. Ex code: acf(POPR['PP'],lag.max = 51,type = c("correlation", "covariance", "partial"),plot = TRUE,na.action = na.contiguous,demean = TRUE) In this code, we can change the lag.max to the number of the data if we want to access the lag coefficient individually, or uses NULL to default. For this exercise I used lag.max which according to the number of my data.

Steps of Analysis

  1. Divided the data based on the depth categories:
CategoryDepth (meter)
POP 10-7
POP 29-12
POP 318-22
POP 437-42
POP 557-62
POP 677-85
POP 798-102
POP 8108-122
POP 9138-144
POP 10158-164
POP 11197-200
Table 1. The Category of POP based on depth

2. Regression analysis for between PP vs POP and PP vs Temperature. Since the data of PP and temperature is only for 0-8 meter depth, hence the regression is only conducted for in this depth category. In order to perform the regression, I pulled out four outliers:

Table 2. The outlier to perform regression analysis

3. Accessing the temporal pattern of variable PP, temperature, and POP by using the auto-correlation function and also time series function. Due to some missingness data, especially for POP and temperature in 0-7 meter depth, hence the date 2014/03/06, 2015/02/04, 2015/12/13 is pulled out for POP and date 2014/03/06 and 2014/12/11 is pulled out for temperature.


  1. Regression
Figure 1. The regression analysis between PP vs POP and PP vs Temperature
Figure 2. R square analysis every month

From figure 1, overall the POP has a positive correlation to the PP, and the temperature has a negative correlation to POP (R square PP vs POP is 0.3064 and R square PP vs temperature is -0.183). From this analysis I can assume that POP is a factor that contributes to the Primary Production in Bermuda ocean. However, to see the detail of this analysis I tried to see the regression analysis per month in a 5 year interval period. I did not perform the regression in January due to a lack of data. From figure 2, we can see that from April to December and February, the pattern of correlation between PP vs POP is similar to the pattern of PP vs temperature, where the highest is in May and the lowest is in December and February. The significant difference happens in March, where the gap is +0.98 for PP vs POP and -0.99 for PP vs Temperature.

2. Auto-correlation and Time Series PP, POP and Temperature depth 0-7 meter

Figure 3. The auto-correlation and time series graph of PP, POP and Temperature depth 0-7 meter

From figure 3, as we can see the temporal pattern of PP and POP is likely similar, where the highest peak is at the beginning of the year for every year, except for 2012 and 2016. This is due to in 2012 the data start in June and March in 2016. This pattern is inversely for temperature, wherein the beginning of the year the temperature is very low and high from June to July. To confirm this pattern, we can see the autocorrelation (ACF) chart, where the temperature has significantly related to the function of time, The vertical line crosses the horizontal dash line which means there is a repetition cycle in time for temperature.

As a temporal pattern, there is a repetition pattern for POP and PP, where the highest value happens in January-March. However the difference from temperature is the repetition in temperature has the exact same value, which does not happen for POP and PP. There is a big gap between the value of POP and PP in January and February from 2013 to January and February in other years. I think that is why there is only one significant correlation line in the beginning ACF chart for POP and PP.

3. Auto-correlation and Time Series of POP by depth categories

Figure 4. The auto-correlation and time-series graph of POP based on 11 depth categories

From figure 4 we can see that the temporal pattern from lag 0-5 is similar for all depth categories where for overall lag only POP 1 to 3 is similar (depth 0-22 meter). It indicates that the deposition of the POP can reach 22 meters depth at the same time.

If we see at POP 4 to POP 6, there is no repetitive pattern both in the ACF chart and the time series chart. However the repetitive pattern happens from POP 7 to POP 10, or from 98-164 meter depth.

By looking at the number of POP 11 (depth 197-200 meter), there is only a small number of POP can reach this depth.

Critique of the method – what was useful, what was not?

Overall this exercise 3 made me realized the way we process the data will affect the result. In exercise 2, I was likely to simplify the process, hence I got difficulties in order to interpret my data and the result is not likely what I expected.

In this exercise 3 I learned a lot about the Auto-correlation function and time series function in R. In my opinion, acf is very useful for stable data like temperature, where the repetitive pattern is along with the value. The significance of ACF is based on the value, hence in the ACF temperature chart (figure 3) we can see that most ACF has a significant correlation overtime period (both positive and negative), which does not happen in POP and PP data.

POP and PP have a pattern, however, the value varies overtime period, hence although we can see the pattern in time-series and ACF charts, the significance only appears in the beginning month in 2013. In my opinion, if we would like to access the significance cycle, the unstable data like POP and PP where the is a big gap value overtime period, ACF is not really suitable. However, if we just would like to see the pattern we can relly on time-series and ACF function as well.


Gross, A., Goren, T., Pio, C., Cardoso, J., Tirosh, O., Todd, M. C., Rosenfeld, D., Weiner, T., Custódio, D., & Angert, A. (2015). Variability in Sources and Concentrations of Saharan Dust Phosphorus over the Atlantic Ocean. Environmental Science & Technology Letters, 2(2), 31–37.

Eijsink LM, Krom MD, Herut B (2000) Speciation and burial flux of phosphorus in the surface sediments of the eastern Mediterranean. Am J Sci 300(6):483–503. doi: 10.2475/ajs.300.6.483.

Mahowald, N., Jickells, T. D., Baker, A. R., Artaxo, P., Benitez-Nelson, C. R., Bergametti, G., Bond, T. C., Chen, Y., Cohen, D. D., Herut, B., Kubilay, N., Losno, R., Luo, C., Maenhaut, W., McGee, K. A., Okin, G. S., Siefert, R. L., & Tsukuda, S. (2008). Global distribution of atmospheric phosphorus sources, concentrations and deposition rates, and anthropogenic impacts. Global Biogeochemical Cycles, 22(4).

Paytan, A., & McLaughlin, K. (2007). The Oceanic Phosphorus Cycle. Chemical Reviews, 107(2), 563–576.

Stockdale, A., Krom, M. D., Mortimer, R. J. G., Benning, L. G., Carslaw, K. S., Herbert, R. J., Shi, Z., Myriokefalitakis, S., Kanakidou, M., & Nenes, A. (2016). Understanding the nature of atmospheric acid processing of mineral dusts in supplying bioavailable phosphorus to the oceans. Proceedings of the National Academy of Sciences, 113(51), 14639.

Zongbo Shi, Michael D. Krom, Timothy D. Jickells, Steeve Bonneville, Kenneth S. Carslaw, Nikos Mihalopoulos, Alex R. Baker, Liane G. Benning. Impacts on iron solubility in the mineral dust by processes in the source region and the atmosphere: A review. Aeolian Research, Volume 5, 2012, Pages 21-42. ISSN 1875-9637.

Continuing to understand the spatial and temporal correlation between census variables and their relation to gentrification

For this exercise I was interested in continuing to test out different variable combinations with in the Geographically Weighted Regression and further exploring the meaning of the relationship between these variables. This framed exercise three with the following research questions:

  • What are the individual relationships between change in race, education, and income across both decades?
  • How does each change variable relate to itself across the two decades?
  • Do these results produce information that provide information on where there might be gentrification?
  • How do these results compare to where others have identified gentrification and how they have calculated it?

Geographically Weighted Regression

I continued using the GWR tool to make more maps of how the variables are correlated. I expected some similarity in the patterns between variables, and I hypothesized that they would be in the northern central part of the city. When I compare each decade’s variables to the same variable in the second decade, I expect some correlation in areas that have steady change, but lower correlation in areas that change rapidly in one decade and not the next.

In order to do this, I used the GWR tool in ArcGIS and used the number of neighbors as the neighborhood type. I ran the tool with change in race from 1990-2000 as the dependent variable and change in education from the same decade as the explanatory variable. I repeated this with race and income, and then all of it with the second decade. I then ran it with education as the dependent variable and income as the explanatory variable for each decade.

Comparison to Other Gentrification Maps

I have begun to look at other maps of gentrification in Portland, OR to help make sense of the information that I am getting out of the GWR tool, compare results, and compare methods. Bureau of Planning and Sustainability for the City of Portland has produced maps of gentrification, also by census tract throughout the city. They created a taxonomy of gentrification from susceptible to late gentrification (and even continued loss after gentrification). They also have maps of the variables that went into their maps, which consisted of percentages of populations that were: renters, of color, without a bachelor’s degree, and how many households were below 80% of the median family income. These were all combined into a vulnerability map. The changes in renters, race, education, and income were combined to create a map of demographic change. The change in median home value and appreciation from 1990-2010 maps were combined to create a housing market conditions map. All of these were combined to create the final gentrification map.


The results of the GWR tool produced maps comparing two variables for one decade. As mentioned above, I ran the tool for all variable combinations for both decades. These results show that each combination of variables changes together in a different part of the city. Change in race and education occurs in the northwest part of the city in the first decade and north central/west part in the second decade. Race and education change in the eastern part of the city in the first decade, and some in the east and some in the northwest in the second decade. Education and income change in the southern part of the city in the first decade and southeast in the second decade. It is interesting that all of the combinations are slightly different but have some patterns.

The regression coefficients from the model runs when comparing each variable across the two decades show interesting patterns too. The changes in race each decade was relatively similar across the city but had the highest coefficients in the southern part of the city. Changes in education both was the same in the eastern part of the city, and changes in income both decades were similar in the north/central part of the city.

Figure 1: This figure shows the coefficients from the GWR between two variables. a) Coefficients of change in race and education. b) Coefficients of change in race and income. c) Coefficients of change in education and income.

When compared with the Bureau of Planning and Sustainability gentrification maps, there are some patterns of similarity. They find that the northwest part of the city there is gentrification currently happening, the central part of the city has a late stage of gentrification, the south part has some early stage gentrification, and the eastern side is susceptible.

Figure 2: Bureau of Planning and Sustainability for the City of Portland: Gentrification Neighborhood Typology map

The correlation between the changes in race and education seem to have similar patterns to current gentrification and areas that are susceptible. Change in race and income seem to follow a similar pattern to susceptible areas. Change in education and income seem to have the highest correlation in areas that are susceptible or are in early stages of gentrification.

I think the biggest difference in their calculations are using the opposite side of the data. For race they used percentage of populations that were communities of color, whereas I used percentage that is white. They used percentage without a bachelor’s degree, whereas I used percentage with a bachelor’s degree. They also used the percentage of the population that are renters. These differences tell a slightly different story since there are multiple patterns of change in the city of Portland.


I think one downfall of the GWR tool is that if you include multiple explanatory variables, you cannot get a coefficient value for how they all change together, you get separate coefficients for each explanatory variable. Granted, I’m not sure how this would be possible, but it would be useful to see if the three variables have any patterns together. This makes it difficult to compare three variables at the same time.

Exercise 3: so far…


# Lewis et al. 2007, D’eon and Delparte 2005. data screening based on fix type and PDOP (DOP)

# calculate data retention for: (Blair 2019)

# 1. Converting all 2D locations to “missed fix”

# 2. Converting all 2D locations AND 3D with DOP>5 to “missed fix”

# 3. Converting all 2D locations AND 3D with DOP>7 to “missed fix”

# 4. Converting all 2D with DOP >5 AND 3D with DOP >10 to “missed fix”

To account for fixes that are likely inaccurate (not enough satellites), there are various ways to screen the data. I tried 4 different methods to see which method had the highest data retention. Fixes that are considered inaccurate past a certain threshold are flagged as a “missed fix” even if the collar obtained a location.

No ScreeningMethod.1Method.2Method.3Method.4
Comparison of data retention given 4 different screening methods, versus no screening method. The original proportion of successful fixes out of the total number of attempted fixes was 0.91, and the next highest was method 4 (Removing all 2D with DOP >5 AND 3D with DOP >10).

Option 4 (Removing all 2D with DOP >5 AND 3D with DOP >10) had the highest data retention after removing suspicious locations. Method 1 (removing all 2D locations) was a closet second.

The next step is to run the models for FSR as a function of environmental attributes again with the new screened data (derived from Method 4).


First attempt at moving window analysis used very large window sizes.

The effect of sky availability at various spatial scales on FSR. All models were competitive.

Decided to try again with smaller window sizes.

05/22/2020 finished redoing window sizes, next need to extract values from test sites and re-run the regression analysis.


Context: I calculated the difference between each test site location (obtained via handheld GPS unit) and collar locations. Common data screening practices utilize cutoff values based off of fix type (2D vs 3D fixes) and Dilution of Precision (DOP), which are both recorded with the coordinates obtained by the GPS collar. Locations predicted to be more accurate (less location error) are normally 3D, and have low DOP values.

A common practice is to remove all 2D locations, or remove all locations with DOP>5.

The graphs below are meant to help me figure out a cutoff value for the stationary collar test data. The cutoff value used here will also be applied to the deer GPS data. The graphs are the result of a series of subsetting the data based off of DOP values and Fix Type to help gain a clearer picture of the data.

There is obviously one outlier 3D location at about 4000m away from the test site location.
To get a better look at the data, I removed all locations with DOP>10.
All 2D locations and their associated distances to the actual true coordinates of the test site.
All 3D locations and their associated distances to the actual true coordinates of the test site.

I was hoping to see a more clear pattern or “threshold” pattern that would show that above a certain DOP value, distances dramatically increased. However as the “data cloud” in the first scatterplot shows, there isn’t much of a pattern. However…

Keeping all data only within 30m to match raster data resolution results in 72% data retention (loss of 28% of the data), and doesn’t include any 2D fixes.

If we look at only the locations within 30 meters to match raster data resolution, we see that this data includes DOP as high as 18.5 and does not include any 2D data. The fact that these more accurate locations also include such high DOP values is unexpected. I’m not sure what the best cutoff value is based off of these plots.

Ex 3 Accuracy Assessment of Woody Debris Classifier

  1. For exercise 3, I am interested in the relationship between accuracy of wood detection using a random forest classifier and distance from training samples used to train the classifier. Specifically, do accuracy metrics (i.e. accuracy, kappa, Type I and Type II error) change with distance to training area? To train the classifier, I selected a 3.5-acre area near the center of my area of interest. I selected this area because I felt it had a variety of features—islands, wood jams, shadows, woody debris, rocks, etc.—representative of features found throughout the 150 acres of the site. I hypothesize that kappa will decrease in sample plots that are further from the training area.
  2. I have not executed the approach I intend to use because I am still in the process of data wrangling and creating confusion matrices for sample plots (I am attempting to batch clip plots in ArcGIS Pro, and merge rasters into a 2-band raster stack with reference data in one band and predictions in the other band), however, I ultimately intend to use a similar approach as I implemented in exercise 2 using simple linear regression in RStudio and geographically weighted regression in ArcGIS Pro.
  3. I will calculate distance from sample plot centers to training area using the near tool in ArcGIS Pro. This distance will be the explanatory variable in a linear model, and kappa will be the dependent variable. I would like to append this data in tabular format as illustrated in Table 1 below.

Table 1

Plot IDKappaAccuracyDistance to Train Area

I will calculate kappa and distance to training area (ideally for 72 sampled plots). Then, I will use these variables in a simple linear regression in RStudio and a Geographically Weighted Regression in ArcGIS Pro.

4. I anticipate the results of this analysis will indicate a decrease in kappa as plots are located further form the training area. I hope to see this indicated by a low p-value in SLR. The GWR may suggest there are areas of the site where the classifier performed poorly, and I believe this will be likely due to lighting conditions in the area (either shadows, or very high reflectance on woody debris due to sun angle). If there is time, I would like to assess the relationship between kappa and wood area because I currently cite Landis & Koch 1977 to assess kapa values. This paper refers to a somewhat arbitrary scale for assessing the accuracy of a classifier based on kappa. It would be interesting to see the effect kappa has on wood area estimates.

5. I will critique the method when I have results. My early impressions are this methodology is a little tedious due to number of plots. I am having a lab assistant manually delineate the wood polygons in plots that were also sampled on the ground. These plots will then be rasterized and clipped. I also need to clip the classified raster by these plots (there is a batch clip tool available in arc). From here, I will write a loop in R that accepts a path, locates tiffs in the folder, and calculates confusion matrix metrics for each plot.


Landis, J. R. and Koch, G. G. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1): 159. doi: 10.2307/2529310

Ex. 3 – Spatial Model Specification

1. Question that you asked

For this week’s exercise I was interested in refining my spatial model specifications based on my findings of previous weeks. I was interested in asking the question: 

Can I determine a spatial linear model to describe the relationship between county out migration flows and flood claim payments and claim rates?

Significance in context: Modeling the relationship between county migration flows and flood claims allows for discussions of the implications of flood insurance on population mobility and how natural hazards interact with the US population landscape in general. 

2. Name of the tool or approach that you used.

In order to test the fit of two model frameworks (the spatial lag and spatial error model) I used the Lagrange Multiplier Test which tests spatial dependence in model fit for these two models under typical and robust forms. I then fit model variables to the preferred model form using the “spatialreg” package in R and map the residuals to observe spatial autocorrelation. 

3. Brief description of steps you followed to complete the analysis.

I started by subsetting my data into only counties in Gulf Coast states (Texas, Louisiana, Mississippi,Alabama, and Florida). I then created 2 different versions of this subset of states for a high flood instance year (2005) and a low flood instance year (2003). 

Next, I created an Ordinary Least Squares model where log base 10 county out-flow is a function of log base 10 total amount paid and claim rate per county (log(out_flow) ~ log(total_amount_paid) + claim_rate). Following this, I ran a Lagrange Multiplier Test and used the outputs to determine the preferred alternative model based on spatial dependence. I applied the following framework created by Luc Anselin (2005) to determine the preferred alternative model.

I then proceeded in running the preferred model, which, in my case was the spatial error model. The model structure is composed of the same variables described in the OLS version. I then map the residuals by county of the error model and the residuals of the original OLS model for comparison.

4. Brief description of results you obtained.

The spatial error model results can be seen below with mapped residuals alongside for 2005 and 2003. 

The OLS mapped residuals can be seen below for comparison for 2005 only. 

These results indicate there is statistical relationship in insurance payment amounts that is positively associated with county out flows. This means that as payment amounts increase we expect there to be more people leaving. Interestingly enough there was a notable difference in 2005 and 2003 for the coefficient estimate associated claim rate and I would attribute this to the relatively lower claim rate in 2003 compared with 2005. Next, I will be interested to see if this negative (not statistically significant) relationship remains when I expand the model and test for all years. Notably, we see the model is much improved in spatial dependence by using the spatial error model which can be seen comparing the residuals mapped in the OLS map and the Error maps. 

5. Critique of the method – what was useful, what was not?

This method was very useful in helping me find a model specification that more precisely fits the data without spatial dependence. I will have to determine if the same model structure is viable at a larger scale spatially and temporally next. 

The only cons of this approach is the extensive research required before understanding the data and diagnostic test results well enough to proceed and communicate the interpretations.

Velocity Along Ice Flows vs. Rate of Terminus Retreat Among Greenland Outlet Glaciers (Pt. III)


I initiated exercise 3 expecting to glean useful information from GWR and global Moran’s I experiments, but preliminary exploration of these tools indicated that they wouldn’t be particularly useful to me given how my data is currently set up. As a result, I decided to go ahead and attempt my own version of a network analysis, to better characterize ice velocity around the borders of Greenland. In exercise 2, I simply pulled velocity data from right by glacier termini- which is not necessarily the be all and end all of glacier velocity. My time is better spent trying to develop a more robust velocity dataset, and returning to more conventional techniques for interpreting statistical relationships. Furthermore, I went to great lengths to improve data quality (fewer glaciers with gaps in their datasets…) in hopes that this manifest in clearer statistical relationships.


1)How does velocity along entire Greenland ICE FLOWS (or ice streams) correlate to terminus retreat rates for their associated outlet glaciers?

In simpler terms: Do glaciers that are fed by larger/ faster ice flows display a greater likelihood of rapid retreat?

2) Does the relative annual change in ice flow velocity (variability in %) correlate to relative annual change in glacier retreat (variability in %)?

In simpler terms: if ice velocity in a location jumps from one year to the next, do we see a greater likelihood of terminus retreat increasing as well?


Given time constraints for the class, and my particular questions, I had to use some intuition to rule out various paths forward. Exploratory statistics in my velocity and retreat datasets that there is really no relevant geospatial clustering. (sure, high velocity clusters around valleys, but this is where all the outlet glaciers occur anyways, so that is relatively meaningless/ obvious). It seems clear to me that I need to better represent velocity data, on a wider spatial scale, in order to accurately represent the relationship between ice velocity and terminus retreat. To do this, I attempted a rudimentary “network analysis” to incorporate velocity data along entire flow lines of ice. This seems far more valuable to me than fiddling with meaningless GWR and correlogram data on relatively messy datasets.

APPROACH: “Characterizing Ice Stream Velocity”:

ArcGIS Pro Tools used:

Drawing polylines, Generate Points along Lines, Extract Multivalues to Points, Summary Statistics, Join Field, Buffer, Intersect, Table to Excel

Excel Tools used:

Time series, R- squared simple regressions, normalization, first derivatives


1.DRAWING FLOW LINES: A feature class was created and populated with flow lines. Flow lines were drawn using the feature editor, and using visual queues from velocity datasets. Care was taken to ensued flow lines spanned the entire study period (now 2008-2016) so as not to have gaps in my dataset. If glaciers did not appear to be associated with ice flows, fell within a velocity satellite “dead zone”  or did not have terminus data for most years, they were excluded from the dataset. The total 240 glacier dataset was shrunk down to 140 glaciers following these limitations.

Figure 1: Example of flow lines drawn in ice streams.
Figure 2: Complete set of drawn flow lines. Green lines represent digitized ice streams, and red (lines) represent available terminus data. This figure just represents the “trimming” of data from 240 glaciers to 113 glaciers.

2)GENERATE POINTS ALONG LINES tool was used to split each flow line into 20 parts (5% chunks). Each point could then be used to extract underlying velocity data from annual velocity rasters.

Figure 3: The GENERATE POINTS ALONG LINES TOOL was used to generate 20 points along each digitized flow line.

3)MULTIVALUES TO POINTS tool was used to extract velocity data from velocity rasters spanning the study period. This data is saved in the point attribute table.

4)SUMMARY STATISTICS tool was used to summarize these velocities. For each glacier, the MEAN and the SUM of velocity along flow line points was recorded.

5)JOIN FIELD tool was used to apply these new MEAN and SUM variables to the flow line attribute table.

6)BUFFER tool was used to expand previous glacier point data representing glacier ID, as well as previously calculated terminus retreat data over the study period.

7)INTERSECT tool was used to merge the flow line dataset (containing new velocity data) with the glacier point dataset (containing old retreat data). This was to consolidate data into one attribute table.

8)TABLE TO EXCEL tool was used to export the attribute table for use in other programs.


An exhaustive array of graphs and statistical relationships were generated in order to try to understand how ice velocity and terminus retreat interact, but I will share only the most insightful in this blog post. For the most part, the results are very similar to those from exercise 2, which isolated velocity data to point sources right near termini.

Figure 4: Absolute terminus retreat (m) plotted with absolute ice velocity along flow lines (mean) averaged over all of Greenland.

When averaged across the entire island, there appears to be a relatively strong correlation between the average ice flow velocity and the average terminus retreat, indicating that these two variables are indeed connected on an annual timescale. However, considering there are only 5 discontinuous years to plot, this conclusion would benefit from a greater pool of sample years. With an R-squared of 0.63, we can with large uncertainty say that on average, years that experience faster glacier velocity in ice streams also experience somewhat faster retreat at their associated glacier termini. This is a very similar conclusion (with a very similar R-squared) to what was reached in exercise 2 using point-sourced velocity data. It is worth noting that from 2008 to the 2016-2017 season, there does not appear to be an increasing trend in terminus retreat or in velocity along flow lines, which does not follow local temperature trends. This indicates a potential lag in response time larger than that of the study period (not at all surprising).

Figure 5: The CHANGE in distance retreated plotted with the CHANGE in velocity between sample periods, averaged over all of Greenland.

In addition to plotting the absolute magnitudes of these two variables, I wanted to plot the temporal change. This was a bit unorthodoxed given the discontinuous nature of the sample years, but it still provides valuable information. Again averaged over the entire island we see another trend: Glaciers that experience an increase in velocity from t1 to t2 are more likely to increase their rate of terminus retreat from t1 to t2. These variables can be conceptualized as the first derivatives of the records provided above- focusing on the rate of change rather than the magnitude of change itself.

Figure 6: There is no clear correlation between the change in ice velocity and the change in retreat rate, in any sample period with available data.

While annual data averaged over the entire island showed clear trends, much like it did last time, individual glaciers for every given sample year tell a different story. Not displayed here: there was no trend between absolute velocity along flow lines (MEAN OR SUM) and absolute distance retreated in any given year. As you can see above, there is also virtually no trend between their first derivatives either. Glaciers with bigger jumps in velocity from t1 to t2 do not necessarily experience greater jumps in rate of retreat from t1 to t2. This goes against my initial hypothesis from the start of the project (that faster moving glaciers also retreat faster.

Figure 7: Normalizing the changes in velocity and in retreat rate does not improve the statistical relationship.

Normalizing these changes over time also does not induce a visible trend. The figure above is merely an example figure, but all four time jumps show no clear trend even on a relative scale. However these regression plots could theoretically tell us something about variability in any given year if the sample periods were better represented (consistent time jumps). For example, glacier velocity seems to have been much more variable in the 2008-2014 period, compared to 2014-2016. However, this is not a fair comparison, because the first 2 show a change over several years, whereas the second 2 are actual year to year changes. (its also not fair to divide the first 2 over their study period, because that assumes the changes were incremental and constant, which they most certainly were not.)


The problem with my approach is that I do not have the necessary data to draw a reasonable conclusion. There are just far too many temporal gaps in the dataset(which span too short of a time period to observe trends in glaciers- which have notoriously slow response times). I did my best to eliminate glaciers that did not have all the necessary data (whittling down from 240 glaciers to 113 based on available data), but I cannot generate data for years that were simply not provided. As a result, my conclusions can really only be as comprehensive as the data allows.

As Julia brought up in my previous draft, my hypothesis also applies to a specific subset of glaciers- those with wet basal conditions. It is prudent to presume that moving forward, more and more outlet glaciers will develop these conditions, but once again, glaciers are slow to respond to climate responses. It is also quite hard to find a dataset that determines which of these glaciers are wet-based vs. dry based: sure we know for a fact that certain glaciers may or may not be wet-based, but glaciers where these conditions are well constrained are not just scattered across the continent. One idea for a future project would be to limit this analysis only to glaciers we know have wet basal conditions. As of right now, it seems those are the only glaciers velocity might have predictive power over to begin with.

Finally, the hypothesis can simply be wrong. I was operating based on an assumption that vulnerable glaciers tend to heat up and develop wet beds- which cause them to flow faster. These same glaciers would hypothetically also be more vulnerable to retreat at the terminus because they have an assumed negative mass balance. An alternative mechanism- as Julia pointed out, could be that glaciers that FLOW faster and are indeed melting might not necessarily exhibit terminus retreat, because that lost “real estate” at the terminus would be quickly replenished by new ice from further inland. That too is a very valid hypothetical mechanism.

Moving forward, I think the only thing I could milk from this dataset without starting an entirely new project is to try to identify regional trends. I cannot distinguish glaciers based on their basal conditions with the available data, but I CAN distinguish them based on their location. Perhaps the different regions of Greenland glaciers are experiencing different trends. The final step I will complete will be to divide the data I produced above into 6 separate regional groups, indicated below (over GRACE- mass anomaly data) and recreate the graphs above to distinguish any trends in different regions, if there are any present. It could be reasonable to assume that some of these regions might have higher proportions of wet beds than others.

Figure 8: For the final week of class, I propose to separate data based on regions, to see if any clear trends emerge in particular geographic areas (Overlaid over GRACE 2013 mass anomaly data- our best reference for greenland mass balance.)

Temporal Variability in Cultural and Natural Deposition

How is Time Measured?

              Up until this point, I have worked through 2-Dimensional representations of the artifact assemblage from Cooper’s Ferry, Idaho. My first blog post contained visualizations of the artifacts within 3-Dimensions and 2-Dimensions by using kernel density. These graphics were used to identify intervals at this archaeological site that appear to be separated by an extended period of sediment deposition and/or time. Under the hypothesis that human behavior at the site differs between these “gaps” in time, I chose five unique intervals to examine. Within a 1-meter depth, I chose the intervals 0-25cm, 25-50cm, 50-60cm, 60-74cm, and 74-100cm. Geoff Bailey (2007), when discussing time perspectivism, states that archaeological assemblages may be best studied through multiple scopes in time and space. The idea represented here alludes to a much more formalized theory in fractals (Mandlebrot 1967; Plotnick 1986; Sadler 1999). In a general sense, a fractal is a pattern that is repeated at different scales. Mandlebrot (1967) discusses how fractals can be used to define the coastline of Britain at different spatial scales. The larger scale uses fewer fractals, or lines, to define the coastline of Britain, whereas a smaller scale requires more fractals because there is more identifiable detail in the coastline when one “zooms in” on a map. My question for this blog post mirrors this discussion on fractals; the main difference in my study is that, instead of looking at coast lines, I wish to understand how the temporal pattern can be observed within the archaeological assemblage between and within each interval listed above. To explore this inquiry, I used a model developed by Maarten Blaauw and J. Christen that has been named after Francis Bacon (AD 1561-1626).

Question: How is sediment accumulation rate associated with artifact deposition through different temporal scales at the archaeological site in Cooper’s Ferry, Idaho and how can these rates of deposition imply natural and cultural transformations on archaeological deposits?


              For the purpose of thinking about how the artifacts in this assemblage are separated by time, I explored the concept of age-depth models that I was directed to with the assistance of Erin Peck. Age-depth models, with the proper input, facilitate the understanding of depositional rates in sedimentation throughout a core of sediment. The primary input for the depth-age model I used is C14 dated materials that was recovered from the archaeological context in question. I utilized the ‘rbacon’ package built for the R environment. Following the guidance and recommendations given by Blaauw et al (2018), I implemented the carbon dates I had access to (fig. 1) and produced an age-depth model correlating to those known dates (fig. 2).

age cal BPerrordepth (cm)
Figure 1. This chart shows the 8 C14 dates that have been collected from the context of the current archaeological assemblage. Dates and associated errors are given in calibrate years (cal) Before Present (BP) at the related depth the material was recovered (cm).
Figure 2. Age depth model produced from the 8 C14 dates. The top left image represents Monte-Carlo simulations of the data. Top middle image shows a histogram of sediment accumulation at a rate of 50yrs/cm. Top right image shows a similar histogram of ‘memory’ which, from my understanding, relates to the associated prior information with the strength to produce accurate inferential data. The central image shows the predicted sediment accumulation for each age per depth. Shaded area represents a 95% confidence interval, blue figures represent the predicted ages of the dated material.

How to Cook the Bacon

              There is a large number of components associated with creating an age-depth model within the ‘rbacon’ package. I have explored multiple different approaches with certain settings set to ‘on’ and others set to ‘off’. Now, one can simply add the dates recovered from the context in question and run the program, but there is a lot of information to think about and consider implementing to produce stronger and more accurate age-depths. The main algorithm behind this model production (fig. 2) consists of Bayesian statistics which takes prior knowledge and information in account for the improvement of inferential statistical analyses (Blaauw et al 2018).  One of the most important inclusions to consider is known or hypothesized hiatuses. [I am sure you were waiting until fractals came back into conversation] Hiatuses have been shown by Sadler (1999) and Plotnick (1986) to create a process of temporary deposition followed by an erosional event. A hiatus can be thought of as a period of no sedimentation, but it can also lead to a net loss in sediment via erosional events. When these patterns appear in the sediment record at small time frames it is understood through a net-gain/net-loss system. When the time frame is broadened, the pattern identified in the record appears to represent a net-gain and net-loss sedimentation rate of zero. When this model is plotted on a graph, the figure resembles stairs when the x-axis represents time and the y-axis represents depositional rates. Thus, the horizontal segments of the stairs resemble a period of no deposition within a broad scale of time, henceforth a hiatus is brought into the picture [and fractals are born]. In the archaeological sciences, Schiffer (1987) hypothesizes that these types of events can displace and reposition cultural materials through both natural and cultural transformation of the area. Therefore, it is important to implement as many known or hypothesized dates for a hiatus or other surface altering events (i.e. volcanic eruptions, landslides, etc). Unfortunately, there is no known or hypothesized hiatus within the context of this archaeological assemblage because it requires a more in-depth look at the geological processes during this time in the region; something that might be great for future work and interpretation.

              Thinking about hiatuses in terms of sedimentation and cultural activity are a separate, yet intriguing, concept to consider in this instance. A hiatus in sedimentation can often represent relative stability in the land form and allow for a pedostratigraphic horizon to begin forming. Pedostratigraphic layers, from a geological view, are horizons within sediment that allow soils and organic materials to form. [People really like soil and do not always enjoy rocky or sandy surfaces] The presence of pedostratigraphic soils often represent ancient surfaces that have been buried by the sediment of the past 6 million days. These ancient surfaces have a high probability of containing cultural materials because soils seem to attract human activities. To summarize, when we have a hiatus in sedimentation, we have stability in the land form, so we get soil formation, and henceforth a lot of human activity. [In a simple format, sediment hiatus = lots of people] On the other hand, a hiatus in human activity can sometimes, not always, represent instability in the land form and thus, rapid sedimentation. [From this, high sedimentation rate = minimal rates of human activity] Now, can dense human activity/occupation indicate a hiatus in sedimentation, and can rapid sedimentation predict a hiatus in human activity? The result from the age-depth model production will be discussed in the next section and I will briefly explain how the scope and time frame of investigation [spoiler alert: the model does not seem viable to answer my question] needs to be much tighter than the current results to determine how sedimentation at this site affected human activity.

What does the Bacon look like When it is Done?

              As you can see in figure 2, there is a pretty straightforward linear relationship between the dates I provided and the predicted depth per time interval. As previously mentioned, the inclusion of many more dates would further define the sedimentation rate through time (Blaauw et al 2018). Unfortunately, the process of collecting further dates may be beyond the scope [pun intended] of my thesis project. In order to answer my question about the temporal variation in human behavior, I would like to have dates that are much closer together and can identify patterns of sedimentation by century, at the very least since human movement and activity varies so sporadically through time. One aspect that may help to implement human activities with age-depth models is derived from the proposed question I left in the previous section associated with choosing large human occupational evidence as a hiatus in sedimentation. Regardless, this perspective may prove to be beyond the goals of my current research endeavors.

              So, what does this current model (fig. 2) show? Within the parameters of the main age-depth function ‘Bacon’, I assigned the minimum depth at 0cm and maximum depth at 90cm. Furthermore, the minimum date was set to 13,000 cal BP and the maximum set to 16,500 cal BP. This was done because any date beyond these lies outside the archaeological context containing the current artifacts and dates. In the model, the light blue shapes indicate the statistically estimated dates that came from the 8 C14 samples (fig. 1). The shaded region around the red dashed line represents confidence intervals. It is clear that the more dates the model has, the tighter the confidence interval will be. This is evident by the expansion in the confidence interval ranging from 60-80cm. Ultimately, with an accumulation rate of 50yrs/cm, there appears to be a slight increase in deposition starting at a 25cm depth and a similar decrease in sediment deposition around the 55cm depth. Of course, this is just the surface of capabilities within the ‘rbacon’ package. From here, one would be able to identify deposition rate by year or by depth by producing graphics representing changes in depositional patterns (fig. 3). Figure 3 can be interpreted as the sediment accumulation rate per depth in centimetres and age in cal BP, respectively.

              Ultimately, I am unsure how I will integrate this analysis into the interpretations of human behavior and occupancy at this archaeological site because, as I have noted previously, the main requirement for small scale studies requires a large amount of known dates for specific events or materials. That being said, the theory and understanding of how sediments may have accumulated throughout the time at this site is very important to consider when striving to make archaeological inferences and arguments. In the near future, I will begin to examine the temporal scope of artifact placement with sedimentation in mind. For this analysis, I will assume vertical displacement represents temporal displacement and analyze the rate of artifact deposition by depth to determine aggregates through time. Thus, allowing me to examine an observed temporal displacement of artifacts which differs from blog post 1 and 2  by adding the third dimensions, that is time.

Figure 3. The chart on the left represents accumulation rates for sediment by depth (cm). The dashed line shows the confidence interval. The chart on the right represents a similar pattern in accumulation rates by date (cal BP).


Bailey, Geoff. “Time perspectives, palimpsests and the archaeology of time.” Journal of anthropological archaeology 26, no. 2 (2007): 198-223.

Blaauw, Maarten, J. Andrés Christen, K. D. Bennett, and Paula J. Reimer. “Double the dates and go for Bayes—Impacts of model choice, dating density and quality on chronologies.” Quaternary Science Reviews 188 (2018): 58-66.

Mandelbrot, Benoit. “How long is the coast of Britain? Statistical self-similarity and fractional dimension.” science 156, no. 3775 (1967): 636-638.

Plotnick, Roy E. “A fractal model for the distribution of stratigraphic hiatuses.” The Journal of Geology 94, no. 6 (1986): 885-890.

Sadler, P. M. “The influence of hiatuses on sediment accumulation rates.” In GeoResearch Forum, vol. 5, no. 1, pp. 15-40. 1999.

Schiffer, Michael B. “Formation processes of the archaeological record.” (1987).

A Century of Oregon Salt Marsh Expansion and Contraction (Part III Draft)


As a reminder, my ultimate research goal is to quantitatively characterize salt marsh volumetric change from 1939 to the present within Oregon estuaries using a combination of sediment cores, for which I have excess 210Pb-derived sediment accumulation rates, and historical aerial photographs (HAPs), from which I plan to measure horizontal expansion and retreat. For my Exercise 1, I successfully georeferenced and digitized HAPs from Alsea Bay that are roughly decadal, spanning 1939 to 1991. The next major step was analyzing my data using the USGS Digital Shoreline Analysis System (DSAS; Himmelstoss et al. 2018). However, prior to analyzing my data in the DSAS framework, I had to first tie up a number of loose ends, which I did for my Exercise 2; these included determining an uncertainty associated with each shoreline layer, attempting to account for differences in shoreline resolutions related to different spectral resolutions of each aerial photograph, and determining modern shorelines (2001, 2009, and 2018). My next tasks included answering the following questions:


  1. What final edits must be made to successfully run DSAS on my Alsea Bay shoreline layer?
  2. What are the net rates of change in Alsea Bay salt marshes? (sub-question: from those that DSAS calculates, what are the most appropriate statistics to characterize rates of change?)  
  3. How does the Alsea Bay salt marsh shoreline change over time?


Step 1: Reduce unnecessary complexity in the shoreline layer

First, I removed all upslope edges on the fringing marsh since these are not of interest and all digitized using the same PMEP boundaries. I then removed all unnecessary complexity within the salt marsh complexes, especially within the northern portion of the larger island and the southern portion of the fringing marsh.

Step 2: Determine best DSAS settings using trial and error

Next I began to play around with the DSAS settings to produce the best results (assessed visually). These included:

  1. transect search distance to remove transects bridging the gap between the largest and smallest islands. This may require running each marsh complex separately (which I would rather not due so that all transect symbology are on the same scale). 
  2. Play around with transect smoothing to reduce the number of intersecting transects in areas with complex shoreline.  
Figure 1. Edited long-term rates of salt marsh change for Alsea Bay (LRR = linear regression rate).

Step 3: Calculate DSAS statistics

To calculate DSAS statistics, I selected the Calculate Rates tool, selected “select all” statistics, applied an intersection threshold of a minimum of 6 shorelines, and changed the confidence interval to 95%. DSAS reports the following statistics:

  1. Net Shoreline Movement (NSM): NSM is the distance between the oldest and youngest shorelines
  2. Shoreline Change Envelope (SCE): SCE is the greatest distance among all the shorelines that intersect a given transect, these values are therefore always positive
  3. End Point Rate (EPR): EPR is the NSM value divided by the time elapsed between the oldest and youngest shoreline; even if the shoreline contracts, then expands so that the maximum distance (SCE) is not the same as the distance between the oldest and youngest layers, EPR and NSM values are still calculated based on the oldest and youngest layers
  4. Linear Regression Rate (LRR): LRR is calculated by fitting a least squares regression line to all shoreline points for a transect; all transect data is used in LRR regardless of changes in shoreline trend or accuracy; this method is susceptible to outlier effects and tends to underestimate the rate of change relative to EPR. DSAS additionally provides the standard error estimate (LSE), the standard error of the slope given the selected confidence interval (LCI), and the R-squared value (LR2) for each LRR value    
  5. Weighted Linear Regression (WLR): WLR is very similar to LRR except it gives greater weight to transect data points for which the position uncertainty is smaller. DSAS additionally provides the standard error estimate (WSE), the standard error of the slope given the selected confidence interval (WCI), and the R-squared value (WR2) for each LRR value      

Step 4: Rerun all previous steps with sets of dates from 1939 to 2018

Using all of the same steps DSAS, I reran each group of shoreline layers from 1939 to 2018. Following each run, I had to clip each rate layer using the DSAS Data visualization tool to fit the size of the transects specific to the timeframe of interest. I additionally changed the color ramps to be consistent for all 8 time points. LRR and WLR are not calculated for only two shoreline layers so EPR will be the primarly statistic I use for this time series analysis.

Step 5: Analyze DSAS statistics

I exported the DSAS rates files using the ArcMap Table to Excel tool. I then plotted and analyzed the data in MATLAB. I made histograms with Kernel smoothing function fits using the function histfit. I chose kernel smoothing over a normal distribution fit because the normal distribution fit did not effectively display the uneven tails or the humps on the sides of the NSM or EPR plots. I additionally made box and whisker plots of the EPRs over time for each marsh complex: fringing, large island, and medium island.   

Preliminary Results

Figure 2. Histograms of the DSAS statistics calculated for all Alsea Bay salt marshes of interest.

The histograms of the DSAS net shoreline movement for the entirety of Alsea bay approximates a normal, gaussian curve, with the majority of points falling the 0 m, indicating net erosion of the Alsea Bay salt marshes from 1939 to 2018. This could indicate widespread drowning of these marshes. These results confirm similar patterns of net vertical accretion rates lower than 20th relative sea level rise rates for the bay (Peck et al. 2020). Rates of growth (EPR, LRR, WLR) for the entire bay are additionally approximate a normal distribution and tend to be negative. LRR and WLR are likely better predictors of long-term rates of change since they better account for intermittent periods of erosion and contraction. Though the uncertainty does not vary much between shoreline year (~1 to 4 m), I will likely use WLR to ensure uncertainty is incorporated appropriately.

Figure 3. Histograms of the DSAS statistics calculated for individual Alsea Bay salt marshes of interest.

Preliminary results comparing the three salt marsh complexes display similar patterns as the total bay histograms. The large island values tend to display a greater range of movements and rates, especially towards growth. Despite these growth rates in the large island salt marsh, the majority of observations still skew towards erosion.     

Preliminary analysis of the box plots shows that the EPRs have changed through time for each salt marsh complex. Box plots were likely not the best way to visualize this data so I plan to focus on this moving forward.

Figure 4. Box plots of EPRs over time for each individual salt marsh within Alsea Bay.


I have no negative critique of my methods so far for Exercise 3. I’m very excited with how the project is progressing. I think DSAS is a great method for addressing my questions. I’m still working on the best way to plot the figures, however. One next step I envision is creating a GIF displaying the colored transects that cycles through each shoreline layer from 1939 to 2018. Additionally, I would like to plot the mean EPRs for each time step for each salt marsh complex as bar plots with ranges. I think this will display trends better. I additionally think it’s time to start incorporating my salt marsh accretion data!     


Himmelstoss, E.A., Henderson, R.E., Kratzmann, M.G., & Farris, A.S. (2018). Digital Shoreline Analysis System (DSAS) version 5.0 user guide: U.S. Geological Survey Open-File Report 2018–1179, 110 p., ofr20181179.

Peck, E. K., Wheatcroft, R. A., & Brophy, L. S. (2020). Controls on sediment accretion and blue carbon burial in tidal saline wetlands: Insights from the Oregon coast, USA. Journal of Geophysical Research: Biogeosciences, 125(2), e2019JG005464.

Exploring GPS Collar Fix Success as a Function of Environmental Characteristics

Global Positioning Systems (GPS) technology has improved the ability of wildlife biologists to gain greater insight into wildlife behavior, movements and space-use. This is achieved by fitting wildlife, such as deer, with a collar equipped with a GPS unit. My research relies on GPS data representative of Columbian black-tailed deer (Odocoileus hemionus columbianus) in western Oregon to assess habitat and space use. Deer were captured and monitored from 2012-2017 in the Alsea, Trask, Dixon and Indigo Wildlife Management Units. Although this technology offers plentiful opportunities to answer research questions, GPS data is accompanied by errors in the form of missed fixes or inaccurate locations that should be corrected prior to conducting analyses. A common cause of GPS error is physical obstruction between the GPS collar and satellites such as topography or vegetation. One way to correct for habitat-induced-bias is to place GPS collars at fixed locations with known environmental characteristics and model GPS fix success rates (FSR) as a function of topographic and vegetative attributes. Nine GPS collars were left out for 5-7 day intervals at fixed locations (n=53) in the study area in February and March, 2020. Fix schedules matched the same fix schedule that was used on the deer collars. Locations were chosen specifically to represent a variety of terrain and vegetation in the study area. Results from the stationary collar test can be used to create a predictive map of FSR, and use as weights in subsequent habitat selection regression analyses.

  1. Question

1.1         What is the spatial extent of the influence of sky availability on GPS collar fix success?

1.2         What vegetative and topographic features best predict average GPS collar fix success rates?

  1. Name of the tool or approach that you used.

2.1         I used focal mean statistics (a form of moving window analysis) to derive sky availability values at varying spatial scales. I performed focal mean statistics in R using the ‘raster’ package, and the ‘Extract Multi-Values to Points’ tool in ArcGIS to obtain values from test site locations. I used the ‘lme4’ and ‘MuMIn’ packages in R to perform linear regression modeling and model selection.

2.2         I used generalized linear mixed regression modeling to predict fix success rates as a function of environmental attributes. I used the ‘lme4’ and ‘MuMIn’ packages in R to perform linear regression modeling and model selection.

CovariateData typeResolutionSourceSoftwareTool/Package
Canopy Cover %Raster30×30GNN  
Slope°Raster10×10Derived from DEM rasterArcGISSlope (Spatial Analyst)
Aspect°Raster10×10Derived from DEM rasterArcGISAspect (Spatial Analyst)
Land coverVector30×30Oregon Statewide Habitat Map- 2018  
Sky availability%Raster10×10Derived from DEMArcGISSky-view Factor (Relief Visualization toolbox)
Sky Availability% of varying focal windowsRaster10×10Derived from Sky Availability rasterR‘Raster’
Table 1. Most explanatory variables for the stationary collar test were measured on the ground by technicians (canopy cover, slope, aspect, land cover type) but I created the sky availability variable (described in 3.1). The variables described here will be used in ‘Exercise 3’ to create a predictive fix success map across the study area.

  1. Brief description of steps you followed to complete the analysis.

Response variable: Fix Success Rate

GPS fix data was downloaded from each of the 9 GPS collars, and FSR was calculated for each test site by taking the number of successfully obtained fixes by total number of attempted fixes. Collar data includes: coordinates, date, time, the number of satellites that were used to obtain a location and Dilution of Precision (DOP). DOP is calculated by the GPS collar, and is a measurement of accuracy based on satellite quantity and position. The higher the DOP, the less accurate the fix record is.

3.1         Moving Window Analysis

I loaded the sky availability raster into R and created new rasters with the ‘focal’ function. Six different window sizes were subjectively chosen based on what distances seemed reasonable to me to test: 30x30m, 50x50m, 110x110m, 210x210m, 510x510m, 1010x1010m. With this technique, each focal cell is reclassified as the average sky availability within each moving window.

After creating new raster data, I imported the raster stack of different scales into ArcGIS, along with the coordinates of the stationary collar test locations (n=53). I extracted the sky availability values using the ‘Extract Multi-Values to Points’ tool, and exported the attribute table as a .csv file to bring back into R.

A linear mixed model was used to predict FSR as a function of sky availability per test site, with collar ID as a random intercept because fix success can also vary between individual collars. I ran 7 univariate models and used corrected Akaike’s Information Criterion (AICc) to perform model selection. The top model was used in subsequent analyses.

m.SA1 <- lmer (Fix.Success.Rate ~ skyavail_s + (1|CollarID), 
m.f3 <- lmer (Fix.Success.Rate ~ f3_s + (1|CollarID), 
m.f5 <- lmer (Fix.Success.Rate ~ f5_s + (1|CollarID), 
m.f11 <- lmer (Fix.Success.Rate ~ f11_s + (1|CollarID), 
m.f21 <- lmer (Fix.Success.Rate ~ f21_s + (1|CollarID), 
m.f51 <- lmer (Fix.Success.Rate ~ f51_s + (1|CollarID), 
m.f101 <- lmer (Fix.Success.Rate ~ f101_s + (1|CollarID),   
skyavail.AICc <- model.sel(m.SA1,m.f3,m.f5,m.f11,m.f21,m.f51,m.f101)
Univariate generalized linear mixed model set that was used to compare FSR as a function of average sky availability at varying spatial moving window sizes.

3.2         Collar FSR Linear Regression

Most variables were measured on the ground by technicians (canopy cover, slope, aspect, land cover type) but sky availability was obtained as described above.

Using a generalized linear mixed model framework, I developed competing models using test site as the sample unit, and collar ID as the random intercept. Models were considered competitive if they were within 2 AICc units of the top model, and model selection was performed in a tiered approach. The first tier model set included univariate models of scaled covariates, interactions and any non-linear forms that seemed appropriate. Competitive models would proceed to subsequent modeling efforts.

  1. Brief description of results you obtained.

4.1         Moving Window Analysis

Model selection output (Table 2) resulted in an ambiguous outcome. It appears that sky availability values within the moving window size of 110x110m is the magic number that best predicts variability in FSR, however the evidence for this is not very strong. All models appear competitive with delta AIC <2, and the top model only carrying 3% more of the weight than the next most competitive model.

Table 2. Model output from sky availability moving window analysis. All models were competitive (within 2 AIC units from the top model). The ‘top’ model was used in subsequent modeling for the purposes of this exercise.

4.2         Collar FSR Linear Regression

Judging from the raw data, the only nonlinear term that seemed appropriate was the quadratic form of canopy cover (Figure 1), and was the most competitive model with 73% of the weight in the model set (Table 3), and a conditional coefficient of determination of 0.72. The scaled covariate of canopy cover was the next most competitive model, but with a delta AICc score of slightly >2.

Figure 1. GPS fix success rate as a function of environmental attributes, and was used to inform non-linear relationships to include in model sets.

Table 3. Model selection results relating FSR to topographic and vegetative variables in western Oregon. The most competitive model was the quadratic form of canopy cover.

  1. Critique of the method – what was useful, what was not?

5.1         Moving Window Analysis

My next steps are to re-examine the focal window type I used, and instead try an annulus style instead of the entire square to calculate the average sky availability values. This is in hopes that I will get a more clear result in which spatial scale is most important in sky availability influencing FSR. The ‘focal’ function was very easy and efficient.

5.2         Collar FSR Linear Regression

I was a little bit surprised that topography didn’t play a stronger role in influencing FSR, but canopy cover does make sense. However I am surprised that the quadratic form of canopy cover explained the most variation in FSR, because it doesn’t make sense that at lower canopy cover proportions FSR would also be low.

Moving forward, I plan to filter the fix data to reclassify locations that are likely to be inaccurate using cutoff values for number of satellites and DOP values. Doing so will decrease the overall FSR, and introduce more variability into the data and could potentially show different results.

Exercise 2: Assessing relationship between zooplankton density and kelp

Question asked

Following exercise 1, (where I compared different interpolations to estimate zooplankton density), I wanted to make my interpolations more ecologically informed. I hypothesize that zooplankton will be found closer to kelp and that with increasing distance to kelp, there will be decreasing zooplankton density. Therefore, my question for exercise 2 was “Does zooplankton density decrease with increasing distance from kelp?”.

Approaches/tools used and methods

I used a simple linear regression in RStudio to address this question.

At the end of every field season, we use the theodolite to trace outlines of surface kelp. This produces many lat/lon points. Therefore, I first needed to create a kelp polygon layer from all of these points in R. Since I have never worked with spatial data in R before, it took me quite some time to figure out how to do this and to ensure that I was working in the correct coordinate system. 

Next, I needed to calculate the distance between each of the sampling stations and the edge of each of the kelp polygons. Once this was done, I needed to select the kelp patch with the minimum distance for each of my 12 sampling stations.

Finally, I plotted the zooplankton densities for each of those stations against the minimum distance for each station, as well as ran a simple linear regression to see what the relationship between zooplankton density and distance to kelp is.


I was only able to run this analysis on 2017 and 2018 data because I am not yet finished with scoring the 2019 images for zooplankton density. However, the relationship is the same for both 2017 and 2018, namely, that zooplankton density decreases with increasing distance to kelp.

Since there is a big jump in distances to kelp (this is because two of the sampling stations are ‘controls’ which are over sandy bottom habitat where there is no kelp nearby), I also decided to log both axes to try and normalize the distance axis.


While I would have like to have explored a more complex tool or technique, this exercise has taught me to always keep your data in mind and to acknowledge the limitations that are contained within it. I only have 12 sampling stations and so am rather limited in the analysis that I can reliably run. Sometimes our data truly conform to a simple solution, which in my case, the simple linear regression seemed the most straightforward and obvious solution to answering my question about whether or not zooplankton density and distance to kelp are correlated. I feel pretty confident in the R2 and p-values that the above regressions show and so my next step will be to try and figure out how I can incorporate this linear function into my interpolations.

Exercise 2_Relationship between Raod Density and Landslide Buffer Zone


In Exercise 1, I tried to generate point data for the entire Oregon landslide. Due to the limitations of survey, the generated hot spot analysis does not show the situation of Oregon landslide. So in Exercise 2, I will first generate a hot spot analysis map of the relevant landslide volume, and select about 30 points of data through the hot spot analysis map to create a buffer zone. After that, the road density of the analysis area is calculated. By comparing the relationship between different buffer radius and road density, analyze which spatial scale are the relationships strongest.

2.Name of the tool or approach that you used

a) Hot Spot Analysis based on Volume of Landslide

Hot spot analysis is used, field is volume, Through the hot spot analysis map, find the gathering point, based on these points, select about 30 points to create a buffer zone as a reference object for comparison later.

b) Create a Buffer and Mutiple Ring Buffer

Buffer zone is created for these selected points, two different radius buffer are created, one is 250 meters, other is 500 meters. The reason to create buffer is to compare with road denstiy, to find strongest relationships in those spatial scale.

c) Line Density

In ArcGis Pro, the roads in analysis region are inport, and then using line density tool to calculate road density. As the reference for later comparison.

3. Results

Figure 1. Portland Landslide Hot Spot Analysis

Figure 1 shows a heat map of landslides occurring in the city of Portland based on landslide volume. It can be found that the landslides in the mountains near Oregon city are similar in volume and cluster. Therefore, the buffer in Exercise 2 will be created around the surrounding point data.

Figure 2. Main Regoin Road Density Map
Chart 1. Main Region Road Density

Figure 2 shows the distribution of road density in the analysis area. The darker the color, the greater the road density. Through observation, landslides are roughly concentrated in the densest areas of roads. Table 1 shows the average value of road density in the area, which is 0.316.

Figure 3. 250m Buffer zone map
Figure 4. 500m Buffer zone map

Figures 3 and 4 show the relationship between roads and landslides in buffer zones with different radii. Although most of the landslides occurred around the mountains, it can be found that in the buffer zone of 250m and 500m, the greater the road density, the higher the impact. Analyzing Figure 3, Figure 4 and Figure 2 together, it is found that if there is a greater road density around the mountain, the more landslides that occur, the larger the affected area.

5. Critique of the method

In Exercise 2, my data is mainly concentrated around Oregon city in Portland, so when using hot spot analysis, I think it is efficient. The hot spot analysis based on the volume of landslides expresses whether the landslide data is clustered in Portland, and where is the most approximate data set.
When using the buffer tool, due to the different data volume of different points, when creating a buffer zone, I am not sure whether to create a buffer zone based on its volume, or all data have a uniform radius.But for this exercise, this tool is effective.

Hongyu Lu

Lab blog – Exercise 2 _ Crop fires and human health

  1. Question that you asked

With the data I had, I was interested in examining whether proximity to location of crop fires was associated with individual’s health outcomes. The reason that I had restricted my analysis to crop fires was because I want to explore whether there is a health-related argument for making illegal the burning of crop residue – which is the source of these fires.

  • Name of the tool or approach that you used.

I estimated the following logit regression model:

Hit = βo + β1EXPOSUREit + eit

where H is a binary variable for reported illness =1 if individual reported illness and 0 otherwise. Exposure is the count of fires experienced by the household in 5, 10 and 15 km distance respectively. It may be possible as a next step to control for other factors at a broader scale e.g. by including district level factors such as population and road density in the above regression model.

  • Brief description of steps you followed to complete the analysis.

In my data, I had point locations of daily fires as well addresses of household.

  • In order to better manage my data I assigned week of fire to my entire daily fire dataset. I then aggregated my data into 52 weekly fire files. Each fire had an associated longitude and latitude.
    • I was able to write out a code in stata that allowed me to calculate weekly counts of fires within a 5km radius. This resulted in 52 variables (weekly fires) with values corresponding to the count of fires in a buffer distance for each household.
    • I then summed up fires for the 10 weeks prior to the date of the interview to calculate the exposure variable for each household.
    • I repeated step (ii) and (iii) for a 10k and 15km distance.
    • As a last step before the regression I merged this with the individual level datafile.
  • Brief description of results you obtained.

Looking at the distribution of our sample (Table 1), it does not appear that any single demographic has a disproportionate exposure to fires. Across genders and all age cohorts 40% of individuals have had exposure to at least one fire within a 15km distance. About 15 % have had exposure to at least 1 fire within a 5km distance over the past 10 weeks. The count and percentage of individuals who have been exposed to fires within a 5km distance is highest in the province of Punjab. Given the distribution across provinces, it may be worthwhile to restrict the analysis (logit regression) to a sample based on Punjab and Sindh.

 5km 10km 15km 
    Up to 5 years4,18715.48,55031.41179543.3
    6 – 17 years10,58514.921,37830.32905341.3
   18 – 60 years11,12314.722,90730.23108441
    greater than 601,147152,39931.4324842.5
Table 1: Sample Distribution
Note : The table presents count and percentage of individuals with at least 1 fire within a 5 km,10 km and 15 km buffer distance – in a 2-month period. Percentages are rounded to one decimal place.

The results of the logit analysis are presented in Table 2 below. The magnitude as a percentage of mean for two of the dependent variables – respiratory illness and blood pressure suggests a substantial positive association and confirms the original hypothesis. However, these are preliminary estimates and other explanatory variables will need to be included to gauge the impact of exposure to cropfires.

Table 2: Logit regression for the impact of exposure on reported illness
Respiratory Illness0.017*0.007**0.003*
Mean = 0.013(0.009)(0.003)(0.002)
Cough, cold, fever0.00020.002*0.001**
Mean = 0.15(0.003)(0.001)(0.001)
Blood pressure0.032***0.016***0.0079***
Mean = 0.0130.0000.0000.000
*** p<0.01, ** p<0.05, * p<0.1. Standard errors reported in parenthesis.

Critique of the method – what was useful, what was not?

The logit regression is well suited for a binary dependent variable model and in this case, it is useful for estimating overall association in illness and exposure. It is also possible to predict errors from the regression which can be tested for spatial autocorrelation. If error terms are spatially correlated, it would suggest that the incidence of illness is spatially correlated. However,  I was interested in exploring the temporal distribution of the fires, but I was not able to use the logit analysis for that purpose.

Unmanned aerial vehicle (UAV)-based photogrammetric approach to delineate tree seedlings

Exercise 2: Assessing the growth of artificially regenerated seedlings using UAV imagery

  1. The question that is addressing in this study

Evaluation of tree health and growth after artificial regeneration is an important task, and monitoring of seedlings after the regeneration potentially provide vital attributes for the forest managers to make future decisions. However, monitoring can be unique to site characteristics and may complex based on site accessibility and terrain conditions. Hence the conventional monitoring approaches become more challenging. With the emergence of remote sensing technology, especially the applications of unmanned aerial vehicles (UAVs) provide a novel approach for seedling detection and monitoring, even in challenging terrains and remote areas (Buters et al., 2019).

After the artificial regeneration of seedlings, the growth and health of each seedling may depend on different parameters such as soil type, exposure to sun and wind, drainage conditions, space constraints, insect and disease susceptibility, etc (International Society of Arboriculture., 2011). UAV based monitoring approaches have high potential to classify and identify planets in disturbed forest environments and to monitor plant health as well (Berni et al., 2009; Lehmann et al., 2017). Thus, in this exercise, I am interested in developing an automated method to detect individual seedlings and their attributes (i.e., location, height, canopy area, number of trees in a given plot, etc.) from UAV based imagery (as the first step,). Additionally, in the second step of this study is focused on evaluating “How does the surrounding material affect the health of seedlings and its growth.”

2. Name of the tool or approach used, and steps followed.

Step 1:  Detecting individual seedlings and their attributes

Canopy height model

I used R software packages to detect the seedlings (automatically) from the ortho-mosaic image.

  • Packages used: lidR, raster, sp, raster, rgdal, and EBImage

As the initial step, point cloud data were extracted using AgiSoft software (Fig.1(a)). Then the lasground function in lidR package was used to classify the points as ground or not ground by using the “cloth simulation filter” (Fig.1 (b)). The points which were classified as ‘ground’ are assigned a value of 2, according to las specifications (The American Society for Photogrammetry & Remote Sensing, 2011). In the next step, I used the lasnormalize function in lidR package to remove the topography from the point cloud created (Fig.1 (c)). Next, the normalized point clouds were cropped with the area of interest (AOI) shapefile (Fig. 2 (d)) using lasclip function.

Figure 1. Map of (a). point clouds generated over the study site, (b). point clouds after the cloth simulation function performed, (c). normalized point cloud, and (d). finalized point cloud mapped in AOI.

Then I used the output of the lasclip function to make the canopy height model, using grid_canopy command, and use 0.01m as the resolution, which is similar to the resolution of the raster image. Next, dsmtin algorithm was used to create the digital surface model.  After that, a variable window filter algorithm was used to identify the treetops from a canopy height model. The minimum height for the detection of seedling was considered as 0.2 m for the canopy height model for a potential treetop. All canopy height pixels beneath this value will be masked out. The threshold of minimum height (0.2m) was used to minimize the effect of woody debris and small shrubs. Finally, tree seedling locations were exported to ArcMap and plotted them on top of the AOI raster to observe the performance of the algorithm.

Figure 2. Map of (a). area of interest (b). canopy height model, (c). locations detected of seedlings, and (d). canopy areas of detected seedlings.

Step 2: Geographically weighted regression

Image classification tools available in ArcMap 10.7:

  1. Supervised image classification (SIL): Maximum Likelihood Algorithm (explained in blogpost 1)
  2. Spatial Analyst Tool: Tools available in Zonal
  3. Geographically weighted regression

Quantifying the surface materials

In order to identify the surface material surrounding each seedling, a supervised classification was performed. The supervised classification approach is more accurate and minimizes the effect of shadow and other noises compared to the unsupervised classification. The raster image was classified into five classes, including tree seedlings, woody materials, grass, barren land, and barren land.

As the next step, I created 3 types of buffer zones around each seedling with 1m, 2m, and 3m buffer radius (Fig.3 (a)). Next “Tabulate Area” option that is available in the “Spatial Analyst Tool” was used to extract the corresponding area for each class (Fig.3(b)).

Figure 3. Map of (a). 1-meter buffer zones around each tree seedling, and (b). a schematic diagram showing the extracted classes from the classified image by using a 1-meter buffer zone.

Geographically weighted regression

Geographically weighted regression is an extension of the traditional regression approach, which allows providing local estimation for each location instead of a sole regression for the entire area. In this regression, approach observations are weighted concerning their proximity to a point (point I), which is determined by the kernel size. “This ensures that the weighting of observation is no longer constant in the calibration, but instead varies with point I. As a result, observations closer to the point I have a stronger influence on the estimation of the parameters for location I.” (The Landscape Toolbox, 2012)

To determine the effect of surrounding surface materials on seedling growth, two types of geographically weighted regression analyses were performed using ArcMap 10.7.  (1). Area of bare ground (Fig. 4) and (2). amount of woody debris (Fig.5) within three buffer zones (1m, 2m, and 3m) considered as the independent variable while estimated tree height considered as the dependent variable.

  • Effect of bare ground

Generally, bare ground indicates many indicators that are related to subsurface soil conditions such as insufficient nature of nutrients, the behavior of subsurface water table and water holding capacity. Hence, the presence of bare ground around the seedling can be used as an indirect indicator to predict the health conditions of seedlings.

Figure 4. Map of geographically weighted regression by considering seedling height as the dependent variable and area of bare ground within (a). 1-meter buffer, (b). 2-meter buffer, and (c). 3-meter buffer as the independent variable.

  • Effect of woody debris

The presence of woody debris around seedlings can cause both positive and negative impacts. For example presence of old tree stumps and debris act as microsites: especially when the seedlings shaded by stumps and logs tend to grow well particularly on hot, dry sites (Landis et al., 2010) However, the presence of woody debris/slashes may act as an obstacle to seedlings and limit the accessibility to sunlight and seedling growth. To study the effect of woody debris and their impacts on seedling growth I performed another geographically weighted regression by considering three types of buffers (Fig.5).

 Figure 5. Map of geographically weighted regression by considering seedling height as the dependent variable and area of woody debris within (a). 1-meter buffer, (b). 2-meter buffer, and (c). 3-meter buffer as the independent variable.

Additionally, I created a map of normalized difference green/red normalized green, red difference index (NDGRI) to validate the results obtained by geographically weighted regressing (Fig.6 (a)). Furthermore, I generated a map of geographically weighted regression by considering the seedling height as the dependent variable and canopy area of seedling in the 1-m buffer zone as the independent variable by assuming healthy seedlings are taller and have larger canopy area.

Figure 6. Map of (a). normalized difference green/red normalized green, red difference index (NDGRI), and (b). geographically weighted regression by considering seedling height as the dependent variable and area of canopy cover within 1-meter buffer as the independent variable.


  • Bare ground coeffects:

Results obtained by geographically weighted regression using height and bare ground showed a decreasing trend of coefficient values with increasing the buffer size (Fig.7 (b)). Furthermore, with a given buffer zone, the values of the coefficients are not ranging drastically (only change can be observed in the fourth decimal place of the coefficients). These results suggest that the presence of bare ground within 1-meter and 2-meter buffer zones have a considerable impact on seedling growth. Obtained trends indicate that seedlings located in the northwest part of the study area have lower coeffects values, while seedlings located in the southeast part have relatively higher negative coefficient values (Fig.7(b)). A similar type of trend was observed for the area of woody debris as the independent variable: especially for woody debris within a 1-meter buffer zone, shows positive coefficients for seedlings that are located in the northwest part of the study region. Further, magnitudes of the coefficients are lower with increasing buffer size. Additionally, I evaluated the variation of coeffects (from both geographically weighted regressions) with seedling heights (Fig. A1, Appendix A). The observed results indicate that taller seedlings always tend to show either positive or lower negative coefficients.  

Figure 7. (a). Map of seedling heights and corresponding seedling I.D.’s, plots of coefficients of geographically weighted regression by considering seedling height as the dependent variable, and (b). area of bare ground, and (c). area of woody debris within (1). 1-meter buffer, (2). 2-meter buffer, and (3). 3-meter buffer as independent variables. Trends of coefficients on the x-axis and seedling ID. on the y-axis.

By considering the observed results, we can interpret that area of the bare ground, and the presence of woody debris have an impact on the growth of seedlings. For example, trees growing surrounding more woody debris have the highest (negative) coeffect values and lowest heights. Similarly, areas that are prominent with bare ground show a negative effect on seedling growth. Results illustrated in figure 6 proves that trees growing in areas where bare ground and woody debris are prominent does not show a healthy seedling population. Overall, the results obtained from geographically weighted regression indicate the type and amount of material around the seedlings are vital, especially for their health and growth.

  • Possible background factors

Woody debris: generally, the presence of woody debris and their degraded products can provide nutrients to the soil and will help for the growth of seedlings (Orman et al., 2016). However, this process may vary with the size of the woody debris. For example, if debris are coarse, it will take a longer time to degraded and will act as obstacles for seedling growth. This type of observation can be observed in the southeast part of the study region (Fig. 2(a) and Fig.8 (a)). Furthermore, the northwest part of the region shows smaller size woody debris, and possibly they may degrade at a faster rate and provide some nutrients to the soil (Fig.2(a) and Fig.8(b)).

Additionally, down wood act as moisture reservoirs and provide persistence microsites for regeneration seedlings (Amaranthus et al. 1989). However, an excess amount of moisture conditions may negatively affect the seedling growth by rotting the root system of the seedlings. However, the presence of an excess amount of woody debris may tend to accumulate more moisture and will act as an adverse effect for seedling growth (as explained under Geography).

Figure 8. Seedlings surrounded by (a). coarse woody debris, and (b). small woody debris.  

Bare Ground: Visual observation of bare ground can provide some essential facts about the subsurface soil conditions. For example, the bare ground can indicate several factors about subsurface soil conditions(i.e., lack of nutrients, problems associated with groundwater table, soil pH, subsurface geology, etc.). Hence, it is essential to have some prior information about the subsurface soil conditions to understand possible factors that are affecting seedling growth. Overall, we can interpret that seedling growth is lesser with the presence of bare ground; therefore, we can interpret that poor soil conditions hinder seedling growth.  

Geography: To identify the possible effects of terrain conditions, especially for the results obtained from geographically weighted regression, I generated a digital elevation model (Fig. 9) and searched for possible prior information on site conditions. The digital elevation model suggests that the majority of healthy seedlings located in the northwest part of the study area with relatively high elevations (Fig.2(a) and Fig.9). Contrastingly, seedlings growing under stressed conditions in the southeast part of the study region were located at lower elevated areas (Fig.2(a) and Fig.9). Consequently, we can hypothesize that the accumulation of the excess amount of water (groundwater or surface water) in lower elevated areas can negatively affect seedling growth. This might be the indirect cause for the presence of bare ground in lower elevated areas (i.e., stagnant water may damage the root systems of grass and may die due to rotten roots and will expose the bare ground). Similarly, the presence of excess water may hinder the growth of seedlings as well. As a side effect of the presence of groundwater with woody debris may tend to increase excess moisture conditions in subsurface soil layer and will act as a negative factor for seedling growth, especially for their root system.

Figure 9. Digital elevation model of the study area.


Overall, the automated detection of seedlings and derived attributes show promising results. However, the presence of woody debris may affect the detection of trees, especially for small unhealthy seedlings. For example, the developed model will falsely detect piles of woody debris as seedlings. According to the results obtained from the canopy height model, the heights of most of the woody debris are less than 0.2 m. Hence, to avoid this type of interference, I introduced a threshold value (0.2 m) for the canopy height model.

Additionally, lack of prior background information (i.e., ground conditions, soil and hydrological conditions, quantitative information of woody debris, etc.) became quite challenging when interpreting the results of this study. Therefore, the collection of in-situ information would be beneficial for future research activities. Additionally, prior assessment of woody debris and their decaying nature would be valuable as a supportive attribute for estimating seedling health.  

Appendix A

Figure A1. Estimated coefficients from geographically weighted regression for the area of bare ground (top) and area of woody debris (bottom)  with seedlings height.


Amaranthus, M.P.; Parrish, D.S.; Perry, D.A. 1989. Decaying logs as moisture reservoirs after drought and wildfire. In: Alexander, E., ed. Proceedings of the watershed 1989 symposium on stewardship of soil, air, and water resources. R10-MB-77. Juneau, AK: U.S. Department of Agriculture, Forest Service, Alaska Region: 191–194.

 Berni, J.A.J., Zarco-Tejada, P.J., Sepulcre-Cantó, G., Fereres, E., Villalobos, F., 2009. Mapping canopy conductance and CWSI in olive orchards using high resolution thermal remote sensing imagery. Remote Sens. Environ. 113, 2380–2388.

Buters, T., Belton, D., Cross, A., 2019. Seed and Seedling Detection Using Unmanned Aerial Vehicles and Automated Image Classification in the Monitoring of Ecological Recovery. Drones 3, 53.

International Society of Arboriculture., 2011. Tree Selection and Placement [WWW Document]. Tree Sel. Place. URL

Landis, T.D., Dumroese, R.K., Haase, D.L., 2010. Seedling Processing, Storage, and Outplanting, in: The Container Tree Nursery Manual.

Lehmann, J.R.K., Prinz, T., Ziller, S.R., Thiele, J., Heringer, G., Meira-Neto, J.A.A., Buttschardt, T.K., 2017. Open-Source Processing and Analysis of Aerial Imagery Acquired with a Low-Cost Unmanned Aerial System to Support Invasive Plant Management. Front. Environ. Sci. 5.

Orman, O., Adamus, M., Szewczyk, J., 2016. Regeneration processes on coarse woody debris in mixed forests: Do tree germinants and seedlings have species-specific responses when grown on coarse woody debris? | Request PDF [WWW Document]. ResearchGate. URL (accessed 5.8.20).

The American Society for Photogrammetry & Remote Sensing, 2011. LAS SPECIFICATION VERSION 1.4 – R13.

The Landscape Toolbox, 2012. Geographically Weighted Regression [WWW Document]. Landsc. Toolbox. URL

The spatial and temporal correlation between the Particulate Organic Phosphorous (POP) and PP (Primary Production) in Bermuda Ocean.


How the spatial and temporal correlation between the Particulate Organic Phosphorous (POP) and PP (Primary Production) in Bermuda Ocean.

My raw data is unique, where the dates and coordinates of POP and PP is not matching each other, there are only eight data which match, hence I need to adjust it into several ways:

  1. For spatial correlation, I expected that the nearest coordinate of POP to PP will have a very high correlation. In my hypothesis, Primary production is affected by the Phosphorus which came from the dust deposition. The location of this sampling is on the southeast ocean of Bermuda island, hence it is expected that the dust deposition is originated from the Sahara desert.
  2. For temporal correlation, I divided the correlation based on the four seasons. Since the site is in the northern hemisphere, hence winter is from December to February, spring from March until May, summer from June until August, and autumn from September to November. The dust from Sahara will be transported in summer, where the heavier particle will deposition first near to the Sahara Desert and the small particle will deposition latter by the precipitation. Thus, based on this, the hypothesis is there will be a high correlation between the POP and PP in autumn and winter.

Tool and Approach

At first, I would like to use the Neighborhood analysis, however after I accessed the raw data thus, I divided the data based on the nearest distance between POP and PP.

  1. ArcMap 10.7.1. I used join table for POP and PP data which based on the date since the coordinate is different and the FID has a different length.
  1. MATLAB. I used MATLAB to average the data based on distance and generate the R square and scatter plot.

Description of Analysis  

  1. Spatial Category

The spatial category is based on the distance between POP and PP wherein this step I used the Pythagorean formula. Subsequently, I divided the data based on three categories: I (0-0.030 degree), II (0.031 – 0.67 degree), and III (0.68 – 0.9 degree).

  1. Temporal Category

At first, the temporal category is based on all data from 2012 – 2016, and then I also analyzed the temporal pattern based on the distance category.

  1. Time Series plot to see the propagation of POP and PP over five time period.
  2. Scatter plot and Regression analysis

A Scatter plot is done for both spatial and temporal to see the correlation between the POP and PP spatially and temporally. The regression analysis is done to prove the significance of the correlation.


w = regress(POPw,PPw)

tblw = table(POPw,PPw);

lmw = fitlm(tblw,’linear’)


  1. Time Series POP
Figure 1. POP time series in Bermuda Ocean from 2012-2016

From figure 1, we can see there is a lag in temporal patter, wherein 2012, there is no record on may, while in 2014 there is no record from January to February and in 2015 there is no record for January and October. In addition, in 2016, the data for winter is only available in March. However, overall we can see that the POP pattern is clustered from January to August, especially in 2013-2015, where the highest POP is converged in January and April.

2. Time Series PP

Figure 2. PP time series in Bermuda Ocean from 2012-2016

From figure 2, the lag also appears, wherein 2013, there is no record in March, while in 2014 there is no record in January, February, and May. In addition, in 2015 the temporal lag is similar to POP and in 2016 there is no record from January to February. However, almost similar to the POP, the temporal pattern is clustered from January to July, so does with the result, the highest PP is concentrated from January to April.

3. Scatter Plot based on Seasonal

Figure 3. The scatter plot of seasonal pattern between POP vs PP

From figure 3 we can conclude that the high concentration of PP happens in winter which until 20.87 mgc/m3/day which also followed by the PP which until 0.034 umol/kg. However, even the PP in spring is also high, however, we can see that the value of POP is concentrated on 0-0.01 wherein this value, the PP is relatively high. However, after I tried to prove this finding by matching it to the R square, it is found that the highest R-square is on autumn (0.17) compare to another season, where the lowest is on spring (0.00458), followed by winter (0.0773) and summer (0.0846). It is indicated that the deposition of POP in the Bermuda ocean is based on a dry and wet deposition.

4. Scatter Plot based on distance category

figure 4. The scatter plot of POP vs PP for distance 0-0.03 degree

Figure 4 is the nearest distance measurement between POP and PP and the result is not likely what I expected, where in autumn the PP is relatively high, however, unfortunately, it is not followed with the POP value, where the lowest POP value happens on autumn. The R square indicates that in this category, the summer season has the highest correlation (0.227), followed by spring (0.117), autumn (0.00146), and winter (0.000297).

figure 5. The scatter plot of POP vs PP for distance 0.031-0.67 degree

As overall, figure 5 indicates the lowest R score compare to category distance I and III in figure 4 and figure 6, which about 0.0036. However, if we see per seasonal, hence winter has the highest R score (0.0613) and summer is the lowest (0.0117). However, there is no significant measurement on winter compare to other season. If refer to the hypothesis, this result of this category matches the hypothesis very well.

figure 6. The scatter plot of POP vs PP for distance 0.68-0.9 degree

From figure 6 we can see the relationship of the farthest distance between POP and PP. The result of R square is similar to figure 5, where the highest is in winter (0.509). However, the trend value of PP is different to category I and II, where the highest PP score for category III is in on summer, whereas two others are in autumn.

Critique of the method

My raw data is a little bit messy, I need to think more about how to process it properly. I am not satisfied with the analysis and the result that I got this my exercise 2. I did not access the normal distribution before doing the regression, which I guess makes me cannot get a significant finding and the R-square is very low compare to the global average. However, by dividing the data into several categories make me can determine the spatial and temporal pattern that I can clip to the Hysplit back trajectory data because in that step I need exact coordinates and time to deploy the modeling.

Assessing relationship between channel width and wood cover in South Fork McKenzie River


My question for exercise 2 involves the relationship between wood cover (in this case, wood area (m^2)/ plot area (m^2)) and channel width through plot center (m). My response variable is wood cover, and my explanatory variable is channel width.


I used geographic weighted regression (ArcGIS Pro) and simple linear regression (R Studio) to assess this relationship.


My initial approach involved using the wood accumulation index (WAI) raster I developed in exercise 1. It became clear this was not the best approach, and Julia and I agreed I should attempt to investigate the relationship between wood cover and stream width. My hypothesis was as channel width increased, wood should accumulate and therefore we will see a higher wood cover in plots at wider portions of the channel.


Step 1. I delineated a length of stream that approximated the main flow of the South Fork McKenzie River (Fig. 1). This line feature was approximately 850m in length. In order to produce enough observations for regression analysis, I divide length by 30, and plotted points even spaced along this line at 28.3m intervals.

Step 2. Next, I produced plot around these points with 10m buffers and used calculate geometry tool to calculate the overall area encompassed by these plots 312 m^2).

Step 3. I dissolved all features classified as wood into one feature. I then intersected this shapefile with my plots shapefile. This resulted in 30 in-plot wood features. Next, I made a new column for wood area and calculated area for these 30 features.

Step 4. To develop a wood cover variable, I created another column and used the field calculator tool to calculate wood cover: (WoodArea) / (PlotArea) * 100.

Step 5. To produce my explanatory variable (channel width through plot center), I used the measure tool to measure a line segment from one side of the channel to the other that passed through plot center. I approximated the shortest segment, and I recorded these observations in a new column (StreamWid).

Step 6. I used the geographic weighted regression tool to calculate coefficients for StreamWid. These were all very low

Step 7. I imported the associated database file into R studio and performed a simple linear regression.

A close up of a tree

Description automatically generated

Figure 1: Map demonstrating wood plot sampling method


Results from this exercise are not promising. The p-value associated with my linear regression model (figure 2) was 0.72. Therefore, including stream width in the model seems to be insignificant. Additionally, the associated r2 is .005 indicating very little correlation between stream width and wood cover.

Figure 2: Simple Linear Regression Model

Results from the GWR were similar. Local r^2 values were low and ranged from .009 to .016 (figure 3).

Figure 3: R^2 results from GWR


The approach I used for this exercise is sound and easily applicable to other hydrological modeling (albeit, this is not my realm of expertise). Stream width does not appear to correlate with wood cover in the particular flow transect that I sampled. However, it may be interesting to explore other explanator variables including distance to nearest land feature, water indices, or various plot sizes.

Can Human Behavior be Delimited to the Interactions of Categorical Artifacts in Space?

As the title of this section alludes to, I chose to assess the spatial interactions that categorical artifacts have with each other in space. The assumption that must be made states that areas of the archaeological that once contained human activities creates a source for specific artifact patterns. This assumption can then be broken down into a series of additional corollaries:

C: Human behavior associated with domestic activities will result in a source for a medium count of diverse artifact types (i.e. charcoal, bone, debitage).

C2: Human behavior associated with tool production will result in a source for formal and informal tools, a high abundance of debitage, no abundance of bone and charcoal.

C3: Human behavior associated with animal processing will result in a source for blades, scrappers/uniface, utilized flakes, medium abundance of debitage, high abundance of bone, and no abundance of charcoal.

C4: Human behaviors that produce features such as hearths and pits create a strong source for a diversity of artifacts within an immediate vicinity of the feature.

The above corollaries walk through an assumption that is generally made about archaeological remains when observing the distribution of artifacts. Each on of these corollaries can be broken down even further and more specifically but, because I am not conducting an experimental archaeological investigation (in which these corollaries would be hypotheses), I have chosen to maintain a presence of generality. Additionally, in my statistical calculations, I have only included the three most abundant artifact types to analyze, debitage, bone, and charcoal. These provide for adequate sample sizes and I hypothesize that they share the most amount of variance with human activities in the past.

The questions I wish to address for this blog post is: How does debitage, bone, and charcoal interact with each other to form clustered or dispersed boundaries reflecting human use and perception of space throughout the site, in one point in time and through time?

Cautious Approach to Spatial Autocorrelation

              To assess the question stated above, I used spatial autocorrelation on debitage, bone, and charcoal at this archaeological site. Using Ripley’s K and the Pair Correlation Function (PCF) through R, I analyzed the autocorrelation between all of the artifacts, individual artifacts, and across each artifact type at the horizontal intervals of 0-25cm, 25-50cm, 50-60cm, 60-74cm, 74-100cm. I judgmentally chose these segments through the steps outlines in my blog post 1. By using autocorrelation, I hope to identify boundaries, clusters, and a dispersion of human activities across the site at each interval. This type of analysis must be taken with caution because, as an archaeologist, I cannot assume that 10-25cm intervals contain a single occupancy period at this site. Therefore, by assessing these intervals, I am not defining a single point in time, but I am arguing that these intervals indicate shifts in how people at the site organized themselves in terms of human behavior and spatial perception.

              An additional cautionary method I needed to implement was the correction for inhomogeneity in my dataset by using ‘Kinhom’ for Ripley’s K and ‘pcfinhom’ for the PCF in the spatstat package. This is a necessary step to undergo if the data being analyzed is not homogenously distributed across the study area. This aspect of data can be easily assessed through the production of density maps showing observable clusters. Ultimately, the inhomogeneous correction for these autocorrelation methods more realistically addresses the observed and Poisson distributions.

Now, onto the fun part.


Figure 1 shows the results of Ripley’s K and the PCF operated on an interval of 0-25cm. I had to segregate the boundary of the archaeological site due to the excavation methods and amorphous topography of the surface. Therefore, I am under the assumption that the archaeological data here is an adequate representative of human behavior within the whole site. By comparing Ripley’s K and the PCF, there is a suggested a tendency for clustering (above the red line) from 0-20cm (not significant) and from 90-115cm (significant) at this interval. From the density map, it is clear that the clustered areas reflect the size of 90-115cm but there is also evidence for suggested clustering from 0-20cm which may not have been highlighted in the density map.

Figure 1 Contains a raw data map of all artifact types at 0-25cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 2 shows Ripley’s K and the PCF operated on the interval of 25-50cm. Ripley’s K and the PCF identify clustering at 0-15cm (significant). Additionally, the PCF shows clustering at 100-120cm. These are fascinating results because from looking at the raw data one might infer a larger cluster located near the left margins of the site, but the functions both reflect two sized clusters that are identified through the density map.

Figure 2 Contains a raw data map of all artifact types at 25-50cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 3 shows Ripley’s K and the PCF operated on the interval 50-60cm. In this instance, the results suggest a rapid drop of in clustering within close proximity and clustering starting at 15-25cm (mot significant). This best represents the spatial location of artifacts within the raw data map because the density map appears to reflect slightly larger clusters.

Figure 3 Contains a raw data map of all artifact types at 50-60cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 4 is my personal favorite because of the fascinating oscillation that occurs around the boundary for random distributions. Ripley’s K and the PCF both show significant clustering at 0-20cm. Any artifact past 20cm appears to be randomly distributed within space. Again, this slightly contradicts the observable characteristics in the raw data and density maps, leading to further investigation when imposing specific clusters and human behavior within this surface in the future.

Figure 4 Contains a raw data map of all artifact types at 60-74cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Figure 5 shows a tendency for clustering at 0-15cm (not significant) with two further tendencies for clusters at 25-40cm (not significant) and 60-80cm (not significant). These cluster extents can be seen within the raw data and density maps but may warrant further analysis prior to assigning a specified number of clusters to this segment.

Figure 5 Contains a raw data map of all artifact types at 74-100cm (top left), kernel density interpolation for all artifact types (top right), Ripley’s K corrected for inhomogeneity (bottom left), and the PCF corrected for inhomogeneity (bottom right). The red dashed line represents complete random distribution surrounded by a confidence envelope of the maximum and minimum distribution resulting from 99 Poisson simulations. The black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.

Lastly, figure 6 shows the cross correlation between each artifact type within a PCF corrected for inhomogeneity compared between the 50-60cm interval and 60-74cm interval. This figure helps relate to my initial question as to how each artifact relates to the other within space. By addressing shifts in artifact-artifact relationships, I hope to strengthen my argument that people between these two intervals perceived and used space differently at this site. Through these graphics, there are some similarities and differences that we can identify. The correlation between bone and debitage is suggested to be closer at 60-74cm, while it begins around 20cm (not significant) at 50-60cm. The correlation between charcoal and debitage is suggested to be fairly similar between the intervals. This aligns with my corollaries showing that debitage and charcoal interact with one another only in specific domestic human behaviors. So, it can be assumed that this specific interaction may influence similar spatial patterns through time. The last correlation between charcoal and bone at these intervals is fascinating and warrants further investigation because the graphics almost appear to be mirror reflections of one another, showing a closer correlation at the 50-60cm interval.

Figure 6 Contains the cross correlation of Bone-Debitage, Charcoal-Debitage, and Charcoal-Bone operated on by a PCF corrected for inhomogeneity comparing both intervals at 50-60cm and 60-74cm. The red dashed line represents complete spatial randomness, the black line represents the observed distribution. If the observed distribution is above the red line, then it suggests a clustered behavior and if the observed distribution is below the red line, then it suggests a dispersed behavior.


The major critiques I have for this approach is the affect borders have on statistical interpretations and, in a similar vein, the scale of data. Borders have always been a barrier within statistical analyses (pun intended). The problem with borders is the absence of data that lies beyond it that is both there and not there (kind of like Schrodinger’s cat). I would have to do more research into the specific mathematical properties affecting by borders, but it is important to understand that they do influence both calculations and interpretations. The second critique on scale is also important because Ripley’s K and a PCF can be operated when there are very few data points, but it will always suggest one’s data is always dispersed. From a mathematical view, this is a correct and appropriate assumption but is trivial to interpret by simply looking at your raw data. So, do not waste time with trying these functions on sparse data that is visibly dispersed already.

Assessing patterns of race and education in Portland over two decades

For Exercise 2 I changed the dataset and question that I am asking, so I will be sharing the results from a hotspot analysis (which would have been exercise 1) and a geographically weighted regression. I am interested in identifying gentrification in Portland, OR so I am looking at race, income, and education data at the census tract level. I got these data for 1990, 2000, and 2010, and have been experimenting to see which will be useful. An important note is that census tracts change shapes over the years, so for this class I am using the 2010 census tracts and acknowledging that there is an error associated with this decision. I obtained the census tracts for all of Multnomah County and trimmed it down to exclude large census tracts on the outskirts with small populations, census tracts with no data in 1990, and some of them east of the city of Portland (closer to Gresham).

The race and education data came as counts and the income is median income per census tract. I added in the raw population numbers in order to calculate race and education percentages. These are percentage of population that is white and percentage that has a bachelor’s degree or higher.

The raw percentages of white population show that in 1990, overall Portland is over 80% white in most tracts except for between the Willamette River and the Portland Airport. Over the next decade the who eastern side of the city becomes less predominantly white and continues to 2010. The area in the northeast that was less white to begin with had an increase in proportion of white population.

I conducted a hotspot analysis of the changes in each demographic from 1990-2000 and 2000-2010. This turned out to be very useful for finding the patterns in the data (more useful than when I ran it for my previous data). Below I have shown the hotspot map for change in race from 2000-2010 (which is similar to that of the previous decade). It becomes very obvious that there is an increase in white people (in red) in the north part of Portland (between the airport and the Willamette river) and a decrease in the eastern part of the city. I found this visualization of race data made by Portland State University that displayed a dot per person and assigned them a color indicating their race. The map swipes between 2000 and 2010 and confirms the pattern I see but provides insight that white people are replacing the black population in the northern part of the city and the population of people with Asian descent is increasing in the east.

Hot spot analysis of change in white population from 1990-2000. The red shows hot spots of where there has been an increase in proportion of the population that is white, and the blue shows a cold spot of where there has been a decrease in proportion of population that is white.

The hotspots for both education and income are generally in the same regions as the hotspots for race, in both decades, but are smaller. This led me to the next step, which was conducting a geographically weighted regression.

Understanding the spatial distribution of the different variables and their correlation is important to understanding their relationship. Since my thesis research will entail mapping gentrification in order to map the spatial correlation with urban gardens, I need to ensure that I have a reliable source of information for gentrification. My hope was that I could use these three variables to map gentrification.  My hypothesis for the pattern between the three variables is that they will be relatively similar. I imagine in such a white city there will be places with increases in income and no change in race. I also understand that there is an increase in Asian population in the east, which unlike black populations is not necessarily lower income on average.

I conducted the Geographically Weighted Regression with the ArcGIS tool a variety of ways. I tested each of the three different variables as the dependent and explanatory variables, and tried with just one explanatory variable, and two explanatory variables. Overall, I think this tool did not provide extremely useful results because there are too many assumptions that are not valid. The results show that the variables are somewhat correlated. The standardized residual maps show that most of the census tracts have similar changes in the three variables. When change in race is the dependent variable, and education and income are both explanatory variables, the northern region that is likely gentrifying is identified as having a higher standardized residual. The graphs show that change in race and change in education have some correlation but change in income has very little correlation with either race or education. The graphs below show these relationships between the variables in 2000-2010.

GWR results with change in race from 2000-2010 as the dependent variable and change in education and income as the explanatory variables. (race_00_1 is the change in proportion of population that is white, inc_00_10 is change in median income, and edu_00_10 is the change in proportion of the population with a bachelor’s or higher.

Overall, I think that the Geographically Weighted Regression tool was not extremely useful. It helped identify that there is some correlation between the change in race and education, but the hotspots show a slightly different story. The hotspots show that the three variables have increases in similar parts of the city. It is possible that I am not using the tool correctly, but I believe that the assumptions needed to utilize the GWR tool are ones that I cannot make with this data.

Exercise 2 – Spatial-Temporal Autocorrelation in Migration and Insurance Claims

1. Question and Significance

My goal for this exercise: to identify to what extent my county-level data is spatially and temporally autocorrelated. This is part of a larger effort to identify a suitable data sample that does not violate the assumption of independent observations. 

Exercise 2 significance in context: Do we see significant out-migration in instances of high flood claims or is there simply a trend of adjacent counties having similar values due to space and time similarities? Or simply, is this a local or larger scale phenomenon?

2. Tool or Approach Utilized

While originally I was interested in attempting to apply geographic weighted regression to examine the spatial relationship of flood insurance claims and migration flows. After reading in-depth about GWR I found that not all assumptions of Ordinary Least Squares could be satisfied. Thus, I was interested in finding suitable subsets within my data that could be used for analysis. I was primarily interested in spatial and temporal autocorrelation (the assumption that all observations are independent). To examine the extent of autocorrelation I created subsets of my data and created a series of scatterplots to visualize spatial and temporal relationships of adjacent counties.  

3. Description of Methods

I first created 2 subset datasets, one of urban well-populated adjacent counties and the other of rural low-populated adjacent counties all in Florida. Florida was utilized due to its high participation in flood insurance. I used Broward county as the metro county of comparison (High population center) to compare with adjacent high population counties. I used a subset of 3 rural low population counties in Northeastern Florida where Dixie County was selected as the primary county for comparison with adjacent rural low population counties. 

I then explored trends in the data related to spatial autocorrelation by urban and rural out-flow rates over time. I then plotted 1 urban county out-flow rate against adjacent counties out-flow rates and did the same for rural counties. I then did the same thing for a hurricane period of time and a non-hurricane period of time. Finally I followed the same steps for hurricane and non-hurricane periods but with the variables or in-flow rates and claims per county population rates. 

Next, I examined temporal autocorrelation by selecting a county and plotting out-flow rates at time 1 on 1 axis and time 2 on another axis for both hurricane and non-hurricane periods with both urban and rural counties. I then tried this with the variable of claims per county population. 

4. Brief Description of Results

I first examined the general trend of out-flow rates across the whole time period (1990-2010). Looking at the above plots above we can see there is a general relationship between out-flow rates in spatially adjacent counties across time.

A) Hurricane Period Out-Migration Plots

Looking at these 2 plots there are obviously far fewer data points meaning they will be much more affected by outliers. From the metro hurricane period (1998-2001) we see similar out-flow rates with higher rates in adjacent counties for this period. For the rural hurricane period (1994-1996) we see generally higher out-migration rates with a somewhat positive linear relationship. Meaning that as Dixie County out-migration rate increases the adjacent county out-migration rates also increase (not causal).

B) Non-Hurricane Period Out-Migration Plots

The metro county out-migration rates reflected here are for (1990,1993,2002, 2003) the lowest insurance claim years and the rural county years are (1999-2002). We see no real discernible trend for non-hurricane metro out-flow rates which seem approximately similar to the metro hurricane years plot above. The rural non-hurricane period also does not demonstrate a significant linear trend with approximately the same values as hurricane years. This likely due to the small number of data points. I saw similar results when plotting the variables of in-flow rates and claims per county population rate.

Temporal Autocorrelation Plots

A) Hurricane Period Time 1 and Time 2 for single County Out-migration Rates

These plots are the result of the subsetting of Broward County (metro)  and Wakulla County (rural)  out flow rates at different times for the X and Y axes during a Hurricane period. We can see that out-flow rates seem relatively constant across the different times indicating evidence of temporal autocorrelation.

B) Non-Hurricane Period Time 1 and Time 2 for single County Out-Migration Rates

These plots give similar results to the hurricane period however the Broward plot does not have enough non-hurricane time periods to show this trend. As indicated for the hurricane time period we see evidence of temporal autocorrelation here. Similar results were seen with other variables such as claims per county population and in-flow rates.

Critique of Method

This was a useful exercise to help examine the spatial and temporal relationships at a small scale. I would say this was useful in painting a partial picture of the extent of the autocorrelation issue, however, not a complete one. This method did not help me narrow down a sample dataset to use for future methods and I am unsure of what my next steps are in continuing with this data in an attempt to fit an econometric model. I would be curious to find some sort of model that allows looser assumption requirements or transformation methods that may help me overcome these autocorrelation issues.

Greenland Ice Velocity vs. Terminus Retreat (II)


I decided to simplify my exercise 2 for a number of reasons, primarily because raw annual data turned out to be more interesting than I anticipated, and because I wanted to leave some tools/questions behind for exercise 3 (also, generating annual data was not a cake walk). As a result, exercise 2 has become entirely focused on the annual correlation between terminus retreat in a glacier, and the corresponding glacier velocity at the terminus location. ArcGIS pro was used to generate the terminus distance retreats for each available year, to create very small buffers around termini, and to extract underlying velocity raster values. Once a master-dataset was generated, I assessed their relationships in excel.


1.For any given year in my dataset, what is the correlation/relationship between the distance of individual glacier terminus retreat, and individual glacier velocity? (on individual scale and Greenland wide scale)

Simplified: At “T1” how is local glacier velocity correlated to local glacier retreat?

Context from our lab exercises:

  1. What kinds of enhancing or limiting processes might occur that allow B to influence A, in space?

I presume that glacier velocity and retreat distance will be linked via the mechanism of basal melting (wet basal conditions will not only allow glaciers to move faster, but can also be representative of negative mass balance). Fully understanding basal slip is still a major problem in glacier physics. My hypothesis is based purely on the fact that glacier velocity seems to increase early in the melt season- indicating it is linked to negative seasonal mass balance. The spatial influence is obvious in that the retreat and velocity are being measured on the same glacier. More distal influences could include snow accumulation and ice temperature.

  • What kinds of enhancing or limiting processes might occur that allow B to influence A, in time?

This is a more difficult question. I would think that a glacier that experiences basal slip at T would be more likely to experience it at T+1, in a sort of positive feedback. Basal melt could reduce mass balance, which reduces thermal inertia and in turn results in more basal melt. That however does not explain how a brief pulse of velocity at T, followed by a slowdown at T+1, would influence terminus retreat at T+1. I cannot think of a mechanism in which velocity at T would influence terminus retreat at T+1 in this scenario.

APPROACH: “Buffer, Extract, and Correlate”

ArcGIS Pro Tools used:

(for generating annual datasets: Feature Vertices to Points, Near, Summary statistics, Join field)

Buffer, Construct Pyramids, Extract Multivalues to Points, Join field, Table to Excel

Excel Tools used:

Time series (plots), R-squared regression analysis, normalization


Data was manipulated as such:

(step 0: Using “Approach #2” from exercise 1, annual datasets of terminus retreat were generated, and annual velocity rasters were added to the map)

1.The BUFFER tool was used to create very small 500m buffers around glacier point data- (because velocity data comes at 500m resolution, this can be thought of almost like a point source- but not quite)

2.For some reason the raster datasets had pyramids, so I had to go through and use  CONSTRUCT PYRAMIDS and set each to 0.

3.The EXTRACT MULTIVALUES TO POINTS was used to extract and average the velocity values sitting directly beneath buffer shapes, for each year in the dataset. This data is automatically added to the buffered-shapefile feature class attribute table. (see below)

4.The TABLE TO EXCEL tool was used to export the buffer attribute table, containing terminus retreat and underlying velocity data for every available year, to an excel file. I find it easier to manipulate and visualize data in other software, as Arc can get a bit clunky.

5.Time series plots containing both distance retreated and local velocity were constructed for every glacier and visually inspected.

6.R-squared linear regression plots were plotted together for each year to assess statistical correlation between the two variables at T1

7.Glacier velocities and retreats were normalized on individual glacier scales (each year subtracted mean and divided by it), to eliminate absolute magnitude effects.

8. Normalized data was plotted in R-squared linear regression plots.

9. Time series were averaged over the entire island on an annual basis, and plotted, visually inspected.


The methods used ended up being simpler than I anticipated, because the results I ran into were interesting and perplexing, and I wanted to investigate them further. To assess glaciers on an individual scale, I created a map with inlayed data (I originally intended to include more time series but I ran into an error with data alignment that I need to figure out, hence some arrows leading to nowhere) :

Because I assume this will come out at terrible resolution in the blog post- in the time series: Red lines represent the glacier velocity and blue lines represent terminus retreat- plotted by year. The raster overlay is Greenland surface velocity, in the year 2017, measured in meters/year. The points represent each glacier terminus and are colored according to their overall retreat (2005-2017) For some glaciers, velocity over the years seems to roughly follow the retreat particularly in the Northwest of the island. For other glaciers, notably several on the East coast, the glacier velocity does not appear particularly related to terminus retreat. Another purely visual assessment is that, on identical color scales, the dots (retreat magnitudes) do not always seem to match up with underlying velocities. In simpler terms, just by the eye test, the fastest glaciers don’t always necessarily retreat the fastest. But perhaps there is an overall trend? To assess, I made a simple r-squared linear interpolation graph for each of these (~240) pairs, color coding each point by year:

The results indicate no clear correlation. Each point on the graph represents an individual glacier in a specific year, so each glacier is represented a total of 7 times. The linear R-squared relationship between these 2 variables is never greater than .002, for any individual year. As a result we can say that there is no correlation between ABSOLUTE magnitude of glacier velocity and ABSOLUTE magnitude of glacier retreat. That still doesn’t explain why a fair number of glaciers appear to have curves that follow each-other on relative scales. After all, when each year is averaged out among all 240 glaciers, we get the following time series graph:

Here, we notice two things in particular. First, the overall curves seem to covary to a certain extent. Second, there appears to be two “steady states” so to speak: one pre-2008 and one post-2008. Before 2008, Ice retreat and ice velocity were both almost negligible compared to their post-2008 counterparts. The correlation between the two can be isolated using 240 glacier averages over the course of the study period plotted together:

Here, each point represents a single year. When annually averaged across all glaciers, the r-squared of termination retreat vs. ice velocity becomes 0.53. When 2006 and 2007 are removed, that R-squared jumps to 0.65. These values are more in line with what I was expecting to see based on the map and time series produced above. So how can I reconcile the two linear regressions? Clearly the two variables are connected in some way, but I need to figure out how. My first thought is that perhaps I need to eliminate the absolute magnitude of the variables in each glacier, and normalize them into relative magnitudes. That way I can compare relatively large years and small years in a less biased way aka: (for each glacier: does a relative change in velocity correlate to a relative change in terminus retreat?) I normalized measurements for each glacier by subtracting annual measurements from the study period mean, and dividing by the mean, plotted here:

While the R-squared is no better here not even displayed, the graph does depict a reality with two clear steady states- one pre-2008 and one post 2008. Normalized velocities were miniscule before 2008, and then quite significant after. Furthermore, the normalized terminus retreat appears to be far more variable in the years post 2008 (they tend to span a taller height on the graph). These are clear trends I want to dig further into in exercise 3.


If I were to go back and do this exercise over again, I would probably increase the size of the buffer. In creating such small buffers, I created some shapes that were not even overlapping with velocity data (in many cases, velocity data stopped farther up-valley than the terminus location- e.g. the blue circles from my first figure would be sitting over nothing some years). Buffer zones under these conditions extracted values of Nil, and left the velocity data relatively sparsely populated. While most of the data was well populated for retreat values, it was somewhat difficult to find glaciers with a full 8 years of velocity data. As a result, all of these correlations are performed on velocity data that is more sparsely populated than I would like. Several time series for the glaciers are missing many years of velocity data that could be holding important information.

I would say that overall I am disappointed with how much I got done in exercise 2, though I think in order to do this project correctly it is important to look closely at the data I produce before asking totally new questions. I got stuck in a bit of a rabbit hole looking at all the data I produced and took it out of spatial context. Plus, the project is pretty fluid since it was just generated for this class, so I am free to investigate things that capture my interest rather than a specific hard-nosed question.


Looking forward towards exercise 3, I now want to begin investigating varying buffer sizes, calculating and observing Rate of change (first derivatives) as well as possibly time lags (though looking at the time series data, I doubt time lags are going to have a big effect on any of the correlations). I want to do more spatial cross correlation, in the following exercise to investigate these factors as well (Perhaps IDW, and more Morans I on varying radii).

Geographically Weighted Regression Analysis on Benthic Organism Count and Total Organic Carbon off Newport, OR

For Exercise 2, I continued my investigation into the relationship between benthic abundance and total organic carbon (TOC) between Point Conception, CA, and the Canada border with a special emphasis on a region off Newport, OR. For the Newport region, I hypothesize that there is a seasonally fluctuating positive relationship between benthic organism count and TOC. I further hypothesize that commercial fishermen tossing bait overboard, river discharge, summer upwelling, and the dumping of dredge materials outside Yaquina Bay impact this relationship. To investigate my hypothesis, I conducted geographically weighted regressions (GWRs) on benthic box core data from five months (April, June, August, October, and December) in 2011. I have ~12 samples from each of the months. Although the analysis did show some interesting results, I am skeptical of these results due to the lack of variation between data taken from each month and small sample size. For all months, TOC levels were less than 1% and mean organism abundance was about the same. I will run the samples through a t-test to determine if there is any statistical difference between samples from each month. Also, I discovered the GWR requires a large sample size (in the hundreds at least) to produce trustworthy results. Accordingly, I expanded my sampling to a larger area off Newport and included monthly data from all sampled years (2010-2016) so that I could have 100+ data points for each month (April, June, August, October – no December data was available for other years). The results of my analysis on the expanded area differ from my results from the limited region, suggesting a negative relationship between organism abundance and TOC for some months.

These conflicting results led me to expand my investigation area even further; I looked at the relationship between organism abundance and TOC for the entire Oregon Coast and the entire sampling area, with mixed results. Because of my inconclusive results, I believe that I need to take a more focused approach with regard to the species I’m investigating. My advisor, Dr. Sarah Henkel, suggested that I look at the relationship between the ratio of polychaetas (bristle worms) to crustaceans and TOC in order to investigate potential polluted areas. Previous researchers have determined that the ratio between polychaetas and crustaceans can be used as an indicator of pollution. This line of inquiry could help me to identify areas where increased TOC results from pollution (like the dumping of dredge materials and agricultural discharge). I also decided that I need to spend some time reading more papers about benthic TOC levels in coastal areas in order to understand why TOC on the Oregon coast is so much lower than in other areas. One hypothesis is that benthic TOC is low because the waters off the coast of Oregon are incredibly productive – especially during summer upwelling. Huge plankton blooms occur, and those plankton are preyed upon by other organisms before they die and sink to the bottom. Oregon’s rivers also transport large quantities of nitrogen and carbon into coastal areas (especially during the fall-winter rainy season). However, the same storms that cause nutrient transport also cause high wave energy and mixing along the coastline. Transported nutrients therefore remain suspended rather than settling in the benthos. Previous researchers have described both a shift in benthic organismal community structure and a “TOC horizon” off the Oregon coast at 90m depth. Perhaps the coastal processes of upwelling and wave action (among others, i.e. longshore transport) are less impactful on benthic TOC accumulation at that depth. 

For this blog post, I will explain the work I’ve done so far using figures and then muse over the process of conducting geographically weighted regressions, keeping my hypotheses in mind.

These points depict the mean centers of each sample collection site. The numbers tell you the approximate depth of each site. All samples depicted here were taken in 2011.

For my initial analysis, I examined 12 sample sites off Newport, OR, in five months (April, June, August, October, and December). All of the samples were in slightly different locations because of course it’s impossible to get a boat back to the exact same spot. The points depicting the sample sites above are placed at the centroids of the true same points. Not all of the stations had samples for each of the five months, but most of them did. I ran a geographically weighted regression analysis for each month using a fixed kernel size based on AICc (Akaike Information Criterion). I would have preferred to run analysis with a fixed bandwidth parameter using the 30 nearest sample points to a given sample, but I could not use that approach because I had an n value of ~12/month.

These graphs illustrate the seasonal changes in GWR coefficient and R2 values at each sample site.

In order to understand my result, I graphed the GWR regression coefficients and r2 values for each site (note than MB40 only has 4 months of data and NS50 has three months of data). The graphs reveal a spatially complex and seasonal relationship between organism count and TOC within the sample area. Overall, I saw the strongest positive relationships between organism count and TOC in April and August. The largest r2 values were associated with April (~.6 at some sites) and June (~.7 at some sites) sampling. These results could support my hypothesis that discarded bait from the commercial Dungeness crab fishery in the winter and early spring causes and increase in TOC and organism abundance. Also, the results for August suggest that organism counts increase as a result of spring and summer upwelling (increasing organism abundance). However, as I mentioned earlier, I am skeptical of these results because of the small sample size and relatively small differences between TOC and organism count between months. For instance, I believe the August results might be skewed by one sample site (BB50) that has over 200 bristle worms (Spiophanes norrisi) in a single box core.

200+ Spiophanes norrisi were collected at one sample sites in August 2011.

To increase my sample size, I included samples taken from 2010-2016 off Newport, OR. Unfortunately, the only December samples that I have are from 2011, and so n=12 even in my extended analysis. The other sample months were April, June, August, and October. For this analysis, I used an adaptive GWR based on 30 nearest neighbors to each sample point. While my analysis of the smaller area only returned one negative coefficient (August 2011 @ NS50), the results from each month in my larger analysis area returned both positive and negative coefficients for each month. I created a box and whisker plot of the results to visualize this relationship more clearly.

Box and whisker plots of seasonal correlation coefficient in the extended Newport sampling area, depicted in the included map.

I also created box and whisker plots of the organism count and TOC data from each month from the extended Newport sample area (pictured above).

Seasonal box and whisker plots of total organism count from the extended Newport sample area, 2010-2016.
Seasonal box and whisker plots of TOC from the extended Newport sample area, 2010-2016.

My results show seasonal complexity, with peak mean GWR regression coefficient, mean organism count, and mean TOC occurring in August. Notably, all months included both positive and negative GWR coefficients. Both April and October have negative coefficient outliers. Also, as you can see the TOC box and whisker plots are somewhat ugly as far as having non-normal distribution. A few “next steps” will be to explore methods of data transformation and investigate non-linear relationships or relational thresholds within the organism count/TOC correlation. Dr. Henkel told me that she usually uses a fourth root transformation or organism count data. I suspect that if there is a linear relationship between organism count and TOC, that relationship might only hold true within a range of values. Specifically, I hypothesize that extremely high TOC might be correlated with decreased organism count above a certain level.

To provide context for my Newport area results, I conducted an adaptive GWR using 30 nearest neighbors on an EPA 2003 dataset that ranges from Point Conception to Canada. Unfortunately, I do not know specifically what months the data was collected, but I know that it was taken some time in the summer or early fall. I will dig into the metadata to see if I can find that information. In any case, the results allowed me to see the complexity in the relationship between organism abundance and TOC along the coast.

GWR coefficients from EPA 2003 data.

Red values represent positive GWR coefficients and blue values represent negative GWR coefficients. Because I know that the relationship between organism count and TOC varies seasonally, some of the apparent spatial variability depicted above could actually be seasonal variable – more investigation is required.

My GWR Process

For me, the process of conducting GWR in ArcMap highlighted the impacts of sample size, spatial scale, and adaptive vs. fixed kernel size on results. When I looked at a small sample size (off Newport), the resulting GWR coefficients were largely positive. However, when I expanded the sample area, I saw both positive and negative coefficients within each month. I also noticed that the settings I selected within the GWR tool in ArcMap caused variation in the results. This observation provided me with an example of why it is so important to select the appropriate statistical methodology based on the type, spatial scale, distribution, and sample size of your dataset. For my analysis, I found the adaptive 30n kernel to be most appropriate and useful. 

A Century of Oregon Salt Marsh Expansion and Contraction (Part II)


As a reminder, my ultimate research goal is to quantitatively characterize salt marsh volumetric change from 1939 to the present within Oregon estuaries using a combination of sediment cores, for which I have excess 210Pb-derived sediment accumulation rates, and historical aerial photographs (HAPs), from which I plan to measure horizontal expansion and retreat. For my Exercise 1, I successfully georeferenced and digitized HAPs from Alsea Bay that are roughly decadal, spanning 1939 to 1991. The next major step is analyzing my data using the USGS Digital Shoreline Analysis System (DSAS; Himmelstoss et al. 2018). However, prior to analyzing my data in the framework, I must first tie up a number of loose ends; I will explore the following questions for my Exercise 2:


  1. What is an appropriate method of quantifying uncertainty associated with each salt marsh outline?  
  2. How should shoreline be defined? More specifically, to what degree should I incorporate channels into my digitization of each layer?
  3. How should I define a modern shoreline?


Step 1: Determine Uncertainties

Each digitized layer must have an associated uncertainty to be run through DSAS. Moore (2000) details all possible errors associated with analyzing aerial photography.

  1. Radial and tangential distortions from camera lenses and film deformations caused during collection, processing, and storage combine to produce significant image space distortions. Distortions are theoretically accounted for by transformations applied to georeferenced images. I used a second-order polynomial transformation which is appropriate for HAPs. However, errors exist associated with each ground control point and without ground-truthing, which is beyond the scope of this study, it is practically impossible to determine the accuracy of my georeferencing. As others have done, I will report the root mean square error (RMSE) associated with each georeferenced aerial image.
  2. Moore (2000) additionally details that objects may appear displaced from true ground positions due to differences in relief, distance from the center of the photo, camera tilt, flight altitude, and atmospheric refraction. In my HAPs, errors associated with object space displacements are particularly great along the upslope edges. However, as I wrote in my Exercise 1 Blog Post, I am not interested in retreat of the upslope edge. Laura Brophy and her colleagues have been working on this issue (Brophy et al. 2019), and I do not believe that the rate of retreat will be significate in my timeframe of interest. Currently I have digitized the upslope edge of all fringing marsh using PMEP’s (Pacific Marine and Estuarine Fish Habitat Partnership) Current Wetland Extent Map. In the future I might either use polylines rather than polygons, or I will mask out the tree line in the finished product.
  3. Other errors are associated with interpretation and digitization of the features of interest (Moore 2000). Others have simply reported the scale at which their images have been digitized. I considered comparing a modern digitization completed by myself to PMEP’s Current Wetland Extent Map; however, this map is primarily based on LiDAR imagery with 5 m resolution, which is much coarser than the resolution of my imagery. Thus, to account for errors associated with digitization I will adopt the common method of assuming 1 m of uncertainty based on previous studies (Ruggiero et al. 2013).      

Based on Ruggiero et al. (2013) I plan to characterize error for each year using the equation:

where Up is the total uncertainty of the shoreline position, Ug is the uncertainty associated with georeferencing (RMSE of the control points used to georeference each image), and Ud is the uncertainty of the digitization (1 m). Uncertainty associated with NOAA T- and H-sheets is considered to be 10 m (Ruggiero et al. 2013).  

Step 2: Reduce Channel Complexity

Though I have not assessed the impact of scale on changing the overall area of each marsh platform, I hypothesize that smaller spatial scales would result in decreased volume as more and more tidal creeks are incorporated. Because I have digitized my salt marshes at the smallest spatial scale possible for each photo and the highest quality HAPs are 1939 and 1952, these may erroneously appear relatively too small due to higher channel density than the lowest quality HAPs, 1983 and 1991. Thus, I must decide how to correct for differences in salt marsh channel digitizing due to differences in spatial and spectral resolution between photographs as this alters the overall area of the salt marsh complex.

Preliminary results suggest that resampling the cell size of the highest resolution photograph (1952, 0.2 m) at the lowest resolution photograph (1983, 0.7 m) resolution using the ArcGIS Pro Raster Cell Resample tool (Nearest Neighbor and Cubic resample type) does not produce a noticeable difference. Thus, it appears as though differences in spectral resolution primarily impact apparent differences in image qualities, and thus ability to distinguish tidal creeks. Unfortunately, because no meta data besides date was provided by the UO Historical Photography Library, assessing spectral quality is outside the scope of this study.

My next attempt to standardize channel digitization was to limit channels to widths greater than or equal to the uncertainty associated with each image. This was performed using the ArcGIS Pro Buffer tool. I simply selected the marsh polygon of interest, selected the Buffer tool, specified the buffer distance as the error (e.g., 1 m), then selected that buffered polygon and buffered it again, specifying the buffer distance as the negative error (e.g., -1 m). Comparison of the second buffered polygon indicates that little to no resolution was lost on the marsh edges but that the channels were appropriately limited. I then deleted the original polygon and the first buffered layer.

Upon completion of this first round of Buffering, I realized that because the error is different between each year, the channel density is still different. Thus, I repeated the first step using the worst uncertainty estimate of 3.8 m. At this point all marsh outlines should have a channel density not primarily driven by differences in spectral resolution.   

Step 3: Determine a Modern Shoreline

As stated previously, the PMEP map uses LiDAR with a resolution of 5 m, which is too low for comparison with my HAPs. An OSU MS thesis from 2004 (Russel Scranton, 2004) digitized Alsea Bay using digital ortho quads from 2001 and ground-truthing with RTK GPS. I incorporated this layer, though it required some edits made using the original 2001 digital ortho quads. Because I am comparing horizontal growth rates with sediment core derived vertical accretion rates (and my cores were collected between 2013 and 2017) I then needed to decide what imagery to use from the last ~20 years. I settled on 2009 and 2018 as these years have aerial imagery taken at relatively high resolution (< 0.5 m). I digitized these in a similar manner to how I digitized my HAPs.   

Step 4: Merge Data into a Single Feature Class

DSAS requires data be merged into a single feature class using the Merge tool in ArcGIS Pro. This file must also have two columns in the attribute table noting the date and uncertainty of each shoreline. I specified Jan. 1 for each date as I mostly only had the year of collection, and I specified the uncertainty using the above method. Originally, I merged my polygon data but after realizing DSAS only runs with polyline data, I converted each individual layer using the Polygon to Line tool in ArcGIS Pro, and then re-Merged and edited the attribute table of my shoreline layer. Before placing the data into a personal geodatabase that will be accessed by DSAS in ArcMap, I made certain my layer was in meters.  

Step 5: Initial DSAS Run

DSAS requires a baseline layer for orientation when casting transects. I decided to use my 1939 polyline buffered at 10 m as the baseline. In ArcMap, I specified my baseline, making certain that the program would cast offshore. I then performed an initial casting of transects using a 10 m search radius, 10 m transect density, and a value of 50 for transect smoothing. After calculating transect statistics, I colored my transects based on rates of growth or erosion.         


Final results related to this project are related to estimates of uncertainties and the spatial resolution for each year. Uncertainty ranged from 1.0 to 3.8 m amongst shorelines digitized from the last century. These values are actually quite low compares to other similar studies (e.g. Schieder et al. 2017). Because Oregon salt marshes are typically much smaller than those on the US East and Gulf and European coasts, my georeferencing RMSEs are likely smaller as images need not be stretched over greater distances.

The 1887 shoreline has an uncertainty of 10 m but I have not year included this in the analysis. Because the change in shoreline shape between 1887 and 1939 is so stark, it should be analyzed separately from the 1939 to present data.

The spatial resolution ranged from 0.30 to 0.73 m. Estimating spectral resolution is beyond the scope of this study.

Table 1. Uncertainties (calculated based on georeferencing RMSEs and digitization uncertainty of 1 m) and spatial resolution for each year.

  Spatial Resolution:
YearRMSE (m)Total Uncertainty (m)Cell Size X (m)Cell Size Y (m)
1887 10  

The initial DSAS run clearly highlights that though there are a few areas of accumulation, there exists primarily areas of erosion in the Alsea Bay salt marshes. It appears as though the northern edge of the largest salt marsh island has been experiencing the greatest rates of erosion. This pattern is likely related to the diking of the northern portion of fork of the river in the middle of the century. This would have reduced connectivity and thus sediment transport to this portion of the bay, resulting in erosion under rising relative sea level. The northern portion of the fringing marsh appears to have experienced the slowest rates of erosion. The reason for this is unclear but potentially related to oscillating periods of accumulation than erosion as evidenced by a detailed look at the different shoreline layers.    

Figure 1. Preliminary results: DSAS calculated transects displaying rates of erosion (red = 4.1 m/y) and expansion (blue = 0.6 m/y; white = neutral) of the Alsea Bay salt marshes from 1939 to 2018.

Despite these initial exciting results, there are clearly many areas where this initial DSAS run can be improved and these are both highlighted in the image and briefly outlined below.


As a general note, I am very happy I chose to move forward with just Alsea Bay salt marshes for this project, rather than georeferencing and digitizing all of my salt marshes at once. Determining all of the little quirks of the Editing tools in ArcGIS Pro and the specifications of DSAS on one dataset will allow me to move much faster through successive datasets. For instance, I did not realize that DSAS required polylines rather than polygons. Going forward I will digitize all of my marshes using polylines rather than polygons and I believe this may actually save me time since I will not have to digitize any upslope edges.   

The methods I used for determining uncertainty seem to be universally accepted given the difficulty of estimating true accuracies of shoreline positions back through time. I will continue to use this method for the remainder of this project.

The Buffering tool saved me a lot of time. Considering a digitization for a single year takes me approximately one day to complete, not having to re-digitize at a universal resolution or to a specific channel width allowed me to make more progress in other aspects of Exercise 2.

I am really excited to move forward with the DSAS program. For my next Exercise, I plan to play around with the DSAS settings to improve the transect projections. Depending on the difficulty of this, I may also have time to being incorporating my sediment core data. Below are some initial areas for which the DSAS projection could be improved.

Figure 2. Closeup view of transect projections.
  1. Remove or mask out all upslope edges since these are not of interest and all digitized using the same PMEP boundaries.
  2. Remove or mask out all complexity within the salt marsh complexes, especially within the northern portion of the larger island and the southern portion of the fringing marsh.
  3. Play around with the transect search distance to remove transects bridging the gap between the largest and smallest islands. This may require running each marsh complex separately (which I would rather not due so that all transect symbology are on the same scale).  
  4. Play around with transect smoothing to reduce the number of intersecting transects in areas with complex shoreline.   


Brophy, L. S., Greene, C. M., Hare, V. C., Holycross, B., Lanier, A., Heady, W. N., … & Dana, R. (2019). Insights into estuary habitat loss in the western United States using a new method for mapping maximum extent of tidal wetlands. PloS one14(8).

Himmelstoss, E.A., Henderson, R.E., Kratzmann, M.G., & Farris, A.S. (2018). Digital Shoreline Analysis System (DSAS) version 5.0 user guide: U.S. Geological Survey Open-File Report 2018–1179, 110 p., ofr20181179.

Moore, L. J. (2000). Shoreline mapping techniques. Journal of coastal research, 111-124.

Ruggiero, P., Kratzmann, M. G., Himmelstoss, E. A., Reid, D., Allan, J., & Kaminsky, G. (2013). National assessment of shoreline change: historical shoreline change along the Pacific Northwest coast. US Geological Survey.

Schieder, N. W., Walters, D. C., & Kirwan, M. L. (2018). Massive upland to wetland conversion compensated for historical marsh loss in Chesapeake Bay, USA. Estuaries and Coasts, 41(4), 940-951.

Scranton, R. W. (2004). The application of Geographic Information Systems for delineation and classification of tidal wetlands for resource management of Oregon’s coastal watersheds.

Exercise 1: Topographic Influence on GPS Collar Success

Ultimately I am interested in how topography hinder GPS collar fixes from being obtained, but for the purposes of this exercise I asked: How does sky availability change with scale?

A close-up of 6 test sites (red triangles). Raw sky availability raster (10x10m resolution) prior to focal analysis (left) and same spatial extent after performing mean focal statistics with 25×25 moving window size.

I already had a raster of my study area for sky availability, which is the proportion of sky unobstructed by topography (it does not take vegetation into account). I loaded the raster into R and used the function “focal” from the Raster package to perform mean moving window analysis.  Focal analysis computes an output value of each pixel using a neighborhood. A moving window is a rectangular shape that whose function is applied to each pixel in the raster, and is specified in the “weights” argument in the R code. The rectangle used is the neighborhood, and can vary in size. The larger the neighborhood, the more smooth the values will become. Alternatively, I could have used ArcGIS’s focal statistics tool. Below is an example of the R code I used for a moving window size of 3 pixels by 3 pixels (one pixel on either side of the focal pixel):

f3 <- focal(skyavail.rast, w=matrix(1, nrow=3,ncol=3),fun=mean, pad=TRUE, na.rm= TRUE)

I compared the 7 different sizes of focal windows to the original sky availability values from 54 locations within part of my study area (western slopes of the Cascade Mountain Range). The neighborhood sizes were 3, 5, 9, 11, 17, 21 and 25. The locations I used were the same locations as test sites where I will be assessing GPS collar accuracy and rate of fix success in the future.  

After exporting the new focal window raster into ArcGIS, I extracted values from each of the 7 focal rasters, and the raw sky availability data to assess how these values varied with different smoothing parameters. To accomplish this I used the Extract Multi-Values to Points tool in ArcGIS. Now that the moving window values were now added to the test site locations shapefile, I exported this shapefile as a .csv file where I made graphs to compare and visualize the shift in values.

Sky availability values were extracted from 54 locations, testing 7 differently sized moving windows.

Looking at sky availability values across study sites made comparisons difficult to identify, so I averaged the values for each neighborhood size across test sites and produced a line graph. This graphic shows a threshold relationship between sky availability and neighborhood size. Proportion of unobstructed sky values are high for smaller window sizes, and then sharply decreases after a neighborhood size of 9.

When sky availability values were averaged over the 54 sites, a more clear relationship among moving window sizes can be seen. The proportion of sky unobstructed by topography decreases with moving window size, possibly due to “over-smoothing.”

Since the resolution of these rasters are 10x10m, my interpretation of this is that sky availability seems stable within 45 meters (90x90m), but larger windows may smooth the values too much and this is the point where we lose detail that may influence GPS collar data. This was a useful exercise in learning how to perform focal statistics, but I’m not sure how to proceed from here to apply these results to GPS fix success.

Oregon State Landslide analysis

  1. Research Question

Oregon ’s west coast has important highways that connect Washington and California, and the impact of landslides on traffic is huge. Landslides often occur in certain areas, and some areas have national highways or frequently used highways. Once a landslide occurs, travelers will spend a lot of time detouring, but this is very inefficient. In my research, I hope to find the most sensitive area by analyzing the spatial pattern of landslide susceptibility map and analyze the heat map. Mark the dangerous area and make a high-risk area map. Base on that, the cost, severity, and time of landslides are analyzed to obtain more data to analyze the impact of landslides. In exercise1, the annual cost hot spot analysis is generating.

2. Tools

I think hot spot analysis is most suitable for point data of Oregon State Landslides, so Hot Spot Analysis tool in ArcGIS Pro will be used.

3. Methodology

Basically the step of generating hot spot analysis in ArcGIS Pro is very simple. First of all, add point data of Oregon Landslide in ArcGIS Pro. Then open the tool box, search Hot Spot Analysis in tool box, then the tool are shown as blow.

Figure 1. Hot Spot Analysis Tool

For Input Feature Class, the Point data should be added. For Input Field, I choose Annual Cost for analyzing.

Then, click Run button, the hotspot based on annual cost are generated.

4. Result

Figure 2. Oregon State Landslide Hot Spot Analysis By Annual Cost Map
Figure 2. Oregon State Landslide Susceptibility Map
Figure 3. NW Oregon State Landslide Hot Spot Analysis Map

The map records the landslides that occurred in Oregon since 1990. The hot spot analysis tool was used to analyze the map. The annual cost was used as the parameter of the analyze. Through observation, it is found that NW Oregon most often occurs landslides, but SW Oregon spends the most on landslides.
By analyzing susceptibility map, the entire West coast is high susceptibility to occur landslide. However, most highway or important traffic in west coast, the landslide protectation need more budget. As shown in figure 1, NW Oregon has less cost than SW. Oregon government should add budget in Northwest area to prevent landslde.

For hot spot analysis map, the red points are hot sopt, blue points are cold spot, which represent the similarity of landslide by year.

As shown in figure 3, I have chosen NW Oregon as further analysis by location. In figure 3, lower right has many landslide occured, the hot spot is shown as orenge color in this area, which means landslide in this area are relative similar or the survey only occured in this area. Those area sround by Corvallis, Newport and Monroe. In this area, the further analysis is going to generate due to the area has many roads, highway and resident area.

5. Critique

In the analysis process, the tools in ArcGIS Pro are very efficient, so it does not take much time to generate hot spots. Hot spot is useful in this analysis, the research is about landslide distributions, further information also can be generated by hot spot analysis, for example, annual cost, volume of landslide or landslide area. So I think the tools in ArcGIS are very effective. However, before using the Hot Spot Analysis tool, I failed to use the Create Space Time Cube tool, so there is no more detailed hot spot generation. It can be improved in later exercises.

In exercise1, due to the different soil characteristics of various regions in Oregon, landslides mainly occur in certain specific areas. In the point data, these points represent the place where the landslide occurred. In Exercise 1, these points are attracted to each other in high-risk areas. Forthermore, point data and landslide susceptibility maps are also associated. For red and blue high-risk areas, they are mutually attractive with point data, and for green low-risk areas, they are mutually exclusive with point data.

Hongyu Lu


Unmanned aerial vehicle (UAV)-based photogrammetric approach to delineate tree seedlings

Exercise 1: Image classification for detecting seedlings based on UAV derived imagery

  1. The question that is addressing in this study

Detection of trees is a primary requirement for most forest management practices and forestry research applications. In modern-day remote sensing techniques provide time and cost-effective tree detection capabilities compared to conventional tree detection methods. Notably, the image classification approach considers as one of the vital tools associated with remote sensing applications (Lu and Weng, 2007). During the last few decades, scientists and practitioners developed various image classification methods to detect the landcover land-use features using remotely sensed data (Franklin et al., 2002; Gong and Howarth, 1992; Pal and Mather, 2003). However, due to various factors such as terrain conditions and its complexity, remote sensing platform used and its performance, remotely sensed image classification, image processing tools, and classification approaches used, the desired output of image classification may change/ affects (Lu and Weng, 2007). Therefore, in this part of the study, I am interested in evaluating “how feasible is detecting individual seedling crown using the unmanned aerial vehicle (UAV) based RGB imagery by applying supervised and unsupervised classification techniques?”

Tree seedlings are considered as the individual entities and attraction or repulsion of seedling may depend on how they arrange/planted spacially. Additionally, attraction or repulsion of seedlings can be described based on their spatial arrangement. The spatial arrangement of the seedlings changes according to their regeneration prosses. On a broader scale, both natural and artificial regeneration processes can be observed in the main study area. A systematically arranged artificial regeneration seedlings represent attraction, while natural regeneration of seedling represents some kind of repulsion. However, for this particular study, the northwest part of the study area was selected where artificially regenerated seedlings were prominent.  In this part of the study, approximately 450 m2 of two plots/ areas were selected (Fig.1) for image classification (Fig.1). Plot 1 used for collecting training data, and Plot 2 used for assessing the accuracy of image classification performed in this study.

Figure 1. Spacial arrangement of plots that were used in this study. Plot 1 used as the training site, and plot 2 used as the testing site for image classification.

2.Name of the tool or approach used and steps followed.

Image classification tools available in ArcMap 10.7:

  • Supervised image classification: Maximum Likelihood Algorithm
  • Unsupervised image classification:  Iso Clustering approach  

2.1 Agisoft Metashape

AgiSoft Metashape software used for producing orthomosaic images at the pre-possessing stage of UAV imagery. As shown in figure 2, collected UAV images (287 images) were aligned georeferenced. An additional step was performed to feed the GPS coordinates to each UAV imagery (Link1: Appendix 1). After georeferencing the images, the dense point cloud was produced. Depending on the desired image quality, availability of time, and data space, the building of the dese cloud can be carried out in five different ways (i.e., lowest, low, medium, high, highest). Generally, the required computing power and image processing time increase from lowest to highest.   Finally, orthomosaic image produced and saved it as a .tif  file. The produced orthomosiac image can be used in image classification in the next stage.

Figure 2. Flow diagram of processing orthomosiac image from collected UAV imagery.

2.2 ArcMap

2.2.1 Supervised image classification

  • Collecting training data

The produced orthomosiac image was added to ArcMap 10.7 for image classification. Next, two different blocks were selected from the raw image as training and testing, Plot 1 and Plot 2, respectively (Fig.1). The training data were collected using the image classification tool available in ArcMap 10.7. Draw polygon option used to collect the training data, and data were collected by covering the entire area (Fig. 3). The training sample manager table used to merge the training data for each class, and a total of five classes were determined for this classification (Fig.3). The total number of polygons per class depending on the number of total classes that we are interested, generally, for better classification, the number of polygons for each class need to be equal to 10 times number of total classes (i.e., nu.of polygons class 1 = 10 x 5). Finally, the polygons were saved as a signature file, which is required for the image classification in the next stage.

Figure 3. RGB image of the training site (left) and the distribution of polygons used to collect training data (right), and the image of the attribute table detailing the classes and their respective outputs (bottom).
  • Collecting testing data and image classification

To evaluate the performance of the supervised image classification, a set of testing data points collected over the testing area (Plot2) (Fig 4). Then collected reference points converted into pixels by using conversion tools (points to raster)option available in the ArcMap toolbox. Next, the raster file of plot 2 (Fig. 4 ) and signature file from the training data set (from plot 1) were used as input files to perform the Maximum Likelihood classification. Finally, the spatial analyst tool used to combine the produced classified image and the raster training data (from plot 2) to asses the image accuracy parameters. The output of this process used to create the confusion matrix and evaluate the accuracy of the classified image.

Figure 4. RGB image of the testing site (left) and the distribution of testing data points used to asses the accuracy of classified image (right).

2.2.2 Unsupervised image classification

For the unsupervised classification approach, Iso clustering method was used, which is available under the classification tool category. For this approach, we can define the number of classes that we are expecting to see from unsupervised classification. Also, we can define the minimum class size and sample intervals that we are interested in the classification. A total of 8 classes were included to get a better classification scheme; however, the initial eight classes were reduced into three major classes by comparing the orthomosaic and classified image features (Fig. 5). Reclassify tool used to merge the similar classes to get the final classified image.

Figure 5. The initial stage of the unsupervised classified image(left) and reclassified image by matching the features of the RGB image(right).

The spatial analyst tool used to combine the produced classified image and the raster training data (from plot 2) to asses the image accuracy parameters. The output of this process used to create the confusion matrix and evaluate the accuracy of the classified image

3. Results

Generally, the error matrix approach is widely used for assessing the performance/accuracy of image classification (Foody, 2002). Further, the error matrix can be utilized to produce additional accuracy passement parameters such as overall accuracy, omission error, commission error, etc (Lu and Weng, 2007). In this study, the supervised image classification approach showed higher overall accuracy (94 %) compared to unsupervised classification (93%). Detection of tree seedlings in the supervised classification approach showed 94 % of user accuracy and 10 % commission error. However, tree seedling detection based on unsupervised image classification showed 4% and 94 % of commission error and user accuracy, respectively.  Similarly, the supervised classification approach showed 97% of producer accuracy and 3% of omission error for the class of seedlings. Additionally, the unsupervised image classification approach showed 94% producer accuracy and 2 % of omission error for the class of seedlings (Fig.6).

Figure 6. The output of the supervised classification (top) and unsupervised classification (bottom).

4. Critique

Overall, estimated accuracies show promising results for identifying the tree seedlings using both supervised and unsupervised classification. However, estimated classification accuracy depends on some essential parameters, including sampling design, estimation, and analysis procedures (Stehman and Czaplewski, 1998). Additionally, sampling strategy also plays an important, especially in determining sample size, sample unit (either pixels or polygons), and sample design (Muller et al., 1998). Therefore, the observed results are highly dependent on the parameters described above. For this study, a random sampling approach was used to collect data that cover the study area to represents most of the objects in orthomosaic imagery.

Spatial resolution is another important factor that can affect the image classification performances. One of the main advantages of UAV imagery is fine resolution. Fine resolution images reduce mixed-pixel problems while providing more landcover information compared to medium and coarser resolution imagery (Lu and Weng, 2007). Hence, fine-resolution images have a higher potential to provide better image classification results. However, there are a few drawbacks associated with the fine resolution imagery, such as effects of shadows and high spatial variation within the same landcover class: these problems tended to reduce the accuracies of image classification (Cushnie, 1987). For example, the classifier may detect dark objects as trees seedlings because shadows may have similar pixel values as seedlings (Fig.7).

Figure 7. RGB image showing the shadow of a cut-down tree stem (left) and unsupervised classified image for the same area (right).

Another potential problem associated with UAV based orthomosaic imagery is errors associated with data processing. Especially in this study, I found there was some low detection of seedlings in both supervised and unsupervised classified images (Fig. 8). Figure 8 shows the coarse-scale pixelated area (circled) in RGB image and low detection of tree seedlings both unsupervised and supervised classifications, respectively.

Figure 8. Figure showing the effect of coarse pixels for both supervise (bottom) and unsupervised (top) classified images.

By considering the spatial aspects of the classified images, we can notice that there were some tree seedlings were missing compared to the RGB image (Fig 9). This type of issues may occur due to poor health conditions of the seedlings or due to some drawbacks associated with image classification settings (i.e., the pixel value of these trees may be lower compared to the pixel value of healthy tree)

Figure 9. RGB image showing the spatial distribution of seedlings (left) and the distribution of same seedlings detected from unsupervised classification (right)

Overall, this image classification approach showed the potential use of both supervised and unsupervised classification to detect tree seedlings. The observed minor drawbacks can be addressed by implementing the quality of image processing and classification.

Appendix 1


Future works:

In addition to the above classification methods, I tried to use the random forest classification approach. By changing the input band values for the random forest algorithm, the detection of tree seedlings can be enhanced (Fig. A1). However, further research/details are required to perform better classification using random forest approach.

Figure A1. Image showing the outputs derived from the random forest classification approach.


CUSHNIE, J.L., 1987. The interactive effect of spatial resolution and degree of internal variability within land-cover types on classification accuracies. Int. J. Remote Sens. 8, 15–29.

Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote Sens. Environ. 80, 185–201.

Franklin, S.E., Peddle, D.R., Dechka, J.A., Stenhouse, G.B., 2002. Evidential reasoning with Landsat TM, DEM and GIS data for landcover classification in support of grizzly bear habitat mapping. Int. J. Remote Sens. 23, 4633–4652.

Gong, P., Howarth, P.J., 1992. Frequency-based contextual classification and gray-level vector reduction for land-use identification. Photogramm. Eng. Remote Sens. 58, 423–437.

Lu, D., Weng, Q., 2007. A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28, 823–870.

Muller, S.V., n.d. Accuracy Assessment of a Land-Cover Map of the Kuparuk River Basin, Alaska: Considerations for Remote Regions. Photogramm. Eng. 10.

Pal, M., Mather, P.M., 2003. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens. Environ. 86, 554–565.

Stehman, S.V., Czaplewski, R.L., 1998. Design and Analysis for Thematic Map Accuracy Assessment: Fundamental Principles. Remote Sens. Environ. 64, 331–344.

Exercise 1


Are there nonrandom spatial patterns in reported illness across the country?

Using data from Pakistan for the years 2011, 2013 and 2014, I explore rates of respiratory illnesses for districts across the country. I further examine whether these patterns vary by age (children less than 5) and gender. Intuitively, I expect that underlying mechanisms such as poverty and pollution exposure will influence spatial clustering i.e. poorer areas may have greater incidence of illness just as areas exposed to greater pollution source would also have more people falling ill.

Attraction and Repulsion: One “attraction” mechanism is exposure to crop fires. Areas that have greater exposure may have higher rates of reported illness.  

Name of the tool or approach that you used

I used the Getis-Ord Gi* (hotstop) statistic to assess whether there are statistically significant clustering of high/low rates of reported respiratory illnesses across districts in the country. This initial analysis can be used to gauge if areas are more or less susceptible so to later develop a causal understanding of the mechanisms that explain these illnesses.  

Sources and Sinks: A uni variate analysis of the count of reported illness across districts provides limited information as less populated areas/districts will have a lower count than other more congested areas. To account for spatial patterns in population, I use the total number of people interviewed in each district as the base for calculating the rates of illness across districts.

Steps to complete the analysis

I use survey data from Pakistan to analyze individual responses on illness and created a binary response variable – 1: if illness was experienced and 0 otherwise. The steps for analyzing the spatial distribution of this variable were as follows:

  1. Treating all family members in a household as independent respondents, I aggregated the count of illnesses across districts. I use this for choropleth maps (as in Figure 2) and also to calculate district level rates for the hotspot analysis.
  2. For each district, I also aggregate the number of individuals interviewed and use this as the denominator to calculate the rate of illness across districts.
  3. I used the collapse command in stata to get one observation per district from my entire dataset.
  4. I then merged the above data with a district level shapefile and subsequently carried out the hotspot analysis.
  5. The input field was the rates that I had calculated and used the fixed distance conceptualization of spatial relationships to compute the Getis-Org-Gi statistic.


At this stage, I am only looking at the distribution of my outcome variable. It is not possible to draw inference since the sampling is not representative of the district population. My purpose at this point is to explore the data and then move on to do an analysis of exposure to see if there is association between this outcome variable (respiratory illness) and a household’s proximity to fire locations.

The results of the hotspot analysis (Figure 1) show that the significant clusters exist in Punjab. These results make intuitive sense as the province has the largest share of the country’s population residing in it. The results also indicate that Agricultural crop residue burning (occurring primarily in Punjab) may be a worthwhile cause to explore further in the analysis.

Figure 2 is a mapping of point data using household addresses that may later be used to explore household level explanations of the causes of illness in a community. Figure 3 – 5 are spatial distributions based on aggregate as well as yearly data. It is evident that the patterns exhibited by the simple count of reported illnesses does not predict the presence of statistically significant clusters as presented in Figure 1. This is likely because the districts with higher count also have higher population density.

Figure 1: Spatial Clustering of Respiratory illnesses across districts
Fig 2: Point locations of households interviewed
Figure 3: Count of individuals with respiratory illness across districts
Figure 4: Individuals interviewed across districts
Figure 5: Rate of illness across districts


The analysis a present is a district level which does not reflect household level variation. Such broad aggregation does not tell us much about the neighborhood level factors (access to sanitation, type of energy source, exposure to pollution etc.) that are likely to be strongly associated with the current outcome variable. Another limitation of the present analysis is that the district level of aggregation limits the use of spatial regression due to small n – there are only 77 districts covered in the survey.

However, the hotspot analysis is useful to understand the distribution of respiratory illnesses. It may be more useful for my research if I am able to do a hotspot analysis of the residuals of my regression model.

Exercise 1: Comparison of Interpolation Methods to Estimate Zooplankton Density

Question asked

A key to understanding the foraging patterns of predators, such as gray whales, it is important to have a grasp of the patterns of their prey, which in the case of gray whales is zooplankton. The distribution and abundance of zooplankton is often described as patchy and variable. Therefore, my question for exercise 1 was “What is the spatial and temporal pattern of zooplankton density in the inshore waters off of Port Orford, OR?”. In order to address this question, I tested using different types of interpolation on my point data of prey density sampled at 12 stations during the month of August in 2018. 

Approaches/tools used and methods

To address the question, I wanted to explore various types of interpolation to see how the density of prey collected at my sampling stations mapped out to my entire study site to fill the gaps of where we do not have known density of prey. I had planned on trying to do this in ArcGIS (a program I am too familiar with), however due to slowness of remote connection and my lack of knowledge/skills, I decided to teach myself how to do interpolations in R instead, since I am more savvy in R. I quite enjoyed the process of trying out new things and figuring out kinks in lines of code that I found online. Since interpolations in R were also new to me, I followed a very helpful tutorial on how to do interpolations. However, because of this, I was limited to the types of interpolation that were presented in said tutorial, namely Inverse Distance Weighting (IDW), 1st and 2nd order polynomials, and kriging. The other main type of interpolation that was not covered in this tutorial was splines. However, after some research, I discovered that splines do not tend to do well when there is a lot of variation in the data that is being interpolated. After doing some initial data exploration, I found that there was significant variation in my data, such that on most days there were 1-2 stations that had a very density while the other 4-5 had a relatively low density. Therefore, I decided that it was not too important that I was unable to run a spline interpolation for that reason. 

Raw relative density values of zooplankton at sampling stations in Mill Rocks (left) and Tichenor Cove (right) on August 2nd in 2018.

I will be sharing code for how to make interpolations in R during my tutorial, however I will detail the rough steps involved here. First, I had to import coordinates (latitude, longitude) for the boundary of my study sites and convert it into a spatial polygon. Then, after splitting my data into daily data frames, I converted them into spatial points. I used several packages that contained the functions to run different steps of the data preparation and interpolation, including ‘spatstat’, ‘gstat’, ‘tmap’, and ‘sp’. All of these packages have other dependencies that also needed to be installed and loaded before these packages and functions will run. 

Before I move on to some preliminary results, I want to explain some of the core differences between the different kinds of interpolations I employed.

IDW is a deterministic interpolation method, which relies on surrounding values to populate the resulting surface. One of the major assumptions is that the variables being mapped decrease in influence linearly as you move away from a given value. In IDW, the weight given to any given point is calculated linearly, decreasing as you move farther away from this given value. 

1st and 2nd polynomials fit a smooth surface given by a mathematical surface to the data points. The polynomial surfaces change gradually and therefore are good at capturing coarse changes in data. It’s kind of like fitting a flat sheet of paper to a surface of points and trying to find the best straight fit. That’s what 1storder polynomial interpolations are. In a 2nd order polynomial, the ‘sheet of paper’ is folded once. Polynomials tend to be very good at representing a gradual change in the variable of interest.

Kriging is different from IDW and polynomials as it is not a deterministic interpolation but instead is a geostatistical approach. Kriging is a little more advanced than other interpolations because it not only considers the actual value of the variable at a spatial location, but it also involves an exploration of the spatial autocorrelation between points. Therefore, kriging not only has the ability to produce a predicted surface but also provides a measure of certainty or accuracy of the predictions.


The resulting interpolations for my two field sites can be found below. I have selected the interpolations from the same date at the both sites so that some initial visual comparisons can be made.

IDW interpolations for Mill Rocks (left) and Tichenor Cove (right).
1st order polynomial interpolations for Mill Rocks (left) and Tichenor Cove (right).
2nd order polynomial interpolations of Mill Rocks (left) and Tichenor Cove (right). The interpolation for Mill Rocks says 1st order polynomial and this is because the 1st and 2nd order polynomials for Mill Rocks were the same for every day, so the same interpolation is presented here.
Universal kriging interpolations of Mill Rocks (left) and Tichenor Cove (right).

Critiques of the method

Based on my research on the different interpolations, it seems that my data are best suited for IDW interpolations. This is because I have very few data points (only 6 stations per site) and there is quite a lot of variation in the density of zooplankton within a site on the same day. Since polynomials and kriging apply functions and equations, they will inevitably smooth the data to some extent. However, I am precisely trying to capture this variability in the data as my next step will be to see whether gray whales prefer sites that have high densities of prey. Therefore, if I select an interpolation that smooths out this variability, then I may not find the right relationships between predator and prey that I am investigating. 

Even though the polynomials may make sense as they seem to imply there is a gradient of density that occurs in the field sites and the raw data may in fact also reflect this, the polynomials (and the kriging too) create very unrealistic values. If you look at the density scales next to the plots, you will see that both the polynomials and the kriging interpolations end up having negative density values which ecologically, does not make any sense. 

I still have a lot of refinement to do with these interpolations, including calculating root mean square errors for each of the interpolations to quantitatively compare them to one another to see which one performs best. Furthermore, since I think the IDW interpolations are the most suitable based on my data, I think it would be good to continue exploring the different interpolation variables, like grid cell size or buffers around the points, to see what variations will look best and most ecologically relevant.

Accessing the incremental and spatial autocorrelation of Particulate Organic Phosphorous (POP) in Bermuda Ocean. (Exercise I)

1. Research Question
In order to perform the right hysplit back trajectory data, the first thing that I must do is accessing the spatial and the temporal pattern of POP. The initial data of POP is looked like clustered, however, it needs more spatial statistical analysis to determine where is the highest clustered condition with exact coordinate and time. Hence the questions that I would like to ask for exercise A is the cluster pattern of POP and when is the highest value of POP is conducted.

2. Tools and Approach
I decided to use Incremental and Spatial Autocorrelation by comparing the Hot Spot Analysis (Fixed distance band) and Local Moran’s I. By looking at the result of those features, I can decide which tool that appropriate to read my data.
I used Spatial Statistic Tools in ArcMap 10.7.1:
a) Analyzing pattern by using Incremental Spatial Autocorrelation and Spatial Autocorrelation (Moran’s I).
b) Mapping cluster by using Hotspot analysis: ((Getis-Ord general G) and Optimized Hot Spot) and Cluster and Outlier Analysis (Anselin Local Morans I)

3. Step of Analysis
a) My POP data consisted of a lot of value based on the depth in coordinate, thus in order to start analyst the spatial dan temporal pattern I need to exaggerate the POP by averaging the POP value. Subsequently, I got one POP value for one X and Y coordinate. In order to do this I used Matlab, with code as follows:

b) The next step is determining the rank of POP for clustering purposes. The rank is defined by using the mean value, where the POP value which highest than the mean will be considered as high value and POP value which lower than the mean will be accessed as low value. I also used Matlab to perform this job, the code as follow:

c) Since the data from Matlab is in txt format, hence I call this data in ArcMap to display X and Y data and convert it to be the shp file.
d) Analyzing the pattern by using spatial autocorrelation Moran’s I with “fixed distance band” and “Euclidean Distance” and “non-standardization”. The second pattern is by using Incremental Spatial Autocorrelation with ‘Row standardization’.
e) The mapping cluster is generated by using as follows:
1. Cluster and Outlier Analysis (Anselin Local Morans I) with “fixed distance band” and “Euclidean Distance” and “non-standardization”.
2. Hot spot analysis by using Getis-Ord general G with “fixed distance band” and “Euclidean Distance” and Optimized Hot Spot with “Snap nearby incident to create a weighted point” aggregate method.
f) Join the attribute table of initial data to the attribute table of spatial autocorrelation result to get the highest clustered POP value by coordinate and time.

4. Result
a) The data is significantly decreased from 929 points to 82 points.

If we see in a three-dimensional graph hence, we can see that the pattern of POP is clustered over five years’ time interval, however, I need more robust spatial autocorrelation to get the highest clustered value.

b) The pattern analysis is performed based on the rank value, where the result is like I expected, where the highest value is clustered in one area.

In order to robust the certainty of the result, I also performed the incremental spatial autocorrelation, where the result seems similar to the Morans I.

c) The Mapping clusters

The mapping shows the result of different methods of mapping cluster analysis, both Local Moran’s I and hot spot indicate similar result where the hot spot and high value is clustered in the black dot. Based on this result, the Hysplit back trajectory will be conducted on the black dot on Moran’s I cluster.

5. Critique of the method – what was useful, what was not?
Actually, my data has been clustered in one area, however, it will consume a lot of effort and time if I analyses all the hysplit back trajectory data for all date of the year in that clustered area. This spatial autocorrelation gives me a plausible decision to determine the spatial and temporal analysis for hysplit back trajectory.
However, I am not sure that I have conducted the right spatial relationship or not, I need to deepen my knowledge on how to determine the conceptual spatial relationship based on the initial data. Once we change the conceptual spatial relationship the result of the Z-score will change as well.

Looking for Ancient Surfaces using Kernel Density Interpolations

Primary Question

The main question I sought to answer within this first exercise looks to identify the dispersion and aggregation of artifacts in an assemblage. Using this assemblage as a point pattern, I implemented a method of interpolation based upon the density of physical points within space. The first hypothesis I am seeking to assess is whether my assemblage is clustered within physical space, regardless of artifact attributes. I am moving forward with the assumption that artifact attributes do not produce and internal source or sink in my dataset. The main reason that it may appear that similar artifact attributes are attracted to one another in space is because the external force behind their creation produces very constant patterns in this regard. Human behavior is the external factor that is primarily responsible for the perceived sources and sinks in typical artifact assemblages. For this reason, I have neglected artifact attributes (e.g. length, width, material, type) in this initial hypothesis.

In addition to the previous assumption, since the dataset is confined within a depth of 1 meter, identifying clustered behavior in this artifact assemblage is not sufficient within 2-Dimensions. I must consider how the artifacts within this assemblage are clustered within 3-Dimensional space involving the longitude, latitude, and depth. Where many may attempt to cluster within this expanded dimensional, I have chosen to segregate this dataset into horizontal segments based upon observable characteristics that come from visualizations of the artifacts within 2-D and 3-D space. As a result of this approach, there is a potential for a higher amount of error because I have made visual judgments based upon what I can see in the assemblage with little quantification present in this step.

In conclusion, I seek to identify dispersion within the artifact assemblage in 2-Dimensional segments varying in vertical extent using interpolation methods based upon the kernel density function.

Proceeding with Interpolation

A tool that I used to assess the dispersion within the horizontal segments of the artifact assemblage is the kernel density function called ‘kde3d’ located within the software package ‘misc3d’ in R. Since this tool has already been created and established in R, the first step I undertook was assigning the proper coordinates for each artifact within ‘kde3d’. One of the main parameters in this function is the number of grid points to use. Ultimately, this number relates to the accuracy and “smoothness” of the resulting image. A number is then assigned to each grid determining the density in a cubed extent of the data frame. When this resulting value is produced into contour lines using ‘contour3d’ located in the same package, I am able to draw these contours in order to visualize the 3-D kernel density of the artifact assemblage (figure 1). The residual points not included in the topological features are outliers and do not constitute a highly dense area. Ultimately, figure 1 shows areas that contain a significant cluster of artifacts. I then used the resulting 3-D kernel density image (figure 1) in conjunction with a vertical window into the artifact assemblage (figure 2) in order to identify depths that I will choose as my horizontal segments.

Figure 1 3-D Kernel Density

Figure 2 resembles one part of the process that was undergone in order to judgmentally determine the most significant surfaces to analyze within 2-D horizontal segments with the x-axis assigned to the Easting and the y-axis is assigned as the Elevation. The areas, or levels, chosen here will be referred to as surfaces of interest. These have been determined significant because the artifacts appear to follow an increasing trend along a similar angle as the recorded topographic surface, there is a gap between artifacts and these intervals representing a possible period with no occupation, and there is an aggregation of artifacts within each surface of interest. This is just a single example of a window that has been cut into the dataset; similar procedures were done with multiple other windows cut into the dataset. Figure 3 represents an example of what I refer to as a “window” cut into the dataset within a 1-meter interval spanning from West to East across the entire site extent.

I have chosen five initial horizontal segments to analyze further. These horizontal segments have been analyzed for clustering using the ‘density’ function in the package ‘spatstat’ in R (figure 4). This density function uses the similar kernel density calculation but instead of 3-D space, I have created an interpolated surface within 2-D space based upon the density of artifacts contained in each surface interval. Using the ‘spatstat’ package, I separated the artifact types at each surface in order to visualize spatial patterns between all the artifacts and then between each artifact type at every 5cm interval in my dataset. For the purpose of this exercise, I have chosen to only include the density plots representing the artifacts at the five identified surfaces in the dataset (figure 4).

Figure 2 Vertical windows cut into the artifact assemblage. Moving from left to right mirrors moving from West to East.
Figure 3 Plan view of archaeological site representing the method used to determine surfaces of interest.


              Resulting from this procedure, I was able to visually identify clustered areas of artifacts within both 2-Dimensional and 3-Dimensional space. Figure 1 identified high density areas of artifacts within 3-D space and facilitated in the identification of five primary horizontal segments of the dataset that I will focus on. Within these five segments, I have analyzed the interpolated density surfaces containing all artifacts and surfaces containing a single type of artifact (figures 4). There appears to be clustered areas within the artifact assemblage that may be representative of different human activities at this site. Figure 4 shows highly dense areas of artifacts across the entire site is specific areas. As we move between the different surfaces in figure 4, the dispersion and density of artifacts changes as well. Thus, there appears to be some kind of variable that is shifting patterns in the spatial distributions of artifacts throughout time in this assemblage. The next step is to analyze the type of variable that may be affecting artifact distribution throughout each horizontal segment.

              The primary critique that comes with this interpolation approach has to do with the scale of investigation. For a specific example, my data set has one entry labeled as Fire Cracked Rock (FCR). Even though there is only one count of FCR in my data, when I interpolate the surface containing this object, the result shows a very prominent density in the area surrounding area. If one looks closer, they will notice that the scale of this particular frame is adjusted to indicate a density value less than 0.5. This value occurs because, in the dataset, there is only one object identified as FCR. Thus, visually, there appears to be a significant amount of FCR in the identified area but in hindsight, there is only a single object in that location. So, in light of this critique, I must adjust my interpretation of which artifacts present a clustered area and which ones could simply be outliers within the dataset. The use of visualizing point patterns in 2-D and 3-D space comes into play here in order to mitigate the affects of this misrepresented data.

Figure 4 Density maps of the five surfaces of interest

Ex. 1 Spatial-Temporal Patterns of Flood Insurance Claims

  1. Research Question 
    My research question that I continue to work toward answering is How is the temporal pattern of  population flow by county related to the temporal pattern of flood insurance claims by county via risk compensation? I was particularly interested in the temporal pattern of insurance claims by county for this week. I examined single year as well as whole study period extents. From the exercise 1 lab I was also drawn to the question of possible sources/sink locations, which, in the case of my data could signify vulnerable areas located near the coasts as sources of hazard. While the sink is areas of higher population adjacent to these sources of more hazardous locations. Down the line, this source/sink concept could be interesting once I include population flow data into the mix. See below for a bubble map denoting total flood insurance claims over the study period (1990-2010) by county.
Total NFIP Claims by county visualized by proportional bubbles

2. Approach

I was interested in and attempted a few different approaches to explore my data. I started out by attempting spatial autocorrelation using Moran’s I to attempt to disaggregate county-level vector data and examine the spatial dependence significance. Next I attempted an Inverse Distance Weighted Interpolation. Finally, I attempted to simply examine and plot the temporal variation in my insurance claims data.

3. Steps and Results

Using the Monte-Carlo simulation of Moran I in R I observed a significant pseudo p-value (less than 0.01) which indicates that there is a significant spatial autocorrelation between contiguous counties and we would not expect to see this relationship if the claims data was randomly distributed across counties. 

Inverse Distance Weighted interpolation attempts to predict the values of data points near areas of known values. Since my data is attached as county polygons I converted this to point data and then ran an IDW function over my entire claims dataset (total claims from 1990-2010) to show interpolated values across the country (see picture below). I also tried this for single years of data (see below for 1998 as an example year). This map along with other maps I have created from this data helped inform where some “hotspot” regions exist for future work. 

Finally, I summarized and plotted the data to examine the temporal variation. I plotted the counts of claim frequency at the national scale and also selected 3 hotspot counties (Houston, New Orleans, and Miami) to plot as well (see charts below). These plots are helpful in showing the volatile nature of flood claims. Since I have been working with this data for some time, I had some preconceived ideas of the data’s temporal characteristics. Thus, I understand the up-and-down nature of flood claim frequency that respond to the acute nature of natural hazards. I also created plots for mean annual claims which I included below. I attempted to create a mean difference map in an attempt to show anomalous data deviation but that has not as of yet has depicted data in a new way than previous mapping efforts. 

4. Critiques of Methods

Of the three approaches the IDW interpolation was the least helpful. As I now understand, this type of analysis is not particularly helpful to my type of data and I could essentially see the same results from mapping the data by county. I found the spatial autocorrelation approach to be interesting, however, the results were not surprising given the known spatial distribution of disaster claims from the visualized maps. Finally, the temporal plot focus yielded helpful information regarding the temporal variation in the data and inside hotspot areas. 

Determining Patterns of Spatial Distribution of Nearshore Benthic Organism Abundance from Point Conception to Canada

Exercise #1: Determining Patterns of Spatial Distribution of Nearshore Benthic Organism Abundance from Point Conception to Canada

Research Question: Using ~80,000 benthic box core samples along the California, Oregon, and Washington coasts, I investigated methods to determine patterns of spatial distribution of benthic organism abundance (both infauna and epifauna). This process was the first step towards examining the correlation between total benthic organism count and total organic carbon (TOC). I conducted this investigation at three spatial scales: (1) the entire study area (Pt. Conception, CA to the Canada border) (2) an approximately ~60 km2 area off Yaquina Head near Newport, OR, and (3) an approximately ~760 km2 area off Newport, OR from Beverly Beach to Seal Rock.

Hypothesis: I hypothesize that total organism count will have a clustered spatial distribution due to both internal and external processes, including attraction/repulsion and sources/sinks. Predator/prey interactions, mating, territorialism, and colonization are all internal processes of attraction/repulsion that impact the spatial distribution of total organism count. For instance, if a large Dungeness crab megalope recruitment event occurs in Yaquina Bay, predators will congregate in that area to take advantage of the abundant food source, resulted in a concentrated area of high organism abundance.  Salinity, upwelling, sediment grain size, pH, temperature, turbidity, wave intensity, current, light availability, depth, total nitrogen, total organic carbon, chemical pollution, & noise pollution could all serve as either external sources or sinks of organism abundance depending on the adaptations of particular species. I hypothesize that total organism count, especially of detritovores, will be higher in areas with relatively high levels of total organic carbon.   

Tools/Approaches Used & Steps: I focused on three Spatial Analyst tools in ArcMap 10.7.1 to conduct my analysis: Hotspot, Kriging (Interpolation), and Inverse Distance Weighting (IDW, Interpolation). Before I could begin my analysis, I first merged various tables from an Access database and created Pivot Tables in Excel so that I could have datapoints with total organism count, TOC, lat/long, and sampling date information to make the “next steps” in my analysis easier. After that I began playing around with various tools at my three spatial scales, a process I’ll summarize below:

  1. Total Study Area: In order to generate as accurate an interpolation of total organism count as possible, I loaded all my data into ArcMap, opened the attribute table, and then used the “Select by Attribute” function to select only box core samples that were sifted using a 1 mm mesh screen. Different mesh sizes would of course impact the number of organisms counted (smaller critters fall through bigger holes). I then used the Inverse Distance Weighting (IDW) (Spatial Analyst) tool to generate an interpolation raster for the entire study area. The raster output was a large rectangle that included all the points. Because my data points are distributed over a large area along the coast, the raster covered significant areas of land and sea. I felt that the interpolations could only be “trusted” near the actual data points, so I decided to clip the raster to a smaller area. In order to do this, I buffered all the points and dissolved the buffer to generate a polygon. I then used the Extract by Mask tool to create a raster within the bounds of the buffer polygon. Some of the raster area was still over land, so I imported a polygon of California, Oregon, and Washington and then used the Erase function to delete the portion of the raster that intersected with the state areas. I ended up with an IDW interpolation raster for the coastal area. Next, I conducted a Hotspot analysis on the 1mm mesh organism count data. Hotspot analysis doesn’t do well with unevenly distributed data points (which I have), so I selected data points with values near mean organism counts within highly sampled areas. I ultimately determined that Hotspot analysis is not the most appropriate method for determining spatial distribution given my data, a conclusion I will discuss in more detail below.  
  2. Newport Limited Region: Looking towards the future, I plan to examine if/how the relationship between total organism count and TOC changes seasonally. With that in mind, I generated interpolations for five months in 2011 (April, June, August, October, & December) of a ~60km2 area off Yaquina Head near Newport, OR. I once again used the IDW (Spatial Analyst) tool in ArcMap 10.7.1. I then ran a sixth interpolation using all my data for that area to compare to the five monthly interpolations. Next, I used the “Minus” function to subtract the mean pixel values of all the data from the pixel values of the monthly rasters. These difference rasters allowed me to more easily compare the monthly organism count data.
  3. Newport Extended Region: In the Newport Limited Region, I only looked at data from 2011. In the Newport Extended Region, I looked at all the monthly data I have between 2010 and 2016 from a ~760km2 area off Newport between Beverly Beach and Reedsport, OR. I determined that I have sufficient data from April, June, August, & October. I also had some September and December data, but not enough to make accurate interpolations for the whole area. For these interpolations, I decided to compare results from the Kriging and IDW methods.


  1. Total Study Area: The IDW interpolation for the entire study area suggests that the areas with the highest overall numbers of benthic organisms occur near San Francisco Bay, Monterey Bay, and off the coast of Washington State north of Puget Sound (see Figures 1 and 2). Generally speaking, the Oregon coast seems to have lower total organism counts than the other two states. The Hotspot analysis showed similar results. 
  2. Newport Limited Area: The difference rasters helped me to see some temporal change in total organism count, but I will save a discussion of those findings in a later exercise. I included an example of how I generated a difference raster in Figure 3 below.
  3. Newport Extended Area: For the larger Newport region, I decided to compare the IDW interpolations to Kriging interpolations for four different months. I observed some clustering of total organism count (see Figure 4).
Figure 4: Visual comparison of IDW and Kriging interpolation methods.

Methods Critique: Broadly speaking, this exercise provided me with an excellent learning opportunity. I found it useful to practice working through a large dataset, switching between different forms of software analysis (ArcMap, Access, and Excel), and using various spatial analysis tools in ArcMap that I haven’t used recently. I also learned about the advantages and disadvantages of various methods for determining spatial distribution patterns, namely IDW, Kriging, and Hotspot analyses. 

Inverse Distance Weighting: The assumption behind this method is that points that are closer together will be more similar than points that a farther apart. Unknown values are determined through the weighted average of known values within a certain radius. Spatially closer values are given more weight than those that a farther away within the set radius. Weights are proportional to the inverse of the distance, raised to an assigned value p. I used p = 2 for my analyses. IDW has the advantage of relative simplicity which makes the results easy to interpret, but it does have some disadvantages. In instance, interpolated IDW rasters can be skewed when data points are highly clustered. Additionally, IDW cannot interpolate values above or below the minimum and maximum values form the dataset. IDW also does not generate variance rasters to help the viewer determine how “trustworthiness” of the results. 

  • Kriging: Like IDW, the Kriging interpolation method estimates values for unsampled areas using known data made up of a certain number of points or within a certain radius of an estimated point. This technique used both the distance between points and the degree of variation between points when interpolating values. One difference between Kriging and IDW is that Kriging can interpolate values outside of the range of the known dataset. However, the interpolation does not necessarily pass through known points, so the interpolation raster values could be higher or lower than the known values in a sampled location. The results of the Kriging and IDW showed similar patterns of total organism abundance in this instance.
  • Hotspot: I ran the Getis-Ord Gi* Hotspot analysis tool in ArcMap 10.7.1, which identifies statistically significant hotspots and coldspots in spatial data. The tool generates an Arc feature class that contains the statistical p-value and z-values for each feature (in this case point) in a designated feature class. After consultation with Dr. Jones, I learned that hotspot analysis is not an appropriate toll when sample sites are determined by researchers. My data comes from multiple projects and as a result some of the sample sites are spread out and others are clustered closely together. These clusters can influence the results of hotspot analysis.  

Exercise 1 – Spatial patterns of population change

My dataset A is population by county in Nebraska, Kansas, Oklahoma, and Texas in 1992 and 2017. My research question is asking what the spatial patterns in population are in 1992 and 2017. I also wanted to identify the spatial patterns in the change in population between 1992 and 2017.

My hypotheses for this analysis are that there will be much higher population rates in cities than rural places, but more importantly, that population increase will be in urban and suburban counties. I also suspect that there will be decreasing populations in rural counties. I hypothesis this shift towards cities because of the attractions of cities (and repulsions of rural areas). People frequently move in search of more job opportunities, which are typically in or around cities (PRB, 2008). A major industry in rural areas is agriculture, which is a shrinking industry, especially due to the concentration of farms. Fewer job opportunities in rural areas act as a repulsion processes and the larger economies in cities act as an attraction process.

In order to identify spatial patterns in the population data I created maps of the two years of data as well as a map of the difference in population. To help highlight the areas with higher rates of increase and decrease in populations, I conducted a hotspot analysis.

Since I have continuous county-level data, I determined that a hotspot analysis would be the best way to highlight the patterns. I used a similar approach to the Getis-Ord Gi hotpot analysis, but since I was working in Python, I used the Exploratory Analysis of Spatial Data package to conduct spatial autocorrelation and identify hot spots, cold spots, and spatial outliers.

In order to conduct a hotspot analysis, I had to first join the population datasets from 1992 and 2017 with the shapefile of counties. This took some reformatting of the datasets and their county identifier codes to match them up correctly. I then selected out the four states that I am working with but found that the population in cities in eastern Texas are significantly larger than anywhere else in those four states, so I separated eastern Texas from the rest.

In order to calculate hotspots, I had to first calculate the spatial lag. This function returns the average of each observation’s neighbors. Then using weights, the spatial lag, and a few other inputs, identifies counties that are hotspots and cold spots.

My results show some obvious hotspots over cities, although not all cities are hotspots. It is possible that some have populations that are not as large as others, and therefore are not standing out as hotspots. I found the cold spots to be helpful in identifying cold spots, or areas with dramatic decreases in population. There is a large cold spot between Nebraska and Kansas on the eastern side of the states. There are some others that I will keep an eye on as well when I bring dataset B into the picture.

On the left is the hot and cold spot map of population change between 1992 and 2017. On the right is the raw change in population (2017 – 1992).

I would say that this method has been useful for highlighting the hot and cold spots in a busy dataset, but it has not created new insights. The tool that I used in Python is also not documented in very much detail, so there was a lot of front loading of effort in order to get it to work.


Population Reference Bureau. 2008. Population Losses Mount in U.S. Rural Areas. Retrieved from

Assessing Greenland Outlet Glacier Retreat from Terminus Lines (Exercise 1)


My work for Exercise 1 was geared towards developing a method to calculate the magnitude of retreat of Greenland outlet glaciers, based on their terminus lines. Because there is no obvious tool or set of tools to do this in ArcGIS pro, I developed two distinct workflows to produce relatively accurate datasets representing the spatial retreat of glacier boundaries. Once these datasets were constructed, I was able to conduct some very preliminary spatial statistics to determine their distribution.


1.How can I calculate and represent glacier retreat solely from polylines of glacier termini? How is this (constructed) variable of glacial retreat distributed across Greenland?

What kinds of attraction/repulsion processes might occur in your variable? 

Outlet glaciers tend to be connected (or their qualities may be “spatially attracted”) to those nearest to them because they are all fed by a single ice mass. For example, if the southeast portion of the Greenland ice sheet experiences higher temperatures, the outlet glaciers that it feeds will likely have a higher likelihood of experiencing retreat.

Furthermore, temperature and precipitation in and of themselves tend not to affect entire regions universally. For example, certain regions warm up more than others, which could lead to a “clustered” distribution of glacier mass balances. Greenland as a whole is not warming at a constant rate across its geographic extent.

Local geography (valley geometry, rainshadow effects, etc.) have a certain element of randomness that can contribute to repulsion behavior (adjacent glaciers experiencing similar forcings not necessarily responding the same way.

What kinds of source/sink processes might occur in your variable? 

I think in general, the primary “sources” of material to outlet glaciers is ice from the interior of the ice sheet, and from snow that falls directly in the accumulation zone of each glacier (locally). The primary sinks, that contribute to ablation, are surface melt (a function of temperature), subaerial melt (within the glacier) and ice-ocean interaction (calving and subaqueous melting). Together, these competing forces determine the mass balance of any given glacier. Obviously, all of these forces have a local and global trends, which could lead to different spatial patterns at different scales.

Approach #1- Drawing Polygons:

ArcGIS Pro Tools Used:

Merge, Feature Vertices to Points, Points to Lines, Feature to Polygon, Hotspot Analysis


To experiment with techniques, I limited my datasets to just 2 years, 2005 and 2017, to calculate the magnitude of retreat over the entire study period. Termini were distributed as such (There is an unfortunate absence of data in the Southwest of Greenland, a region considered to be melting the fastest based on gravitational anomaly data):


Datasets were manipulated as such (zoomed to view individual glacier example):

1.Polylines from each dataset were merged together into one individual dataset using MERGE tool.

2.Endpoints from polylines were isolated using FEATURE VERTICES TO POINTS tool and selecting “both start and end vertices” parameter.

3.Endpoints of polylines were connected using “POINTS TO LINES” tool.

4.Outlines were converted to polygons using the “FEATURE TO POLYGON” tool. The polygon dataset is now complete.

5. Map generated from polygon attributes (Area) to represent “Area of ice retreated” across Greenland:

6. The tool “HOTSPOT ANALYSIS” was used to understand spatial trends in glacier retreat, yielding a predictable result:

Approach#1 Results and Critique:

I was not happy with this approach for several reasons. First, it required several parameters to be just right in order for a constructed polygon to accurately represent an area of retreat. The 2 terminus lines must be entirely separate (aka the lines from T1 and T2 can never cross)  otherwise they will create multiple polygons and overestimate the area retreated. As a result, glaciers that did not retreat significantly always received a larger estimate than they should have. Also, glaciers in unique geographic locations (like curved valleys) would not retreat straight back, but perhaps around a corner, causing termini at T1 and T2 to be ~90 degrees from one another. For these glaciers, straight lines between terminus edges would clearly cut off parts of the retreated area. In these scenarios, the estimated area clearly undershot the true area retreated. The screenshots above were cherry-picked from a cooperative glacier, but in reality, about 30-40% of my calculations seemed to significantly overshoot or undershoot what I believed to be the true area lost.

Second, the AREA lost in and of itself seemed to be a substandard parameter to calculate. Wider glaciers, that create longer termini, would always produce larger polygons. On a relative scale, this makes it seem like only the widest glaciers retreated significantly, whereas smaller, thinner glaciers only lost a miniscule amount. This makes the 2 maps above deceiving: Glaciers in the north of Greenland were much larger and had much longer terminus lines, so their retreat polygons came out massive. Along the East and West of Greenland, there appeared to be very many smaller glaciers, but their polygons came out very tiny. Comparing raw data of a few huge retreats and hundreds of small retreats made the hotspot analysis entirely skewed to the north. Now, I could normalize for this if I wanted to, but in exercises 2 and 3 I want to work with all of these glaciers individually when compared to local glacier velocity, so I want to keep their “raw data” as intact as possible. (It is worth noting, glacial geologists usually use the distance of a glacier’s retreat, in conjunction with the movement of the equilibrium line, to estimate temperature changes. They do not care about area)

Finally, I don’t like this approach because the hotspot analysis makes it look like the ONLY location with significantly retreating glaciers is northern Greenland. While this could indeed be the region experiencing the most calving and ablation at the terminus, it does not even remotely resemble Greenland mass anomalies detected with GRACE tandem satellites (shown below). Of course, this brings up the question: how exactly do glaciers retreat? While one obvious and dramatic mechanism is calving which occurs at the terminus, there are very significant mass fluxes from surface water melt and subaerial melt- which don’t necessarily occur at the terminus. The discrepancy between these 2 datasets makes me wonder if (climatically) short term changes in terminus position are dominated by calving and subaqueous frontal melt, but that total mass flux of a glacier (including surface and subaerial melt) takes slightly longer timescales to kick in and be “felt” at the terminus line. In other words: The equilibrium line of a glacier, and thus it’s terminus position, is entirely controlled by glacier mass balance. However, glaciers that experience particularly vigorous calving might appear to retreat faster than their surface melting counterparts. These processes might take longer to be expressed at the terminus line, and might manifest more immediately as glacier THINNING. Perhaps glaciers in the north of Greenland experience more calving, whereas glaciers in central and south Greenland might experience more surface and subaerial melt. It is not a question I can tackle with the current project and datasets, but a very interesting one nonetheless.

ews Details

Approach #2- Average Distance Retreated:

ArcGIS Pro Tools Used:

Feature Vertices to Points, Near, Summary Statistics, Join Field, Hotspot Analysis, Spatial Autocorrelation (Global Moran’s I)


Data was manipulated as such:


1.The tool FEATURE VERTICES TO POINTS was used once again on a single dataset (in this case 2005 rather than 2017). This time, the parameter “All Vertices” was selected, to convert virtually the entire polyline into points.

2. The NEAR tool was used to calculate the closest distance between each individual point on the 2005 line to the 2017 line. The blue hand-drawn lines in the figure above visually represent this process. NOTE- the distance calculated is NOT perpendicular the 2005 line, merely the distance between a given point on that line and the closest point on the 2017 line. I attempted to visualize this concept above as well.

3.The SUMMARY STATISTICS tool was used to generate a table isolating the mean distance of these points for each glacier, as well as the median distance of these points for each glacier. The parameter “Case Field” was set to each glacier’s unique “glacier ID,” to ensure individual values were produced for each glacier, and not each separate point.

4.The JOIN FIELD tool was used to merge the newly created statistics table to a separate dataset containing only point locations of each glacier (created previously)

5. Maps were constructed representing the newly constructed variable: “Mean Distance Retreated” across Greenland (NOTE: a different symbology was used than in the previous method, because graduated symbol size became quite confusing to view):

6. The tool “HOTSPOT ANALYSIS” was used to understand spatial trends in glacier retreat, based on length:

7.The tool SPATIAL AUTOCORRELLATION: GLOBAL MORANS I was used to determine the clustered behavior of glacier distance retreated across the island:

Approach #2 Results and Critique:

I consider this approach far superior to the previous one. The Near tool gives me a distance between termini at very close intervals that can then be averaged with summary statistics, and when going back to manually check each average “distance” it seemed far more accurate than the constructed polygons. In this approach, the map produced by the eye test alone seems to have a more even distribution, not heavily weighting everything to the north. The hotspot analysis picks up more hotspots in different locations spread across the island. While there are clearly still 3 hotspots in the north, there are also 3 in central Greenland. I am surprised a hotspot didn’t appear in NW Greenland, where there is a clear abundance of high retreat values. Moving forward, I think I will use this type of dataset to compare glacier retreat to glacier velocity in Exercise 2 and 3. This process can be replicated relatively simply on various time series throughout the study period, to get annual resolution of terminus retreat.

Does your variable A have different spatial patterns at different scales?

From the eye test, it seems there are some groups of high-retreat glaciers around the hotspots, interspersed with glaciers that are lower retreat, which indicates a somewhat clustered distribution. Furthermore, in both distance and area datasets, high retreat values appear to be clustered in northern Greenland. We can assess further with a Spatial autocorrelation tool (Gobal Moran’s I)

The Moran’s I index, which calculated a default neighborhood search area of 162km, came back as 0.018, which reflects a very, very slight clustering. But with an index value so close to zero, a p value of 0.5 and a Z score of 0.6, this is nowhere near enough to reject the null hypothesis, and indicates the distribution may just be random. This is not an exhaustive analysis of the distribution of these values, just something I performed quickly to try to include last minute in Exercise 1. I hope to perform more spatial autocorrelations and more closely assess the distribution of these values, as well as  their relatedness to glacier velocity, in Exercise 2.

A Century of Oregon Salt Marsh Expansion and Contraction (Part I)


My main goal for this course is to determine the methods needed to quantify salt marsh volumetric change from 1939 to the present within Oregon estuaries. I have historical aerial photographs (HAPs) and sediment cores from five estuaries along the Oregon coast – Nehalem, Netarts, Salmon, Alsea, and Coquille. Ultimately, through a combination of rates of area change measured from the HAPs with rates of vertical accretion measured using excess 210Pb within my sediment cores, I think I will be able to develop a history of salt marsh volumetric change. (See my first blog post for more background the significance of this research.)

Based on my goals outlined in my first blog post, Julia made a number of helpful suggestions for how to best frame my first exercise. First, she suggested that I need to begin by mapping salt marsh horizontal change from the HAPs by creating outlines of my marshes for each year I have photos. Second, Julia suggested that rather than just focusing on net area change, I also investigate portions of the salt marsh that exhibit growth or contraction as these areas might relate to larger changes within the estuary and watershed. Third, Julia suggested I first focus on one estuary to nail down my methods as a proof of concept. I can then apply my method to each individual salt marsh.  

Throughout this blog post I will reference many of the topics we have discussed in class. For instance, I will analyze hot spots of accumulation and contraction through analysis of digitized areas from georeferenced HAPs. I will briefly discuss how I hypothesize that the scale at which I digitize a salt marsh area will also change the overall area. Additionally, though I am not directly using an interpolation tool in my geospatial analysis yet, I will use interpolation between years in my initial analysis of salt marsh area change. Ultimately, I will be characterizing whether the salt marsh is a source or sink of sediment through analysis of the volumetric change.

Specific Questions

  1. What is the best method to measure change in salt marsh area from HAPs?
  2. How have different regions of the salt marsh within Alsea Bay changed in area since 1939?
  3. How do horizontal changes relate to vertical changes?


Step 1: Scan images

HAPs, which are panchromatic and roughly decadal spanning 1939 to 1991, were scanned at the University of Oregon’s Map & Aerial Photography Library. No information was provided for each photo besides the year it was taken (and sometimes the day). This is frustrating because when photographs were taken at high tide, areas of the vegetated salt marsh are inundated and edges are less easily distinguished – especially within tidal creeks. Before bringing these images into ArcGIS Pro, they were cropped to remove borders and converted to TIFF files in Adobe Photoshop.

Step 2: Georeference images   

I played around with stitching the images together in Fiji(Image J), which is a powerful, open-source image processing program that I highly recommend. However, since the HAPs were taken from different angles, the photographs stich together very strangely especially for tall features such as trees, buildings, and bridges.   

All images were georeferenced in ArcGIS Pro using ≥ 10 control points. With an effort to select areas evenly across the photographs, control points were preferentially placed at road intersections; however, creek intersections were also used as control points in portions of the images without roads. Comparison between HAPs indicates that tidal creeks are surprisingly stable throughout the last ~80 years in Alsea Bay. Historical aerial photographs combined with control points were fit using second-order polynomial transformations. The root mean square error (RMSE) was calculated for each georeferenced image; no map had a RMSE >7 m and the mean RMSE was 3 m.

Step 2.5: Try automated image classification

Initially I tried classifying images using the ArcGIS Pro Classification Wizard. To reduce the complexity and size, I first cropped the georeferenced images. Object based supervised classification was used. Trying different degrees of spectral and spatial detail, I then trained the Classification Wizard with examples of forest, vegetated salt marsh, and mudflat/open water. One major issue observable in the classifications was that the vegetated surfaces were sometimes white, causing the software to confuse it with the white produced from light reflected off the water’s surface. Thus, the salt marsh edge is seemingly too complex for supervised image classification when images are only black and white. After receiving these poor results with every adjustment and speaking with colleagues about automated/manual classification, I decided manual digitization is my best option. (A review of the literature also indicates that others must have found that manual digitization is preferable to supervised/unsupervised image classification when analyzing panchromatic photographs of salt marshes.)    

Figure 1. Example of classified salt marsh surface (green area) compared to heads-up digitized salt marsh (blue line). Classification does not capture low marsh surface in gray.

Step 3: Heads-up digitization

Digitized areas were initially limited to regions of the estuary present within the majority of photographs. These areas were further limited to least-disturbed tidal saline wetlands from which vertical sediment accumulation rates were available. I then performed heads-up digitization of the edges between unvegetated mudflat and vegetated marsh using the create polygon tool… Because trees and their shadows obscure land-ward edges, these boundaries were digitized using a combination of PMEP’s (Pacific Marine & Estuarine Fish Habitat Partnership) “West Coast USA Estuarine Biotic Habitat” maps and the georeferenced aerial photography.

Step 4: Observe changes

Preliminary analysis of horizontal changes was observational. I additionally plotted net changes in salt marsh area over time for each major salt marsh complex.   

Figure 2. Digitized Alsea Bay salt marshes. Red is 1939, orange is 1952, yellow is 1959, green is 1972, blue is 1982, and navy is 1991.


In response to question 1, heads-up digitization is much better suited to producing salt marsh outlines than an automated image classification method, such as those available within the ArcGIS Pro Classification Wizard. This result is unfortunate because heads-up digitization is obviously very time consuming and strenuous on your eyes (my right eye started to twitch) and mouse hand. Supervised image classification may allow for an initial distinction between unvegetated mudflat and vegetated marsh surface. Classified images and georeferenced aerial photography could then be referenced in combination to digitize the marsh boundaries.

Figure 3. Changes in salt marsh area based on changes between HAPs and net changes since 1939 for the three islands and the total area.

A preliminary analysis of the digitized layers shows hot spots of salt marsh growth and contraction. For instance, the upstream, east-most salt marsh complex has shown very little change over the last observed record, though over the total record, its area has decreased by ~4%. Comparatively, after first experiencing a ~5% decrease in size in the 1940s, the large, west-most island increased in area by ~10% from 1952 to 1959, then after modest growth in the 1960s to 80s, declined to only ~3% of its original 1939 area by the early 1990s. The salt marsh fringing the NE portion of the bay additionally observed a modest 3% increase in area in the 1950s, but then contracted in size by ~13% in the mid-1960s, expanded again by ~10% and ~5% between the 1970s and 1980s, respectively. This area observed a net growth of ~4% from 1939 to 1991.

The wet phase of the Pacific Decadal Oscillation (PDO) from 1944 to 1977 has been observed to increase peak river flows (e.g., Wheatcroft et al. 2013). Coincidently logging intensified in Lincoln county during the same period (Andrews & Kutura 2005). These events may have contributed to increased area of the large, west-most island in that time frame. The contraction of the salt marsh fringing the NE portion of the bay in the mid-1960s to 70s is unclear but perhaps related to increased wake from boats in that area.    

A preliminary comparison between histories of sediment accumulation and marsh growth indicate very similar patterns. For instance, the sediment core collected from the small island shows very little change in accumulation over the last century. Additionally, the sediment core collected from the large island exhibits a period of rapid accumulation in the 1960s. Vertical accumulation increased in the 1950s in the core collected from the fringing marsh, also similar to its horizontal accumulation record.     

Figure 4. Example comparison between history of discharge and timber harvest, sediment core data (mass accumulation rate – brown, dry bulk density – navy), and horizontal change.

Based on the net changes in area for the three salt marsh complexes of interest and the net in vertical sediment accumulation from three cores collected within each marsh, the volumetric change for the small island, large island, and fringing marsh are -160, 220, 250 m3, respectively. With an estimated dry bulk density of ~0.5 g cm-3, these values equate to ~ -80, ~110, ~125 t of sediment trapped from 1939 to 1991 on the small island, large island, and fringing marsh, respectively. The Alsea Bay salt marshes have thus remained a net sink during this timeframe. I will improve these volumetric estimates by incorporating more sediment core data and also comparing the history of volumetric change with the sediment discharge record for the Alsea River, thereby creating estimates of trapping efficiencies for the estuary.    


I am foremost interested in assessing changes within the high marsh extent of my salt marshes because these are the areas considered in quasi-equilibrium with relative sea level rise. As a bit of background, intertidal areas are typically divided between unvegetated mud flats (inundated most frequently and exposed only at low tide), low marsh (inundated less frequently with high tides), and high marsh (inundated the least frequently with only high higher high tides). However, in Oregon, high marsh and low marsh are both densely vegetated and look very similar in panchromatic aerial photographs. This represents one of the first major downsides to analyzing salt marsh change from HAPs – low marsh and high marsh must be considered together regardless of the research question. Fortunately, Oregon salt marshes typically exhibit very little low marsh area so my estimates are hopefully not too different from what they would be if I could distinguish between the two habitat classes.

Another major issue with analyzing HAPs is that the landward edge between high marsh and forest is difficult to distinguish due to the trees obscuring the edge. Depending on the location of the plane when taking the photographs and the time of day, the edge is either obscured by the trees themselves or by the shadows they cast. Without ground truthing, it is difficult to speculate how much this impacts estimates of the landward edge; fortunately for me, however, Oregon salt marshes typically extend to the base of Coast Range hills and so it is unlikely that there has been much landward migration of salt marshes over the last century. (This process is called coastal squeeze, and it poses a serious threat to long-term survival of Oregon salt marshes under accelerated sea level rise.)    

The method of heads-up digitization, while straightforward but time consuming, seems to be the best suited method for digitizing panchromatic images. Unfortunately, I have not yet figured out the best method of assessing error associated with heads-up digitization. Additionally, I have not assessed the impact of scale on changing area. I have digitized my salt marshes at the smallest spatial scale possible for each photo. I hypothesize that smaller spatial scales would result in decreased volume as more and more tidal creeks are incorporated. The highest quality HAPs are 1939 and 1952. This is something I will have to think more about in the future.

A serious issue in sediment stratigraphy is the inability to easily observe reductions in accumulation, hiatuses, and periods of erosion. Thus comparison between the HAPs and core data is much clearer for periods of increased accumulation, as I have done. Though not much can be done to remedy this issue (think of publication impact factor if I could!), I will continue to acknowledge it.  

Remaining questions and future directions

My immediate questions are:

  1. How should I include a modern-day layer? Just outline the satellite basemap in ArcGIS Pro? (Hand digitization would be more precise than PMEP’s layers.)
  2. How should I measure horizontal change more robustly – i.e. with better statistics?
  3. How should I estimate error associated with heads-up digitization?
  4. How should I be digitizing channels?

Going forward I plan to answer these questions ~ depending on how tricky they prove (simple questions always seem to be the hardest to answer), I plan to focus on at least questions 1, 2, and 3 for my next exercise. To answer question 1, the USGS Digital Shoreline Analysis System (DSAS; Himmelstoss et al. 2018) seems very promising so I will focus on getting my Alsea data into this program. DSAS requires error estimates for each shoreline layer and suggests a number of resources including Ruggiero et al. (2013) that I will investigate.  


Andrews, A., & Kutura, K. Oregon’s timber harvests: 1949 – 2004. (2005). Oregon Department of Forestry. Salem, OR.

Himmelstoss, E.A., Henderson, R.E., Kratzmann, M.G., & Farris, A.S. (2018). Digital Shoreline Analysis System (DSAS) version 5.0 user guide: U.S. Geological Survey Open-File Report 2018–1179, 110 p., ofr20181179.

Ruggiero, P., Kratzmann, M. G., Himmelstoss, E. A., Reid, D., Allan, J., & Kaminsky, G. (2013). National assessment of shoreline change: historical shoreline change along the Pacific Northwest coast. US Geological Survey.

Wheatcroft, R. A., Goñi, M. A., Richardson, K. N., & Borgeld, J. C. (2013). Natural and human impacts on centennial sediment accumulation patterns on the Umpqua River margin, Oregon. Marine Geology, 339, 44-56.

Ex1. Wk2 Lab

What kinds of attraction/repulsion processes might occur in your variable? 

Due to the different soil characteristics of various regions in Oregon, landslides mainly occur in certain specific areas. In the point data, these points represent the place where the landslide occurred. In Exercise 1, these points are attracted to each other in high-risk areas.

Forthermore, point data and landslide susceptibility maps are also associated. For red and blue high-risk areas, they are mutually attractive with point data, and for green low-risk areas, they are mutually exclusive with point data.

What kinds of source/sink processes might occur in your variable? 

After generating the hot spot map, different soil characteristics are the source of the variables. Some areas of the soil are more prone to landslides. In Exercise 2, the spectrum difference of high-risk areas will be analyzed. The high-risk area will generate force might occur landslides.

Does your variable A have different spatial patterns at different scales?

In the original spatial scale, point data are scattered, and after hot spot analysis, point data are clustered.

Gray Whale Foraging and Zooplankton Density

Research Question

The Pacific Coast Feeding Group (PCFG) is a subgroup of the Eastern North Pacific (ENP) population of gray whales (Scordino et al. 2018). The ENP, a population of 24,000-26,000 individuals, migrates along the U.S. west coast from breeding grounds in the lagoons of Baja California, Mexico to feeding grounds in the Bering Sea (Omura 1988). The PCFG, currently estimated at 264 individuals (Calambokidis et al. 2017), stray from this norm and do not complete the full migration, instead choosing to spend their summer feeding months along the Pacific Northwest coast (Scordino et al. 2011). Since gray whales as a species already exhibit specialization (by being the only baleen whale that benthically forages) and since the PCFG display a second tier of specialization by not using the Bering Sea feeding grounds, it seems plausible that individuals within the PCFG might have individual foraging specializations and preferences. Therefore, my research aims to investigate whether individual gray whales in Port Orford exhibit individual foraging specializations. Individual foraging specializations can occur in a number of different ways including habitat type (rocky reef vs sand/soft sediment), distance to kelp, time and distance of foraging bouts, and prey type and density. For this class, my research question is whether prey quantity and/or community drives whale foraging.


Prey data

The prey data has been obtained through zooplankton tow net samples from a research kayak in the summers of 2016, 2017, 2018 and 2019. Kayak sampling effort varies widely between the three years due to weather creating unsafe conditions for the team to collect samples. These samples have been sorted and enumerated to the zooplankton species level so that for each day when a prey sample was collected, we have known absolute abundances of prey species communities at each sampling station. Additionally, a novel GoPro method has been used to quantify relative zooplankton density at the same sampling stations. 

Whale data

The whale data is in the form of theodolite tracklines of gray whales that used the Port Orford study area during the summers of 2016, 2017, 2018 and 2019. Since whale tracking occurs at the same sites as prey sampling, we are able to map the prey community present at a particular location that whales forage at. The tracklines occur on a very fine spatial resolution as the study area is approximately 2.5 km in diameter, though some of the tracklines extend out to approximately 8 km offshore. Furthermore, as whales forage in the area, photographs are taken of each individual in order to match the trackline with a particular individual. This way, potential individual specializations may be detected if there are repeat tracklines of an individual. These tracklines have been analyzed using Residence in Space and Time, which assigns a behavior to each spatial point. The behaviors are broken down into three categories foraging, searching, travelling. Each of these behaviors is assigned a residual value (-1, 1, 0, respectively).


Gray whales will prefer areas with the densest prey, regardless of the community composition of prey found at different locations.


Daily interpolation layers will be created for prey density between the two sites, Mill Rocks and Tichenor Cove. These interpolations will then also be weighted according to the percentage that mysids and amphipods (the two main taxonomic prey groups) made up the community at each location. This will result in 3 layers per day: an overall interpolated prey density layer, an interpolated layer weighted by mysid abundance, and an interpolated layer weighted by amphipod abundance. Once these layers have been created, the RST analyzed whale tracks will be overlaid onto them. Statistical analyses (likely linear mixed models) will be used to determine whether overall prey density or specific prey communities drive gray whale foraging by relating the interpolated layers with the behavior residual values.

Expected Outcomes

There will be many interpolated layers of prey density and weighted prey density. Furthermore, there could be some plots of overlaid tracklines showing where whales prefer to forage and search vs travel.


This spatial problem is important to science since genetic evidence suggests that there are significant differences in mtDNA between the ENP and PCFG (Frasier et al. 2011; Lang et al. 2014), and therefore it has been recommended that the PCFG should be recognized as being demographically independent. In the face of a proposed resumption of the Makah gray whale hunt as well as increased anthropogenic coastal use, there is a strong need to better understand the distribution and foraging ecology of the PCFG. This subgroup has an important economic value to many coastal PNW towns as many tourists are interested in seeing the gray whales. Therefore, understanding what drives their distribution and foraging habits will allow us to properly manage the areas where they prefer to forage. 


I have novice/working knowledge of Arc-Info and Modelbuilder. I have never used Python before. I am proficient in R and image processing.

1a. What is A?

A is the behavior displayed by a gray whale at a certain location.

1b. What is B?

B is the density of prey as well as the weighted density of mysids and amphipods.

1c. What is the relationship you want to test between B and A?

Whether prey quantity and/or species drives whale foraging at a certain location.

1d. Why or how does B case/influence A? What is your hypothetical explanation for why or how B causes A (this is mechanism C)?

Gray whales, like most baleen whales, have a breeding season which they spend in warm waters and a feeding season which is spent in colder, more productive waters. For a subgroup of the Eastern North Pacific gray whale population, this summer feeding season is spent along the NW U.S. coastline (northern California, Oregon, Washington and southern British Columbia). The activities undertaken during each season is so distinct that no feeding is undertaken during the breeding season, and vice versa. As such, gray whales must regain 11-29% of critical body mass during the feeding season in order to prepare themselves for the breeding season during which they do not feed. Therefore, it seems logical to assume that their distributions and movements should be entirely dictated by where their zooplankton prey is and also, where it is the most abundant or most dense since their sole purpose during the summer is to feed.

1e. Now, write your question in this form: “How is [the spatial pattern of] A related to [the spatial pattern of] B via mechanism C?”

How is gray whale behavior (specifically foraging) related to the spatial pattern of prey density via optimal foraging theory.

Literature Cited

Calambokidis, J. C., Laake, J. L. and A. Pérez. 2017. Updated analysis of abundance and population structure of seasonal gray whales in the Pacific Northwest, 1996-2015. Draft Document for EIS. 

Frasier, T. R., Koroscil, S. M., White, B. N. and J. D. Darling. 2011. Assessment of population substructure in relation to summer feeding ground use in the eastern North Pacific gray whale. Endangered Species Research 14:39-48.

Lang, A. R., Calambokidis, J. C., Scordino, J., Pease, V. L., Klimek, A., Burkanov, V. N., Gearin, P., Litovka, D. I., Robertson, K. M., Mate, B. R., Jacobsen, J. K. and B. L. Taylor. 2014. Assessment of genetic structure among eastern North Pacific gray whales on their feeding grounds. Marine Mammal Science 30(4):1473-1493.

Omura, H. 1988. Distribution and migration of the Western Pacific stock of the gray whale. The Scientific Reports of the Whales Research Institute 39:1-9.

Scordino, J., Bickham, J., Brandon, J. and A. Akmajian. 2011. What is the PCFG? A review of available information. Paper SC/63/AWMP1 submitted to the International Whaling Commission Scientific Committee.

Scordino, J., Weller, D., Reeves, R., Burnham, R., Allyn, L., Goddard-Codding, C., Brandon, J., Willoughby, A., Lui, A., Lang, A., Mate, B., Akmajian, A., Szaniszlo, W. and L. Irvine. 2018. Report of gray whale implementation review coordination call on 5 December 2018. 

My Spatial Problem _ An analysis of health patterns in Pakistan


Agricultural crop residue burning is a common practice among farmers in many developing countries including Pakistan. It is used to clear the fields of stalks left behind after the use of Combines to cut the harvest. For most farmers, it serves as a low-cost alternative to prepare fields for planting because the fires remove most post-harvest vegetative material and reduce the risk of pest and disease. However, this practice also contributes to air pollution by increasing emissions of fine particulate matter (PM2.5). Smoke from these fires can trigger and exacerbate respiratory diseases especially among children (Chakrabarti et al., 2019; Zhuang et al., 2018; Awasthi et al., 2010; Balmes, 2010). PM2.5 exposure has been linked to increased hospitalizations and asthma related emergency visits as well as reduced life expectancy (Chen et al., 2017; Adar et al., 2013).

Research question: Is there any association between individual level health outcomes and exposure to the pollution source i.e. burning of agricultural crop residue.

Data: There are two main sources of data. The first is a nationwide household survey data for Pakistan for the three years 2011, 2013, 2014. Around 8000 households were repeatedly surveyed across the three years and location data in the form of latitude and longitude is available.

Individual responses reported within the health module are used to measure health outcomes – specifically type of illness (if any) experienced in the last 2 months.

The data on agricultural residue crop fires is from NASA’s Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite data on active fires. Daily fire (point) data for Pakistan is available and a global landcover raster can be used to see which of the fires are occurring over agricultural land.

Land cover raster data from the European Space/Climate Change Initiative (ESA/CCI) with a 300m spatial resolution was used to identify fires occurring on cropped land.

Hypotheses: I hypothesize that respiratory illnesses are spatially correlated and associated with higher frequency of exposure to crop fires.

Approaches: I would like to use clustering / hotspot analysis to assess possible spatial correlation, butmy concern is that for the entire country it may not be useful since the exhibited pattern would mirror population distribution and not provide useful results.

I hope to use buffer analysis to measure proximity to fires. To empirically test the association between crop fires and health outcomes I would use the following regression estimation:

Hit = β1(Exposure to crop fires)it + βZit + εit

where Hit = 1 if the individual reported experiencing respiratory illness in the 2 months prior to the interview and 0 otherwise. Exposure is the number of fires within a buffer radius over the past 3 months and Zi represent a vector of individual and household level controls. εit is the idiosyncratic error term. I can use three estimations based on multiple buffer distances of 5km, 10km and 15km.

However, my concern is again that number of fires may not be a meaningful assessment of exposure. At present, I do not have another way to measure impact of agricultural crop residue burning.

Expected Outcome: I expect that health outcomes are spatially correlated and in a way that can be explained by patterns of agricultural crop fires.

Significance: The proposed analysis would examine individual level health outcomes in regard to what is a prominent source of pollution in South Asia – Agricultural crop residue burning. Political inaction on this issue can to some degree, be tied to the lack of rigorous evidence on the health effects of this practice. While this study would only look at the immediate, short-term impact of exposure, any significant results would show that there is in fact a causal association of burning crop residue on the health of individuals living within some proximity of the fires. Such results may contribute to policy emphasis on addressing a potential environmental concern and to the formulation of agricultural policy that may help farmers to switch to alternative harvesting practices.

Level of Preparation: ArcMap – working knowledge; R – between novice and working knowledge; Python – no experience.

I am proficient in Stata and have been using that for data manipulation and basic spatial commands (distance calculations).

Concerns:  Most studies have used a fire ‘event’ which makes it easier to assess impact across time and areas (control and treatment). Agricultural crop fires are a continuous phenomenon across the year, though there is a spike in post-harvest period (Figure below). I am concerned about the time dimension of my data.


Adar, S. D., Sheppard, L., Vedal, S. et al. (2013). Fine particulate air pollution and the progression of carotid intima-medial thickness: a prospective cohort study from multi-ethnic study of atherosclerosis and air pollution. Plos Medicine. Vol 10.

 Awasthi, A., Singh, N., Mittal, S., Gupta, P. .K, Agarwal, R. (2010). Effects of agriculture crop residue burning on children and young on PFTs in North West India. Science of the Total Environment. Vol 408: 4440 – 45.

Balmes, J. R. (2010). When Smoke gets into your lungs. Proceedings of the American Thoracic Society. Vol 7(2): 98 – 101.

Chakrabarti, S., Khan, M. T., Kishore, A., Roy, D., Scott, S. P. (2019). Risk of acute respiratory infection from crop burning in India: estimating disease burden and economic welfare from satellite and national health survet data from 250,000 persons. International Journal of Epidemiology. pp 1113-1124.

 Chen, L., Li, C. & Ristovski, Z et al. (2017). A review of biomass burning and impacts on air quality, health and climate in China. Science of the Total Environment. Vol 579.

Zhuan, Y., Chen, D., Li, R. et al. (2018). Understanding the influence of crop residue burning on PM2.5 and PM10 concentrations in China from 2013 to 2017 using MODIS data. International Journal Environmental Research and Public Health. Vol (15).

Unmanned aerial vehicle (UAV)-based photogrammetric approach to delineate tree seedlings

A description of the research question that you are exploring.

Use of unmanned aerial vehicle (UAV) platforms have been rapidly expanded in forestry remote sensing applications, especially from stand-level to an individual tree level measurement (i.e., precision forestry, forest management planning, biomass estimation and modeling forest growth) (Koch et al., 2006). Among those applications, acquisition of three-dimensional (3D) structural data from high-resolution imagery is a novel and powerful approach available with advances in UAV technology (Puliti et al., 2015; Zarco-Tejada et al., 2014). Photogrammetric derived tree canopy/crown information that can be used to detect the individual trees and specifically to monitor tree structure information (St-Onge, 1999; Jozkow et al., 2016).

In this study, I am interested in observing how the tree crown completeness/voxel size varies as a function of the number of points that arranged in the 3D point cloud. Since voxel size is important for visualizing the tree architecture, the number of points per unit volume (voxel size) considers as an essential measurement in photogrammetric analysis, especially for mapping the 3D structure of trees or determining the desired image quality of tree architecture (Fig. 1). Estimation of an optimal number of points or voxels required to delineate tree structure is an important parameter in terms of time and cost. In this study, I seek to determine the threshold value for voxel size ( or point cloud density) associated with crown completeness/desired image quality that can explain the architecture of tree seedlings (Fig. 2(a)).

Figure 1: Changing of tree architecture or the crown description with voxel size. (source: Hess et al., 2018).

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

For this study, I will use a hexacopter unmanned aircraft system (UAS) equipped with a high-resolution multispectral camera with five channels, including red, blue, green, near-infrared, and red edge to acquire the imagery from the study area. The spatial resolution of the multispectral sensor is 2.5 cm, and imagery collection for this study will be based on one-day data collection (temporal resolution). The extent of this study is approximately four acres and located in Benton County, Oregon, USA. The initial data (raw images) collected from UAV flights will be processed using Agisoft Metashape. Additionally, I will produce a 3-D point cloud using structure from motion algorithm in Agisoft Metashape. These products will be utilized to perform the spatial and statistical analyses using R studio, ArcGIS Pro, AgiSoft Metashape etc.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

In this study, I am interested in observing the behavior of voxel size variation as a function of point cloud density and how the voxel size affects the desired image quality (Fig.2 (b)). With this intention, I am expected to have a relationship between voxel size and desired image quality (of tree architecture) (Fig.2(a)) where increasing the voxel size is going to change the desired image quality (tree architecture) (Fig.1).

Therefore, I hypothesize there is a threshold value for tree crown completeness/desired image quality of tree architecture at a particular voxel size. I expect change (decrease) in the desired image quality of tree architecture/crown description beyond this threshold value.

Figure 2: (a) Schematic diagram showing desired image quality (crown completeness) with respect to the voxel size for tree seedling (*the shape of this distribution may be changed). (b) Schematic diagram showing the overview of the voxel model. This represents the relationship of point cloud data to the voxel model (Source: Fujiwara et al., 2017).

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

Orthomosiac images can be used to detect the centroids and the canopy cover of each tree.  I will perform a separate supervised classification to identify the seedlings and other landcover properties. (Alternatively, I will perform binary classification to identify the seedlings (1) and other landcover properties (0)). Approximately 70% of these pixels will be used as the training data set, while 30% of the data set will be used as validation data. Random Forest (RF) and Support Vector Machine (SVM) classifiers will be used for classification. Further, I will produce confusion matrices to evaluate the model performance of RF and SVM classifiers and identify individual seedlings with their canopy cover in the initial stage. Additionally, the location of each centroid will be assessed.

Further, delineation of the individual tree seedling can be done utilizing the point cloud data.  I will perform a spatial arrangement of points as a function of data processing (i.e., by changing the parameters of generating point clouds) and define the crown completeness with respect to the number of points and voxel size (Fig.3).

Figure 3: Voxel resolution and point cloud binning process for a square field plot clipped from the ALS data. (Source: Almeida et al., 2019).

Source for GIF link: Moreover, I am interested in studying the vitality of tree seedlings based on the point cloud map together with some copula modeling with multispectral data to evaluate the seedling structure/architecture.

Expected outcome: what do you want to produce — maps? statistical relationships?

 From this study, I would like to identify the spatial distribution seedlings and their centroid locations, utilizing UAV remote sensing techniques. I will produce maps of the spatial distribution of seedlings with their canopy cover. Additionally, I am expected to define a threshold value for voxel size to determine the seedling architecture/crown description. Finally, I will make additional maps showing the vitality map of seedling using the results obtained from point cloud data together with copula modeling.

How is your spatial problem important to science? to resource managers?

In sustainable forest management practices, estimation of macroscopic information/tree-level attributes including tree counts, locations of individual trees, tree crown architecture, and tree heights are essential for monitoring forest regeneration (Strigul, 2012; Mohan et al., 2017). Especially, assessing the vitality of seedlings is an important task for forest managers. With this proposed method, the characterization of seedlings can be done using remote sensing data in cost-effectively and accurately compared to conventional field surveys, which is expensive and time-consuming. The other importance of this study is the applicability of the proposed method to collect periodic data for individual tree seedling. This helps to understand the growth and vitality of seedlings with respect to time, which is another essential attribute for forest management practices.

Level of preparation:

(a) Arc-Info: I have experience in ArcMap gained from previous classes and some research projects.

(b) Modelbuilder and/or GIS programming in Python, No previous experience.

(c) R: Have substantial experience in terms of statistical analysis with limited knowledge in spatial analysis.

(d) image processing, I have gained some experience with Google Earth Engine and ENVI Abased on my previous classes.

(e) Other relevant software: Agisoft Metashape


Almeida, D. R. A. D., Stark, S. C., Shao, G., Schietti, J., Nelson, B. W., Silva, C. A., … & Brancalion, P. H. S. (2019). Optimizing the remote detection of tropical rainforest structure with airborne lidar: Leaf area profile sensitivity to pulse density and spatial sampling. Remote Sensing, 11(1), 92.

Fujiwara, T., Akatsuka, S., Kaneko, R., & Takagi, M. (2017). Construction Method of Voxel Model and the Application for Agro-Forestry. Internet Journal of Society for Social Management Systems, Vol. sms17.

Hess, C., Härdtle, W., Kunz, M., Fichtner, A., & von Oheimb, G. (2018). A high‐resolution approach for the spatiotemporal analysis of forest canopy space using terrestrial laser scanning data. Ecology and evolution, 8(13), 6800-6811.

Jozkow, G., Totha, C., & Grejner-Brzezinska, D. (2016). UAS TOPOGRAPHIC MAPPING WITH VELODYNE LiDAR SENSOR. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, 3(1).

Koch, B., Heyder, U., & Weinacker, H. (2006). Detection of individual tree crowns in airborne lidar data. Photogrammetric Engineering & Remote Sensing, 72(4), 357-363.

Mohan, M., Silva, C. A., Klauberg, C., Jat, P., Catts, G., Cardil, A., … & Dia, M. (2017). Individual tree detection from unmanned aerial vehicle (UAV) derived canopy height model in an open canopy mixed conifer forest. Forests, 8(9), 340.

Puliti, S., Ørka, H.O., Gobakken, T. and Næsset, E., 2015. Inventory of small forest areas using an unmanned aerial system. Remote Sensing, 7(8), pp.9632-9654.

St-Onge, B. A. (1999). Estimating individual tree heights of the boreal forest using airborne laser altimetry and digital videography. International Archives of Photogrammetry and Remote Sensing, 32(part 3), W14.

Strigul, N. (2012). Individual-based models and scaling methods for ecological forestry: implications of tree phenotypic plasticity. Sustainable forest management, 359-384.

Zarco-Tejada, P. J., Diaz-Varela, R., Angileri, V., & Loudjani, P. (2014). Tree height quantification using very high resolution imagery acquired from an unmanned aerial vehicle (UAV) and automatic 3D photo-reconstruction methods. European journal of agronomy, 55, 89-99.

Spatial Problem Blog Post by Hongyu Lu

  1. A description of the research question that you are exploring.

Landslides often occur in certain areas, and some areas have national highways or frequently used highways. Once a landslide occurs, travelers will spend a lot of time detouring, but this is very inefficient. In my research, I hope to find the most sensitive area by analyzing the spatial pattern of landslide susceptibility map and analyze the heat map. Mark the dangerous area and make a high-risk area map.

  • A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

The dataset I will analyze is Oregon State landslides map and landslide susceptibility map. The dataset comes from Oregon Department of Geology and Mineral Industries, the spatial pattern of the landslides susceptibility area should be dispersed, so these the further analysis will be done in this spatial pattern to generate high-risk area. High-risk landslide area map should be clustered, the high-risk landslide area will be shown clearly.

  • Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I expect my pattern could has hotspot. First of all, the spatial analysis should be generated, hexagon hotspot map will be shown, by this hexagon hotspot map find highest landslide occur area.

  • Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to learn the classification analysis by spectrum difference. Also I want to learn how to generate the most efficient route if one main route is closed.

  • Expected outcome: what do you want to produce — maps? statistical relationships? other?

The Two map I want to produce, the first is hotspot map, second is Spectral classification map.

  • Significance.How is your spatial problem important to science? to resource managers?

I think landslide is very serious hazard in the world, not only affect traffic, but also cause casualties and substantial economic losses, if the road or land could be strengthen, or the highway could avoid those area, it will help a lot. So, it is important to do landslide analysis.

  • Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

ArcInfo: novice

Modelbuilder/Python: novice

R: novice

Image processing: Very basic working knowledge

I also have very basic Work knowledge of Google Earth Engine and ArcGIS Pro.

A. My spatial problem: Habitat-induced GPS Fix Success Bias

My master’s thesis is studying habitat quality of black-tailed deer in western Oregon, and the link between habitat selection and survival rates.

Although there are lots of spatial and temporal relationships I want to analyze, for this blog post I will focus on the question: what environmental characteristics (topography, vegetation) are correlated with missed fixes in my study area? The extent of this problem is important to identify and account for in habitat selection studies.

An adult female Columbian black-tailed deer freshly captured and released, sporting a GPS collar programmed to collect a location every 4 hours.
Deer were captured in 4 Wildlife Management Units (WMUs) in western Oregon: the Alsea, Trask, Indigo and Dixon. Fix success test sites are located in the Indigo and Dixon WMUs, and sites reflect variation topography and vegetation across the entire study area. Fix success testing is still in progress at the time of this post, and the map does not include all of the test sites to be used in the final analysis.

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

Data downloaded from Lotek 3300S GPS collars include coordinates, date and time and are obtained from GPS collars that were deployed on black-tailed deer in western Oregon in 4 Wildlife Management Units (WMUs; Alsea, Trask, Indigo, Dixon). Collars stay on the deer for up to 72 weeks, but may be shorter if there is a collar malfunction or if the deer dies. Deer capture effort lasted from 2012-2017, and I will be focusing on winter season only (January 1 – March 31).

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

My response variable is fix success rate (number of acquired fixes divided by number of total fix attempts) and my predictor variables are environmental characteristics.

Fix success will likely be higher where there is less obstruction to satellites. I expect that fix success will be negatively related to steep terrain, dense canopy cover, larger trees and forested land cover types. I predict that there will be a positive association between fix success and flat terrain, low canopy cover, small trees, and non-forested land covers.

Examples of predicted relationships of fix success rate with canopy cover and slope.

I am testing the magnitude of GPS bias (location error and missed fixes) by deploying GPS collars for 1 week at test sites in my study area. At the moment we are still collecting data and have one week left to complete the tests.

Test sites have known environmental characteristics, and were specifically chosen in a systematic sampling design to capture variation in topography and vegetation across the study area (n=50). Nine collars were rotated between sites over the course of 6 weeks. Measurements are taken on the ground for each site (technicians were told to focus on a 30x30m area), and variables include: canopy cover, slope, aspect, dominant land cover type, dominant tree species and their associated diameter at breast height, tree height, and whether the stand is even-aged or not. Other variables for that are remotely sensed include topographic position index (10x10m) and elevation (10x10m). Collars were programmed to reflect the same fix schedule as the collars deployed on deer: 1 fix every 4 hours.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I will determine probability of fix success (PFIX) for each stationary collar test site using logistic regression and perform model selection using AICc. I will utilize the top model to predict the probability of fix success for each pixel on the landscape in my study area using remotely sensed data (GNN 30x30m; Statewide Habitat Layer 30x30m, DEM 10x10m). This “predicted fix success map” will be used as a correction factor in habitat selection analyses (also a logistic regression analysis). Each deer’s GPS location will be weighted by the inverse probability of fix success (1/PFIX) associated with that pixel’s fix success.

Expected outcome: what do you want to produce — maps? statistical relationships? other?

I want to produce a map of predicted fix success for the study area.

Significance. How is your spatial problem important to science? to resource managers?

Although GPS technology have opened the door for researchers to gain massive amounts of information about their study system, it is not perfect. Habitat-induced bias caused by obstruction to satellites produce location error and missed fixes and can bias habitat selection studies, and ultimately researchers and managers can misinterpret what habitat variables are actually being selected or avoided.

Columbian black-tailed deer population trends are declining across their range and are relatively understudied compared to other ungulate species. As an important prey item, game species, and charismatic megafauna, Oregon state wildlife managers are interested in determining causes behind the apparent population decline.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

Arc-Info: ArcMap- working knowledge.

Modelbuilder and/or GIS programming in Python: none

R: Working knowledge

image processing: none

other relevant software: FRAGSTATS- very little. I’ve attempted using package Landscapemetrics program FRAGTATS on my own. But not very successfully.

Exploring Patterns of Past Human Behavior through the Use of 3-Dimensional Models

­Research Question

Human behavior in North America can easily be identified through the lens and in the voice of early Europeans who colonized and claimed rights over every living species on this “new found land” even fellow human species. Through this lens, the stories and lives of early Indigenous communities in the Americas have been silenced and, in the void, we can read about every “discovery” that early settlers made across this “untouched” land. For the first time, there is a chance to intensively study an area in North America that contains the oldest evidence of human activities in the state of Idaho; roughly 16,000 years old (Davis et al 2019).

My thesis research seeks to better explain the human activities at this archaeological site in order to shed more light on how early humans interacted within their landscape through time. This will create a space for discussions on how Indigenous early settlers used their landscape for over 15,000 years prior to when European “discoveries” were made. There must be something truly unique about this area in Idaho for people to keep coming back and utilizing this landscape. Humans are, after all, animals of patterns and consistency; this is what my project in this class will be focused on.

How is the spatial distribution and patterning of artifacts related to human behavior and natural transforms at one point in time and through time at this archaeological site?

Description of Dataset

The dataset that I am working with is an artifact assemblage of 327 artifacts that have been recorded with x-y-z-coordinates in the constraints of a 10-meter x 5-meter x 1-meter excavation interval. The temporal setting has been calculated through charcoal remains of two hearths that date to around 12,000 years ago. Everything else must be in relative setting to this data. This is only a partial segment of the archaeological site, but my hope is to create a standardized procedure that can be used for any number of artifacts within any constraint and at any location.


Hypothesis 1. The artifacts will exhibit significant clustering in 2-Dimensional and 3-Dimensional perspectives.

This pattern is commonly produced through human use and re-use of specific areas across the site. Utilization of space by humans leaves evidence of use in the form of features or artifacts such as hearths and stone tools.

Hypothesis 2. The clustered areas will show significant differences in artifact type (human behavior) both across the site horizontally and vertically.

  1. Differences in horizontal dispersion shows diverse site use during one point in time.
  2. Similarities in horizontal dispersion may present unsystematic variation in my dataset.
  3. Differences in vertical dispersion shows diverse site use through multiple points in time.
  4. Similarities in vertical dispersion shows similar site use through multiple points in time.

A pattern such as this is caused by segregated use across the site at any one point in time. For example, it would not make logical sense to produce stone tools near and area that is used for processing food in fear of contamination. Therefore, a processing area should be shown by specific stone tools, bones, and very few debris from stone tool production. On the other hand, stone tool production should contain a relatively high amount of stone debris with little evidence for food preparation.

Hypothesis 3. Using observational statistics, the physical distances between clusters and within the clusters will represent how human behavioral practices were employed at the site and how space was utilized by past people.

Hypothesis 2 addresses the similarity of artifact types within each cluster horizontally and vertically; this approach uses feature space. Hypothesis 3 seeks to address the physical spatial relationships between the clusters as separate entities and between the artifacts within each of the clusters. My thinking in this regard is: as people use space in their environment, highly organized clusters may indicate an area of primary refuse (planned use of the environment), whereas, less organized clusters may represent an area of low frequency use (not-planned use of the environment). Furthermore, the physical space between clusters may indicate how past people perceived their environments and the choices made to organize it.


I have begun to utilize spatial clustering tests within the dataset that have available for this project and study. Among these tests, I have explored Ripley’s K, L-function, and G-function. For the purpose of calculating accurate clusters in an archaeological setting, I have also explore using k-means clustering and a density-based clustering algorithm. Both have their advantages and disadvantages for my specific aims (e.g. Baxter 2015, Ducke 2015). For the purpose of this class, I wish to expand on the knowledge and understanding of these functions as well as an increased understanding for general statistical methods and procedures. My hope is to identify specific clusters within my dataset and then statistically correlate each artifact within that cluster with each other and then correlate that cluster with the other clusters identified across the site horizontally and vertically.

Expected Outcome

As an outcome for this project, I expect to arrive at many statistical relationships, but I wish to emphasize the best ways to visualize these relationships. I seek this type of visualization because, unlike many other studies, one of the most influential factors for these relationships is human behavior and we all know how hard it can be to predict human behavior. Thus, for this reason, I want to express my results in a visual way in order to take the numbers and algorithms out of the data because computations will never fully and accurately predict human behavior or correlation. It is important to recognize the potential ethical problems with representationalism given the strict computational approach I am seeking. Therefore, to put computed numbers and a purely human visualization of data on a more even playing field, I seek both statistical relationships as well as 2-Dimensional and 3-Dimensional mappings of the data and corresponding relationships.


I listed the global significance within the ‘Research Question’ section of this post; that is, the examination of human behavior within an area that contains the oldest evidence for human occupancy in North America. In terms of this course, the significance of my proposed procedure is found within the emerging theme of 3-Dimensional representations in the archaeological science. Space has always been on the mind of archaeologists but only recently has there been the capabilities for a push in the direction of 3-Dimensional and even 4-Dimensional understandings of human behavior. This project will not only solidify an understanding in the perception of N-Dimensional space, but it will help to pave a path towards a more standardized approach in identifying past human behavior through time and space. Additionally, the way in which archaeological sites are excavated in the future will be forever changed based upon the intricacy that is required for producing the models that I seek to examine.


I would say that I am proficient at using both ArcGIS and ArcScene. I have used these two synchronously for the past couple years within an academic setting for the production of predictive maps, simulations, and a plethora of other academic inquiries. Additionally, I have some experience playing around with LiDAR data within these systems. I have only used model builder within ArcGIS a couple times and would say I am a novice at this function as well as GIS programming in python. I have coding experience in java and R but have very little in python. I would say that I am proficient at utilizing the tools, understanding, and implementing ideas within R, though the code may not be as efficient as it could be. I do not have much image processing experience or other software systems that could be of use during this project.


Baxter, M. J. “Spatial k-means clustering in archaeology–variations on a theme.” Academia (2015).

Davis, Loren G., David B. Madsen, Lorena Becerra-Valdivia, Thomas Higham, David A. Sisson, Sarah M. Skinner, Daniel Stueber et al. “Late Upper Paleolithic occupation at Cooper’s Ferry, Idaho, USA, ~ 16,000 years ago.” Science 365, no. 6456 (2019): 891-897.

Ducke, Benjamin. “Spatial cluster detection in archaeology: Current theory and practice.” Mathematics and Archaeology (2015): 352-368.

Matt Barker Spatial Problem

Description of research question

How is the density of woody debris related to window size via resolution? I will have window sizes at various spatial scales, i.e. 1x1m, 3x3m, etc., and I will manipulate the resolution of my associated classified raster (native resolution is 0.0588 m).

Description of dataset

We acquired multispectral imagery from three UAS flights on 9/23/2019 between approximately 12:00 to 13:45 PDT. Imagery is also available from August 2019 flight to conduct potential temporal analysis. Resolution (EO bands) is 5.9 cm/pix. Phase I of South Fork McKenzie Floodplain Enhancement Project is approximately 150 acres. I have produced a shapefile of classified woody debris using a random forest supervised classification (kappa ~ 0.75)


I hypothesize wood density will decrease with coarser resolution because I will lose detail of smaller diameter wood. 

  1. I expect the spatial pattern of wood density to be clustered. There will be areas of moving water that have relatively low wood density, and still areas where wood will be deposited with high density.
  2. I anticipate this clustered pattern of wood density due to fluvial patterns of the river.


I would like to either develop a tool in ArcGIS Pro or write a script in R that adjusts resolution of the raster and sliding window. In this way, I will be able to simplify wood density outputs by only adjusting resolution and window size.


I want to produce a map of wood density throughout the South Fork McKenzie site at various resolutions. Additionally, I would like to analyze spatial statistics of woody debris distribution.


Woody debris is a critical component of forest ecosystems and essential for wildlife habitat, especially for fish (Howson et al. 2012). Current methods for detecting and quantifying woody debris rely on ground surveys that are expensive, time-consuming, and potentially dangerous when working around moving water. Development of remote sensing methods using relatively low-cost unmanned aircraft promises a safer, more efficient alternative compared to current ground surveys.

Level of preparation

  1. ArcInfo – No experience
  2. Modelbuilder – novice: some experience in GEOG 561

Python – working knowledge: I took the python GIS programming course (GEOG 562), but I am a little rusty.

  • R – Proficient: This is the software I use most frequently for statistical analysis and have done some spatial analysis with it as well.
  • Trimble eCognition – Novice: I have some experience, but I may have difficulty accessing software due to COVID-19 outbreak.
  • Agisoft Metashape – Working knowledge: I use this software frequently to produce othomosaics and point clouds from UAS imagery

Howson, T.J., Robson, B.J., Matthews, T.G., and Mitchell, B.D. 2012. Size and quantity of woody debris affects fish assemblages in a sediment-disturbed lowland river. Ecological Engineering 40: 144-152. doi:

Dust and Marine Productivity

1.Research Description

My current research is focused on accessing the source of the Fe on the North Atlantic Ocean by using HYSPLIT back trajectory modeling. For this class, I am going to focus on my research about the relationship of dust deposition to marine productivity in the ocean. The dust source will be derived from HYSPLIT back trajectory modeling from NOAA and the marine productivity will be accessed from BATS (Bermuda Atlantic Time-series Study) discrete bottle data in Bermuda Ocean by using the Particulate Organic Phosphorous (POP) and PP (Primary Production) component.

The challenge is the time series and the depth of the dataset, three of these datasets have unique time interval, and for some data, there is no correlation between depth and the value of POP and PP where it is expected that less than 200 meters the POP and PP will be affected by the dust deposition.  

Fig 1. The spatial pattern of POP and PP
Fig 1b. Depth pattern of POP and PP

From fig 1a and 1b, we can see the spatial pattern of the POP and PP location, hence this study will focus on that hotspot location to generate the hysplit back trajectory data.

2. Dataset

1.  POP and PP data from BATS bottle.

This dataset is derived from BATS bottle data, the temporal resolution is daily, and only the appropriate data is used for this analysis. It is an in-situ data. Figure 2 describes the availability of data based on the date of the year.

Figure 2. Date of year POP and PP Data in Bermuda Ocean 2012-2016

2. Hysplit is vector data, that will show us the source of the dust that affects the marine productivity in Bermuda Ocean. The spatial resolution oh hysplit data is 500-meter AGL. The back trajectory will be generated based on the cluster coordinate on fig 1 and date on fig 2. Temporal resolution is every hour.

Fig 3. Hysplit back trajectory data with end point on May 29, 2015

3. Hypotheses

1. The spatial pattern of POP and PP is clustered or has a hotspot, and the abundance of marine productivity is starting from spring until fall.

2. Atmospheric transport and deposition dust-associated iron (Fe), causing the Fe and Pi (phosphorus) sufficient and limitation, thus it will affect the marine ecosystem (Letelier et al, 2019). The near-crustal dust from organic dust or Sahara dessert is proven insoluble and has a less significant effect on North Atlantic Ocean productivity compare to the anthropogenic dust from north America which soluble (Conway et al, 2019).   The general hypothesis of this study is the hysplit back trajectory will show that the marine productivity in the Bermuda ocean is more likely derived from anthropogenic dust in north America.

4. Methodology Approach

1. Eliminate missing and bad data of POP and PP by using Matlab; 2. Time series analysis to see the temporal and spatial pattern ; 3. Clip the hysplit data by the result number 2; 4. Analyst the source of dust that affect marine productivity by regression; 5. Seasonal time series analysis using Matlab. 

5. Outcome

The outcome of this study is the map of dust trajectory and the evaluation of the seasonal dynamic of marine productivity in the Bermuda ocean.

6. The significance

This study will help us to quantify the role of ocean-atmosphere coupling on the air-sea exchange of carbon export to the ocean interior. Carbon export includes both inorganic and organic carbon (OC), and for the latter, both particulate and dissolved phases. Of particular interest, in light of clear evidence for ocean acidification, this study will help us to understand how our behavior contributes to the production of the anthropogenic aerosol. This anthropogenic aerosol will affect marine productivity, particularly the pelagic ecosystem.

6. Level of proficiency

I have proficient skills for ArcGIS desktop because I have been using it since I was in undergraduate study. However, I never use ArcGIS pro. My R knowledge is just for statistical analysis, not for spatial statistics. I do not know how to connect the R and ArcGIS, in this study I will not use R. I ever used phyton when I was an undergraduate however I never go deep into it and never use it anymore. I ever took Digital Image Processing (GEOG 581) class and very confident with google earth engine coding. I also use Matlab for geoshow and spatial analysis.

Greenland Outlet Glacier Dynamics

1.My primary research question is: How is the spatial pattern of Greenland outlet glacier extent (or, more specifically, their rate of growth/retreat) related to the spatial pattern of local glacier velocity via the mechanisms of glacier mass balance and basal slip?

Put simply: Are the glaciers that are retreating faster also physically flowing faster, or not?

2.I have 2 primary datasets for this project. The first covers what I consider to be my response variable: rate at which Greenland outlet glacier termini are moving (either advancing or retreating). This variable will be constructed from annual polyline shapefiles sitting on the visible edge of the glacier, over ~4 different years. The spatial pattern of these termini lie along the outer edge of Greenland’s geographic border, because that is where outlet glaciers terminate.

  1. Data: This dataset consists of a glacier ID consistent across every year (I will likely use 2015, 2016, 2017, possibly 2001). The dataset was produced via image processing of data from satellite sensors indicated in the link (there are several). Each shapefile consists of a vector polyline that represents the terminus extent of the glacier in a given year (winter max extent).


My second dataset covers what I consider to be my influential variable: glacier velocity. This dataset is a raster that covers the entire Greenland continent. There are clearly dendritic-hotspots along the outer edge of the continent, where the ice moves the fastest. The center of the continent, where most of the ice accumulates, has very little horizontal movement (only a few meters/year).

  1.  Data: This larger dataset represents annual ice velocity magnitudes across the surface of Greenland at 200 or 500m, annual resolution. Data is also available for separate x and y components of velocity, as well as their errors. This is a raster file covers the entire Greenland ice sheet, including all outlet glaciers mentioned above^.


3.My initial hypothesis is that statistically, faster flowing glaciers will tend to retreat faster. My explanation would be that vulnerable glaciers that are melting typically develop wet, slippery basal conditions. The slippage along the ground also allows the glacier to move faster. These same melting glaciers (with a negative mass balance) also typically experience a retreat of their termini, because they are melting and calving away. This is why I would generally expect a positive correlation between a glacier’s velocity, and the distance or area of retreat.

However, among healthy, dry-based glaciers, the relationship might be totally different. In these cases, glaciers with the highest accumulation (aka most snowfall) flow the fastest because they have so much mass to move to maintain equilibrium. This provides an alternate hypothesis that the fastest glaciers are the ones with the highest accumulation, regardless of terminus behavior.

As far as geospatial distribution goes, I’d expect rapidly retreating glaciers to have a clustered geospatial distribution. There are probably some areas where most of the glaciers are retreating, and a few areas where the majority of the glaciers are stable or growing. That hypothesis is just based on the observation that entire ice sheets do not melt at the same rate (for example, the Antarctic Peninsula is retreating far faster than the Eastern Antarctic Ice Sheet (EAIS)) Rather, ice sheets move/deform based on their own individual dynamics and local climate, which includes variables like temperature, precipitation, wind, cloudiness, and perhaps most importantly, ocean conditions. Considering the southeast portion of Greenland is subjected to the “rapidly warming and weakening” Gulf Stream, I’d predict that region of Greenland to be the most unstable and to display the greatest reduction in glacier extent.

4. I suppose Exercise 1 for me will focus on my response variable: the termini positions. I will need to develop a method to calculate the distance between 2 (jagged) polylines, and then determine whether that magnitude is considered an advance or a retreat… over multiple different years (with multiple years I can also determine the if the RATE of retreat is speeding up or slowing down). I can then create “points” out of these measurements, and determine if points of similar magnitude are clustered or evenly distributed, etc.

For the second step, I am envisioning working with the influential variable: glacier velocity. I think it would be interesting to create a hotspot analysis of this map, as well as create a map displaying vector directions and magnitudes of ice movement, to visualize directionality. I also think I will need to find a way to vectorize/ section off individual glaciers from the raster, possibly using a fixed buffer from the termini data. I’m also debating bringing in some mass balance data (gravitational mass anomaly), to see how glacier mass balance correlates with velocity.

For the final step, I’d like to connect the 2 datasets to interpret their relationship. I’d like to run a couple of correlation tools (Global Moran’s or something) to see quantitatively how distance retreated compares to glacier velocity. Finally, since I can’t expect the entire region to behave similarly, I’d like to cut Greenland into 6-8 sections and see how each behaves, to compare and contrast.

5.The expected outcomes are pretty diverse. The big thing I’m after is a statistical relationship between these variables, which is really just a number (R-squared). However there are several maps I am interested in making, including those depicting calculated rates of retreat, one depicting vector directions and magnitudes of ice movement, and a gridded map of Greenland depicting the R-squared between my variables in each region. While the answer to the question is a statistical relationship, there are several informative maps to be made along the way.

6. I think the scientific significance of this is mainly pointed towards making accurate predictions of sea level rise. Developing such predictions involves taking into account several not-so-obvious glacial mechanisms, and identifying which glaciers are most at risk. If ice velocity gives some indication towards which glaciers might contribute to sea level in the near future, then it could make the difficult job of modeling sea level a bit easier. It is also information that could be useful when put towards ecology and a handful of local communities. Data on recently uncovered land could help explain ecological shifts and invasive or unfamiliar species.

7. Proficiency:

-I have a working knowledge of ArcGIS Pro, based on the GIS I, GIS II and GIS III classes in the certificate program. This is more a textbook knowledge rather than applied experience.

-I have no experience whatsoever with R. I am not sure I wish to learn it, but if this class convinces me it is critical to a solid geospatial repertoire, I will concede.

-I have a working knowledge of Python, and I can use online resources to get most things done.

-I’ve used QGIS, ENVI, and MATLAB, but am not particularly well versed in any of them.

Agricultural Intensification and Population Decline in the Great Plains

My Research Question:

How are changing farm sizes and population correlated? There have been trends of agricultural intensification in Nebraska, Kansas, Oklahoma, and Texas. I am curious to see if the shift in distribution of farm sizes from more smaller farms to more larger farms has an effect on the population sizes.


In order to address this question, I will be looking at the United States Census for population data at the county level in 2010 and 2018. The 2018 data are estimates since the census is only collected every ten years. The United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS) runs a Census of Agriculture every five years, which collects information about farms throughout the country. I will use 2012 and 2017 Ag Census data at the county level. The Ag Census reports the number of farms in a handful of size ranges as well as the area of farms within those size ranges. They also report income from farm production, and whether the farms are owned by individuals, partnerships, or corporations, which I might include in the study to provide more context. I will include county-level data from the US Census and Agricultural Census for all of Nebraska, Kansas, Oklahoma, and Texas.

Figure 1: Sample map from the interactive Ag. Census viewer of increases and decreases in the number of farms with 2,000 acres or more between 2012 and 2017. This is the largest farm size category in the data.


In the Agricultural Census data, I expect to see an increase in farms with larger sizes and a decrease in farms with smaller sizes. I do not know enough about the region to predict the pattern of the increase. Yan and Roy (2016) calculated field size with remote sensing and from their maps it appears that the Western sides of these states have much larger farms in data from 2010.  I predict that in the time frame I am studying the range of large farms will extend more broadly across the Mid-West.

In the US Census data, I expect to see slight decreases of populations in rural counties but increases of populations in counties that contain cities and those immediately surrounding cities. McGranahan and Beale (2002) found that was the case in the 1990s because of lack of income opportunities in rural areas, forcing many to move into urban and suburban counties in search of work.

I hypothesize that the there will be overlap in the counties that have increasing numbers of large farms and decreasing populations. I hypothesize that this pattern occurs because individuals who farm for large corporations have funds available to buy large plots of land, and then are able to manage farms with higher economic efficiency and therefore sell their products for cheaper prices. The smaller farms in the area are then priced out of the market and can’t afford to continue farming and leave for urban areas in search of other sources of income. This then takes away business from the local small businesses such as doctors’ offices, post offices, and other local stores.


I would like to explore using Python to find analyze the patterns and do some statistical analysis with this data. I plan on exploring statistical analysis with spatial data and calculating if any correlation between increases in farm size and decrease in population is statistically significant. I will also have to aggregate the farm size data from categories of farm size into a general increase or decrease. Or potentially doing separate analyses for the different groupings. I may have to expand my analysis into R to do some of the statistical computations.

Expected Outcome:

I expect to produce maps with the patterns that I find in both datasets and between the two. I will also produce statistics on the relationships between the patterns. I will also likely produce some maps with contextual information about agriculture, economics, and population.


This research is important to policymakers because it is important to understand the impact of changing industry on people’s living and economic situations.  Municipalities are typically concerned with their economies and population growth. If there is a clear correlation between a change in an industry and a change in population and available business, it might be reason for more research or a change in policy. This correlation might also have implications for soil health and the general environmental sustainability of agriculture.


I have many years of using ArcGIS (both in an out of school) and will likely use it for the final cartographic steps in making my maps.

I have a little bit of experience in both Python and R, which I will use for my statistical analyses. I don’t think I will be using remote sensing in this project, but I am proficient in Java Script for Google Earth Engine if I need it.


McGranahan, David A. & Beale, Calvin L., (2002). “Understanding Rural Population Loss,” Rural America/ Rural Development Perspectives, United States Department of Agriculture, Economic Research Service, vol. 17(4), December.

Yan, L., & Roy, D. P. (2016). Conterminous United States crop field size quantification from multi-temporal Landsat data. Remote Sensing of Environment, 172, 67–86.