Category Archives: Exercise 3

Exercise 3: Investigating spatial and temporal relationships between prey density sampling stations

Question asked

Following exercise 2, it seems that the relationship between distance to kelp and density of zooplankton cannot appropriately be explained via a linear function. Therefore, I wanted to explore whether certain stations are correlated with one another to continue to try and figure out a way to create a spatial layer of zooplankton density across my entire study area that is ecologically informed. Therefore, my questions for exercise 3 were:

  • “Are there correlations between zooplankton density values at different stations?”
  • “If so, what is the nature of these correlations – positive/negative?”
  • “How does distance factor into these relationships?”

Approaches/tools used, methods & results

I first visualized the zooplankton density at all of my stations for 2018 across time (Fig 1). Since this plot looks a little convoluted, I also decided to split it out by the two sites (Mill Rocks and Tichenor Cove; Fig 2) as well as by station (Fig 3). Just by visual inspection, it does look like there may be peaks and troughs in zooplankton density occurring at the same time across several stations.

To investigate this, I then ran a bivariate correlation matrix on my data to compare prey density between all pairs of stations on the same days (Fig 4), which resulted in only 3 pairs being significantly correlated (Fig 5). 

While this was interesting to see, these plots did not help elucidate where these correlations are occurring in space. Therefore, to add the spatial element into this, I plotted the correlation coefficients of each pair by the distance between that pair of stations (Fig 6). Looking at the plot, there is no obvious, discernable trend. Interestingly, there appear to be some stations that are strongly positively correlated but far apart, while there are also stations that are positively correlated that are close to each other. The same trend applies for negative correlations, though there is not as strong of a negative correlation as there is a positive one. However, a majority of the stations also fall in the middle, indicating that for a lot of stations, zooplankton density at one station does not relate to density at another station.

Fig 6

Although this plot did bring the distance between stations into context, it is hard to visualize that this actually looks like. Therefore, I visualized this plot in two ways. One was by drawing color-coded lines of the strongest positive and negative correlations on my site map (Fig 7), while the second was the same plot as Fig 6, however the points were color-coded by the habitat type (Fig 8) of the two stations (since habitat type is likely also a determinant of where we find zooplankton). 


It has become very clear to me that trying to figure out a way to interpolate prey density across my sites is not as straightforward as I thought it was going to be, which can feel very frustrating at times. However, I simply need to remind myself that the marine environment is incredibly complex and dynamic and that it would be near miraculous if the relationships between habitat, environment, prey, and predator could be easily defined by one variable. For now, exercise 3 has continued to uncover the patterns that do and don’t exist in my data and has led to more analyses to try to keep disentangling the patterns. My next step for the final project will be to investigate temporal cross-correlation to see whether the density at station x1 at time t is related to the density at station x2 at time t-1. I am a little doubtful about whether or not this will work because my sampling frequency varies between stations (sometimes we can’t do certain stations due to unworkable conditions) and there are sometimes 3 day gaps in sampling. However, I shall persevere and see how it goes!

Multiple Regression Analysis Relating to Benthic Polychaetes Amphipods (BPA) Ratios off the Oregon and Washington Coasts

  • Question I asked:
  1. Is Benthic Polychaetes Amphipods (BPA) ratio positively correlated to total organic carbon TOC off the Oregon and Washington coasts?
  2. Is BPA ratio positively correlated to depth off the coast of Washington State?
  3. How strong of predictors of BPA are TOC and Depth off the Washington coast?
  4. Does the relationship between distance to nearest harbor/river and BPA differ at different depth ranges off the Oregon coast?
  5. How effectively can you determine TOC based on depth and distance to nearest harbor or river in Oregon?
  • Tool or Approach Used:
  1. Benthic Polychaetes Amphipods (BPA) Index (See Above)
  2. Generate Near Table (Analysis) Tool in ArcMap
  3. Join (Data Management) Tool in ArcMap
  4. Multiple Regression Analysis w/Interaction Term
  • Steps Followed:
  1. First, I queried by data in Access to determine the total number of polychaetae and amphipods in each box core sample. Then, I joined that information to and existing Excel file and generated a new column to determine BPA.
  2. I typed the BPA equation listed above into Excel and used it to populate a new column to generate BPA index values for each benthic box core sample.
  3. I created a shapefile that contained the GPS locations of all the major rivers outlets and harbors in Oregon.
  4. I created a shapefile of all the benthic box core samples off the coast of Washington and then a shapefile of all the samples off the coast of Oregon.
  5. I used the Generate Near Table (Analysis) Tool in ArcMap with Oregon benthic box core samples as the “Input features” and major river mouths & harbors in Oregon as the “Near features.” This generated a distance value from each box core sample to the nearest river or harbor.
  6. I then used the Join tool too join the distance to nearest river or harbor values to the benthic box core data. After I completed the join, I exported the Attribute table to a text file and opened it in Excel using the “Add Data” function.
  7. In Excel, I used the correlation and Data Analysis regression tools to answer the questions posed above.
  • Results:
  1. Is Benthic Polychaetes Amphipods (BPA) ratio positively correlated to total organic carbon TOC off the Oregon and Washington coasts? – Washington: Pearson’s I Correlation Coefficient: 0.178069128 – WA Regression equation: y = 0.1332x + 0.803, y=BPA, x=TOC – WA Polynomial Relationship between TOC & BPA: y = -0.6365×3 + 0.5372×2 + 0.4564x + 0.7076
This figure shows that off the coast of Washington, BPA tends to increase at higher levels of TOC. TOC is reported in terms of percent organic carbon present in the benthic box core sample.
A square polynomial rather than linear function fits the relationship between TOC and BOA off the Washington Coast more closely. The R-squared value for the linear function was 0.0317 while the R-squared function for the polynomial function is 0.1313.

– Oregon: Pearson’s I Correlation Coefficient: 0.186643563 – OR Regression Equation: y = 0.0906x + 0.8261, y=BPA, x=TOC

This linear regression function suggests that BPA tends to increase at increased levels of TOC off the Oregon coast. Notice that maximum TOC levels are much higher in Oregon than in Washington. The max TOC in Washington was 1.4%, while in Oregon it is over 3%.

2. Is BPA ratio positively correlated to depth off the coast of Washington? Oregon? – WA Pearson’s I Correlation Coefficient: 0.650932232 – WA Regression equation: y = 0.0079x + 0.3124, R2=0.4237, y=BPA, x=Depth

This figure shows that BPA tends to increase at increased depth off the Washington coast. Notice the relatively high R-squared value here, at 0.4237.

– OR Pearson’s: 0.179339185 – OR Regression: y = 0.0005x + 0.8135, R2=0.0322, y=BPA, x=Depth

This figure shows that overall, BPA tends to increase with increased depth off the Oregon coast. However, notice that there seems to be a stronger positive trend between 0 and 200 m depth and a negative trend between 200 and 600 m.

-Because I saw a positive relationship between BPA and TOC from 0-200 m and a negative relationship at 200-600 m, I ran a Pearson’s I correlation coefficient on those two subsets of the depth data and found: – OR 0-200 m correlation: 0.504465898 – OR 200 -600 m correlation: -0.354042185

3. How strong of predictors of BPA are TOC and Depth off the Washington coast? – The results of a multiple regression analysis revealed an R2 value of 0.435745363315495 with y=BPA, x1=TOC, and x2=Depth.

4. Does the relationship between distance to nearest harbor/river and BPA differ at different depth ranges off the Oregon coast?

Here we see that relationship between distance from the nearest major river or harbor and BPA for depths up to 50m. Note that there is a negative relationship for this depth range but a positive relationship at all other depth ranges.

– Multiple Regression (at all depths): y = 0.760833428 – 0.000278547x1 + 0.741952481x2 -0.000206669, y= BPA, x1=Depth, x2= distance from river or harbor

5. How effectively can you determine TOC based on depth and distance from river or harbor in Oregon?   – Multiple Regression: y = -0.141824199 + 0.003378366x1 + 1.147497178x2 + 0.003876666, y = TOC, x1=Depth, x2= Distance from river or harbor, R2 = 0.670711926

This figure shows the relationship between depth and TOC off the Oregon coast. This is a single variable linear regression. This multiple regression R-squared value is a slightly better fit at 0.67 rather than 0.62.
  • I found the Multiple Regression Analysis, Pearson’s I correlation coefficients, and ArcMap tools to be effective for my analysis. It was somewhat cumbersome to run these analyses in Excel, however. I would like to work with multiple regressions in R for ease of use, to determine interaction coefficients, and to graph results.

Exercise 3_ Work in Progress

  • For the third exercise, I am exploring the MODIS fire data in more detail to access spatial patterns and seasonal impacts. I am still relying on mapping exercises and hotspot analysis to examine variations following the wheat harvest season in April and the rice harvest season in October. Some graphs are shown below:
The map above is intended to show wheat production (in 000 Tonnes) across districts. This serves as a useful reference for examining post wheat harvest crop burning.
Results of the Hotspot Analysis using fire radiative power (FRP) show that for the clusters of fires tend to occur in the northern part of the Punjab province. In winter month – post rice harvest, the intensity of fires is lower than in April – May 2011. There are some high intensity fires that occur in rice cultivating districts of the Sindh province.
Hotspot analysis using data for 2013 shows some high intensity fires in the winter months also. However, spatially the fires appear to be concentrated in the same region, as in 2011.
For2014 – in the post wheat harvest season, there are fires of higher intensity clustered in parts of southern Punjab – specifically Bahwalnagar district where a significant amount of wheat is cultivated. There also appear to be clusters of high intensity fires occurring in SIndh.

Exercise3. Influence of landslide volume on traffic distance and time


In Exercise 2, I created a buffer zone and compared the relationship between road density and landslides. Exercise 3 I want to further analyze the impact of landslides on roads and traffic. For large landslides, how does it affect travel distance and time? If the landslides at the same location have different volumes, do people choose different ways to detour?

2. Tool or approach that be used

Buffer: First of all, attribute a table to find out what is volume and area of landslide, based on the area of landslide, determine if the construction site is needs, if yes, create buffer with appropriate distance.

Network analysis: Network analysis is to analyze what is trip time and distance with buffer. First click Network Analysis, then choose route as target. The route is presented, then click edit and create features, add stops and buffer, stops should be same when the route with or without buffer is analyzed.

Show directions: This tool is to estimate trip time and dictance. Also the difference between highways and urban roads will also be shown.


Figure 1. 213-highway normally trip map
Figure 2. 213-highway 250m buffer zone detour map
Figure 3. 213-highway 500m buffer detour map
Figure 4. Directions of normal trip on 213.
Figure 5. Directions of 500m buffer on 213

By comparing Figures 4 and 5, it can be seen that the 500m buffer zone will cause a 5-minute detour. Due to the difference in speed limit and traffic flow on highways and urban roads, the time may be longer.

Unmanned aerial vehicle (UAV)-based photogrammetric approach to delineate tree seedlings

Exercise 3: Visualizing the seedling structure and assessing seedling growth using point cloud data

  1. The question that is addressing in this study

Characterization of the forest canopy and projection of forest inventory from Light detection and ranging (lidar) data has been widely used over the last two decades. Active remote sensing approaches (i.e., lidar) can use to construct three dimensional (3D) structures of tree canopies and underlying terrains (Pearse et al., 2019). However, due to some drawbacks associated with lidar data such as the low density, expensiveness of acquisition of data, and difficulties for utilizing for small areas, the practical use of lidar data becomes limited (Tao et al., 2011). However, the emergence of unmanned areal vehicles (UAVs) photogrammetric point cloud technology could able to address most of the drawbacks associated with lidar data. Especially, lightweight UAVs provide a better platform to acquire digital photos with lower operational costs (Ota et al., 2017). Therefore dense 3D point clouds generated from Structure-from-Motion (SFM) photogrammetry can use as ample source to substitute the lidar point cloud data and utilize to construct 3D structures to estimate forest biophysical properties.

Consequently, voxel-based matrics approaches are commonly used in forestry applications, especially for characterizing the forest canopy and other forest attributes in 3D space. With these approaches, point clouds are split “along both the vertical and horizontal axes to form volumetric pixels (volume pixel or voxel)” (Pearse et al., 2019). The main advantage of the voxel-based prediction is that we can extract the information within different layers of the forest canopy. However, most of the voxel-based matrices have been utilized with lidar point cloud data and terrestrial laser scanning data (Pearse et al., 2019). Since the acquisition of both lidar point cloud data and terrestrial laser scanning data expansive, UAVs based point cloud data use as an alternative solution. Therefore, in this exercise, I am interested in characterizing the forest attributes (in 3D space) using a voxel-based matrics approach by utilizing UAVs photogrammetric point cloud data. Specifically, identify “how the tree crown completeness varies as a function of the number of points that arranged in the 3D point cloud/ voxel resolution”.

2. Name of the tool or approach used, and steps followed.

Step 1:  Pre-processing point cloud data

Selecting Area of Interest

I used R software packages to develop point cloud data from UAV images.

  • Packages used: lidR, sp, raster, rgdal, and EBImage

As the initial step, point cloud data were extracted using AgiSoft software (Fig.1(a)). Then the lasground function in the lidR package was used to classify the points as ground or not ground by using the “cloth simulation filter” (Fig.1 (b)). In the next step, I used the lasnormalize function in the lidR package to remove the topography from the point cloud created (Fig.1 (c)). Next, the normalized point clouds were cropped with the area of interest (AOI) shapefile (Fig. 2 (d)) using lasclip function.

Figure 1. Map of (a). point clouds generated over the study site, (b). point clouds after the cloth simulation function performed, (c). normalized point cloud, and (d). finalized point cloud mapped in AOI and selected sub-set of AOI.

Step 2:  Voxelizing the point cloud data

For better visualization and to increase the efficiency of computer performance, I selected a subset of AOI to illustrate the voxel models(Fig. 1(d)). First, the “lasfilterduplicates” function (in lidR package) was used to remove the duplicate point from point cloud data. After removing the duplicate points, voxelization of point cloud data performed using the “lasvoxelize”  function in the lidR package. This function allows two different options (cubic and non-cubic) to reduce the number of points by voxelizing the point cloud. Figure 2 represents the cubic voxel approach, and relative x, y, and z dimensions indicate the resolution of the voxel. Figure 3 represents the non-cubic approach, and relative dimensions for the voxel represent in each diagram. Especially in this approach, z-axis dimensions are different from x and y dimensions.

Figure 2. Voxel models developed by using a Cubic voxel approach, voxel resolution (a) 1m, (b), 0.1 m, (c). 0.01 m, and (d). 0.001 m. The top row represents the side view, and the bottom row represents the top view of the voxel models.  

Figure 3. Voxel models developed by using a Non-cubic voxels approach, voxel dimensions  (a) 1 x 1x 2 m, (b), 0.1 x0.1 x 0.2 m. (c). 0.01 x0.01 x0.02 m, and (d). 0.001 x 0.001 x0.002 m. The top row represents the side view, and the bottom row represents the top view of the voxel models.

Step 3:  Visualization approaches

The viewshed3d package uses 3D point cloud data for visualizing the different environmental settings, especially for construction 3D environments for ecological research. Due to the potential usefulness of this package, I used it for visualization of the study area and canopy structure of the seedlings in 3D space. At the initial stage, duplicated points were removed from the point cloud. Then the noise associated with point cloud data removed, and the reconstruct_ground() function was used to reconstruct the ground layer of the study area. Finally, an appropriate voxel model (described in section 3) used to visualize the seedlings in 3D space.

Figure 4. (a). Example of detected noise of point clouds (red), (b). reconstruction of the ground and illustrating canopy cover os seedlings, and (c) and (d). illustration of canopy cover over the AOI.

Step 4: Geographically weighted regression(continued from exercise 2)

Image classification tools available in ArcMap 10.7:

  • Constructing a digital elevation model
  • Spatial Analyst Tool: Tools available in Zonal
  • Geographically weighted regression

Assessing the effect of geography  

To identify the influence of geography for seedling growth, I constructed a digital elevation model for the study area. Relative elevation values for each seedling location were extracted by using the Zonal tool available in ArcMap 10.7. As the next step, a geographically weighted regression performed by considering the tree height as the dependent variable and relative elevation of the seedling locations as the independent variable. The observed coefficients for the geographically weighted regression illustrated in figure 5.

Figure 5. Map of (a). coefficients observed from geographically weighted regression by considering seedling height as the dependent variable and elevation of seedling locations as the independent variable and (b). orthomosaic image of the study area.


  • Voxelization of point cloud data

Results obtain by voxelization of point cloud data indicate that reducing of voxel volume can provide better outputs in terms of visualization of seedlings in 3D space. For example, 1 x1 x 1 m3 cubic voxel cloud not able to visualize the structure of the seedlings in sub-AIO, while a volume of 0.1 x0.1x 0.1 m3 voxel could able to demonstrate the seedling architecture in an appropriate manner. After certain trials, a volume of 0.01 x 0.01 x0.01 m3 showed relatively the best voxel model for visualizing the seedling architecture. Even the voxel volume reduces (beyond 0.01 x 0.01 x0.01 m3), the visualization of the seeling article does not improve drastically. Further reducing the voxel volume requires more time and computer power to process the seedling article. Therefore, I selected 0.01 x 0.01 x0.01 m3 voxel volume (model) as the best cubic voxel model to identify the seedling architecture for this study.

Additionally, I performed voxelization of point cloud data with a non-cubic approach, and the observed results are similar to the cubic voxel models. The only difference for the non-cubic approach is the z-axis change (change of vertical resolution). We can change the resolution of the voxel image by changing the z-axis dimensions based on our interest. For example, we can characterize/observe the canopy structure by changing the resolution of the z-axis, as shown in figure 6. Non-cubic approach appropriate if we are interested in studying the variation of canopy structure relative vertically.

Based on the results obtained by visualization of the study area (step2), provide a better platform to understand the actual canopy structures of each seedling. Mainly, this helps to compare the results we obtained in the 2D environment (in exercise 2) and assess the results with the help of 3D representations. By looking at the canopy structures of the seedlings, we can confirm that seedlings growing in the northwest side of the AOI has relatively larger and healthy seedlings while the southeast side of the AOI has smaller and unhealthy seedlings. This observation agrees with the results obtained by geographically weighted regression (exercise 2 and 3). Both cubic and non-cubic approaches are applicable and can be utilized based on our desired output. The identified optimal voxel resolution (0.01 x 0.01 x0.01 m3) can be used as a reference for further voxel-based analysis, and it will help to save data processing time and with less computer power.

Figure 6. Variation of non-cubic voxels relative to z-axis changes (top) (a). 1 m, (b). 0.1 m, (c). 0.01m, (bottom) (d). 2 m, (e). 0.2m, and (f) 0.02 m.

Results obtained by geographically weighted regression (height as the dependent variable and relative elevation of seedling locations as the independent variable) showed a decreasing trend of coefficient values with decreasing the elevation profile (Fig.4 (a)). Furthermore, obtained trends indicate that seedlings located in the northwest part of the study area have higher coeffects values, while seedlings located in the southeast part have relatively lower coefficient values (Fig.4(a)). Generally, the elevation of seedling locations may indicate essential parameters about the geography and water table of the study area. We can assume that lower elevation areas are more prone to accumulate water, and the groundwater table may close to the surface compare to the higher elevations. As described in the previous exercise, this might be the indirect cause for the presence of bare ground in lower elevated areas (i.e., stagnant water may damage the root systems of grass and may die due to rotten roots and will expose the bare ground). Similarly, the presence of excess water may hinder the growth of seedlings as well. As a side effect of the presence of groundwater with woody debris may tend to increase excess moisture conditions in subsurface soil layer and will act as a negative factor for seedling growth, especially for their root system.


Overall, the visualization of seedlings using the voxel approached showed promising results, especially identifying the optimal voxel size/volume that can be utilized for developing seedling architecture in future studies. However, to obtain better visualization outputs, we need to create high-density point clouds (described in step 1), and that requires more time and computer power.

If we are interested in a relative lager area, visualization of objects may need to be enhanced. Therefore, after selecting the appropriate voxel model (i.e., 0.001 x0.001 x 0.001 m3), plotting tools available in lidR package can be used to create better visualization structures, as illustrated in Figure A1.

Additionally, the detection of branches in early-stage seedlings may be difficult due to its unique shape and small branches. However, the developed approach can be useful for identifying the number of branches of a mature trees.

Appendix A

Figure A1. Different methods of plotting the voxel model using the plotting function available in lidR package to increase the visibility of voxel models.


Ota, T., Ogawa, M., Mizoue, N., Fukumoto, K., Yoshida, S., 2017. Forest structure estimation from a UAV-Based photogrammetric point cloud in managed temperate coniferous forests. Forests 8, 343.

Pearse, G.D., Watt, M.S., Dash, J.P., Stone, C., Caccamo, G., 2019. Comparison of models describing forest inventory attributes using standard and voxel-based lidar predictors across a range of pulse densities. Int. J. Appl. Earth Obs. Geoinformation 78, 341–351.

Tao, W., Lei, Y., Mooney, P., 2011. Dense point cloud extraction from UAV captured images in forest area, in: Proceedings 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services. Presented at the Proceedings 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services, pp. 389–392.

What factors that contribute to the Primary Production in Bermuda Ocean

Background and Research Question

Mineral dust is the most important external source of phosphorus (P), a key nutrient controlling phytoplankton productivity and carbon uptake, to the offshore ocean (Stockdale et al., 2016). Paytan and MacLaughin (2007) emphasized that atmospheric P can be important as the major external supply to the offshore ocean, particularly in oligotrophic areas of the open ocean and areas that are P-limited, such as Bermuda Ocean. The most important source of atmospheric P is desert dust, which has been estimated to supply 83% (1.15 Tg⋅a−1) of the total global sources of atmospheric P (Mahowald et al., 2008). Of that dust, it is estimated that 10% is leachable P (Stockdale et al., 2016). In addition, Saharan dust supplies a significant fraction of the P budget of the highly weathered soils of America’s tropical forests and of the oligotrophic water of the Atlantic Ocean, increasing the fertility of these ecosystems (Gross et al., 2015).

Hence, this background is my starting point to try analyzing the correlation between the Particulate Organic Phosphorus (POP) and Primary Production (PP) in Bermuda Ocean. In exercise 2, I tried to answer these questions, however, the result was not likely that I expected. In this exercise 3, I explored a lot about my raw data and I realized that the way I divide the data will affect the result a lot. In this exercise I dug up a lot about the time series, regression, and auto-correlation function in R.

However, due to my misinterpret data in exercise 2, I ended up adding one more variable in this exercise 3 to broaden my analysis and convince my result. The previous study revealed that most phosphorus (P) and iron (Fe) are present as minerals that are not immediately soluble in water, hence not bioavailable (Lidewijde et al (2000), Shi et al (2012)). Lidewijde et al (2000) also stated that phosphorus (P) and iron (Fe), if deposited to the surface ocean, may pass through the photic zone with no effect on primary productivity, owing to their high settling velocity and low solubility. The photic zone has relatively low levels of nutrient concentrations, as a result, phytoplankton does not receive enough nutrients. Moreover, there are several factors that contribute to the primary production, such as physical factors (temperature, hydrostatic pressure, turbulent mixing), chemical factors (oxygen and trace elements), and biological factors. Hence, I added the temperature variable in this exercise 3.

In this exercise 3, I would like to find out what factor that contributes to the PP in Bermuda Ocean and how is the time cycle of three variables (PP, POP, and Temperature)?


Dr. Julia helped me in pre-processing data by using excel, and for further analysis, I used three tools in R:

  1. Time series function: This function will help us to plot the time series trend for every data. Ex code: plot.ts(POPR['PP'], main="PP depth 0-7 meter", ylab="PP")
  2. Linear regression model: This function will help us to identify the correlation between two designated variables, in this exercise, the dependent variable is Primary Production and the independent variables are POP and Temperature. Ex code: POPR.lm<-lm(POP~PP,data=POPR) and summary(POPR.lm). To plot it into a scatter and linear line, I used ggplot function. Ex code: ggplot(POPR, aes(x=POP, y=PP))+ geom_point() + geom_smooth(method=lm,se=FALSE)
  3. Auto-correlation function: This function is used to find patterns in the data. Specifically, the autocorrelation function tells you the correlation between points separated by various time lags. In this exercise, the lag ranges from +1 to -1, where +1 is perfectly related and -1 is inversely related. In afc chart, the dashed line represents the boundary of the significance of correlation. Ex code: acf(POPR['PP'],lag.max = 51,type = c("correlation", "covariance", "partial"),plot = TRUE,na.action = na.contiguous,demean = TRUE) In this code, we can change the lag.max to the number of the data if we want to access the lag coefficient individually, or uses NULL to default. For this exercise I used lag.max which according to the number of my data.

Steps of Analysis

  1. Divided the data based on the depth categories:
CategoryDepth (meter)
POP 10-7
POP 29-12
POP 318-22
POP 437-42
POP 557-62
POP 677-85
POP 798-102
POP 8108-122
POP 9138-144
POP 10158-164
POP 11197-200
Table 1. The Category of POP based on depth

2. Regression analysis for between PP vs POP and PP vs Temperature. Since the data of PP and temperature is only for 0-8 meter depth, hence the regression is only conducted for in this depth category. In order to perform the regression, I pulled out four outliers:

Table 2. The outlier to perform regression analysis

3. Accessing the temporal pattern of variable PP, temperature, and POP by using the auto-correlation function and also time series function. Due to some missingness data, especially for POP and temperature in 0-7 meter depth, hence the date 2014/03/06, 2015/02/04, 2015/12/13 is pulled out for POP and date 2014/03/06 and 2014/12/11 is pulled out for temperature.


  1. Regression
Figure 1. The regression analysis between PP vs POP and PP vs Temperature
Figure 2. R square analysis every month

From figure 1, overall the POP has a positive correlation to the PP, and the temperature has a negative correlation to POP (R square PP vs POP is 0.3064 and R square PP vs temperature is -0.183). From this analysis I can assume that POP is a factor that contributes to the Primary Production in Bermuda ocean. However, to see the detail of this analysis I tried to see the regression analysis per month in a 5 year interval period. I did not perform the regression in January due to a lack of data. From figure 2, we can see that from April to December and February, the pattern of correlation between PP vs POP is similar to the pattern of PP vs temperature, where the highest is in May and the lowest is in December and February. The significant difference happens in March, where the gap is +0.98 for PP vs POP and -0.99 for PP vs Temperature.

2. Auto-correlation and Time Series PP, POP and Temperature depth 0-7 meter

Figure 3. The auto-correlation and time series graph of PP, POP and Temperature depth 0-7 meter

From figure 3, as we can see the temporal pattern of PP and POP is likely similar, where the highest peak is at the beginning of the year for every year, except for 2012 and 2016. This is due to in 2012 the data start in June and March in 2016. This pattern is inversely for temperature, wherein the beginning of the year the temperature is very low and high from June to July. To confirm this pattern, we can see the autocorrelation (ACF) chart, where the temperature has significantly related to the function of time, The vertical line crosses the horizontal dash line which means there is a repetition cycle in time for temperature.

As a temporal pattern, there is a repetition pattern for POP and PP, where the highest value happens in January-March. However the difference from temperature is the repetition in temperature has the exact same value, which does not happen for POP and PP. There is a big gap between the value of POP and PP in January and February from 2013 to January and February in other years. I think that is why there is only one significant correlation line in the beginning ACF chart for POP and PP.

3. Auto-correlation and Time Series of POP by depth categories

Figure 4. The auto-correlation and time-series graph of POP based on 11 depth categories

From figure 4 we can see that the temporal pattern from lag 0-5 is similar for all depth categories where for overall lag only POP 1 to 3 is similar (depth 0-22 meter). It indicates that the deposition of the POP can reach 22 meters depth at the same time.

If we see at POP 4 to POP 6, there is no repetitive pattern both in the ACF chart and the time series chart. However the repetitive pattern happens from POP 7 to POP 10, or from 98-164 meter depth.

By looking at the number of POP 11 (depth 197-200 meter), there is only a small number of POP can reach this depth.

Critique of the method – what was useful, what was not?

Overall this exercise 3 made me realized the way we process the data will affect the result. In exercise 2, I was likely to simplify the process, hence I got difficulties in order to interpret my data and the result is not likely what I expected.

In this exercise 3 I learned a lot about the Auto-correlation function and time series function in R. In my opinion, acf is very useful for stable data like temperature, where the repetitive pattern is along with the value. The significance of ACF is based on the value, hence in the ACF temperature chart (figure 3) we can see that most ACF has a significant correlation overtime period (both positive and negative), which does not happen in POP and PP data.

POP and PP have a pattern, however, the value varies overtime period, hence although we can see the pattern in time-series and ACF charts, the significance only appears in the beginning month in 2013. In my opinion, if we would like to access the significance cycle, the unstable data like POP and PP where the is a big gap value overtime period, ACF is not really suitable. However, if we just would like to see the pattern we can relly on time-series and ACF function as well.


Gross, A., Goren, T., Pio, C., Cardoso, J., Tirosh, O., Todd, M. C., Rosenfeld, D., Weiner, T., Custódio, D., & Angert, A. (2015). Variability in Sources and Concentrations of Saharan Dust Phosphorus over the Atlantic Ocean. Environmental Science & Technology Letters, 2(2), 31–37.

Eijsink LM, Krom MD, Herut B (2000) Speciation and burial flux of phosphorus in the surface sediments of the eastern Mediterranean. Am J Sci 300(6):483–503. doi: 10.2475/ajs.300.6.483.

Mahowald, N., Jickells, T. D., Baker, A. R., Artaxo, P., Benitez-Nelson, C. R., Bergametti, G., Bond, T. C., Chen, Y., Cohen, D. D., Herut, B., Kubilay, N., Losno, R., Luo, C., Maenhaut, W., McGee, K. A., Okin, G. S., Siefert, R. L., & Tsukuda, S. (2008). Global distribution of atmospheric phosphorus sources, concentrations and deposition rates, and anthropogenic impacts. Global Biogeochemical Cycles, 22(4).

Paytan, A., & McLaughlin, K. (2007). The Oceanic Phosphorus Cycle. Chemical Reviews, 107(2), 563–576.

Stockdale, A., Krom, M. D., Mortimer, R. J. G., Benning, L. G., Carslaw, K. S., Herbert, R. J., Shi, Z., Myriokefalitakis, S., Kanakidou, M., & Nenes, A. (2016). Understanding the nature of atmospheric acid processing of mineral dusts in supplying bioavailable phosphorus to the oceans. Proceedings of the National Academy of Sciences, 113(51), 14639.

Zongbo Shi, Michael D. Krom, Timothy D. Jickells, Steeve Bonneville, Kenneth S. Carslaw, Nikos Mihalopoulos, Alex R. Baker, Liane G. Benning. Impacts on iron solubility in the mineral dust by processes in the source region and the atmosphere: A review. Aeolian Research, Volume 5, 2012, Pages 21-42. ISSN 1875-9637.

Continuing to understand the spatial and temporal correlation between census variables and their relation to gentrification

For this exercise I was interested in continuing to test out different variable combinations with in the Geographically Weighted Regression and further exploring the meaning of the relationship between these variables. This framed exercise three with the following research questions:

  • What are the individual relationships between change in race, education, and income across both decades?
  • How does each change variable relate to itself across the two decades?
  • Do these results produce information that provide information on where there might be gentrification?
  • How do these results compare to where others have identified gentrification and how they have calculated it?

Geographically Weighted Regression

I continued using the GWR tool to make more maps of how the variables are correlated. I expected some similarity in the patterns between variables, and I hypothesized that they would be in the northern central part of the city. When I compare each decade’s variables to the same variable in the second decade, I expect some correlation in areas that have steady change, but lower correlation in areas that change rapidly in one decade and not the next.

In order to do this, I used the GWR tool in ArcGIS and used the number of neighbors as the neighborhood type. I ran the tool with change in race from 1990-2000 as the dependent variable and change in education from the same decade as the explanatory variable. I repeated this with race and income, and then all of it with the second decade. I then ran it with education as the dependent variable and income as the explanatory variable for each decade.

Comparison to Other Gentrification Maps

I have begun to look at other maps of gentrification in Portland, OR to help make sense of the information that I am getting out of the GWR tool, compare results, and compare methods. Bureau of Planning and Sustainability for the City of Portland has produced maps of gentrification, also by census tract throughout the city. They created a taxonomy of gentrification from susceptible to late gentrification (and even continued loss after gentrification). They also have maps of the variables that went into their maps, which consisted of percentages of populations that were: renters, of color, without a bachelor’s degree, and how many households were below 80% of the median family income. These were all combined into a vulnerability map. The changes in renters, race, education, and income were combined to create a map of demographic change. The change in median home value and appreciation from 1990-2010 maps were combined to create a housing market conditions map. All of these were combined to create the final gentrification map.


The results of the GWR tool produced maps comparing two variables for one decade. As mentioned above, I ran the tool for all variable combinations for both decades. These results show that each combination of variables changes together in a different part of the city. Change in race and education occurs in the northwest part of the city in the first decade and north central/west part in the second decade. Race and education change in the eastern part of the city in the first decade, and some in the east and some in the northwest in the second decade. Education and income change in the southern part of the city in the first decade and southeast in the second decade. It is interesting that all of the combinations are slightly different but have some patterns.

The regression coefficients from the model runs when comparing each variable across the two decades show interesting patterns too. The changes in race each decade was relatively similar across the city but had the highest coefficients in the southern part of the city. Changes in education both was the same in the eastern part of the city, and changes in income both decades were similar in the north/central part of the city.

Figure 1: This figure shows the coefficients from the GWR between two variables. a) Coefficients of change in race and education. b) Coefficients of change in race and income. c) Coefficients of change in education and income.

When compared with the Bureau of Planning and Sustainability gentrification maps, there are some patterns of similarity. They find that the northwest part of the city there is gentrification currently happening, the central part of the city has a late stage of gentrification, the south part has some early stage gentrification, and the eastern side is susceptible.

Figure 2: Bureau of Planning and Sustainability for the City of Portland: Gentrification Neighborhood Typology map

The correlation between the changes in race and education seem to have similar patterns to current gentrification and areas that are susceptible. Change in race and income seem to follow a similar pattern to susceptible areas. Change in education and income seem to have the highest correlation in areas that are susceptible or are in early stages of gentrification.

I think the biggest difference in their calculations are using the opposite side of the data. For race they used percentage of populations that were communities of color, whereas I used percentage that is white. They used percentage without a bachelor’s degree, whereas I used percentage with a bachelor’s degree. They also used the percentage of the population that are renters. These differences tell a slightly different story since there are multiple patterns of change in the city of Portland.


I think one downfall of the GWR tool is that if you include multiple explanatory variables, you cannot get a coefficient value for how they all change together, you get separate coefficients for each explanatory variable. Granted, I’m not sure how this would be possible, but it would be useful to see if the three variables have any patterns together. This makes it difficult to compare three variables at the same time.

Exercise 3: so far…


# Lewis et al. 2007, D’eon and Delparte 2005. data screening based on fix type and PDOP (DOP)

# calculate data retention for: (Blair 2019)

# 1. Converting all 2D locations to “missed fix”

# 2. Converting all 2D locations AND 3D with DOP>5 to “missed fix”

# 3. Converting all 2D locations AND 3D with DOP>7 to “missed fix”

# 4. Converting all 2D with DOP >5 AND 3D with DOP >10 to “missed fix”

To account for fixes that are likely inaccurate (not enough satellites), there are various ways to screen the data. I tried 4 different methods to see which method had the highest data retention. Fixes that are considered inaccurate past a certain threshold are flagged as a “missed fix” even if the collar obtained a location.

No ScreeningMethod.1Method.2Method.3Method.4
Comparison of data retention given 4 different screening methods, versus no screening method. The original proportion of successful fixes out of the total number of attempted fixes was 0.91, and the next highest was method 4 (Removing all 2D with DOP >5 AND 3D with DOP >10).

Option 4 (Removing all 2D with DOP >5 AND 3D with DOP >10) had the highest data retention after removing suspicious locations. Method 1 (removing all 2D locations) was a closet second.

The next step is to run the models for FSR as a function of environmental attributes again with the new screened data (derived from Method 4).


First attempt at moving window analysis used very large window sizes.

The effect of sky availability at various spatial scales on FSR. All models were competitive.

Decided to try again with smaller window sizes.

05/22/2020 finished redoing window sizes, next need to extract values from test sites and re-run the regression analysis.


Context: I calculated the difference between each test site location (obtained via handheld GPS unit) and collar locations. Common data screening practices utilize cutoff values based off of fix type (2D vs 3D fixes) and Dilution of Precision (DOP), which are both recorded with the coordinates obtained by the GPS collar. Locations predicted to be more accurate (less location error) are normally 3D, and have low DOP values.

A common practice is to remove all 2D locations, or remove all locations with DOP>5.

The graphs below are meant to help me figure out a cutoff value for the stationary collar test data. The cutoff value used here will also be applied to the deer GPS data. The graphs are the result of a series of subsetting the data based off of DOP values and Fix Type to help gain a clearer picture of the data.

There is obviously one outlier 3D location at about 4000m away from the test site location.
To get a better look at the data, I removed all locations with DOP>10.
All 2D locations and their associated distances to the actual true coordinates of the test site.
All 3D locations and their associated distances to the actual true coordinates of the test site.

I was hoping to see a more clear pattern or “threshold” pattern that would show that above a certain DOP value, distances dramatically increased. However as the “data cloud” in the first scatterplot shows, there isn’t much of a pattern. However…

Keeping all data only within 30m to match raster data resolution results in 72% data retention (loss of 28% of the data), and doesn’t include any 2D fixes.

If we look at only the locations within 30 meters to match raster data resolution, we see that this data includes DOP as high as 18.5 and does not include any 2D data. The fact that these more accurate locations also include such high DOP values is unexpected. I’m not sure what the best cutoff value is based off of these plots.

Ex 3 Accuracy Assessment of Woody Debris Classifier

  1. For exercise 3, I am interested in the relationship between accuracy of wood detection using a random forest classifier and distance from training samples used to train the classifier. Specifically, do accuracy metrics (i.e. accuracy, kappa, Type I and Type II error) change with distance to training area? To train the classifier, I selected a 3.5-acre area near the center of my area of interest. I selected this area because I felt it had a variety of features—islands, wood jams, shadows, woody debris, rocks, etc.—representative of features found throughout the 150 acres of the site. I hypothesize that kappa will decrease in sample plots that are further from the training area.
  2. I have not executed the approach I intend to use because I am still in the process of data wrangling and creating confusion matrices for sample plots (I am attempting to batch clip plots in ArcGIS Pro, and merge rasters into a 2-band raster stack with reference data in one band and predictions in the other band), however, I ultimately intend to use a similar approach as I implemented in exercise 2 using simple linear regression in RStudio and geographically weighted regression in ArcGIS Pro.
  3. I will calculate distance from sample plot centers to training area using the near tool in ArcGIS Pro. This distance will be the explanatory variable in a linear model, and kappa will be the dependent variable. I would like to append this data in tabular format as illustrated in Table 1 below.

Table 1

Plot IDKappaAccuracyDistance to Train Area

I will calculate kappa and distance to training area (ideally for 72 sampled plots). Then, I will use these variables in a simple linear regression in RStudio and a Geographically Weighted Regression in ArcGIS Pro.

4. I anticipate the results of this analysis will indicate a decrease in kappa as plots are located further form the training area. I hope to see this indicated by a low p-value in SLR. The GWR may suggest there are areas of the site where the classifier performed poorly, and I believe this will be likely due to lighting conditions in the area (either shadows, or very high reflectance on woody debris due to sun angle). If there is time, I would like to assess the relationship between kappa and wood area because I currently cite Landis & Koch 1977 to assess kapa values. This paper refers to a somewhat arbitrary scale for assessing the accuracy of a classifier based on kappa. It would be interesting to see the effect kappa has on wood area estimates.

5. I will critique the method when I have results. My early impressions are this methodology is a little tedious due to number of plots. I am having a lab assistant manually delineate the wood polygons in plots that were also sampled on the ground. These plots will then be rasterized and clipped. I also need to clip the classified raster by these plots (there is a batch clip tool available in arc). From here, I will write a loop in R that accepts a path, locates tiffs in the folder, and calculates confusion matrix metrics for each plot.


Landis, J. R. and Koch, G. G. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1): 159. doi: 10.2307/2529310

Ex. 3 – Spatial Model Specification

1. Question that you asked

For this week’s exercise I was interested in refining my spatial model specifications based on my findings of previous weeks. I was interested in asking the question: 

Can I determine a spatial linear model to describe the relationship between county out migration flows and flood claim payments and claim rates?

Significance in context: Modeling the relationship between county migration flows and flood claims allows for discussions of the implications of flood insurance on population mobility and how natural hazards interact with the US population landscape in general. 

2. Name of the tool or approach that you used.

In order to test the fit of two model frameworks (the spatial lag and spatial error model) I used the Lagrange Multiplier Test which tests spatial dependence in model fit for these two models under typical and robust forms. I then fit model variables to the preferred model form using the “spatialreg” package in R and map the residuals to observe spatial autocorrelation. 

3. Brief description of steps you followed to complete the analysis.

I started by subsetting my data into only counties in Gulf Coast states (Texas, Louisiana, Mississippi,Alabama, and Florida). I then created 2 different versions of this subset of states for a high flood instance year (2005) and a low flood instance year (2003). 

Next, I created an Ordinary Least Squares model where log base 10 county out-flow is a function of log base 10 total amount paid and claim rate per county (log(out_flow) ~ log(total_amount_paid) + claim_rate). Following this, I ran a Lagrange Multiplier Test and used the outputs to determine the preferred alternative model based on spatial dependence. I applied the following framework created by Luc Anselin (2005) to determine the preferred alternative model.

I then proceeded in running the preferred model, which, in my case was the spatial error model. The model structure is composed of the same variables described in the OLS version. I then map the residuals by county of the error model and the residuals of the original OLS model for comparison.

4. Brief description of results you obtained.

The spatial error model results can be seen below with mapped residuals alongside for 2005 and 2003. 

The OLS mapped residuals can be seen below for comparison for 2005 only. 

These results indicate there is statistical relationship in insurance payment amounts that is positively associated with county out flows. This means that as payment amounts increase we expect there to be more people leaving. Interestingly enough there was a notable difference in 2005 and 2003 for the coefficient estimate associated claim rate and I would attribute this to the relatively lower claim rate in 2003 compared with 2005. Next, I will be interested to see if this negative (not statistically significant) relationship remains when I expand the model and test for all years. Notably, we see the model is much improved in spatial dependence by using the spatial error model which can be seen comparing the residuals mapped in the OLS map and the Error maps. 

5. Critique of the method – what was useful, what was not?

This method was very useful in helping me find a model specification that more precisely fits the data without spatial dependence. I will have to determine if the same model structure is viable at a larger scale spatially and temporally next. 

The only cons of this approach is the extensive research required before understanding the data and diagnostic test results well enough to proceed and communicate the interpretations.

Velocity Along Ice Flows vs. Rate of Terminus Retreat Among Greenland Outlet Glaciers (Pt. III)


I initiated exercise 3 expecting to glean useful information from GWR and global Moran’s I experiments, but preliminary exploration of these tools indicated that they wouldn’t be particularly useful to me given how my data is currently set up. As a result, I decided to go ahead and attempt my own version of a network analysis, to better characterize ice velocity around the borders of Greenland. In exercise 2, I simply pulled velocity data from right by glacier termini- which is not necessarily the be all and end all of glacier velocity. My time is better spent trying to develop a more robust velocity dataset, and returning to more conventional techniques for interpreting statistical relationships. Furthermore, I went to great lengths to improve data quality (fewer glaciers with gaps in their datasets…) in hopes that this manifest in clearer statistical relationships.


1)How does velocity along entire Greenland ICE FLOWS (or ice streams) correlate to terminus retreat rates for their associated outlet glaciers?

In simpler terms: Do glaciers that are fed by larger/ faster ice flows display a greater likelihood of rapid retreat?

2) Does the relative annual change in ice flow velocity (variability in %) correlate to relative annual change in glacier retreat (variability in %)?

In simpler terms: if ice velocity in a location jumps from one year to the next, do we see a greater likelihood of terminus retreat increasing as well?


Given time constraints for the class, and my particular questions, I had to use some intuition to rule out various paths forward. Exploratory statistics in my velocity and retreat datasets that there is really no relevant geospatial clustering. (sure, high velocity clusters around valleys, but this is where all the outlet glaciers occur anyways, so that is relatively meaningless/ obvious). It seems clear to me that I need to better represent velocity data, on a wider spatial scale, in order to accurately represent the relationship between ice velocity and terminus retreat. To do this, I attempted a rudimentary “network analysis” to incorporate velocity data along entire flow lines of ice. This seems far more valuable to me than fiddling with meaningless GWR and correlogram data on relatively messy datasets.

APPROACH: “Characterizing Ice Stream Velocity”:

ArcGIS Pro Tools used:

Drawing polylines, Generate Points along Lines, Extract Multivalues to Points, Summary Statistics, Join Field, Buffer, Intersect, Table to Excel

Excel Tools used:

Time series, R- squared simple regressions, normalization, first derivatives


1.DRAWING FLOW LINES: A feature class was created and populated with flow lines. Flow lines were drawn using the feature editor, and using visual queues from velocity datasets. Care was taken to ensued flow lines spanned the entire study period (now 2008-2016) so as not to have gaps in my dataset. If glaciers did not appear to be associated with ice flows, fell within a velocity satellite “dead zone”  or did not have terminus data for most years, they were excluded from the dataset. The total 240 glacier dataset was shrunk down to 140 glaciers following these limitations.

Figure 1: Example of flow lines drawn in ice streams.
Figure 2: Complete set of drawn flow lines. Green lines represent digitized ice streams, and red (lines) represent available terminus data. This figure just represents the “trimming” of data from 240 glaciers to 113 glaciers.

2)GENERATE POINTS ALONG LINES tool was used to split each flow line into 20 parts (5% chunks). Each point could then be used to extract underlying velocity data from annual velocity rasters.

Figure 3: The GENERATE POINTS ALONG LINES TOOL was used to generate 20 points along each digitized flow line.

3)MULTIVALUES TO POINTS tool was used to extract velocity data from velocity rasters spanning the study period. This data is saved in the point attribute table.

4)SUMMARY STATISTICS tool was used to summarize these velocities. For each glacier, the MEAN and the SUM of velocity along flow line points was recorded.

5)JOIN FIELD tool was used to apply these new MEAN and SUM variables to the flow line attribute table.

6)BUFFER tool was used to expand previous glacier point data representing glacier ID, as well as previously calculated terminus retreat data over the study period.

7)INTERSECT tool was used to merge the flow line dataset (containing new velocity data) with the glacier point dataset (containing old retreat data). This was to consolidate data into one attribute table.

8)TABLE TO EXCEL tool was used to export the attribute table for use in other programs.


An exhaustive array of graphs and statistical relationships were generated in order to try to understand how ice velocity and terminus retreat interact, but I will share only the most insightful in this blog post. For the most part, the results are very similar to those from exercise 2, which isolated velocity data to point sources right near termini.

Figure 4: Absolute terminus retreat (m) plotted with absolute ice velocity along flow lines (mean) averaged over all of Greenland.

When averaged across the entire island, there appears to be a relatively strong correlation between the average ice flow velocity and the average terminus retreat, indicating that these two variables are indeed connected on an annual timescale. However, considering there are only 5 discontinuous years to plot, this conclusion would benefit from a greater pool of sample years. With an R-squared of 0.63, we can with large uncertainty say that on average, years that experience faster glacier velocity in ice streams also experience somewhat faster retreat at their associated glacier termini. This is a very similar conclusion (with a very similar R-squared) to what was reached in exercise 2 using point-sourced velocity data. It is worth noting that from 2008 to the 2016-2017 season, there does not appear to be an increasing trend in terminus retreat or in velocity along flow lines, which does not follow local temperature trends. This indicates a potential lag in response time larger than that of the study period (not at all surprising).

Figure 5: The CHANGE in distance retreated plotted with the CHANGE in velocity between sample periods, averaged over all of Greenland.

In addition to plotting the absolute magnitudes of these two variables, I wanted to plot the temporal change. This was a bit unorthodoxed given the discontinuous nature of the sample years, but it still provides valuable information. Again averaged over the entire island we see another trend: Glaciers that experience an increase in velocity from t1 to t2 are more likely to increase their rate of terminus retreat from t1 to t2. These variables can be conceptualized as the first derivatives of the records provided above- focusing on the rate of change rather than the magnitude of change itself.

Figure 6: There is no clear correlation between the change in ice velocity and the change in retreat rate, in any sample period with available data.

While annual data averaged over the entire island showed clear trends, much like it did last time, individual glaciers for every given sample year tell a different story. Not displayed here: there was no trend between absolute velocity along flow lines (MEAN OR SUM) and absolute distance retreated in any given year. As you can see above, there is also virtually no trend between their first derivatives either. Glaciers with bigger jumps in velocity from t1 to t2 do not necessarily experience greater jumps in rate of retreat from t1 to t2. This goes against my initial hypothesis from the start of the project (that faster moving glaciers also retreat faster.

Figure 7: Normalizing the changes in velocity and in retreat rate does not improve the statistical relationship.

Normalizing these changes over time also does not induce a visible trend. The figure above is merely an example figure, but all four time jumps show no clear trend even on a relative scale. However these regression plots could theoretically tell us something about variability in any given year if the sample periods were better represented (consistent time jumps). For example, glacier velocity seems to have been much more variable in the 2008-2014 period, compared to 2014-2016. However, this is not a fair comparison, because the first 2 show a change over several years, whereas the second 2 are actual year to year changes. (its also not fair to divide the first 2 over their study period, because that assumes the changes were incremental and constant, which they most certainly were not.)


The problem with my approach is that I do not have the necessary data to draw a reasonable conclusion. There are just far too many temporal gaps in the dataset(which span too short of a time period to observe trends in glaciers- which have notoriously slow response times). I did my best to eliminate glaciers that did not have all the necessary data (whittling down from 240 glaciers to 113 based on available data), but I cannot generate data for years that were simply not provided. As a result, my conclusions can really only be as comprehensive as the data allows.

As Julia brought up in my previous draft, my hypothesis also applies to a specific subset of glaciers- those with wet basal conditions. It is prudent to presume that moving forward, more and more outlet glaciers will develop these conditions, but once again, glaciers are slow to respond to climate responses. It is also quite hard to find a dataset that determines which of these glaciers are wet-based vs. dry based: sure we know for a fact that certain glaciers may or may not be wet-based, but glaciers where these conditions are well constrained are not just scattered across the continent. One idea for a future project would be to limit this analysis only to glaciers we know have wet basal conditions. As of right now, it seems those are the only glaciers velocity might have predictive power over to begin with.

Finally, the hypothesis can simply be wrong. I was operating based on an assumption that vulnerable glaciers tend to heat up and develop wet beds- which cause them to flow faster. These same glaciers would hypothetically also be more vulnerable to retreat at the terminus because they have an assumed negative mass balance. An alternative mechanism- as Julia pointed out, could be that glaciers that FLOW faster and are indeed melting might not necessarily exhibit terminus retreat, because that lost “real estate” at the terminus would be quickly replenished by new ice from further inland. That too is a very valid hypothetical mechanism.

Moving forward, I think the only thing I could milk from this dataset without starting an entirely new project is to try to identify regional trends. I cannot distinguish glaciers based on their basal conditions with the available data, but I CAN distinguish them based on their location. Perhaps the different regions of Greenland glaciers are experiencing different trends. The final step I will complete will be to divide the data I produced above into 6 separate regional groups, indicated below (over GRACE- mass anomaly data) and recreate the graphs above to distinguish any trends in different regions, if there are any present. It could be reasonable to assume that some of these regions might have higher proportions of wet beds than others.

Figure 8: For the final week of class, I propose to separate data based on regions, to see if any clear trends emerge in particular geographic areas (Overlaid over GRACE 2013 mass anomaly data- our best reference for greenland mass balance.)

Temporal Variability in Cultural and Natural Deposition

How is Time Measured?

              Up until this point, I have worked through 2-Dimensional representations of the artifact assemblage from Cooper’s Ferry, Idaho. My first blog post contained visualizations of the artifacts within 3-Dimensions and 2-Dimensions by using kernel density. These graphics were used to identify intervals at this archaeological site that appear to be separated by an extended period of sediment deposition and/or time. Under the hypothesis that human behavior at the site differs between these “gaps” in time, I chose five unique intervals to examine. Within a 1-meter depth, I chose the intervals 0-25cm, 25-50cm, 50-60cm, 60-74cm, and 74-100cm. Geoff Bailey (2007), when discussing time perspectivism, states that archaeological assemblages may be best studied through multiple scopes in time and space. The idea represented here alludes to a much more formalized theory in fractals (Mandlebrot 1967; Plotnick 1986; Sadler 1999). In a general sense, a fractal is a pattern that is repeated at different scales. Mandlebrot (1967) discusses how fractals can be used to define the coastline of Britain at different spatial scales. The larger scale uses fewer fractals, or lines, to define the coastline of Britain, whereas a smaller scale requires more fractals because there is more identifiable detail in the coastline when one “zooms in” on a map. My question for this blog post mirrors this discussion on fractals; the main difference in my study is that, instead of looking at coast lines, I wish to understand how the temporal pattern can be observed within the archaeological assemblage between and within each interval listed above. To explore this inquiry, I used a model developed by Maarten Blaauw and J. Christen that has been named after Francis Bacon (AD 1561-1626).

Question: How is sediment accumulation rate associated with artifact deposition through different temporal scales at the archaeological site in Cooper’s Ferry, Idaho and how can these rates of deposition imply natural and cultural transformations on archaeological deposits?


              For the purpose of thinking about how the artifacts in this assemblage are separated by time, I explored the concept of age-depth models that I was directed to with the assistance of Erin Peck. Age-depth models, with the proper input, facilitate the understanding of depositional rates in sedimentation throughout a core of sediment. The primary input for the depth-age model I used is C14 dated materials that was recovered from the archaeological context in question. I utilized the ‘rbacon’ package built for the R environment. Following the guidance and recommendations given by Blaauw et al (2018), I implemented the carbon dates I had access to (fig. 1) and produced an age-depth model correlating to those known dates (fig. 2).

age cal BPerrordepth (cm)
Figure 1. This chart shows the 8 C14 dates that have been collected from the context of the current archaeological assemblage. Dates and associated errors are given in calibrate years (cal) Before Present (BP) at the related depth the material was recovered (cm).
Figure 2. Age depth model produced from the 8 C14 dates. The top left image represents Monte-Carlo simulations of the data. Top middle image shows a histogram of sediment accumulation at a rate of 50yrs/cm. Top right image shows a similar histogram of ‘memory’ which, from my understanding, relates to the associated prior information with the strength to produce accurate inferential data. The central image shows the predicted sediment accumulation for each age per depth. Shaded area represents a 95% confidence interval, blue figures represent the predicted ages of the dated material.

How to Cook the Bacon

              There is a large number of components associated with creating an age-depth model within the ‘rbacon’ package. I have explored multiple different approaches with certain settings set to ‘on’ and others set to ‘off’. Now, one can simply add the dates recovered from the context in question and run the program, but there is a lot of information to think about and consider implementing to produce stronger and more accurate age-depths. The main algorithm behind this model production (fig. 2) consists of Bayesian statistics which takes prior knowledge and information in account for the improvement of inferential statistical analyses (Blaauw et al 2018).  One of the most important inclusions to consider is known or hypothesized hiatuses. [I am sure you were waiting until fractals came back into conversation] Hiatuses have been shown by Sadler (1999) and Plotnick (1986) to create a process of temporary deposition followed by an erosional event. A hiatus can be thought of as a period of no sedimentation, but it can also lead to a net loss in sediment via erosional events. When these patterns appear in the sediment record at small time frames it is understood through a net-gain/net-loss system. When the time frame is broadened, the pattern identified in the record appears to represent a net-gain and net-loss sedimentation rate of zero. When this model is plotted on a graph, the figure resembles stairs when the x-axis represents time and the y-axis represents depositional rates. Thus, the horizontal segments of the stairs resemble a period of no deposition within a broad scale of time, henceforth a hiatus is brought into the picture [and fractals are born]. In the archaeological sciences, Schiffer (1987) hypothesizes that these types of events can displace and reposition cultural materials through both natural and cultural transformation of the area. Therefore, it is important to implement as many known or hypothesized dates for a hiatus or other surface altering events (i.e. volcanic eruptions, landslides, etc). Unfortunately, there is no known or hypothesized hiatus within the context of this archaeological assemblage because it requires a more in-depth look at the geological processes during this time in the region; something that might be great for future work and interpretation.

              Thinking about hiatuses in terms of sedimentation and cultural activity are a separate, yet intriguing, concept to consider in this instance. A hiatus in sedimentation can often represent relative stability in the land form and allow for a pedostratigraphic horizon to begin forming. Pedostratigraphic layers, from a geological view, are horizons within sediment that allow soils and organic materials to form. [People really like soil and do not always enjoy rocky or sandy surfaces] The presence of pedostratigraphic soils often represent ancient surfaces that have been buried by the sediment of the past 6 million days. These ancient surfaces have a high probability of containing cultural materials because soils seem to attract human activities. To summarize, when we have a hiatus in sedimentation, we have stability in the land form, so we get soil formation, and henceforth a lot of human activity. [In a simple format, sediment hiatus = lots of people] On the other hand, a hiatus in human activity can sometimes, not always, represent instability in the land form and thus, rapid sedimentation. [From this, high sedimentation rate = minimal rates of human activity] Now, can dense human activity/occupation indicate a hiatus in sedimentation, and can rapid sedimentation predict a hiatus in human activity? The result from the age-depth model production will be discussed in the next section and I will briefly explain how the scope and time frame of investigation [spoiler alert: the model does not seem viable to answer my question] needs to be much tighter than the current results to determine how sedimentation at this site affected human activity.

What does the Bacon look like When it is Done?

              As you can see in figure 2, there is a pretty straightforward linear relationship between the dates I provided and the predicted depth per time interval. As previously mentioned, the inclusion of many more dates would further define the sedimentation rate through time (Blaauw et al 2018). Unfortunately, the process of collecting further dates may be beyond the scope [pun intended] of my thesis project. In order to answer my question about the temporal variation in human behavior, I would like to have dates that are much closer together and can identify patterns of sedimentation by century, at the very least since human movement and activity varies so sporadically through time. One aspect that may help to implement human activities with age-depth models is derived from the proposed question I left in the previous section associated with choosing large human occupational evidence as a hiatus in sedimentation. Regardless, this perspective may prove to be beyond the goals of my current research endeavors.

              So, what does this current model (fig. 2) show? Within the parameters of the main age-depth function ‘Bacon’, I assigned the minimum depth at 0cm and maximum depth at 90cm. Furthermore, the minimum date was set to 13,000 cal BP and the maximum set to 16,500 cal BP. This was done because any date beyond these lies outside the archaeological context containing the current artifacts and dates. In the model, the light blue shapes indicate the statistically estimated dates that came from the 8 C14 samples (fig. 1). The shaded region around the red dashed line represents confidence intervals. It is clear that the more dates the model has, the tighter the confidence interval will be. This is evident by the expansion in the confidence interval ranging from 60-80cm. Ultimately, with an accumulation rate of 50yrs/cm, there appears to be a slight increase in deposition starting at a 25cm depth and a similar decrease in sediment deposition around the 55cm depth. Of course, this is just the surface of capabilities within the ‘rbacon’ package. From here, one would be able to identify deposition rate by year or by depth by producing graphics representing changes in depositional patterns (fig. 3). Figure 3 can be interpreted as the sediment accumulation rate per depth in centimetres and age in cal BP, respectively.

              Ultimately, I am unsure how I will integrate this analysis into the interpretations of human behavior and occupancy at this archaeological site because, as I have noted previously, the main requirement for small scale studies requires a large amount of known dates for specific events or materials. That being said, the theory and understanding of how sediments may have accumulated throughout the time at this site is very important to consider when striving to make archaeological inferences and arguments. In the near future, I will begin to examine the temporal scope of artifact placement with sedimentation in mind. For this analysis, I will assume vertical displacement represents temporal displacement and analyze the rate of artifact deposition by depth to determine aggregates through time. Thus, allowing me to examine an observed temporal displacement of artifacts which differs from blog post 1 and 2  by adding the third dimensions, that is time.

Figure 3. The chart on the left represents accumulation rates for sediment by depth (cm). The dashed line shows the confidence interval. The chart on the right represents a similar pattern in accumulation rates by date (cal BP).


Bailey, Geoff. “Time perspectives, palimpsests and the archaeology of time.” Journal of anthropological archaeology 26, no. 2 (2007): 198-223.

Blaauw, Maarten, J. Andrés Christen, K. D. Bennett, and Paula J. Reimer. “Double the dates and go for Bayes—Impacts of model choice, dating density and quality on chronologies.” Quaternary Science Reviews 188 (2018): 58-66.

Mandelbrot, Benoit. “How long is the coast of Britain? Statistical self-similarity and fractional dimension.” science 156, no. 3775 (1967): 636-638.

Plotnick, Roy E. “A fractal model for the distribution of stratigraphic hiatuses.” The Journal of Geology 94, no. 6 (1986): 885-890.

Sadler, P. M. “The influence of hiatuses on sediment accumulation rates.” In GeoResearch Forum, vol. 5, no. 1, pp. 15-40. 1999.

Schiffer, Michael B. “Formation processes of the archaeological record.” (1987).

A Century of Oregon Salt Marsh Expansion and Contraction (Part III Draft)


As a reminder, my ultimate research goal is to quantitatively characterize salt marsh volumetric change from 1939 to the present within Oregon estuaries using a combination of sediment cores, for which I have excess 210Pb-derived sediment accumulation rates, and historical aerial photographs (HAPs), from which I plan to measure horizontal expansion and retreat. For my Exercise 1, I successfully georeferenced and digitized HAPs from Alsea Bay that are roughly decadal, spanning 1939 to 1991. The next major step was analyzing my data using the USGS Digital Shoreline Analysis System (DSAS; Himmelstoss et al. 2018). However, prior to analyzing my data in the DSAS framework, I had to first tie up a number of loose ends, which I did for my Exercise 2; these included determining an uncertainty associated with each shoreline layer, attempting to account for differences in shoreline resolutions related to different spectral resolutions of each aerial photograph, and determining modern shorelines (2001, 2009, and 2018). My next tasks included answering the following questions:


  1. What final edits must be made to successfully run DSAS on my Alsea Bay shoreline layer?
  2. What are the net rates of change in Alsea Bay salt marshes? (sub-question: from those that DSAS calculates, what are the most appropriate statistics to characterize rates of change?)  
  3. How does the Alsea Bay salt marsh shoreline change over time?


Step 1: Reduce unnecessary complexity in the shoreline layer

First, I removed all upslope edges on the fringing marsh since these are not of interest and all digitized using the same PMEP boundaries. I then removed all unnecessary complexity within the salt marsh complexes, especially within the northern portion of the larger island and the southern portion of the fringing marsh.

Step 2: Determine best DSAS settings using trial and error

Next I began to play around with the DSAS settings to produce the best results (assessed visually). These included:

  1. transect search distance to remove transects bridging the gap between the largest and smallest islands. This may require running each marsh complex separately (which I would rather not due so that all transect symbology are on the same scale). 
  2. Play around with transect smoothing to reduce the number of intersecting transects in areas with complex shoreline.  
Figure 1. Edited long-term rates of salt marsh change for Alsea Bay (LRR = linear regression rate).

Step 3: Calculate DSAS statistics

To calculate DSAS statistics, I selected the Calculate Rates tool, selected “select all” statistics, applied an intersection threshold of a minimum of 6 shorelines, and changed the confidence interval to 95%. DSAS reports the following statistics:

  1. Net Shoreline Movement (NSM): NSM is the distance between the oldest and youngest shorelines
  2. Shoreline Change Envelope (SCE): SCE is the greatest distance among all the shorelines that intersect a given transect, these values are therefore always positive
  3. End Point Rate (EPR): EPR is the NSM value divided by the time elapsed between the oldest and youngest shoreline; even if the shoreline contracts, then expands so that the maximum distance (SCE) is not the same as the distance between the oldest and youngest layers, EPR and NSM values are still calculated based on the oldest and youngest layers
  4. Linear Regression Rate (LRR): LRR is calculated by fitting a least squares regression line to all shoreline points for a transect; all transect data is used in LRR regardless of changes in shoreline trend or accuracy; this method is susceptible to outlier effects and tends to underestimate the rate of change relative to EPR. DSAS additionally provides the standard error estimate (LSE), the standard error of the slope given the selected confidence interval (LCI), and the R-squared value (LR2) for each LRR value    
  5. Weighted Linear Regression (WLR): WLR is very similar to LRR except it gives greater weight to transect data points for which the position uncertainty is smaller. DSAS additionally provides the standard error estimate (WSE), the standard error of the slope given the selected confidence interval (WCI), and the R-squared value (WR2) for each LRR value      

Step 4: Rerun all previous steps with sets of dates from 1939 to 2018

Using all of the same steps DSAS, I reran each group of shoreline layers from 1939 to 2018. Following each run, I had to clip each rate layer using the DSAS Data visualization tool to fit the size of the transects specific to the timeframe of interest. I additionally changed the color ramps to be consistent for all 8 time points. LRR and WLR are not calculated for only two shoreline layers so EPR will be the primarly statistic I use for this time series analysis.

Step 5: Analyze DSAS statistics

I exported the DSAS rates files using the ArcMap Table to Excel tool. I then plotted and analyzed the data in MATLAB. I made histograms with Kernel smoothing function fits using the function histfit. I chose kernel smoothing over a normal distribution fit because the normal distribution fit did not effectively display the uneven tails or the humps on the sides of the NSM or EPR plots. I additionally made box and whisker plots of the EPRs over time for each marsh complex: fringing, large island, and medium island.   

Preliminary Results

Figure 2. Histograms of the DSAS statistics calculated for all Alsea Bay salt marshes of interest.

The histograms of the DSAS net shoreline movement for the entirety of Alsea bay approximates a normal, gaussian curve, with the majority of points falling the 0 m, indicating net erosion of the Alsea Bay salt marshes from 1939 to 2018. This could indicate widespread drowning of these marshes. These results confirm similar patterns of net vertical accretion rates lower than 20th relative sea level rise rates for the bay (Peck et al. 2020). Rates of growth (EPR, LRR, WLR) for the entire bay are additionally approximate a normal distribution and tend to be negative. LRR and WLR are likely better predictors of long-term rates of change since they better account for intermittent periods of erosion and contraction. Though the uncertainty does not vary much between shoreline year (~1 to 4 m), I will likely use WLR to ensure uncertainty is incorporated appropriately.

Figure 3. Histograms of the DSAS statistics calculated for individual Alsea Bay salt marshes of interest.

Preliminary results comparing the three salt marsh complexes display similar patterns as the total bay histograms. The large island values tend to display a greater range of movements and rates, especially towards growth. Despite these growth rates in the large island salt marsh, the majority of observations still skew towards erosion.     

Preliminary analysis of the box plots shows that the EPRs have changed through time for each salt marsh complex. Box plots were likely not the best way to visualize this data so I plan to focus on this moving forward.

Figure 4. Box plots of EPRs over time for each individual salt marsh within Alsea Bay.


I have no negative critique of my methods so far for Exercise 3. I’m very excited with how the project is progressing. I think DSAS is a great method for addressing my questions. I’m still working on the best way to plot the figures, however. One next step I envision is creating a GIF displaying the colored transects that cycles through each shoreline layer from 1939 to 2018. Additionally, I would like to plot the mean EPRs for each time step for each salt marsh complex as bar plots with ranges. I think this will display trends better. I additionally think it’s time to start incorporating my salt marsh accretion data!     


Himmelstoss, E.A., Henderson, R.E., Kratzmann, M.G., & Farris, A.S. (2018). Digital Shoreline Analysis System (DSAS) version 5.0 user guide: U.S. Geological Survey Open-File Report 2018–1179, 110 p., ofr20181179.

Peck, E. K., Wheatcroft, R. A., & Brophy, L. S. (2020). Controls on sediment accretion and blue carbon burial in tidal saline wetlands: Insights from the Oregon coast, USA. Journal of Geophysical Research: Biogeosciences, 125(2), e2019JG005464.