Ultimately I am interested in how topography hinder GPS collar fixes from being obtained, but for the purposes of this exercise I asked: How does sky availability change with scale?
I already had a raster of my study area for sky availability, which is the proportion of sky unobstructed by topography (it does not take vegetation into account). I loaded the raster into R and used the function “focal” from the Raster package to perform mean moving window analysis. Focal analysis computes an output value of each pixel using a neighborhood. A moving window is a rectangular shape that whose function is applied to each pixel in the raster, and is specified in the “weights” argument in the R code. The rectangle used is the neighborhood, and can vary in size. The larger the neighborhood, the more smooth the values will become. Alternatively, I could have used ArcGIS’s focal statistics tool. Below is an example of the R code I used for a moving window size of 3 pixels by 3 pixels (one pixel on either side of the focal pixel):
I compared the 7 different sizes of focal windows to the original sky availability values from 54 locations within part of my study area (western slopes of the Cascade Mountain Range). The neighborhood sizes were 3, 5, 9, 11, 17, 21 and 25. The locations I used were the same locations as test sites where I will be assessing GPS collar accuracy and rate of fix success in the future.
After exporting the new focal window raster into ArcGIS, I extracted values from each of the 7 focal rasters, and the raw sky availability data to assess how these values varied with different smoothing parameters. To accomplish this I used the Extract Multi-Values to Points tool in ArcGIS. Now that the moving window values were now added to the test site locations shapefile, I exported this shapefile as a .csv file where I made graphs to compare and visualize the shift in values.
Looking at sky availability values across study sites made comparisons difficult to identify, so I averaged the values for each neighborhood size across test sites and produced a line graph. This graphic shows a threshold relationship between sky availability and neighborhood size. Proportion of unobstructed sky values are high for smaller window sizes, and then sharply decreases after a neighborhood size of 9.
Since the resolution of these rasters are 10x10m, my interpretation of this is that sky availability seems stable within 45 meters (90x90m), but larger windows may smooth the values too much and this is the point where we lose detail that may influence GPS collar data. This was a useful exercise in learning how to perform focal statistics, but I’m not sure how to proceed from here to apply these results to GPS fix success.
Oregon ’s west coast has important highways that connect Washington and California, and the impact of landslides on traffic is huge. Landslides often occur in certain areas, and some areas have national highways or frequently used highways. Once a landslide occurs, travelers will spend a lot of time detouring, but this is very inefficient. In my research, I hope to find the most sensitive area by analyzing the spatial pattern of landslide susceptibility map and analyze the heat map. Mark the dangerous area and make a high-risk area map. Base on that, the cost, severity, and time of landslides are analyzed to obtain more data to analyze the impact of landslides. In exercise1, the annual cost hot spot analysis is generating.
I think hot spot analysis is most suitable for point data of Oregon State Landslides, so Hot Spot Analysis tool in ArcGIS Pro will be used.
Basically the step of generating hot spot analysis in ArcGIS Pro is very simple. First of all, add point data of Oregon Landslide in ArcGIS Pro. Then open the tool box, search Hot Spot Analysis in tool box, then the tool are shown as blow.
For Input Feature Class, the Point data should be added. For Input Field, I choose Annual Cost for analyzing.
Then, click Run button, the hotspot based on annual cost are generated.
The map records the landslides that occurred in Oregon since 1990. The hot spot analysis tool was used to analyze the map. The annual cost was used as the parameter of the analyze. Through observation, it is found that NW Oregon most often occurs landslides, but SW Oregon spends the most on landslides. By analyzing susceptibility map, the entire West coast is high susceptibility to occur landslide. However, most highway or important traffic in west coast, the landslide protectation need more budget. As shown in figure 1, NW Oregon has less cost than SW. Oregon government should add budget in Northwest area to prevent landslde.
For hot spot analysis map, the red points are hot sopt, blue points are cold spot, which represent the similarity of landslide by year.
As shown in figure 3, I have chosen NW Oregon as further analysis by location. In figure 3, lower right has many landslide occured, the hot spot is shown as orenge color in this area, which means landslide in this area are relative similar or the survey only occured in this area. Those area sround by Corvallis, Newport and Monroe. In this area, the further analysis is going to generate due to the area has many roads, highway and resident area.
In the analysis process, the tools in ArcGIS Pro are very efficient, so it does not take much time to generate hot spots. Hot spot is useful in this analysis, the research is about landslide distributions, further information also can be generated by hot spot analysis, for example, annual cost, volume of landslide or landslide area. So I think the tools in ArcGIS are very effective. However, before using the Hot Spot Analysis tool, I failed to use the Create Space Time Cube tool, so there is no more detailed hot spot generation. It can be improved in later exercises.
In exercise1, due to the different soil characteristics of various regions in Oregon, landslides mainly occur in certain specific areas. In the point data, these points represent the place where the landslide occurred. In Exercise 1, these points are attracted to each other in high-risk areas. Forthermore, point data and landslide susceptibility maps are also associated. For red and blue high-risk areas, they are mutually attractive with point data, and for green low-risk areas, they are mutually exclusive with point data.
Exercise 1: Image classification for detecting seedlings based on UAV derived imagery
The question that is addressing in this study
Detection of trees is a primary requirement for most forest management practices and forestry research applications. In modern-day remote sensing techniques provide time and cost-effective tree detection capabilities compared to conventional tree detection methods. Notably, the image classification approach considers as one of the vital tools associated with remote sensing applications (Lu and Weng, 2007). During the last few decades, scientists and practitioners developed various image classification methods to detect the landcover land-use features using remotely sensed data (Franklin et al., 2002; Gong and Howarth, 1992; Pal and Mather, 2003). However, due to various factors such as terrain conditions and its complexity, remote sensing platform used and its performance, remotely sensed image classification, image processing tools, and classification approaches used, the desired output of image classification may change/ affects (Lu and Weng, 2007). Therefore, in this part of the study, I am interested in evaluating “how feasible is detecting individual seedling crown using the unmanned aerial vehicle (UAV) based RGB imagery by applying supervised and unsupervised classification techniques?”
Tree seedlings are considered as the individual entities and attraction or repulsion of seedling may depend on how they arrange/planted spacially. Additionally, attraction or repulsion of seedlings can be described based on their spatial arrangement. The spatial arrangement of the seedlings changes according to their regeneration prosses. On a broader scale, both natural and artificial regeneration processes can be observed in the main study area. A systematically arranged artificial regeneration seedlings represent attraction, while natural regeneration of seedling represents some kind of repulsion. However, for this particular study, the northwest part of the study area was selected where artificially regenerated seedlings were prominent. In this part of the study, approximately 450 m2 of two plots/ areas were selected (Fig.1) for image classification (Fig.1). Plot 1 used for collecting training data, and Plot 2 used for assessing the accuracy of image classification performed in this study.
2.Name of the tool or approach used and steps followed.
Image classification tools available in ArcMap 10.7:
Supervised image classification: Maximum Likelihood Algorithm
Unsupervised image classification: Iso Clustering approach
2.1 Agisoft Metashape
AgiSoft Metashape software used for producing orthomosaic images at the pre-possessing stage of UAV imagery. As shown in figure 2, collected UAV images (287 images) were aligned georeferenced. An additional step was performed to feed the GPS coordinates to each UAV imagery (Link1: Appendix 1). After georeferencing the images, the dense point cloud was produced. Depending on the desired image quality, availability of time, and data space, the building of the dese cloud can be carried out in five different ways (i.e., lowest, low, medium, high, highest). Generally, the required computing power and image processing time increase from lowest to highest. Finally, orthomosaic image produced and saved it as a .tif file. The produced orthomosiac image can be used in image classification in the next stage.
2.2.1 Supervised image classification
Collecting training data
The produced orthomosiac image was added to ArcMap 10.7 for image classification. Next, two different blocks were selected from the raw image as training and testing, Plot 1 and Plot 2, respectively (Fig.1). The training data were collected using the image classification tool available in ArcMap 10.7. Draw polygon option used to collect the training data, and data were collected by covering the entire area (Fig. 3). The training sample manager table used to merge the training data for each class, and a total of five classes were determined for this classification (Fig.3). The total number of polygons per class depending on the number of total classes that we are interested, generally, for better classification, the number of polygons for each class need to be equal to 10 times number of total classes (i.e., nu.of polygons class 1 = 10 x 5). Finally, the polygons were saved as a signature file, which is required for the image classification in the next stage.
Collecting testing data and image classification
To evaluate the performance of the supervised image classification, a set of testing data points collected over the testing area (Plot2) (Fig 4). Then collected reference points converted into pixels by using conversion tools (points to raster)option available in the ArcMap toolbox. Next, the raster file of plot 2 (Fig. 4 ) and signature file from the training data set (from plot 1) were used as input files to perform the Maximum Likelihood classification. Finally, the spatial analyst tool used to combine the produced classified image and the raster training data (from plot 2) to asses the image accuracy parameters. The output of this process used to create the confusion matrix and evaluate the accuracy of the classified image.
2.2.2 Unsupervised image classification
For the unsupervised classification approach, Iso clustering method was used, which is available under the classification tool category. For this approach, we can define the number of classes that we are expecting to see from unsupervised classification. Also, we can define the minimum class size and sample intervals that we are interested in the classification. A total of 8 classes were included to get a better classification scheme; however, the initial eight classes were reduced into three major classes by comparing the orthomosaic and classified image features (Fig. 5). Reclassify tool used to merge the similar classes to get the final classified image.
The spatial analyst tool used to combine the produced classified image and the raster training data (from plot 2) to asses the image accuracy parameters. The output of this process used to create the confusion matrix and evaluate the accuracy of the classified image
Generally, the error matrix approach is widely used for assessing the performance/accuracy of image classification (Foody, 2002). Further, the error matrix can be utilized to produce additional accuracy passement parameters such as overall accuracy, omission error, commission error, etc (Lu and Weng, 2007). In this study, the supervised image classification approach showed higher overall accuracy (94 %) compared to unsupervised classification (93%). Detection of tree seedlings in the supervised classification approach showed 94 % of user accuracy and 10 % commission error. However, tree seedling detection based on unsupervised image classification showed 4% and 94 % of commission error and user accuracy, respectively. Similarly, the supervised classification approach showed 97% of producer accuracy and 3% of omission error for the class of seedlings. Additionally, the unsupervised image classification approach showed 94% producer accuracy and 2 % of omission error for the class of seedlings (Fig.6).
Overall, estimated accuracies show promising results for identifying the tree seedlings using both supervised and unsupervised classification. However, estimated classification accuracy depends on some essential parameters, including sampling design, estimation, and analysis procedures (Stehman and Czaplewski, 1998). Additionally, sampling strategy also plays an important, especially in determining sample size, sample unit (either pixels or polygons), and sample design (Muller et al., 1998). Therefore, the observed results are highly dependent on the parameters described above. For this study, a random sampling approach was used to collect data that cover the study area to represents most of the objects in orthomosaic imagery.
Spatial resolution is another important factor that can affect the image classification performances. One of the main advantages of UAV imagery is fine resolution. Fine resolution images reduce mixed-pixel problems while providing more landcover information compared to medium and coarser resolution imagery (Lu and Weng, 2007). Hence, fine-resolution images have a higher potential to provide better image classification results. However, there are a few drawbacks associated with the fine resolution imagery, such as effects of shadows and high spatial variation within the same landcover class: these problems tended to reduce the accuracies of image classification (Cushnie, 1987). For example, the classifier may detect dark objects as trees seedlings because shadows may have similar pixel values as seedlings (Fig.7).
Another potential problem associated with UAV based orthomosaic imagery is errors associated with data processing. Especially in this study, I found there was some low detection of seedlings in both supervised and unsupervised classified images (Fig. 8). Figure 8 shows the coarse-scale pixelated area (circled) in RGB image and low detection of tree seedlings both unsupervised and supervised classifications, respectively.
By considering the spatial aspects of the classified images, we can notice that there were some tree seedlings were missing compared to the RGB image (Fig 9). This type of issues may occur due to poor health conditions of the seedlings or due to some drawbacks associated with image classification settings (i.e., the pixel value of these trees may be lower compared to the pixel value of healthy tree)
Overall, this image classification approach showed the potential use of both supervised and unsupervised classification to detect tree seedlings. The observed minor drawbacks can be addressed by implementing the quality of image processing and classification.
In addition to the above classification methods, I tried to use the random forest classification approach. By changing the input band values for the random forest algorithm, the detection of tree seedlings can be enhanced (Fig. A1). However, further research/details are required to perform better classification using random forest approach.
CUSHNIE, J.L., 1987. The interactive effect of spatial resolution and degree of internal variability within land-cover types on classification accuracies. Int. J. Remote Sens. 8, 15–29. https://doi.org/10.1080/01431168708948612
Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote Sens. Environ. 80, 185–201. https://doi.org/10.1016/S0034-4257(01)00295-4
Franklin, S.E., Peddle, D.R., Dechka, J.A., Stenhouse, G.B., 2002. Evidential reasoning with Landsat TM, DEM and GIS data for landcover classification in support of grizzly bear habitat mapping. Int. J. Remote Sens. 23, 4633–4652. https://doi.org/10.1080/01431160110113971
Are there nonrandom spatial patterns in reported illness across the country?
Using data from Pakistan for the years 2011, 2013 and 2014, I explore rates of respiratory illnesses for districts across the country. I further examine whether these patterns vary by age (children less than 5) and gender. Intuitively, I expect that underlying mechanisms such as poverty and pollution exposure will influence spatial clustering i.e. poorer areas may have greater incidence of illness just as areas exposed to greater pollution source would also have more people falling ill.
Attraction and Repulsion: One “attraction” mechanism is exposure to crop fires. Areas that have greater exposure may have higher rates of reported illness.
Name of the tool or approach that you used
I used the Getis-Ord Gi* (hotstop) statistic to assess whether there are statistically significant clustering of high/low rates of reported respiratory illnesses across districts in the country. This initial analysis can be used to gauge if areas are more or less susceptible so to later develop a causal understanding of the mechanisms that explain these illnesses.
Sources and Sinks: A uni variate analysis of the count of reported illness across districts provides limited information as less populated areas/districts will have a lower count than other more congested areas. To account for spatial patterns in population, I use the total number of people interviewed in each district as the base for calculating the rates of illness across districts.
Steps to complete the analysis
I use survey data from Pakistan to analyze individual responses on illness and created a binary response variable – 1: if illness was experienced and 0 otherwise. The steps for analyzing the spatial distribution of this variable were as follows:
Treating all family members in a household as independent respondents, I aggregated the count of illnesses across districts. I use this for choropleth maps (as in Figure 2) and also to calculate district level rates for the hotspot analysis.
For each district, I also aggregate the number of individuals interviewed and use this as the denominator to calculate the rate of illness across districts.
I used the collapse command in stata to get one observation per district from my entire dataset.
I then merged the above data with a district level shapefile and subsequently carried out the hotspot analysis.
The input field was the rates that I had calculated and used the fixed distance conceptualization of spatial relationships to compute the Getis-Org-Gi statistic.
At this stage, I am only looking at the distribution of my outcome variable. It is not possible to draw inference since the sampling is not representative of the district population. My purpose at this point is to explore the data and then move on to do an analysis of exposure to see if there is association between this outcome variable (respiratory illness) and a household’s proximity to fire locations.
The results of the hotspot analysis (Figure 1) show that the significant clusters exist in Punjab. These results make intuitive sense as the province has the largest share of the country’s population residing in it. The results also indicate that Agricultural crop residue burning (occurring primarily in Punjab) may be a worthwhile cause to explore further in the analysis.
Figure 2 is a mapping of point data using household addresses that may later be used to explore household level explanations of the causes of illness in a community. Figure 3 – 5 are spatial distributions based on aggregate as well as yearly data. It is evident that the patterns exhibited by the simple count of reported illnesses does not predict the presence of statistically significant clusters as presented in Figure 1. This is likely because the districts with higher count also have higher population density.
The analysis a present is a district level which does not reflect household level variation. Such broad aggregation does not tell us much about the neighborhood level factors (access to sanitation, type of energy source, exposure to pollution etc.) that are likely to be strongly associated with the current outcome variable. Another limitation of the present analysis is that the district level of aggregation limits the use of spatial regression due to small n – there are only 77 districts covered in the survey.
However, the hotspot analysis is useful to understand the distribution of respiratory illnesses. It may be more useful for my research if I am able to do a hotspot analysis of the residuals of my regression model.
A key to understanding the foraging patterns of predators, such as gray whales, it is important to have a grasp of the patterns of their prey, which in the case of gray whales is zooplankton. The distribution and abundance of zooplankton is often described as patchy and variable. Therefore, my question for exercise 1 was “What is the spatial and temporal pattern of zooplankton density in the inshore waters off of Port Orford, OR?”. In order to address this question, I tested using different types of interpolation on my point data of prey density sampled at 12 stations during the month of August in 2018.
Approaches/tools used and methods
To address the question, I wanted to explore various types of interpolation to see how the density of prey collected at my sampling stations mapped out to my entire study site to fill the gaps of where we do not have known density of prey. I had planned on trying to do this in ArcGIS (a program I am too familiar with), however due to slowness of remote connection and my lack of knowledge/skills, I decided to teach myself how to do interpolations in R instead, since I am more savvy in R. I quite enjoyed the process of trying out new things and figuring out kinks in lines of code that I found online. Since interpolations in R were also new to me, I followed a very helpful tutorial on how to do interpolations. However, because of this, I was limited to the types of interpolation that were presented in said tutorial, namely Inverse Distance Weighting (IDW), 1st and 2nd order polynomials, and kriging. The other main type of interpolation that was not covered in this tutorial was splines. However, after some research, I discovered that splines do not tend to do well when there is a lot of variation in the data that is being interpolated. After doing some initial data exploration, I found that there was significant variation in my data, such that on most days there were 1-2 stations that had a very density while the other 4-5 had a relatively low density. Therefore, I decided that it was not too important that I was unable to run a spline interpolation for that reason.
I will be sharing code for how to make interpolations in R during my tutorial, however I will detail the rough steps involved here. First, I had to import coordinates (latitude, longitude) for the boundary of my study sites and convert it into a spatial polygon. Then, after splitting my data into daily data frames, I converted them into spatial points. I used several packages that contained the functions to run different steps of the data preparation and interpolation, including ‘spatstat’, ‘gstat’, ‘tmap’, and ‘sp’. All of these packages have other dependencies that also needed to be installed and loaded before these packages and functions will run.
Before I move on to some preliminary results, I want to explain some of the core differences between the different kinds of interpolations I employed.
IDW is a deterministic interpolation method, which relies on surrounding values to populate the resulting surface. One of the major assumptions is that the variables being mapped decrease in influence linearly as you move away from a given value. In IDW, the weight given to any given point is calculated linearly, decreasing as you move farther away from this given value.
1st and 2nd polynomials fit a smooth surface given by a mathematical surface to the data points. The polynomial surfaces change gradually and therefore are good at capturing coarse changes in data. It’s kind of like fitting a flat sheet of paper to a surface of points and trying to find the best straight fit. That’s what 1storder polynomial interpolations are. In a 2nd order polynomial, the ‘sheet of paper’ is folded once. Polynomials tend to be very good at representing a gradual change in the variable of interest.
Kriging is different from IDW and polynomials as it is not a deterministic interpolation but instead is a geostatistical approach. Kriging is a little more advanced than other interpolations because it not only considers the actual value of the variable at a spatial location, but it also involves an exploration of the spatial autocorrelation between points. Therefore, kriging not only has the ability to produce a predicted surface but also provides a measure of certainty or accuracy of the predictions.
The resulting interpolations for my two field sites can be found below. I have selected the interpolations from the same date at the both sites so that some initial visual comparisons can be made.
Critiques of the method
Based on my research on the different interpolations, it seems that my data are best suited for IDW interpolations. This is because I have very few data points (only 6 stations per site) and there is quite a lot of variation in the density of zooplankton within a site on the same day. Since polynomials and kriging apply functions and equations, they will inevitably smooth the data to some extent. However, I am precisely trying to capture this variability in the data as my next step will be to see whether gray whales prefer sites that have high densities of prey. Therefore, if I select an interpolation that smooths out this variability, then I may not find the right relationships between predator and prey that I am investigating.
Even though the polynomials may make sense as they seem to imply there is a gradient of density that occurs in the field sites and the raw data may in fact also reflect this, the polynomials (and the kriging too) create very unrealistic values. If you look at the density scales next to the plots, you will see that both the polynomials and the kriging interpolations end up having negative density values which ecologically, does not make any sense.
I still have a lot of refinement to do with these interpolations, including calculating root mean square errors for each of the interpolations to quantitatively compare them to one another to see which one performs best. Furthermore, since I think the IDW interpolations are the most suitable based on my data, I think it would be good to continue exploring the different interpolation variables, like grid cell size or buffers around the points, to see what variations will look best and most ecologically relevant.
1. Research Question
In order to perform the right hysplit back trajectory data, the first thing that I must do is accessing the spatial and the temporal pattern of POP. The initial data of POP is looked like clustered, however, it needs more spatial statistical analysis to determine where is the highest clustered condition with exact coordinate and time. Hence the questions that I would like to ask for exercise A is the cluster pattern of POP and when is the highest value of POP is conducted.
2. Tools and Approach
I decided to use Incremental and Spatial Autocorrelation by comparing the Hot Spot Analysis (Fixed distance band) and Local Moran’s I. By looking at the result of those features, I can decide which tool that appropriate to read my data.
I used Spatial Statistic Tools in ArcMap 10.7.1:
a) Analyzing pattern by using Incremental Spatial Autocorrelation and Spatial Autocorrelation (Moran’s I).
b) Mapping cluster by using Hotspot analysis: ((Getis-Ord general G) and Optimized Hot Spot) and Cluster and Outlier Analysis (Anselin Local Morans I)
3. Step of Analysis
a) My POP data consisted of a lot of value based on the depth in coordinate, thus in order to start analyst the spatial dan temporal pattern I need to exaggerate the POP by averaging the POP value. Subsequently, I got one POP value for one X and Y coordinate. In order to do this I used Matlab, with code as follows:
b) The next step is determining the rank of POP for clustering purposes. The rank is defined by using the mean value, where the POP value which highest than the mean will be considered as high value and POP value which lower than the mean will be accessed as low value. I also used Matlab to perform this job, the code as follow:
c) Since the data from Matlab is in txt format, hence I call this data in ArcMap to display X and Y data and convert it to be the shp file.
d) Analyzing the pattern by using spatial autocorrelation Moran’s I with “fixed distance band” and “Euclidean Distance” and “non-standardization”. The second pattern is by using Incremental Spatial Autocorrelation with ‘Row standardization’.
e) The mapping cluster is generated by using as follows:
1. Cluster and Outlier Analysis (Anselin Local Morans I) with “fixed distance band” and “Euclidean Distance” and “non-standardization”.
2. Hot spot analysis by using Getis-Ord general G with “fixed distance band” and “Euclidean Distance” and Optimized Hot Spot with “Snap nearby incident to create a weighted point” aggregate method.
f) Join the attribute table of initial data to the attribute table of spatial autocorrelation result to get the highest clustered POP value by coordinate and time.
a) The data is significantly decreased from 929 points to 82 points.
If we see in a three-dimensional graph hence, we can see that the pattern of POP is clustered over five years’ time interval, however, I need more robust spatial autocorrelation to get the highest clustered value.
b) The pattern analysis is performed based on the rank value, where the result is like I expected, where the highest value is clustered in one area.
In order to robust the certainty of the result, I also performed the incremental spatial autocorrelation, where the result seems similar to the Morans I.
c) The Mapping clusters
The mapping shows the result of different methods of mapping cluster analysis, both Local Moran’s I and hot spot indicate similar result where the hot spot and high value is clustered in the black dot. Based on this result, the Hysplit back trajectory will be conducted on the black dot on Moran’s I cluster.
5. Critique of the method – what was useful, what was not?
Actually, my data has been clustered in one area, however, it will consume a lot of effort and time if I analyses all the hysplit back trajectory data for all date of the year in that clustered area. This spatial autocorrelation gives me a plausible decision to determine the spatial and temporal analysis for hysplit back trajectory.
However, I am not sure that I have conducted the right spatial relationship or not, I need to deepen my knowledge on how to determine the conceptual spatial relationship based on the initial data. Once we change the conceptual spatial relationship the result of the Z-score will change as well.
The main question I sought to answer within this first exercise looks to identify the dispersion and aggregation of artifacts in an assemblage. Using this assemblage as a point pattern, I implemented a method of interpolation based upon the density of physical points within space. The first hypothesis I am seeking to assess is whether my assemblage is clustered within physical space, regardless of artifact attributes. I am moving forward with the assumption that artifact attributes do not produce and internal source or sink in my dataset. The main reason that it may appear that similar artifact attributes are attracted to one another in space is because the external force behind their creation produces very constant patterns in this regard. Human behavior is the external factor that is primarily responsible for the perceived sources and sinks in typical artifact assemblages. For this reason, I have neglected artifact attributes (e.g. length, width, material, type) in this initial hypothesis.
In addition to the previous assumption, since the dataset is confined within a depth of 1 meter, identifying clustered behavior in this artifact assemblage is not sufficient within 2-Dimensions. I must consider how the artifacts within this assemblage are clustered within 3-Dimensional space involving the longitude, latitude, and depth. Where many may attempt to cluster within this expanded dimensional, I have chosen to segregate this dataset into horizontal segments based upon observable characteristics that come from visualizations of the artifacts within 2-D and 3-D space. As a result of this approach, there is a potential for a higher amount of error because I have made visual judgments based upon what I can see in the assemblage with little quantification present in this step.
In conclusion, I seek to identify dispersion within the artifact assemblage in 2-Dimensional segments varying in vertical extent using interpolation methods based upon the kernel density function.
Proceeding with Interpolation
A tool that I used to assess the dispersion within the horizontal segments of the artifact assemblage is the kernel density function called ‘kde3d’ located within the software package ‘misc3d’ in R. Since this tool has already been created and established in R, the first step I undertook was assigning the proper coordinates for each artifact within ‘kde3d’. One of the main parameters in this function is the number of grid points to use. Ultimately, this number relates to the accuracy and “smoothness” of the resulting image. A number is then assigned to each grid determining the density in a cubed extent of the data frame. When this resulting value is produced into contour lines using ‘contour3d’ located in the same package, I am able to draw these contours in order to visualize the 3-D kernel density of the artifact assemblage (figure 1). The residual points not included in the topological features are outliers and do not constitute a highly dense area. Ultimately, figure 1 shows areas that contain a significant cluster of artifacts. I then used the resulting 3-D kernel density image (figure 1) in conjunction with a vertical window into the artifact assemblage (figure 2) in order to identify depths that I will choose as my horizontal segments.
Figure 2 resembles one part of the process that was undergone in order to judgmentally determine the most significant surfaces to analyze within 2-D horizontal segments with the x-axis assigned to the Easting and the y-axis is assigned as the Elevation. The areas, or levels, chosen here will be referred to as surfaces of interest. These have been determined significant because the artifacts appear to follow an increasing trend along a similar angle as the recorded topographic surface, there is a gap between artifacts and these intervals representing a possible period with no occupation, and there is an aggregation of artifacts within each surface of interest. This is just a single example of a window that has been cut into the dataset; similar procedures were done with multiple other windows cut into the dataset. Figure 3 represents an example of what I refer to as a “window” cut into the dataset within a 1-meter interval spanning from West to East across the entire site extent.
I have chosen five initial horizontal segments to analyze further. These horizontal segments have been analyzed for clustering using the ‘density’ function in the package ‘spatstat’ in R (figure 4). This density function uses the similar kernel density calculation but instead of 3-D space, I have created an interpolated surface within 2-D space based upon the density of artifacts contained in each surface interval. Using the ‘spatstat’ package, I separated the artifact types at each surface in order to visualize spatial patterns between all the artifacts and then between each artifact type at every 5cm interval in my dataset. For the purpose of this exercise, I have chosen to only include the density plots representing the artifacts at the five identified surfaces in the dataset (figure 4).
Resulting from this procedure, I was able to visually identify clustered areas of artifacts within both 2-Dimensional and 3-Dimensional space. Figure 1 identified high density areas of artifacts within 3-D space and facilitated in the identification of five primary horizontal segments of the dataset that I will focus on. Within these five segments, I have analyzed the interpolated density surfaces containing all artifacts and surfaces containing a single type of artifact (figures 4). There appears to be clustered areas within the artifact assemblage that may be representative of different human activities at this site. Figure 4 shows highly dense areas of artifacts across the entire site is specific areas. As we move between the different surfaces in figure 4, the dispersion and density of artifacts changes as well. Thus, there appears to be some kind of variable that is shifting patterns in the spatial distributions of artifacts throughout time in this assemblage. The next step is to analyze the type of variable that may be affecting artifact distribution throughout each horizontal segment.
The primary critique that comes with this interpolation approach has to do with the scale of investigation. For a specific example, my data set has one entry labeled as Fire Cracked Rock (FCR). Even though there is only one count of FCR in my data, when I interpolate the surface containing this object, the result shows a very prominent density in the area surrounding area. If one looks closer, they will notice that the scale of this particular frame is adjusted to indicate a density value less than 0.5. This value occurs because, in the dataset, there is only one object identified as FCR. Thus, visually, there appears to be a significant amount of FCR in the identified area but in hindsight, there is only a single object in that location. So, in light of this critique, I must adjust my interpretation of which artifacts present a clustered area and which ones could simply be outliers within the dataset. The use of visualizing point patterns in 2-D and 3-D space comes into play here in order to mitigate the affects of this misrepresented data.
Research Question My research question that I continue to work toward answering is How is the temporal pattern of population flow by county related to the temporal pattern of flood insurance claims by county via risk compensation? I was particularly interested in the temporal pattern of insurance claims by county for this week. I examined single year as well as whole study period extents. From the exercise 1 lab I was also drawn to the question of possible sources/sink locations, which, in the case of my data could signify vulnerable areas located near the coasts as sources of hazard. While the sink is areas of higher population adjacent to these sources of more hazardous locations. Down the line, this source/sink concept could be interesting once I include population flow data into the mix. See below for a bubble map denoting total flood insurance claims over the study period (1990-2010) by county.
I was interested in and attempted a few different approaches to explore my data. I started out by attempting spatial autocorrelation using Moran’s I to attempt to disaggregate county-level vector data and examine the spatial dependence significance. Next I attempted an Inverse Distance Weighted Interpolation. Finally, I attempted to simply examine and plot the temporal variation in my insurance claims data.
3. Steps and Results
Using the Monte-Carlo simulation of Moran I in R I observed a significant pseudo p-value (less than 0.01) which indicates that there is a significant spatial autocorrelation between contiguous counties and we would not expect to see this relationship if the claims data was randomly distributed across counties.
Inverse Distance Weighted interpolation attempts to predict the values of data points near areas of known values. Since my data is attached as county polygons I converted this to point data and then ran an IDW function over my entire claims dataset (total claims from 1990-2010) to show interpolated values across the country (see picture below). I also tried this for single years of data (see below for 1998 as an example year). This map along with other maps I have created from this data helped inform where some “hotspot” regions exist for future work.
Finally, I summarized and plotted the data to examine the temporal variation. I plotted the counts of claim frequency at the national scale and also selected 3 hotspot counties (Houston, New Orleans, and Miami) to plot as well (see charts below). These plots are helpful in showing the volatile nature of flood claims. Since I have been working with this data for some time, I had some preconceived ideas of the data’s temporal characteristics. Thus, I understand the up-and-down nature of flood claim frequency that respond to the acute nature of natural hazards. I also created plots for mean annual claims which I included below. I attempted to create a mean difference map in an attempt to show anomalous data deviation but that has not as of yet has depicted data in a new way than previous mapping efforts.
4. Critiques of Methods
Of the three approaches the IDW interpolation was the least helpful. As I now understand, this type of analysis is not particularly helpful to my type of data and I could essentially see the same results from mapping the data by county. I found the spatial autocorrelation approach to be interesting, however, the results were not surprising given the known spatial distribution of disaster claims from the visualized maps. Finally, the temporal plot focus yielded helpful information regarding the temporal variation in the data and inside hotspot areas.
Exercise #1: Determining Patterns of Spatial Distribution of Nearshore Benthic Organism Abundance from Point Conception to Canada
Research Question: Using ~80,000 benthic box core samples along the California, Oregon, and Washington coasts, I investigated methods to determine patterns of spatial distribution of benthic organism abundance (both infauna and epifauna). This process was the first step towards examining the correlation between total benthic organism count and total organic carbon (TOC). I conducted this investigation at three spatial scales: (1) the entire study area (Pt. Conception, CA to the Canada border) (2) an approximately ~60 km2 area off Yaquina Head near Newport, OR, and (3) an approximately ~760 km2 area off Newport, OR from Beverly Beach to Seal Rock.
Hypothesis: I hypothesize that total organism count will have a clustered spatial distribution due to both internal and external processes, including attraction/repulsion and sources/sinks. Predator/prey interactions, mating, territorialism, and colonization are all internal processes of attraction/repulsion that impact the spatial distribution of total organism count. For instance, if a large Dungeness crab megalope recruitment event occurs in Yaquina Bay, predators will congregate in that area to take advantage of the abundant food source, resulted in a concentrated area of high organism abundance. Salinity, upwelling, sediment grain size, pH, temperature, turbidity, wave intensity, current, light availability, depth, total nitrogen, total organic carbon, chemical pollution, & noise pollution could all serve as either external sources or sinks of organism abundance depending on the adaptations of particular species. I hypothesize that total organism count, especially of detritovores, will be higher in areas with relatively high levels of total organic carbon.
Tools/Approaches Used & Steps: I focused on three Spatial Analyst tools in ArcMap 10.7.1 to conduct my analysis: Hotspot, Kriging (Interpolation), and Inverse Distance Weighting (IDW, Interpolation). Before I could begin my analysis, I first merged various tables from an Access database and created Pivot Tables in Excel so that I could have datapoints with total organism count, TOC, lat/long, and sampling date information to make the “next steps” in my analysis easier. After that I began playing around with various tools at my three spatial scales, a process I’ll summarize below:
Total Study Area: In order to generate as accurate an interpolation of total organism count as possible, I loaded all my data into ArcMap, opened the attribute table, and then used the “Select by Attribute” function to select only box core samples that were sifted using a 1 mm mesh screen. Different mesh sizes would of course impact the number of organisms counted (smaller critters fall through bigger holes). I then used the Inverse Distance Weighting (IDW) (Spatial Analyst) tool to generate an interpolation raster for the entire study area. The raster output was a large rectangle that included all the points. Because my data points are distributed over a large area along the coast, the raster covered significant areas of land and sea. I felt that the interpolations could only be “trusted” near the actual data points, so I decided to clip the raster to a smaller area. In order to do this, I buffered all the points and dissolved the buffer to generate a polygon. I then used the Extract by Mask tool to create a raster within the bounds of the buffer polygon. Some of the raster area was still over land, so I imported a polygon of California, Oregon, and Washington and then used the Erase function to delete the portion of the raster that intersected with the state areas. I ended up with an IDW interpolation raster for the coastal area. Next, I conducted a Hotspot analysis on the 1mm mesh organism count data. Hotspot analysis doesn’t do well with unevenly distributed data points (which I have), so I selected data points with values near mean organism counts within highly sampled areas. I ultimately determined that Hotspot analysis is not the most appropriate method for determining spatial distribution given my data, a conclusion I will discuss in more detail below.
Newport Limited Region: Looking towards the future, I plan to examine if/how the relationship between total organism count and TOC changes seasonally. With that in mind, I generated interpolations for five months in 2011 (April, June, August, October, & December) of a ~60km2 area off Yaquina Head near Newport, OR. I once again used the IDW (Spatial Analyst) tool in ArcMap 10.7.1. I then ran a sixth interpolation using all my data for that area to compare to the five monthly interpolations. Next, I used the “Minus” function to subtract the mean pixel values of all the data from the pixel values of the monthly rasters. These difference rasters allowed me to more easily compare the monthly organism count data.
Newport Extended Region: In the Newport Limited Region, I only looked at data from 2011. In the Newport Extended Region, I looked at all the monthly data I have between 2010 and 2016 from a ~760km2 area off Newport between Beverly Beach and Reedsport, OR. I determined that I have sufficient data from April, June, August, & October. I also had some September and December data, but not enough to make accurate interpolations for the whole area. For these interpolations, I decided to compare results from the Kriging and IDW methods.
Total Study Area: The IDW interpolation for the entire study area suggests that the areas with the highest overall numbers of benthic organisms occur near San Francisco Bay, Monterey Bay, and off the coast of Washington State north of Puget Sound (see Figures 1 and 2). Generally speaking, the Oregon coast seems to have lower total organism counts than the other two states. The Hotspot analysis showed similar results.
Newport Limited Area: The difference rasters helped me to see some temporal change in total organism count, but I will save a discussion of those findings in a later exercise. I included an example of how I generated a difference raster in Figure 3 below.
Newport Extended Area: For the larger Newport region, I decided to compare the IDW interpolations to Kriging interpolations for four different months. I observed some clustering of total organism count (see Figure 4).
Methods Critique: Broadly speaking, this exercise provided me with an excellent learning opportunity. I found it useful to practice working through a large dataset, switching between different forms of software analysis (ArcMap, Access, and Excel), and using various spatial analysis tools in ArcMap that I haven’t used recently. I also learned about the advantages and disadvantages of various methods for determining spatial distribution patterns, namely IDW, Kriging, and Hotspot analyses.
Inverse Distance Weighting: The assumption behind this method is that points that are closer together will be more similar than points that a farther apart. Unknown values are determined through the weighted average of known values within a certain radius. Spatially closer values are given more weight than those that a farther away within the set radius. Weights are proportional to the inverse of the distance, raised to an assigned value p. I used p = 2 for my analyses. IDW has the advantage of relative simplicity which makes the results easy to interpret, but it does have some disadvantages. In instance, interpolated IDW rasters can be skewed when data points are highly clustered. Additionally, IDW cannot interpolate values above or below the minimum and maximum values form the dataset. IDW also does not generate variance rasters to help the viewer determine how “trustworthiness” of the results.
Kriging: Like IDW, the Kriging interpolation method estimates values for unsampled areas using known data made up of a certain number of points or within a certain radius of an estimated point. This technique used both the distance between points and the degree of variation between points when interpolating values. One difference between Kriging and IDW is that Kriging can interpolate values outside of the range of the known dataset. However, the interpolation does not necessarily pass through known points, so the interpolation raster values could be higher or lower than the known values in a sampled location. The results of the Kriging and IDW showed similar patterns of total organism abundance in this instance.
Hotspot: I ran the Getis-Ord Gi* Hotspot analysis tool in ArcMap 10.7.1, which identifies statistically significant hotspots and coldspots in spatial data. The tool generates an Arc feature class that contains the statistical p-value and z-values for each feature (in this case point) in a designated feature class. After consultation with Dr. Jones, I learned that hotspot analysis is not an appropriate toll when sample sites are determined by researchers. My data comes from multiple projects and as a result some of the sample sites are spread out and others are clustered closely together. These clusters can influence the results of hotspot analysis.
My dataset A is population by county in Nebraska, Kansas, Oklahoma, and Texas in 1992 and 2017. My research question is asking what the spatial patterns in population are in 1992 and 2017. I also wanted to identify the spatial patterns in the change in population between 1992 and 2017.
My hypotheses for this analysis are that there will be much higher population rates in cities than rural places, but more importantly, that population increase will be in urban and suburban counties. I also suspect that there will be decreasing populations in rural counties. I hypothesis this shift towards cities because of the attractions of cities (and repulsions of rural areas). People frequently move in search of more job opportunities, which are typically in or around cities (PRB, 2008). A major industry in rural areas is agriculture, which is a shrinking industry, especially due to the concentration of farms. Fewer job opportunities in rural areas act as a repulsion processes and the larger economies in cities act as an attraction process.
In order to identify spatial patterns in the population data I created maps of the two years of data as well as a map of the difference in population. To help highlight the areas with higher rates of increase and decrease in populations, I conducted a hotspot analysis.
Since I have continuous county-level data, I determined that a hotspot analysis would be the best way to highlight the patterns. I used a similar approach to the Getis-Ord Gi hotpot analysis, but since I was working in Python, I used the Exploratory Analysis of Spatial Data package to conduct spatial autocorrelation and identify hot spots, cold spots, and spatial outliers.
In order to conduct a hotspot analysis, I had to first join the population datasets from 1992 and 2017 with the shapefile of counties. This took some reformatting of the datasets and their county identifier codes to match them up correctly. I then selected out the four states that I am working with but found that the population in cities in eastern Texas are significantly larger than anywhere else in those four states, so I separated eastern Texas from the rest.
In order to calculate hotspots, I had to first calculate the spatial lag. This function returns the average of each observation’s neighbors. Then using weights, the spatial lag, and a few other inputs, identifies counties that are hotspots and cold spots.
My results show some obvious hotspots over cities, although not all cities are hotspots. It is possible that some have populations that are not as large as others, and therefore are not standing out as hotspots. I found the cold spots to be helpful in identifying cold spots, or areas with dramatic decreases in population. There is a large cold spot between Nebraska and Kansas on the eastern side of the states. There are some others that I will keep an eye on as well when I bring dataset B into the picture.
I would say that this method has been useful for highlighting the hot and cold spots in a busy dataset, but it has not created new insights. The tool that I used in Python is also not documented in very much detail, so there was a lot of front loading of effort in order to get it to work.
My work for Exercise 1 was geared towards developing a method to calculate the magnitude of retreat of Greenland outlet glaciers, based on their terminus lines. Because there is no obvious tool or set of tools to do this in ArcGIS pro, I developed two distinct workflows to produce relatively accurate datasets representing the spatial retreat of glacier boundaries. Once these datasets were constructed, I was able to conduct some very preliminary spatial statistics to determine their distribution.
1.How can I calculate and represent glacier retreat solely from polylines of glacier termini? How is this (constructed) variable of glacial retreat distributed across Greenland?
What kinds of attraction/repulsion processes might occur in your variable?
Outlet glaciers tend to be connected (or their qualities may be “spatially attracted”) to those nearest to them because they are all fed by a single ice mass. For example, if the southeast portion of the Greenland ice sheet experiences higher temperatures, the outlet glaciers that it feeds will likely have a higher likelihood of experiencing retreat.
Furthermore, temperature and precipitation in and of themselves tend not to affect entire regions universally. For example, certain regions warm up more than others, which could lead to a “clustered” distribution of glacier mass balances. Greenland as a whole is not warming at a constant rate across its geographic extent.
Local geography (valley geometry, rainshadow effects, etc.) have a certain element of randomness that can contribute to repulsion behavior (adjacent glaciers experiencing similar forcings not necessarily responding the same way.
What kinds of source/sink processes might occur in your variable?
I think in general, the primary “sources” of material to outlet glaciers is ice from the interior of the ice sheet, and from snow that falls directly in the accumulation zone of each glacier (locally). The primary sinks, that contribute to ablation, are surface melt (a function of temperature), subaerial melt (within the glacier) and ice-ocean interaction (calving and subaqueous melting). Together, these competing forces determine the mass balance of any given glacier. Obviously, all of these forces have a local and global trends, which could lead to different spatial patterns at different scales.
Approach #1- Drawing Polygons:
ArcGIS Pro Tools Used:
Merge, Feature Vertices to Points, Points to Lines, Feature to Polygon, Hotspot Analysis
To experiment with techniques, I limited my datasets to just 2 years, 2005 and 2017, to calculate the magnitude of retreat over the entire study period. Termini were distributed as such (There is an unfortunate absence of data in the Southwest of Greenland, a region considered to be melting the fastest based on gravitational anomaly data):
Datasets were manipulated as such (zoomed to view individual glacier example):
1.Polylines from each dataset were merged together into one individual dataset using MERGE tool.
2.Endpoints from polylines were isolated using FEATURE VERTICES TO POINTS tool and selecting “both start and end vertices” parameter.
3.Endpoints of polylines were connected using “POINTS TO LINES” tool.
4.Outlines were converted to polygons using the “FEATURE TO POLYGON” tool. The polygon dataset is now complete.
5. Map generated from polygon attributes (Area) to represent “Area of ice retreated” across Greenland:
6. The tool “HOTSPOT ANALYSIS” was used to understand spatial trends in glacier retreat, yielding a predictable result:
Approach#1 Results and Critique:
I was not happy with this approach for several reasons. First, it required several parameters to be just right in order for a constructed polygon to accurately represent an area of retreat. The 2 terminus lines must be entirely separate (aka the lines from T1 and T2 can never cross) otherwise they will create multiple polygons and overestimate the area retreated. As a result, glaciers that did not retreat significantly always received a larger estimate than they should have. Also, glaciers in unique geographic locations (like curved valleys) would not retreat straight back, but perhaps around a corner, causing termini at T1 and T2 to be ~90 degrees from one another. For these glaciers, straight lines between terminus edges would clearly cut off parts of the retreated area. In these scenarios, the estimated area clearly undershot the true area retreated. The screenshots above were cherry-picked from a cooperative glacier, but in reality, about 30-40% of my calculations seemed to significantly overshoot or undershoot what I believed to be the true area lost.
Second, the AREA lost in and of itself seemed to be a substandard parameter to calculate. Wider glaciers, that create longer termini, would always produce larger polygons. On a relative scale, this makes it seem like only the widest glaciers retreated significantly, whereas smaller, thinner glaciers only lost a miniscule amount. This makes the 2 maps above deceiving: Glaciers in the north of Greenland were much larger and had much longer terminus lines, so their retreat polygons came out massive. Along the East and West of Greenland, there appeared to be very many smaller glaciers, but their polygons came out very tiny. Comparing raw data of a few huge retreats and hundreds of small retreats made the hotspot analysis entirely skewed to the north. Now, I could normalize for this if I wanted to, but in exercises 2 and 3 I want to work with all of these glaciers individually when compared to local glacier velocity, so I want to keep their “raw data” as intact as possible. (It is worth noting, glacial geologists usually use the distance of a glacier’s retreat, in conjunction with the movement of the equilibrium line, to estimate temperature changes. They do not care about area)
Finally, I don’t like this approach because the hotspot analysis makes it look like the ONLY location with significantly retreating glaciers is northern Greenland. While this could indeed be the region experiencing the most calving and ablation at the terminus, it does not even remotely resemble Greenland mass anomalies detected with GRACE tandem satellites (shown below). Of course, this brings up the question: how exactly do glaciers retreat? While one obvious and dramatic mechanism is calving which occurs at the terminus, there are very significant mass fluxes from surface water melt and subaerial melt- which don’t necessarily occur at the terminus. The discrepancy between these 2 datasets makes me wonder if (climatically) short term changes in terminus position are dominated by calving and subaqueous frontal melt, but that total mass flux of a glacier (including surface and subaerial melt) takes slightly longer timescales to kick in and be “felt” at the terminus line. In other words: The equilibrium line of a glacier, and thus it’s terminus position, is entirely controlled by glacier mass balance. However, glaciers that experience particularly vigorous calving might appear to retreat faster than their surface melting counterparts. These processes might take longer to be expressed at the terminus line, and might manifest more immediately as glacier THINNING. Perhaps glaciers in the north of Greenland experience more calving, whereas glaciers in central and south Greenland might experience more surface and subaerial melt. It is not a question I can tackle with the current project and datasets, but a very interesting one nonetheless.
1.The tool FEATURE VERTICES TO POINTS was used once again on a single dataset (in this case 2005 rather than 2017). This time, the parameter “All Vertices” was selected, to convert virtually the entire polyline into points.
2. The NEAR tool was used to calculate the closest distance between each individual point on the 2005 line to the 2017 line. The blue hand-drawn lines in the figure above visually represent this process. NOTE- the distance calculated is NOT perpendicular the 2005 line, merely the distance between a given point on that line and the closest point on the 2017 line. I attempted to visualize this concept above as well.
3.The SUMMARY STATISTICS tool was used to generate a table isolating the mean distance of these points for each glacier, as well as the median distance of these points for each glacier. The parameter “Case Field” was set to each glacier’s unique “glacier ID,” to ensure individual values were produced for each glacier, and not each separate point.
4.The JOIN FIELD tool was used to merge the newly created statistics table to a separate dataset containing only point locations of each glacier (created previously)
5. Maps were constructed representing the newly constructed variable: “Mean Distance Retreated” across Greenland (NOTE: a different symbology was used than in the previous method, because graduated symbol size became quite confusing to view):
6. The tool “HOTSPOT ANALYSIS” was used to understand spatial trends in glacier retreat, based on length:
7.The tool SPATIAL AUTOCORRELLATION: GLOBAL MORANS I was used to determine the clustered behavior of glacier distance retreated across the island:
Approach #2 Results and Critique:
I consider this approach far superior to the previous one. The Near tool gives me a distance between termini at very close intervals that can then be averaged with summary statistics, and when going back to manually check each average “distance” it seemed far more accurate than the constructed polygons. In this approach, the map produced by the eye test alone seems to have a more even distribution, not heavily weighting everything to the north. The hotspot analysis picks up more hotspots in different locations spread across the island. While there are clearly still 3 hotspots in the north, there are also 3 in central Greenland. I am surprised a hotspot didn’t appear in NW Greenland, where there is a clear abundance of high retreat values. Moving forward, I think I will use this type of dataset to compare glacier retreat to glacier velocity in Exercise 2 and 3. This process can be replicated relatively simply on various time series throughout the study period, to get annual resolution of terminus retreat.
Does your variable A have different spatial patterns at different scales?
From the eye test, it seems there are some groups of high-retreat glaciers around the hotspots, interspersed with glaciers that are lower retreat, which indicates a somewhat clustered distribution. Furthermore, in both distance and area datasets, high retreat values appear to be clustered in northern Greenland. We can assess further with a Spatial autocorrelation tool (Gobal Moran’s I)
The Moran’s I index, which calculated a default neighborhood search area of 162km, came back as 0.018, which reflects a very, very slight clustering. But with an index value so close to zero, a p value of 0.5 and a Z score of 0.6, this is nowhere near enough to reject the null hypothesis, and indicates the distribution may just be random. This is not an exhaustive analysis of the distribution of these values, just something I performed quickly to try to include last minute in Exercise 1. I hope to perform more spatial autocorrelations and more closely assess the distribution of these values, as well as their relatedness to glacier velocity, in Exercise 2.
My main goal for this course is to determine the methods needed to quantify salt marsh volumetric change from 1939 to the present within Oregon estuaries. I have historical aerial photographs (HAPs) and sediment cores from five estuaries along the Oregon coast – Nehalem, Netarts, Salmon, Alsea, and Coquille. Ultimately, through a combination of rates of area change measured from the HAPs with rates of vertical accretion measured using excess 210Pb within my sediment cores, I think I will be able to develop a history of salt marsh volumetric change. (See my first blog post for more background the significance of this research.)
Based on my goals outlined in my first blog post, Julia made a number of helpful suggestions for how to best frame my first exercise. First, she suggested that I need to begin by mapping salt marsh horizontal change from the HAPs by creating outlines of my marshes for each year I have photos. Second, Julia suggested that rather than just focusing on net area change, I also investigate portions of the salt marsh that exhibit growth or contraction as these areas might relate to larger changes within the estuary and watershed. Third, Julia suggested I first focus on one estuary to nail down my methods as a proof of concept. I can then apply my method to each individual salt marsh.
Throughout this blog post I will reference many of the topics we have discussed in class. For instance, I will analyze hot spots of accumulation and contraction through analysis of digitized areas from georeferenced HAPs. I will briefly discuss how I hypothesize that the scale at which I digitize a salt marsh area will also change the overall area. Additionally, though I am not directly using an interpolation tool in my geospatial analysis yet, I will use interpolation between years in my initial analysis of salt marsh area change. Ultimately, I will be characterizing whether the salt marsh is a source or sink of sediment through analysis of the volumetric change.
What is the best method to measure change in salt marsh area from HAPs?
How have different regions of the salt marsh within Alsea Bay changed in area since 1939?
How do horizontal changes relate to vertical changes?
Step 1: Scan images
HAPs, which are panchromatic and roughly decadal spanning 1939 to 1991, were scanned at the University of Oregon’s Map & Aerial Photography Library. No information was provided for each photo besides the year it was taken (and sometimes the day). This is frustrating because when photographs were taken at high tide, areas of the vegetated salt marsh are inundated and edges are less easily distinguished – especially within tidal creeks. Before bringing these images into ArcGIS Pro, they were cropped to remove borders and converted to TIFF files in Adobe Photoshop.
Step 2: Georeference images
I played around with stitching the images together in Fiji(Image J), which is a powerful, open-source image processing program that I highly recommend. However, since the HAPs were taken from different angles, the photographs stich together very strangely especially for tall features such as trees, buildings, and bridges.
All images were georeferenced in ArcGIS Pro using ≥ 10 control points. With an effort to select areas evenly across the photographs, control points were preferentially placed at road intersections; however, creek intersections were also used as control points in portions of the images without roads. Comparison between HAPs indicates that tidal creeks are surprisingly stable throughout the last ~80 years in Alsea Bay. Historical aerial photographs combined with control points were fit using second-order polynomial transformations. The root mean square error (RMSE) was calculated for each georeferenced image; no map had a RMSE >7 m and the mean RMSE was 3 m.
Step 2.5: Try automated image classification
Initially I tried classifying images using the ArcGIS Pro Classification Wizard. To reduce the complexity and size, I first cropped the georeferenced images. Object based supervised classification was used. Trying different degrees of spectral and spatial detail, I then trained the Classification Wizard with examples of forest, vegetated salt marsh, and mudflat/open water. One major issue observable in the classifications was that the vegetated surfaces were sometimes white, causing the software to confuse it with the white produced from light reflected off the water’s surface. Thus, the salt marsh edge is seemingly too complex for supervised image classification when images are only black and white. After receiving these poor results with every adjustment and speaking with colleagues about automated/manual classification, I decided manual digitization is my best option. (A review of the literature also indicates that others must have found that manual digitization is preferable to supervised/unsupervised image classification when analyzing panchromatic photographs of salt marshes.)
Step 3: Heads-up digitization
Digitized areas were initially limited to regions of the estuary present within the majority of photographs. These areas were further limited to least-disturbed tidal saline wetlands from which vertical sediment accumulation rates were available. I then performed heads-up digitization of the edges between unvegetated mudflat and vegetated marsh using the create polygon tool… Because trees and their shadows obscure land-ward edges, these boundaries were digitized using a combination of PMEP’s (Pacific Marine & Estuarine Fish Habitat Partnership) “West Coast USA Estuarine Biotic Habitat” maps and the georeferenced aerial photography.
Step 4: Observe changes
Preliminary analysis of horizontal changes was observational. I additionally plotted net changes in salt marsh area over time for each major salt marsh complex.
In response to question 1, heads-up digitization is much better suited to producing salt marsh outlines than an automated image classification method, such as those available within the ArcGIS Pro Classification Wizard. This result is unfortunate because heads-up digitization is obviously very time consuming and strenuous on your eyes (my right eye started to twitch) and mouse hand. Supervised image classification may allow for an initial distinction between unvegetated mudflat and vegetated marsh surface. Classified images and georeferenced aerial photography could then be referenced in combination to digitize the marsh boundaries.
A preliminary analysis of the digitized layers shows hot spots of salt marsh growth and contraction. For instance, the upstream, east-most salt marsh complex has shown very little change over the last observed record, though over the total record, its area has decreased by ~4%. Comparatively, after first experiencing a ~5% decrease in size in the 1940s, the large, west-most island increased in area by ~10% from 1952 to 1959, then after modest growth in the 1960s to 80s, declined to only ~3% of its original 1939 area by the early 1990s. The salt marsh fringing the NE portion of the bay additionally observed a modest 3% increase in area in the 1950s, but then contracted in size by ~13% in the mid-1960s, expanded again by ~10% and ~5% between the 1970s and 1980s, respectively. This area observed a net growth of ~4% from 1939 to 1991.
The wet phase of the Pacific Decadal Oscillation (PDO) from 1944 to 1977 has been observed to increase peak river flows (e.g., Wheatcroft et al. 2013). Coincidently logging intensified in Lincoln county during the same period (Andrews & Kutura 2005). These events may have contributed to increased area of the large, west-most island in that time frame. The contraction of the salt marsh fringing the NE portion of the bay in the mid-1960s to 70s is unclear but perhaps related to increased wake from boats in that area.
A preliminary comparison between histories of sediment accumulation and marsh growth indicate very similar patterns. For instance, the sediment core collected from the small island shows very little change in accumulation over the last century. Additionally, the sediment core collected from the large island exhibits a period of rapid accumulation in the 1960s. Vertical accumulation increased in the 1950s in the core collected from the fringing marsh, also similar to its horizontal accumulation record.
Based on the net changes in area for the three salt marsh complexes of interest and the net in vertical sediment accumulation from three cores collected within each marsh, the volumetric change for the small island, large island, and fringing marsh are -160, 220, 250 m3, respectively. With an estimated dry bulk density of ~0.5 g cm-3, these values equate to ~ -80, ~110, ~125 t of sediment trapped from 1939 to 1991 on the small island, large island, and fringing marsh, respectively. The Alsea Bay salt marshes have thus remained a net sink during this timeframe. I will improve these volumetric estimates by incorporating more sediment core data and also comparing the history of volumetric change with the sediment discharge record for the Alsea River, thereby creating estimates of trapping efficiencies for the estuary.
I am foremost interested in assessing changes within the high marsh extent of my salt marshes because these are the areas considered in quasi-equilibrium with relative sea level rise. As a bit of background, intertidal areas are typically divided between unvegetated mud flats (inundated most frequently and exposed only at low tide), low marsh (inundated less frequently with high tides), and high marsh (inundated the least frequently with only high higher high tides). However, in Oregon, high marsh and low marsh are both densely vegetated and look very similar in panchromatic aerial photographs. This represents one of the first major downsides to analyzing salt marsh change from HAPs – low marsh and high marsh must be considered together regardless of the research question. Fortunately, Oregon salt marshes typically exhibit very little low marsh area so my estimates are hopefully not too different from what they would be if I could distinguish between the two habitat classes.
Another major issue with analyzing HAPs is that the landward edge between high marsh and forest is difficult to distinguish due to the trees obscuring the edge. Depending on the location of the plane when taking the photographs and the time of day, the edge is either obscured by the trees themselves or by the shadows they cast. Without ground truthing, it is difficult to speculate how much this impacts estimates of the landward edge; fortunately for me, however, Oregon salt marshes typically extend to the base of Coast Range hills and so it is unlikely that there has been much landward migration of salt marshes over the last century. (This process is called coastal squeeze, and it poses a serious threat to long-term survival of Oregon salt marshes under accelerated sea level rise.)
The method of heads-up digitization, while straightforward but time consuming, seems to be the best suited method for digitizing panchromatic images. Unfortunately, I have not yet figured out the best method of assessing error associated with heads-up digitization. Additionally, I have not assessed the impact of scale on changing area. I have digitized my salt marshes at the smallest spatial scale possible for each photo. I hypothesize that smaller spatial scales would result in decreased volume as more and more tidal creeks are incorporated. The highest quality HAPs are 1939 and 1952. This is something I will have to think more about in the future.
A serious issue in sediment stratigraphy is the inability to easily observe reductions in accumulation, hiatuses, and periods of erosion. Thus comparison between the HAPs and core data is much clearer for periods of increased accumulation, as I have done. Though not much can be done to remedy this issue (think of publication impact factor if I could!), I will continue to acknowledge it.
Remaining questions and future directions
My immediate questions are:
How should I include a modern-day layer? Just outline the satellite basemap in ArcGIS Pro? (Hand digitization would be more precise than PMEP’s layers.)
How should I measure horizontal change more robustly – i.e. with better statistics?
How should I estimate error associated with heads-up digitization?
How should I be digitizing channels?
Going forward I plan to answer these questions ~ depending on how tricky they prove (simple questions always seem to be the hardest to answer), I plan to focus on at least questions 1, 2, and 3 for my next exercise. To answer question 1, the USGS Digital Shoreline Analysis System (DSAS; Himmelstoss et al. 2018) seems very promising so I will focus on getting my Alsea data into this program. DSAS requires error estimates for each shoreline layer and suggests a number of resources including Ruggiero et al. (2013) that I will investigate.
Andrews, A., & Kutura, K. Oregon’s timber harvests: 1949 – 2004. (2005). Oregon Department of Forestry. Salem, OR.
Himmelstoss, E.A., Henderson, R.E., Kratzmann, M.G., & Farris, A.S. (2018). Digital Shoreline Analysis System (DSAS) version 5.0 user guide: U.S. Geological Survey Open-File Report 2018–1179, 110 p., https://doi.org/10.3133/ ofr20181179.
Ruggiero, P., Kratzmann, M. G., Himmelstoss, E. A., Reid, D., Allan, J., & Kaminsky, G. (2013). National assessment of shoreline change: historical shoreline change along the Pacific Northwest coast. US Geological Survey.
Wheatcroft, R. A., Goñi, M. A., Richardson, K. N., & Borgeld, J. C. (2013). Natural and human impacts on centennial sediment accumulation patterns on the Umpqua River margin, Oregon. Marine Geology, 339, 44-56.
What kinds of attraction/repulsion processes might occur in your variable?
Due to the different soil characteristics of various regions in Oregon, landslides mainly occur in certain specific areas. In the point data, these points represent the place where the landslide occurred. In Exercise 1, these points are attracted to each other in high-risk areas.
Forthermore, point data and landslide susceptibility maps are also associated. For red and blue high-risk areas, they are mutually attractive with point data, and for green low-risk areas, they are mutually exclusive with point data.
What kinds of source/sink processes might occur in your variable?
After generating the hot spot map, different soil characteristics are the source of the variables. Some areas of the soil are more prone to landslides. In Exercise 2, the spectrum difference of high-risk areas will be analyzed. The high-risk area will generate force might occur landslides.
Does your variable A have different spatial patterns at different scales?
In the original spatial scale, point data are scattered, and after hot spot analysis, point data are clustered.
The Pacific Coast Feeding Group (PCFG) is a subgroup of the Eastern North Pacific (ENP) population of gray whales (Scordino et al. 2018). The ENP, a population of 24,000-26,000 individuals, migrates along the U.S. west coast from breeding grounds in the lagoons of Baja California, Mexico to feeding grounds in the Bering Sea (Omura 1988). The PCFG, currently estimated at 264 individuals (Calambokidis et al. 2017), stray from this norm and do not complete the full migration, instead choosing to spend their summer feeding months along the Pacific Northwest coast (Scordino et al. 2011). Since gray whales as a species already exhibit specialization (by being the only baleen whale that benthically forages) and since the PCFG display a second tier of specialization by not using the Bering Sea feeding grounds, it seems plausible that individuals within the PCFG might have individual foraging specializations and preferences. Therefore, my research aims to investigate whether individual gray whales in Port Orford exhibit individual foraging specializations. Individual foraging specializations can occur in a number of different ways including habitat type (rocky reef vs sand/soft sediment), distance to kelp, time and distance of foraging bouts, and prey type and density. For this class, my research question is whether prey quantity and/or community drives whale foraging.
The prey data has been obtained through zooplankton tow net samples from a research kayak in the summers of 2016, 2017, 2018 and 2019. Kayak sampling effort varies widely between the three years due to weather creating unsafe conditions for the team to collect samples. These samples have been sorted and enumerated to the zooplankton species level so that for each day when a prey sample was collected, we have known absolute abundances of prey species communities at each sampling station. Additionally, a novel GoPro method has been used to quantify relative zooplankton density at the same sampling stations.
The whale data is in the form of theodolite tracklines of gray whales that used the Port Orford study area during the summers of 2016, 2017, 2018 and 2019. Since whale tracking occurs at the same sites as prey sampling, we are able to map the prey community present at a particular location that whales forage at. The tracklines occur on a very fine spatial resolution as the study area is approximately 2.5 km in diameter, though some of the tracklines extend out to approximately 8 km offshore. Furthermore, as whales forage in the area, photographs are taken of each individual in order to match the trackline with a particular individual. This way, potential individual specializations may be detected if there are repeat tracklines of an individual. These tracklines have been analyzed using Residence in Space and Time, which assigns a behavior to each spatial point. The behaviors are broken down into three categories foraging, searching, travelling. Each of these behaviors is assigned a residual value (-1, 1, 0, respectively).
Gray whales will prefer areas with the densest prey, regardless of the community composition of prey found at different locations.
Daily interpolation layers will be created for prey density between the two sites, Mill Rocks and Tichenor Cove. These interpolations will then also be weighted according to the percentage that mysids and amphipods (the two main taxonomic prey groups) made up the community at each location. This will result in 3 layers per day: an overall interpolated prey density layer, an interpolated layer weighted by mysid abundance, and an interpolated layer weighted by amphipod abundance. Once these layers have been created, the RST analyzed whale tracks will be overlaid onto them. Statistical analyses (likely linear mixed models) will be used to determine whether overall prey density or specific prey communities drive gray whale foraging by relating the interpolated layers with the behavior residual values.
There will be many interpolated layers of prey density and weighted prey density. Furthermore, there could be some plots of overlaid tracklines showing where whales prefer to forage and search vs travel.
This spatial problem is important to science since genetic evidence suggests that there are significant differences in mtDNA between the ENP and PCFG (Frasier et al. 2011; Lang et al. 2014), and therefore it has been recommended that the PCFG should be recognized as being demographically independent. In the face of a proposed resumption of the Makah gray whale hunt as well as increased anthropogenic coastal use, there is a strong need to better understand the distribution and foraging ecology of the PCFG. This subgroup has an important economic value to many coastal PNW towns as many tourists are interested in seeing the gray whales. Therefore, understanding what drives their distribution and foraging habits will allow us to properly manage the areas where they prefer to forage.
I have novice/working knowledge of Arc-Info and Modelbuilder. I have never used Python before. I am proficient in R and image processing.
1a. What is A?
A is the behavior displayed by a gray whale at a certain location.
1b. What is B?
B is the density of prey as well as the weighted density of mysids and amphipods.
1c. What is the relationship you want to test between B and A?
Whether prey quantity and/or species drives whale foraging at a certain location.
1d. Why or how does B case/influence A? What is your hypothetical explanation for why or how B causes A (this is mechanism C)?
Gray whales, like most baleen whales, have a breeding season which they spend in warm waters and a feeding season which is spent in colder, more productive waters. For a subgroup of the Eastern North Pacific gray whale population, this summer feeding season is spent along the NW U.S. coastline (northern California, Oregon, Washington and southern British Columbia). The activities undertaken during each season is so distinct that no feeding is undertaken during the breeding season, and vice versa. As such, gray whales must regain 11-29% of critical body mass during the feeding season in order to prepare themselves for the breeding season during which they do not feed. Therefore, it seems logical to assume that their distributions and movements should be entirely dictated by where their zooplankton prey is and also, where it is the most abundant or most dense since their sole purpose during the summer is to feed.
1e. Now, write your question in this form: “How is [the spatial pattern of] A related to [the spatial pattern of] B via mechanism C?”
How is gray whale behavior (specifically foraging) related to the spatial pattern of prey density via optimal foraging theory.
Calambokidis, J. C., Laake, J. L. and A. Pérez. 2017. Updated analysis of abundance and population structure of seasonal gray whales in the Pacific Northwest, 1996-2015. Draft Document for EIS.
Frasier, T. R., Koroscil, S. M., White, B. N. and J. D. Darling. 2011. Assessment of population substructure in relation to summer feeding ground use in the eastern North Pacific gray whale. Endangered Species Research 14:39-48.
Lang, A. R., Calambokidis, J. C., Scordino, J., Pease, V. L., Klimek, A., Burkanov, V. N., Gearin, P., Litovka, D. I., Robertson, K. M., Mate, B. R., Jacobsen, J. K. and B. L. Taylor. 2014. Assessment of genetic structure among eastern North Pacific gray whales on their feeding grounds. Marine Mammal Science 30(4):1473-1493.
Omura, H. 1988. Distribution and migration of the Western Pacific stock of the gray whale. The Scientific Reports of the Whales Research Institute 39:1-9.
Scordino, J., Bickham, J., Brandon, J. and A. Akmajian. 2011. What is the PCFG? A review of available information. Paper SC/63/AWMP1 submitted to the International Whaling Commission Scientific Committee.
Scordino, J., Weller, D., Reeves, R., Burnham, R., Allyn, L., Goddard-Codding, C., Brandon, J., Willoughby, A., Lui, A., Lang, A., Mate, B., Akmajian, A., Szaniszlo, W. and L. Irvine. 2018. Report of gray whale implementation review coordination call on 5 December 2018.
Agricultural crop residue burning is a common practice among
farmers in many developing countries including Pakistan. It is used to clear
the fields of stalks left behind after the use of Combines to cut the harvest.
For most farmers, it serves as a low-cost alternative to prepare fields for
planting because the fires remove most post-harvest vegetative material and
reduce the risk of pest and disease. However, this practice also contributes to
air pollution by increasing emissions of fine particulate matter (PM2.5). Smoke
from these fires can trigger and exacerbate respiratory diseases especially
among children (Chakrabarti et al., 2019; Zhuang et al., 2018; Awasthi et al.,
2010; Balmes, 2010). PM2.5 exposure has been linked to increased
hospitalizations and asthma related emergency visits as well as reduced life
expectancy (Chen et al., 2017; Adar et al., 2013).
Research question: Is there any association between
individual level health outcomes and exposure to the pollution source i.e.
burning of agricultural crop residue.
Data: There are two main sources of data. The first
is a nationwide household survey data for Pakistan for the three years 2011,
2013, 2014. Around 8000 households were repeatedly surveyed across the three
years and location data in the form of latitude and longitude is available.
Individual responses reported within the health module are used to
measure health outcomes – specifically type of illness (if any) experienced in
the last 2 months.
The data on agricultural residue crop fires is from NASA’s
Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite data on active
fires. Daily fire (point) data for Pakistan is available and a global landcover
raster can be used to see which of the fires are occurring over agricultural
Land cover raster data from the European Space/Climate Change
Initiative (ESA/CCI) with a 300m spatial resolution was used to identify fires
occurring on cropped land.
Hypotheses: I hypothesize that respiratory illnesses are
spatially correlated and associated with higher frequency of exposure to crop
Approaches: I would like to use clustering / hotspot
analysis to assess possible spatial correlation, butmy concern is that
for the entire country it may not be useful since the exhibited pattern would
mirror population distribution and not provide useful results.
I hope to use buffer analysis to measure proximity to fires. To
empirically test the association between crop fires and health outcomes I would
use the following regression estimation:
Hit = β1(Exposure to crop fires)it
+ βZit + εit
where Hit = 1 if the individual reported experiencing
respiratory illness in the 2 months prior to the interview and 0 otherwise.
Exposure is the number of fires within a buffer radius over the past 3 months
and Zi represent a vector of individual and household level
controls. εit is the idiosyncratic error term. I can use three
estimations based on multiple buffer distances of 5km, 10km and 15km.
However, my concern is again that number of fires may not be a
meaningful assessment of exposure. At present, I do not have another way to
measure impact of agricultural crop residue burning.
Expected Outcome: I expect that health outcomes are
spatially correlated and in a way that can be explained by patterns of agricultural
Significance: The proposed analysis would examine
individual level health outcomes in regard to what is a prominent source of
pollution in South Asia – Agricultural crop residue burning. Political inaction
on this issue can to some degree, be tied to the lack of rigorous evidence on
the health effects of this practice. While this study would only look at the
immediate, short-term impact of exposure, any significant results would show
that there is in fact a causal association of burning crop residue on the
health of individuals living within some proximity of the fires. Such results may
contribute to policy emphasis on addressing a potential environmental concern
and to the formulation of agricultural policy that may help farmers to switch
to alternative harvesting practices.
Level of Preparation: ArcMap – working knowledge; R –
between novice and working knowledge; Python – no experience.
I am proficient in Stata and have been using that for data
manipulation and basic spatial commands (distance calculations).
Concerns: Most studies have used a fire ‘event’ which makes it easier to assess impact across time and areas (control and treatment). Agricultural crop fires are a continuous phenomenon across the year, though there is a spike in post-harvest period (Figure below). I am concerned about the time dimension of my data.
S. D., Sheppard, L., Vedal, S. et al. (2013). Fine particulate air pollution
and the progression of carotid intima-medial thickness: a prospective cohort
study from multi-ethnic study of atherosclerosis and air pollution. Plos
Medicine. Vol 10.
Awasthi, A., Singh, N., Mittal, S., Gupta, P.
.K, Agarwal, R. (2010). Effects of agriculture crop residue burning on children
and young on PFTs in North West India. Science of the Total Environment. Vol
408: 4440 – 45.
J. R. (2010). When Smoke gets into your lungs. Proceedings of the American
Thoracic Society. Vol 7(2): 98 – 101.
S., Khan, M. T., Kishore, A., Roy, D., Scott, S. P. (2019). Risk of acute
respiratory infection from crop burning in India: estimating disease burden and
economic welfare from satellite and national health survet data from 250,000
persons. International Journal of Epidemiology. pp 1113-1124.
Chen, L., Li, C. & Ristovski, Z et al.
(2017). A review of biomass burning and impacts on air quality, health and
climate in China. Science of the Total Environment. Vol 579.
Zhuan, Y., Chen, D., Li, R. et al. (2018). Understanding the influence of crop residue burning on PM2.5 and PM10 concentrations in China from 2013 to 2017 using MODIS data. International Journal Environmental Research and Public Health. Vol (15).
A description of the research question that you are exploring.
Use of unmanned aerial vehicle (UAV) platforms have been rapidly expanded in forestry remote sensing applications, especially from stand-level to an individual tree level measurement (i.e., precision forestry, forest management planning, biomass estimation and modeling forest growth) (Koch et al., 2006). Among those applications, acquisition of three-dimensional (3D) structural data from high-resolution imagery is a novel and powerful approach available with advances in UAV technology (Puliti et al., 2015; Zarco-Tejada et al., 2014). Photogrammetric derived tree canopy/crown information that can be used to detect the individual trees and specifically to monitor tree structure information (St-Onge, 1999; Jozkow et al., 2016).
In this study, I am interested in observing how the tree crown completeness/voxel size varies as a function of the number of points that arranged in the 3D point cloud. Since voxel size is important for visualizing the tree architecture, the number of points per unit volume (voxel size) considers as an essential measurement in photogrammetric analysis, especially for mapping the 3D structure of trees or determining the desired image quality of tree architecture (Fig. 1). Estimation of an optimal number of points or voxels required to delineate tree structure is an important parameter in terms of time and cost. In this study, I seek to determine the threshold value for voxel size ( or point cloud density) associated with crown completeness/desired image quality that can explain the architecture of tree seedlings (Fig. 2(a)).
Figure 1: Changing of tree architecture or the crown description with voxel size. (source: Hess et al., 2018).
A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.
For this study, I will use a hexacopter unmanned aircraft system (UAS) equipped with a high-resolution multispectral camera with five channels, including red, blue, green, near-infrared, and red edge to acquire the imagery from the study area. The spatial resolution of the multispectral sensor is 2.5 cm, and imagery collection for this study will be based on one-day data collection (temporal resolution). The extent of this study is approximately four acres and located in Benton County, Oregon, USA. The initial data (raw images) collected from UAV flights will be processed using Agisoft Metashape. Additionally, I will produce a 3-D point cloud using structure from motion algorithm in Agisoft Metashape. These products will be utilized to perform the spatial and statistical analyses using R studio, ArcGIS Pro, AgiSoft Metashape etc.
Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.
In this study, I am interested in observing the behavior of voxel size variation as a function of point cloud density and how the voxel size affects the desired image quality (Fig.2 (b)). With this intention, I am expected to have a relationship between voxel size and desired image quality (of tree architecture) (Fig.2(a)) where increasing the voxel size is going to change the desired image quality (tree architecture) (Fig.1).
Therefore, I hypothesize there is a threshold value for tree crown completeness/desired image quality of tree architecture at a particular voxel size. I expect change (decrease) in the desired image quality of tree architecture/crown description beyond this threshold value.
Figure 2: (a) Schematic diagram showing desired image quality (crown completeness) with respect to the voxel size for tree seedling (*the shape of this distribution may be changed). (b) Schematic diagram showing the overview of the voxel model. This represents the relationship of point cloud data to the voxel model (Source: Fujiwara et al., 2017).
Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.
Orthomosiac images can be used to detect the centroids and the canopy cover of each tree. I will perform a separate supervised classification to identify the seedlings and other landcover properties. (Alternatively, I will perform binary classification to identify the seedlings (1) and other landcover properties (0)). Approximately 70% of these pixels will be used as the training data set, while 30% of the data set will be used as validation data. Random Forest (RF) and Support Vector Machine (SVM) classifiers will be used for classification. Further, I will produce confusion matrices to evaluate the model performance of RF and SVM classifiers and identify individual seedlings with their canopy cover in the initial stage. Additionally, the location of each centroid will be assessed.
Further, delineation of the individual tree seedling can be done utilizing the point cloud data. I will perform a spatial arrangement of points as a function of data processing (i.e., by changing the parameters of generating point clouds) and define the crown completeness with respect to the number of points and voxel size (Fig.3).
Figure 3: Voxel resolution and point cloud binning process for a square field plot clipped from the ALS data. (Source: Almeida et al., 2019).
Expected outcome: what do you want to produce — maps? statistical relationships?
From this study, I would like to identify the spatial distribution seedlings and their centroid locations, utilizing UAV remote sensing techniques. I will produce maps of the spatial distribution of seedlings with their canopy cover. Additionally, I am expected to define a threshold value for voxel size to determine the seedling architecture/crown description. Finally, I will make additional maps showing the vitality map of seedling using the results obtained from point cloud data together with copula modeling.
How is your spatial problem important to science? to resource managers?
In sustainable forest management practices, estimation of macroscopic information/tree-level attributes including tree counts, locations of individual trees, tree crown architecture, and tree heights are essential for monitoring forest regeneration (Strigul, 2012; Mohan et al., 2017). Especially, assessing the vitality of seedlings is an important task for forest managers. With this proposed method, the characterization of seedlings can be done using remote sensing data in cost-effectively and accurately compared to conventional field surveys, which is expensive and time-consuming. The other importance of this study is the applicability of the proposed method to collect periodic data for individual tree seedling. This helps to understand the growth and vitality of seedlings with respect to time, which is another essential attribute for forest management practices.
Level of preparation:
(a) Arc-Info: I have
experience in ArcMap gained from previous classes and some research projects.
(b) Modelbuilder and/or
GIS programming in Python, No previous experience.
(c) R: Have substantial
experience in terms of statistical analysis with limited knowledge in spatial
(d) image processing, I
have gained some experience with Google Earth Engine and ENVI Abased on my previous
Almeida, D. R. A. D., Stark, S. C., Shao, G., Schietti, J., Nelson, B. W., Silva, C. A., … & Brancalion, P. H. S. (2019). Optimizing the remote detection of tropical rainforest structure with airborne lidar: Leaf area profile sensitivity to pulse density and spatial sampling. Remote Sensing, 11(1), 92.
Fujiwara, T., Akatsuka, S., Kaneko, R., & Takagi, M.
(2017). Construction Method of Voxel Model and the Application for
Agro-Forestry. Internet Journal of Society for Social Management Systems, Vol.
Hess, C., Härdtle, W., Kunz, M., Fichtner, A., & von
Oheimb, G. (2018). A high‐resolution approach for the spatiotemporal analysis
of forest canopy space using terrestrial laser scanning data. Ecology and
evolution, 8(13), 6800-6811.
Jozkow, G., Totha, C., & Grejner-Brzezinska, D. (2016).
UAS TOPOGRAPHIC MAPPING WITH VELODYNE LiDAR SENSOR. ISPRS Annals of
Photogrammetry, Remote Sensing & Spatial Information Sciences, 3(1).
Koch, B., Heyder, U., & Weinacker, H. (2006). Detection
of individual tree crowns in airborne lidar data. Photogrammetric Engineering
& Remote Sensing, 72(4), 357-363.
Mohan, M., Silva, C. A., Klauberg, C., Jat, P., Catts, G.,
Cardil, A., … & Dia, M. (2017). Individual tree detection from unmanned
aerial vehicle (UAV) derived canopy height model in an open canopy mixed
conifer forest. Forests, 8(9), 340.
Puliti, S., Ørka, H.O., Gobakken, T. and Næsset, E., 2015.
Inventory of small forest areas using an unmanned aerial system. Remote
Sensing, 7(8), pp.9632-9654.
St-Onge, B. A. (1999). Estimating individual tree heights of
the boreal forest using airborne laser altimetry and digital videography.
International Archives of Photogrammetry and Remote Sensing, 32(part 3), W14.
Strigul, N. (2012). Individual-based models and scaling
methods for ecological forestry: implications of tree phenotypic plasticity.
Sustainable forest management, 359-384.
Zarco-Tejada, P. J., Diaz-Varela, R., Angileri, V., &
Loudjani, P. (2014). Tree height quantification using very high resolution
imagery acquired from an unmanned aerial vehicle (UAV) and automatic 3D
photo-reconstruction methods. European journal of agronomy, 55, 89-99.
A description of the research question that you are
Landslides often occur in certain areas, and some areas have national highways or frequently used highways. Once a landslide occurs, travelers will spend a lot of time detouring, but this is very inefficient. In my research, I hope to find the most sensitive area by analyzing the spatial pattern of landslide susceptibility map and analyze the heat map. Mark the dangerous area and make a high-risk area map.
A description of the dataset you will be analyzing,
including the spatial and temporal resolution and extent.
The dataset I
will analyze is Oregon State landslides map and landslide susceptibility map.
The dataset comes from Oregon Department of Geology and Mineral Industries, the
spatial pattern of the landslides susceptibility area should be dispersed, so
these the further analysis will be done in this spatial pattern to generate
high-risk area. High-risk landslide area map should be clustered, the high-risk
landslide area will be shown clearly.
Hypotheses: predict the kinds of patterns you expect to
see in your data, and the processes that produce or respond to these patterns.
I expect my
pattern could has hotspot. First of all, the spatial analysis should be
generated, hexagon hotspot map will be shown, by this hexagon hotspot map find
highest landslide occur area.
Approaches: describe the kinds of analyses you ideally
would like to undertake and learn about this term, using your data.
I would like to
learn the classification analysis by spectrum difference. Also I want to learn
how to generate the most efficient route if one main route is closed.
Expected outcome: what do you want to produce — maps?
statistical relationships? other?
The Two map I
want to produce, the first is hotspot map, second is Spectral classification
Significance.How is your spatial problem important to
science? to resource managers?
landslide is very serious hazard in the world, not only affect traffic, but
also cause casualties and substantial economic losses, if the road or land
could be strengthen, or the highway could avoid those area, it will help a lot.
So, it is important to do landslide analysis.
Your level of preparation: how much experience do you
have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c)
R, (d) image processing, (e) other relevant software
processing: Very basic working knowledge
I also have very
basic Work knowledge of Google Earth Engine and ArcGIS Pro.
My master’s thesis is studying habitat quality of
black-tailed deer in western Oregon, and the link between habitat selection and
Although there are lots of spatial and temporal
relationships I want to analyze, for this blog post I will focus on the
question: what environmental characteristics (topography, vegetation) are
correlated with missed fixes in my study area? The extent of this problem is
important to identify and account for in habitat selection studies.
A description of the dataset you will be analyzing, including the
spatial and temporal resolution and extent.
Data downloaded from Lotek 3300S GPS collars include
coordinates, date and time and are obtained from GPS collars that were deployed
on black-tailed deer in western Oregon in 4 Wildlife Management Units (WMUs;
Alsea, Trask, Indigo, Dixon). Collars stay on the deer for up to 72 weeks, but
may be shorter if there is a collar malfunction or if the deer dies. Deer capture
effort lasted from 2012-2017, and I will be focusing on winter season only
(January 1 – March 31).
the kinds of patterns you expect to see in your data, and the processes that
produce or respond to these patterns.
My response variable is fix success rate (number of acquired
fixes divided by number of total fix attempts) and my predictor variables are
Fix success will likely be higher where there is less
obstruction to satellites. I expect that fix success will be negatively related
to steep terrain, dense canopy cover, larger trees and forested land cover
types. I predict that there will be a positive association between fix success
and flat terrain, low canopy cover, small trees, and non-forested land covers.
I am testing the magnitude of GPS bias (location error and
missed fixes) by deploying GPS collars for 1 week at test sites in my study
area. At the moment we are still collecting data and have one week left to
complete the tests.
Test sites have known environmental characteristics, and
were specifically chosen in a systematic sampling design to capture variation
in topography and vegetation across the study area (n=50). Nine collars were
rotated between sites over the course of 6 weeks. Measurements are taken on the
ground for each site (technicians were told to focus on a 30x30m area), and
variables include: canopy cover, slope, aspect, dominant land cover type,
dominant tree species and their associated diameter at breast height, tree
height, and whether the stand is even-aged or not. Other variables for that are
remotely sensed include topographic position index (10x10m) and elevation
(10x10m). Collars were programmed to reflect the same fix schedule as the
collars deployed on deer: 1 fix every 4 hours.
the kinds of analyses you ideally would like to undertake and learn about this
term, using your data.
I will determine probability of fix success (PFIX)
for each stationary collar test site using logistic regression and perform
model selection using AICc. I will utilize the top model to predict
the probability of fix success for each pixel on the landscape in my study area
using remotely sensed data (GNN 30x30m; Statewide Habitat Layer 30x30m, DEM
10x10m). This “predicted fix success map” will be used as a correction factor
in habitat selection analyses (also a logistic regression analysis). Each deer’s
GPS location will be weighted by the inverse probability of fix success (1/PFIX)
associated with that pixel’s fix success.
what do you want to produce — maps? statistical relationships? other?
I want to produce a map of predicted fix success for the
Significance. How is
your spatial problem important to science? to resource managers?
Although GPS technology have opened the door for researchers
to gain massive amounts of information about their study system, it is not
perfect. Habitat-induced bias caused by obstruction to satellites produce
location error and missed fixes and can bias habitat selection studies, and
ultimately researchers and managers can misinterpret what habitat variables are
actually being selected or avoided.
Columbian black-tailed deer population trends are declining
across their range and are relatively understudied compared to other ungulate
species. As an important prey item, game species, and charismatic megafauna,
Oregon state wildlife managers are interested in determining causes behind the
apparent population decline.
Your level of
preparation: how much experience do you have with (a) Arc-Info, (b)
Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e)
other relevant software
Arc-Info: ArcMap- working knowledge.
Modelbuilder and/or GIS programming in Python: none
R: Working knowledge
image processing: none
other relevant software: FRAGSTATS- very little. I’ve
attempted using package Landscapemetrics program FRAGTATS on my own. But not
Human behavior in North America can easily be
identified through the lens and in the voice of early Europeans who colonized
and claimed rights over every living species on this “new found land” even
fellow human species. Through this lens, the stories and lives of early
Indigenous communities in the Americas have been silenced and, in the void, we
can read about every “discovery” that early settlers made across this
“untouched” land. For the first time, there is a chance to intensively study an
area in North America that contains the oldest evidence of human activities in
the state of Idaho; roughly 16,000 years old (Davis et al 2019).
My thesis research seeks to better explain the human
activities at this archaeological site in order to shed more light on how early
humans interacted within their landscape through time. This will create a space
for discussions on how Indigenous early settlers used their landscape for over
15,000 years prior to when European “discoveries” were made. There must be
something truly unique about this area in Idaho for people to keep coming back
and utilizing this landscape. Humans are, after all, animals of patterns and
consistency; this is what my project in this class will be focused on.
How is the spatial distribution and
patterning of artifacts related to human behavior and natural transforms at one
point in time and through time at this archaeological site?
Description of Dataset
The dataset that I am working with is an artifact
assemblage of 327 artifacts that have been recorded with x-y-z-coordinates in the
constraints of a 10-meter x 5-meter x 1-meter excavation interval. The temporal
setting has been calculated through charcoal remains of two hearths that date
to around 12,000 years ago. Everything else must be in relative setting to this
data. This is only a partial segment of the archaeological site, but my hope is
to create a standardized procedure that can be used for any number of artifacts
within any constraint and at any location.
The artifacts will exhibit significant clustering in 2-Dimensional and
This pattern is commonly produced through human use
and re-use of specific areas across the site. Utilization of space by humans
leaves evidence of use in the form of features or artifacts such as hearths and
Hypothesis 2. The
clustered areas will show significant differences in artifact type (human
behavior) both across the site horizontally and vertically.
in horizontal dispersion shows diverse site use during one point in time.
in horizontal dispersion may present unsystematic variation in my dataset.
in vertical dispersion shows diverse site use through multiple points in time.
in vertical dispersion shows similar site use through multiple points in time.
A pattern such as this is caused by segregated use across
the site at any one point in time. For example, it would not make logical sense
to produce stone tools near and area that is used for processing food in fear
of contamination. Therefore, a processing area should be shown by specific
stone tools, bones, and very few debris from stone tool production. On the
other hand, stone tool production should contain a relatively high amount of
stone debris with little evidence for food preparation.
Hypothesis 3. Using
observational statistics, the physical distances between clusters and within
the clusters will represent how human behavioral practices were employed at the
site and how space was utilized by past people.
Hypothesis 2 addresses the similarity of artifact
types within each cluster horizontally and vertically; this approach uses
feature space. Hypothesis 3 seeks to address the physical spatial relationships
between the clusters as separate entities and between the artifacts within each
of the clusters. My thinking in this regard is: as people use space in their
environment, highly organized clusters may indicate an area of primary refuse
(planned use of the environment), whereas, less organized clusters may
represent an area of low frequency use (not-planned use of the environment).
Furthermore, the physical space between clusters may indicate how past people
perceived their environments and the choices made to organize it.
I have begun to utilize spatial clustering tests
within the dataset that have available for this project and study. Among these
tests, I have explored Ripley’s K, L-function, and G-function. For the purpose
of calculating accurate clusters in an archaeological setting, I have also
explore using k-means clustering and a density-based clustering algorithm. Both
have their advantages and disadvantages for my specific aims (e.g. Baxter 2015,
Ducke 2015). For the purpose of this class, I wish to expand on the knowledge
and understanding of these functions as well as an increased understanding for
general statistical methods and procedures. My hope is to identify specific
clusters within my dataset and then statistically correlate each artifact
within that cluster with each other and then correlate that cluster with the other
clusters identified across the site horizontally and vertically.
As an outcome for this project, I expect to arrive at many statistical relationships, but I wish to emphasize the best ways to visualize these relationships. I seek this type of visualization because, unlike many other studies, one of the most influential factors for these relationships is human behavior and we all know how hard it can be to predict human behavior. Thus, for this reason, I want to express my results in a visual way in order to take the numbers and algorithms out of the data because computations will never fully and accurately predict human behavior or correlation. It is important to recognize the potential ethical problems with representationalism given the strict computational approach I am seeking. Therefore, to put computed numbers and a purely human visualization of data on a more even playing field, I seek both statistical relationships as well as 2-Dimensional and 3-Dimensional mappings of the data and corresponding relationships.
I listed the global significance within the ‘Research
Question’ section of this post; that is, the examination of human behavior
within an area that contains the oldest evidence for human occupancy in North
America. In terms of this course, the significance of my proposed procedure is
found within the emerging theme of 3-Dimensional representations in the
archaeological science. Space has always been on the mind of archaeologists but
only recently has there been the capabilities for a push in the direction of 3-Dimensional
and even 4-Dimensional understandings of human behavior. This project will not
only solidify an understanding in the perception of N-Dimensional space, but it
will help to pave a path towards a more standardized approach in identifying
past human behavior through time and space. Additionally, the way in which
archaeological sites are excavated in the future will be forever changed based
upon the intricacy that is required for producing the models that I seek to examine.
I would say that I am proficient at using both ArcGIS
and ArcScene. I have used these two synchronously for the past couple years
within an academic setting for the production of predictive maps, simulations,
and a plethora of other academic inquiries. Additionally, I have some
experience playing around with LiDAR data within these systems. I have only
used model builder within ArcGIS a couple times and would say I am a novice at
this function as well as GIS programming in python. I have coding experience in
java and R but have very little in python. I would say that I am proficient at
utilizing the tools, understanding, and implementing ideas within R, though the
code may not be as efficient as it could be. I do not have much image
processing experience or other software systems that could be of use during
Baxter, M. J. “Spatial k-means clustering in archaeology–variations on a theme.” Academia (2015).
Davis, Loren G., David B. Madsen, Lorena
Becerra-Valdivia, Thomas Higham, David A. Sisson, Sarah M. Skinner, Daniel
Stueber et al. “Late Upper Paleolithic occupation at Cooper’s Ferry,
Idaho, USA, ~ 16,000 years ago.” Science 365, no. 6456
Ducke, Benjamin. “Spatial cluster
detection in archaeology: Current theory and practice.” Mathematics
and Archaeology (2015): 352-368.
How is the density of woody debris related to window
size via resolution? I will have window sizes at various spatial scales, i.e.
1x1m, 3x3m, etc., and I will manipulate the resolution of my associated classified
raster (native resolution is 0.0588 m).
Description of dataset
We acquired multispectral imagery from three UAS
flights on 9/23/2019 between approximately 12:00 to 13:45 PDT. Imagery is also
available from August 2019 flight to conduct potential temporal analysis.
Resolution (EO bands) is 5.9 cm/pix. Phase I of South Fork McKenzie Floodplain
Enhancement Project is approximately 150 acres. I have produced a shapefile of
classified woody debris using a random forest supervised classification (kappa ~
I hypothesize wood density will decrease
with coarser resolution because I will lose detail of smaller diameter
I expect the spatial pattern of wood density to be
clustered. There will be areas of moving water that have relatively low wood
density, and still areas where wood will be deposited with high density.
I anticipate this clustered pattern of wood
density due to fluvial patterns of the river.
I would like to either develop a
tool in ArcGIS Pro or write a script in R that adjusts resolution of the raster
and sliding window. In this way, I will be able to simplify wood density
outputs by only adjusting resolution and window size.
I want to produce a map of wood density
throughout the South Fork McKenzie site at various resolutions. Additionally, I
would like to analyze spatial statistics of woody debris distribution.
Woody debris is a critical component of forest
ecosystems and essential for wildlife habitat, especially for fish (Howson et al. 2012). Current methods for detecting and quantifying
woody debris rely on ground surveys that are expensive, time-consuming, and
potentially dangerous when working around moving water. Development of remote
sensing methods using relatively low-cost unmanned aircraft promises a safer,
more efficient alternative compared to current ground surveys.
Level of preparation
ArcInfo – No experience
Modelbuilder – novice: some experience in GEOG
Python – working knowledge: I took the python GIS programming course
(GEOG 562), but I am a little rusty.
R – Proficient: This is the software I use most
frequently for statistical analysis and have done some spatial analysis with it
Trimble eCognition – Novice: I have some
experience, but I may have difficulty accessing software due to COVID-19
Agisoft Metashape – Working knowledge: I use
this software frequently to produce othomosaics and point clouds from UAS
Howson, T.J., Robson, B.J., Matthews, T.G.,
and Mitchell, B.D. 2012. Size and quantity of woody debris affects fish
assemblages in a sediment-disturbed lowland river. Ecological Engineering 40: 144-152. doi:https://doi.org/10.1016/j.ecoleng.2011.12.007.
My current research is focused on accessing the source of the Fe on the North Atlantic Ocean by using HYSPLIT back trajectory modeling. For this class, I am going to focus on my research about the relationship of dust deposition to marine productivity in the ocean. The dust source will be derived from HYSPLIT back trajectory modeling from NOAA and the marine productivity will be accessed from BATS (Bermuda Atlantic Time-series Study) discrete bottle data in Bermuda Ocean by using the Particulate Organic Phosphorous (POP) and PP (Primary Production) component.
The challenge is the time series and the depth of the dataset, three of these datasets have unique time interval, and for some data, there is no correlation between depth and the value of POP and PP where it is expected that less than 200 meters the POP and PP will be affected by the dust deposition.
From fig 1a and 1b, we can see the spatial pattern of the POP and PP location, hence this study will focus on that hotspot location to generate the hysplit back trajectory data.
1. POP and PP data from BATS bottle.
This dataset is derived from BATS bottle data, the temporal resolution is daily, and only the appropriate data is used for this analysis. It is an in-situ data. Figure 2 describes the availability of data based on the date of the year.
2. Hysplit is vector data, that will show us the source of the dust that affects the marine productivity in Bermuda Ocean. The spatial resolution oh hysplit data is 500-meter AGL. The back trajectory will be generated based on the cluster coordinate on fig 1 and date on fig 2. Temporal resolution is every hour.
1. The spatial pattern of POP and PP is clustered or has a hotspot, and the abundance of marine productivity is starting from spring until fall.
2. Atmospheric transport and deposition dust-associated iron (Fe), causing the Fe and Pi (phosphorus) sufficient and limitation, thus it will affect the marine ecosystem (Letelier et al, 2019). The near-crustal dust from organic dust or Sahara dessert is proven insoluble and has a less significant effect on North Atlantic Ocean productivity compare to the anthropogenic dust from north America which soluble (Conway et al, 2019). The general hypothesis of this study is the hysplit back trajectory will show that the marine productivity in the Bermuda ocean is more likely derived from anthropogenic dust in north America.
4. Methodology Approach
1. Eliminate missing and bad data of POP and PP by using Matlab; 2. Time series analysis to see the temporal and spatial pattern ; 3. Clip the hysplit data by the result number 2; 4. Analyst the source of dust that affect marine productivity by regression; 5. Seasonal time series analysis using Matlab.
The outcome of this study is the map of dust trajectory and the evaluation of the seasonal dynamic of marine productivity in the Bermuda ocean.
6. The significance
This study will help us to quantify the role of ocean-atmosphere coupling on the air-sea exchange of carbon export to the ocean interior. Carbon export includes both inorganic and organic carbon (OC), and for the latter, both particulate and dissolved phases. Of particular interest, in light of clear evidence for ocean acidification, this study will help us to understand how our behavior contributes to the production of the anthropogenic aerosol. This anthropogenic aerosol will affect marine productivity, particularly the pelagic ecosystem.
6. Level of proficiency
I have proficient skills for ArcGIS desktop because I have been using it since I was in undergraduate study. However, I never use ArcGIS pro. My R knowledge is just for statistical analysis, not for spatial statistics. I do not know how to connect the R and ArcGIS, in this study I will not use R. I ever used phyton when I was an undergraduate however I never go deep into it and never use it anymore. I ever took Digital Image Processing (GEOG 581) class and very confident with google earth engine coding. I also use Matlab for geoshow and spatial analysis.
1.My primary research question is: How is the spatial pattern of Greenland outlet glacier extent (or, more specifically, their rate of growth/retreat) related to the spatial pattern of local glacier velocity via the mechanisms of glacier mass balance and basal slip?
Put simply: Are the glaciers that are retreating faster also
physically flowing faster, or not?
2.I have 2 primary datasets for this project. The first covers what I consider to be my response variable: rate at which Greenland outlet glacier termini are moving (either advancing or retreating). This variable will be constructed from annual polyline shapefiles sitting on the visible edge of the glacier, over ~4 different years. The spatial pattern of these termini lie along the outer edge of Greenland’s geographic border, because that is where outlet glaciers terminate.
This dataset consists of a glacier ID consistent across every year (I will
likely use 2015, 2016, 2017, possibly 2001). The dataset was produced via image
processing of data from satellite sensors indicated in the link (there are
several). Each shapefile consists of a vector polyline that represents the terminus
extent of the glacier in a given year (winter max extent).
SAMPLE IMAGE OF DATA:
My second dataset covers what I consider to be my
influential variable: glacier velocity. This dataset is a raster that covers the
entire Greenland continent. There are clearly dendritic-hotspots along the
outer edge of the continent, where the ice moves the fastest. The center of the
continent, where most of the ice accumulates, has very little horizontal
movement (only a few meters/year).
This larger dataset represents annual ice velocity magnitudes across the
surface of Greenland at 200 or 500m, annual resolution. Data is also available
for separate x and y components of velocity, as well as their errors. This is a
raster file covers the entire Greenland ice sheet, including all outlet
glaciers mentioned above^.
SAMPLE IMAGE OF DATA:
3.My initial hypothesis is that statistically, faster flowing glaciers will tend to retreat faster. My explanation would be that vulnerable glaciers that are melting typically develop wet, slippery basal conditions. The slippage along the ground also allows the glacier to move faster. These same melting glaciers (with a negative mass balance) also typically experience a retreat of their termini, because they are melting and calving away. This is why I would generally expect a positive correlation between a glacier’s velocity, and the distance or area of retreat.
However, among healthy, dry-based glaciers, the relationship
might be totally different. In these cases, glaciers with the highest
accumulation (aka most snowfall) flow the fastest because they have so much
mass to move to maintain equilibrium. This provides an alternate hypothesis
that the fastest glaciers are the ones with the highest accumulation,
regardless of terminus behavior.
As far as geospatial distribution goes, I’d expect rapidly
retreating glaciers to have a clustered geospatial distribution. There are
probably some areas where most of the glaciers are retreating, and a few areas
where the majority of the glaciers are stable or growing. That hypothesis is
just based on the observation that entire ice sheets do not melt at the same
rate (for example, the Antarctic Peninsula is retreating far faster than the
Eastern Antarctic Ice Sheet (EAIS)) Rather, ice sheets move/deform based on their
own individual dynamics and local climate, which includes variables like
temperature, precipitation, wind, cloudiness, and perhaps most importantly,
ocean conditions. Considering the southeast portion of Greenland is subjected
to the “rapidly warming and weakening” Gulf Stream, I’d predict that region of
Greenland to be the most unstable and to display the greatest reduction in
4. I suppose Exercise 1 for me will focus on my response variable: the termini positions. I will need to develop a method to calculate the distance between 2 (jagged) polylines, and then determine whether that magnitude is considered an advance or a retreat… over multiple different years (with multiple years I can also determine the if the RATE of retreat is speeding up or slowing down). I can then create “points” out of these measurements, and determine if points of similar magnitude are clustered or evenly distributed, etc.
For the second step, I am envisioning working with the influential
variable: glacier velocity. I think it would be interesting to create a hotspot
analysis of this map, as well as create a map displaying vector directions and magnitudes
of ice movement, to visualize directionality. I also think I will need to find
a way to vectorize/ section off individual glaciers from the raster, possibly
using a fixed buffer from the termini data. I’m also debating bringing in some
mass balance data (gravitational mass anomaly), to see how glacier mass balance
correlates with velocity.
For the final step, I’d like to connect the 2 datasets to
interpret their relationship. I’d like to run a couple of correlation tools (Global
Moran’s or something) to see quantitatively how distance retreated compares to
glacier velocity. Finally, since I can’t expect the entire region to behave
similarly, I’d like to cut Greenland into 6-8 sections and see how each
behaves, to compare and contrast.
5.The expected outcomes are pretty diverse. The big thing I’m after is a statistical relationship between these variables, which is really just a number (R-squared). However there are several maps I am interested in making, including those depicting calculated rates of retreat, one depicting vector directions and magnitudes of ice movement, and a gridded map of Greenland depicting the R-squared between my variables in each region. While the answer to the question is a statistical relationship, there are several informative maps to be made along the way.
6. I think the scientific significance of this is mainly pointed towards making accurate predictions of sea level rise. Developing such predictions involves taking into account several not-so-obvious glacial mechanisms, and identifying which glaciers are most at risk. If ice velocity gives some indication towards which glaciers might contribute to sea level in the near future, then it could make the difficult job of modeling sea level a bit easier. It is also information that could be useful when put towards ecology and a handful of local communities. Data on recently uncovered land could help explain ecological shifts and invasive or unfamiliar species.
-I have a working knowledge of ArcGIS Pro, based on the GIS
I, GIS II and GIS III classes in the certificate program. This is more a
textbook knowledge rather than applied experience.
-I have no experience whatsoever with R. I am not sure I
wish to learn it, but if this class convinces me it is critical to a solid
geospatial repertoire, I will concede.
-I have a working knowledge of Python, and I can use online
resources to get most things done.
-I’ve used QGIS, ENVI, and MATLAB, but am not particularly well versed in any of them.
How are changing farm sizes and population correlated? There
have been trends of agricultural intensification in Nebraska, Kansas, Oklahoma,
and Texas. I am curious to see if the shift in distribution of farm sizes from more
smaller farms to more larger farms has an effect on the population sizes.
In order to address this question, I will be looking at the United States
Census for population data at the county level in 2010 and 2018. The 2018 data are
estimates since the census is only collected every ten years. The United States
Department of Agriculture (USDA) National Agricultural Statistics Service (NASS)
runs a Census of Agriculture every five years, which collects information about
farms throughout the country. I will use 2012 and 2017 Ag Census data at the
county level. The Ag Census reports the number of farms in a handful of size
ranges as well as the area of farms within those size ranges. They also report income
from farm production, and whether the farms are owned by individuals, partnerships,
or corporations, which I might include in the study to provide more context. I
will include county-level data from the US Census and Agricultural Census for all
of Nebraska, Kansas, Oklahoma, and Texas.
Figure 1: Sample map from the interactive Ag. Census viewer
of increases and decreases in the number of farms with 2,000 acres or more between
2012 and 2017. This is the largest farm size category in the data.
In the Agricultural Census data, I expect to see an increase
in farms with larger sizes and a decrease in farms with smaller sizes. I do not
know enough about the region to predict the pattern of the increase. Yan and Roy
(2016) calculated field size with remote sensing and from their maps it appears
that the Western sides of these states have much larger farms in data from
2010. I predict that in the time frame I
am studying the range of large farms will extend more broadly across the Mid-West.
In the US Census data, I expect to see slight decreases of
populations in rural counties but increases of populations in counties that
contain cities and those immediately surrounding cities. McGranahan and Beale
(2002) found that was the case in the 1990s because of lack of income opportunities
in rural areas, forcing many to move into urban and suburban counties in search
I hypothesize that the there will be overlap in the counties
that have increasing numbers of large farms and decreasing populations. I
hypothesize that this pattern occurs because individuals who farm for large
corporations have funds available to buy large plots of land, and then are able
to manage farms with higher economic efficiency and therefore sell their products
for cheaper prices. The smaller farms in the area are then priced out of the
market and can’t afford to continue farming and leave for urban areas in search
of other sources of income. This then takes away business from the local small
businesses such as doctors’ offices, post offices, and other local stores.
I would like to explore using Python to find analyze the patterns
and do some statistical analysis with this data. I plan on exploring
statistical analysis with spatial data and calculating if any correlation between
increases in farm size and decrease in population is statistically significant.
I will also have to aggregate the farm size data from categories of farm size
into a general increase or decrease. Or potentially doing separate analyses for
the different groupings. I may have to expand my analysis into R to do some of
the statistical computations.
I expect to produce maps with the patterns that I find in
both datasets and between the two. I will also produce statistics on the relationships
between the patterns. I will also likely produce some maps with contextual
information about agriculture, economics, and population.
This research is important to policymakers because it is important
to understand the impact of changing industry on people’s living and economic
situations. Municipalities are typically
concerned with their economies and population growth. If there is a clear
correlation between a change in an industry and a change in population and available
business, it might be reason for more research or a change in policy. This
correlation might also have implications for soil health and the general environmental
sustainability of agriculture.
I have many years of using ArcGIS (both in an out of school)
and will likely use it for the final cartographic steps in making my maps.
I have a little bit of experience in both Python and R, which I will use for my statistical analyses. I don’t think I will be using remote sensing in this project, but I am proficient in Java Script for Google Earth Engine if I need it.
McGranahan, David A. & Beale, Calvin L., (2002). “Understanding Rural Population Loss,” Rural America/ Rural Development Perspectives, United States Department of Agriculture, Economic Research Service, vol. 17(4), December.
Yan, L., & Roy, D. P. (2016). Conterminous United
States crop field size quantification from multi-temporal Landsat data. Remote
Sensing of Environment, 172, 67–86. https://doi.org/10.1016/j.rse.2015.10.034
Research Question: How is the temporal pattern of population flow by county related to the temporal pattern of flood insurance claims by county via risk compensation?
Flooding is eminently regarded as the costliest and most prevalent natural hazard around the world and within the United States (Miller et al., 2008; Kousky, 2018). On average, flooding in the United States causes approximately $7.96 billion in damages per year (NWS, 2018). Both the magnitude and frequency of floods are expected to increase due to climate change (IPCC 2007, 2012) along with population growth and increased economic assets in coastal zones (Jongman et al., 2012). My analysis seeks to contribute to a body of natural hazards and risk research by seeking to understand the influence of flood insurance on population mobility over time. I use “risk compensation” as the operative term to describe the flood insurance mechanism for economically valuing the risk of its participants by their asset exposure and steps toward hazard mitigation.
2. Data Description
National Flood Insurance Program (NFIP) claims data, operated through the Federal Emergency Management Agency (FEMA), includes the dollar amount paid, the date of loss for the filing, the zip-code and county for the claim. This data spans from 1977 – 2018. From the dataset I am able to aggregate the number of claims filed for a given county annually. I can create a threshold for flood-occurrence versus no flood occurrence by the annual number of claims through the creation of a binary dummy variable (i.e. 0=no flood occurrence, 1=flood occurrence). My analysis will be focused only on the counties with active insurance claims in the continental United States. See below for a figure of the number of insurance claims in force by state.
Internal Revenue Service (IRS) migration data shows the number of tax filings by location at the county level from 1990-2010. From this data I can determine origin and destinations of tax filers over time which can be represented by in-flows (in-migrants), out-flows (out-migrants), and net-flows (net migration). I will be only including population data for the counties to which I have a match for flood insurance claims. See below for a sample county (Beaufort County, North Carolina) data screenshot that includes these population metrics.
I predict to see a statistically significant relationship in total population movement (in-flow or out-flow) in counties with high volumes of flood claims. I also predict to see a significant relationship in counties with high monetary amounts paid out from claims and lowered population out-flows.
I have created a large panel dataset from which I will perform multivariate regressions to gauge statistical significance in a relationship between population and insurance claims. I will seek to test that all assumptions of my regression model are appropriately met (normality, no multicollinearity, homoscedasticity, etc.).
5. Expected Outcomes
I wish to demonstrate a statistical relationship between variables and visualize my results appropriately through plots, charts, and maps.
6. Significance of Research
My analysis applies to both policymakers and to the larger body of natural hazards and risk research. As stated above, flooding and flood risk is an increasingly costly issue that requires interdisciplinary responses. My research can help inform these varied response efforts.
7. Level of Preparation
I have significant experience using R to process and transform datasets as well as for statistical analysis and graphing. I also have experience using Stata for statistical tests and creating result outputs. I have extensive experience using ArcGIS and moderate experience with Google Earth Engine. I have some experience with Python and will be taking a Python workshop concurrently at the start of this quarter to advance my skills further in this. I hope to utilize R for data transformation as well as mapping and result communication. I will use Stata in conjunction with R for data analysis and statistics.
Field, Christopher B., et al. “IPCC, 2012: Managing the risks of extreme events and disasters to advance climate change adaptation. A special report of Working Groups I and II of the Intergovernmental Panel on Climate Change.” Cambridge University Press, Cambridge, UK, and New York, NY, USA 30.11 (2012): 7575-7613.
IPCC, Climate Change. “The physical science basis. Contribution of working group I to the fourth assessment report of the Intergovernmental Panel on Climate Change.” Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA 996 (2007): 2007.
Jongman, B., Ward, P. J., & Aerts, J. C. J. H. (2012). Global exposure to river and coastal flooding: Long term trends and changes. Global Environmental Change, 22(4), 823–835. https://doi.org/10.1016/j.gloenvcha.2012.07.004
Kousky, C. (2018). Financing Flood Losses: A Discussion of the National Flood Insurance Program. Risk Management and Insurance Review, 21(1), 11–32. https://doi.org/10.1111/rmir.12090
Miller, S., Muir-Wood, R., & Boissonnade, A. (2008). An exploration of trends in normalized weather-related catastrophe losses. Climate extremes and society, 12, 225-247.
research suggests that most Oregon salt marshes have survived 20th
century relative sea level rise by accumulating sediment at a pace similar to
or exceeding the rate of sea level rise, except Salmon River Estuary and Alsea
Bay (Peck et al. 2020). Additionally, though we predict that salt marsh growth
can only occur under rising sea level and is limited by the rate of relative
sea level rise, Nehalem Bay salt marshes are growing much faster than the local
pace of sea level rise and salt marshes in the Coquille River estuary are
growing vertically despite experiencing relative sea level fall. Are these
patterns also reflected in the patterns of horizonal growth of the salt
course, I would like to georeferenced and digitize historical aerial
photographs from at least two Oregon estuaries – Nehalem Bay and Alsea Bay –
and estimate error associated with my digitization. I would then like to
compute changes in vegetated salt marsh extent from 1939 to the present and
combine this with vertical accretion rates measured using excess 210Pb
from sediment cores. Ideally, I would be able to create a model of 3D change
for the last ~80 years. If this method works, I will expand my research to two
additionally estuaries – likely the Salmon River Estuary and Coquille River
My main dataset will be historical aerial photographs that are roughly decadal from 1939 – 1990s that span four Oregon estuaries: Nehalem Bay, Salmon River Estuary, Alsea Bay, and the Coquille River Estuary. The resolution varies a lot on the images with the 1939 typically being the highest quality (I estimate ~0.5 m; see Fig 1). My sediment core dates typically extend ~120 ± 20 y and are also roughly decadal as well. I have between ~8 to 14 cores per estuary. I only plan on analyzing the parts of the salt marsh from which I was able to collect sediment cores.
on the historical aerial photograph analysis, the Alsea Bay “least disturbed” salt
marshes of interest will exhibit net erosion while the Nehalem Bay “least
disturbed” salt marshes of interest will exhibit net sediment accumulation
similar to the patterns observed in the vertical sediment accumulation data.
periods of horizontal growth of the salt marsh will correlate with time periods
of vertical growth measured within sediment core data previously collected in Alsea
Bay and Nehalem Bay.
Bay will exhibit more periods of growth, observed in both horizontal and
vertical data, than Alsea Bay despite experiencing similar rates of relative
sea level rise.
Periods of growth, observed in both horizontal and vertical data, will correspond
with periods of intensive land use (especially logging), with wildfire (especially
the Tillamook Burns), and with large flood events (especially the 1996 500-y
return interval flood). Differences in the intensity of logging and wildfire
between the two watersheds may explain apparent drowning of Alsea Bay though it
experiences similar 20th century relative sea level rise as Nehalem
To test these hypotheses, I know I will be doing a lot of georeferencing, digitizing, and associated error analysis but I’m uncertain what to call the rest of what I’ve described – i.e., creation of a 3D model of change. To my current knowledge, many have performed similar methods using Landsat images (e.g., Miller et al. 2017) that may also be applicable to panchromatic historical aerial photographs. The methods of others who have used historical aerial photographs in their analysis seem promising; however, these often rely on ground-truthing (e.g., Ballanti et al. 2017), which is beyond the scope of this study. Other promising methods I will investigate over the coming days/weeks are those by Goodwin & Mudd (2020).
I would like to publish this project as a manuscript. The publication would
include numerous figures I would like to produce in this class including a map
of hotspots of salt marsh growth or erosion. I also envision a lot of figures comparing
horizontal and vertical changes within the estuaries.
this project has both local significance, specifically for Oregon coastal
community members, and global significance, specifically for the intertidal ecogeomorphologist
community. Primarily, salt marshes provide numerous important ecosystem
services including habitat for economically and culturally important fish and
shellfish; flood protection; filtration of nutrients, pollutants, and sediment;
recreation; and carbon burial. However, salt marshes are threatened by rising
sea level and reduced fluvial suspended sediment loads. Despite their
importance and threatened status, we are still unsure the method by which they
grow or erode. Theoretically, we expect that as relative sea level rise, both
accommodation space and hydroperiod increase causing salt marshes to grow at a
similar pace; however, despite no sediment limitation some Oregon estuaries
appear to be drowning (Salmon River and Alsea) while others are growing faster
than what we would expect given relative sea level rise (Nehalem and Coquille).
It is likely that fluvial sediment supply – both magnitude and timing – play a
role in salt marsh growth, but the relationship remains unclear. By comparing
growth patterns to the histories of landscape change and climate oscillations
within the associated watersheds perhaps we can elucidate these linkages.
my level of preparation:
ArcGIS pro: working knowledge
Modelbuilder and/or GIS programming in Python: none
image processing: novice/working in Fiji/ImageJ
other relevant software: MatLab – working knowledge
L., Byrd, K. B., Woo, I., & Ellings, C. (2017). Remote sensing for wetland
mapping and historical change detection at the Nisqually River Delta.
Sustainability, 9(11), 1919.
G. C., & Mudd, S. M. (2020). Detecting the Morphology of Prograding and
Retreating Marsh Margins—Example of a Mega-Tidal Bay. Remote Sensing, 12(1),
J., Morris, J. T., & Wang, C. (2017). Mapping salt marsh dieback and
condition in South Carolina’s North Inlet-Winyah Bay National Estuarine
Research Reserve using remote sensing. AIMS Environmental Science, 4(5),
K., Wheatcroft, R. A., & Brophy, L. S. (2020). Controls on sediment
accretion and blue carbon burial in tidal saline wetlands: Insights from the
Oregon coast, USA. Journal of Geophysical Research: Biogeosciences, 125(2),
1. A description of the research question that you are exploring.
For my thesis, I am researching the relative contribution of discarded bait from the commercial Dungeness crab fishery, wild benthic prey, and cannibalism to the diets of Dungeness crab. The commercial fishery for Dungeness is exceptionally sustainable and productive: yield has increased over time despite increased fishing effort. I hypothesize that fishermen are sustaining crab populations by throwing used bait overboard – “farming” crab unintentionally. Benthic box core sampling off the Oregon coast has revealed low abundance of benthic organisms, a natural prey source for the crabs. These findings further support the hypothesis that the crabs are largely living off bait.
In order to understand the feeding ecology of Dungeness crab, it’s important to have a solid grasp of the wild prey available to them. For this class, I will investigate the relationship between total organic carbon on the seafloor and the abundance and spatial distribution of benthic deposit feeders. I also hope to explore seasonal variability in the spatial distribution of total organic carbon levels and prey abundance. I hypothesize that total organic carbon levels and benthic prey abundance will be higher near designated crab collection sites during the peak winter crabbing season and will be more dispersed in the summer during the off-season.
2. A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.
The dataset I’ll be exploring comes from benthic box core sampling conducted between Fort Bragg, CA, and Grays Harbor, WA from 2010 to 2016. The box cores were taken at between 60 and 525 m depth. The spatial distribution varies, and I have not yet determined the range of that variation.
3. Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.
I hypothesize that the spatial distribution of benthic deposit feeders will be clustered around areas with high levels of total organic carbon (TOC) because deposit feeders eat organic detritus from the sea floor. Additionally, I hypothesize that clusters of high TOC will be evident in the winter during the fishing season, but TOC will be more dispersed during the summer and fall. Similarly, I hypothesize that the spatial distribution of benthic deposit feeders will be more clustered during the winter and more dispersed during the summer.
4. Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.
I don’t know what types of analyses I would like to undertake –
I hope this class will help me figure that out!
5. Expected outcome: what do you want to produce — maps? statistical relationships? other?
I want to produce maps of the seasonal spatial distribution of TOC and benthic deposit feeders. I also hope to determine the statistical relationship between amount of TOC and abundance & spatial density of benthic deposit feeders.
6. Significance: How is your spatial problem important to science? to resource managers?
As mentioned above, the commercial Dungeness crab fishery has been historically sustainable and productive despite increased fishing effort. However, the fishery currently faces closures from whale entanglement in crabbing gear, domoic acid outbreaks, and now the COVID-19 pandemic. If my hypothesis that fishermen are sustaining the crab population through discarded bait is correct, then reducing effort could cause Dungeness crab populations to decline. The work I do for this class will help me to understand the impact of seasonal TOC levels on benthic prey availability for the crabs.
7. Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software?
ArcInfo: working knowledge
R: working knowledge
Image processing: working knowledge
Work knowledge of Google Earth Engine, ENVI, and ArcGIS Pro. Proficient in ArcMap.