GEOG 566






         Advanced spatial statistics and GIScience

June 6, 2018

Observed vs. expected time spent in vegetation types as a function of distance to water

Filed under: 2018,Exercise/Tutorial 3 2018 @ 10:32 am

Disclaimer – this post is long, particularly in the ‘steps taken’ section. I may use some of these methods in the Fall, so I made it very detailed.

Question asked

For this analysis I wanted to dig a little more into the relationship between vegetation type, distance to water, and the movement of the recreationist. The reason for this is because in Exercise 2 I learned that coniferous woodland dominates the study area and the hikers’ path. However, one could wonder if people are less likely to spend time near water if the surrounding vegetation is dense with conifer. Perhaps this dense vegetation is less appealing than the lake feature and the hiker consequently keeps trekking until they reach a more open area. Understanding the significance of this relationship is important prior to applying the data to a hidden Markov Model. Results from this exercise may indicate that I should include vegetation data into the model because vegetation (such as coniferous woodland, meadows, etc.) may explain the movement of recreationists in addition to, or regardless of, distance to water.

My questions for this exercise are:

1.) What are the distributions of the length of hiker’s track versus the time they spend in the area as a function of distance to water?

2.) Are hikers spending significantly more or less time in certain vegetation types despite being near water?

Tool/Approach Used

I used ArcGIS and R to complete my analysis. I did a few things differently in this exercise in contrast to exercise 2. Steps outlined below:

(1) Simplify data and reduce to 1 minute time intervals between points: I am using a sub-sample of 5 GPS tracks. These tracks collected point data every 20 seconds. I decided to aggregate each point to 1 minute intervals to make it a little easier to analyze and interpret. To do this I used the dplyr package in R to aggregate the GPS points to 1 minute timestamps and average the xy coordinates.

(2) Import revised and reduced spatial dataframe to ArcGIS: I ran into a snag here. In the process of trying to convert the spreadsheet to a shapefile I kept losing the timestamp information. As a workaround, rather than converting the spreadsheet to a shapefile, I converted it to a dBASE table which (fortunately) preserved the timestamp information.

(3) Create 5 unique track layers: At this point, my data were still in one large table. I then split up the merged table into the five unique tracks and layers. To do this I used the select by attribute tool and create layer from selection tool.

(4) Build attribute table: I wanted the following variables in the attribute tables of each track: distance to water feature, vegetation type, length of track, timestamp, xy coordinates. So far, I had the xy coordinates and time stamps. To calculate distance to water feature I used the joins and relates tool to calculate the nearest distance between the GPS point and the outline of the water feature (I had previously created a polygon of the water feature using the digitize tool in editor mode; see Figure 1). To extract vegetation data I used the spatial join tool to link the vegetation type to the GPS point (Figure 2)

(4b) Calculating length of track: the final attribute I needed was the length of the GPS track. To do this, I used the Tracking Analyst tool (thank you, Sarah, for the suggestion!) then track intervals to feature option. What this does is calculate the distance between one point and the subsequent point. This is why the time stamp interval was crucial for me to maintain. I needed to ensure that the tool was using the timestamp to determine the subsequent point for calculating length. This tool added a new column to the attribute table of each track denoting the calculated distance between each point to the next point. Now, I have information on the actual distance (in meters) traveled.

(5) At this point, I’m very happy. I finally have an attribute table with all the information I need for analysis. I then export the table as a text file. I import the table into R for manipulation, visualization, and analysis.

In R:

(6) Bin distance to water data: to create histograms representing proportions I needed to bin the distance to water data. I used the cut tool in R to bin the distance to water data into groups of 20 meters (i.e. 1 – 20, 21 – 40, etc.)

(7) Create dummy variables for vegetation type: I used ifelse statements with the dplyr package to create new columns for conifer presence (0 = not present, 1 = present), meadow presence, and residential facilities presence.

(8) Calculate total length of hikers path in vegetation type each distance class: I again used dplyr to calculate the total length of each hiker’s path grouped by the distance class. I then used a conditional statement to only calculate the length of data points in each vegetation type (i.e. conifer, meadow, residential).

(9) Calculate time spent in each distance class: I first created a column with 1’s to represent 1 minute. I then calculated the time each hiker spent in each distance class. I used a conditional statement to only calculate the time spent in each vegetation type (i.e. conifer, meadow, residential)

(10) I then calculated ‘Observed’ versus ‘Expected’ proportions.

Expected = length of [vegetation type] / total length within distance class (assumes people are traveling at a constant speed)

Observed = time spent in [vegetation type] / total time within distance class

This will tell me if people are spending more or less time in a certain vegetation type despite being close to water.

(11) I did a chi-square test for each vegetation type to see if the observed vs expected distributions were significantly different.

Description of Results Obtained:

The first three figures I generated demonstrated to me the overall proportions of the length of the hikers’ tracks grouped by distance class, and the time spent in each distance class. Figure 4 indicates that the five hikers spent 75% of their time 0 -20 meters from the lake. Additionally, of the entire track length, over 60% of the length was 0 – 20 meters from the lakeshore. From an ocular check, it looks like my hypothesis is being supported: people are spending more time near the water. I wonder if that’s a function of the lake or another variable?

Laura introduced me to a neat way to conceptualize differences in distributions, particularly when the distributions represent different units of measurement (i.e. time vs. length). Creating an odds ratio essentially divides the values of time vs. length to give me a ratio. A value less than 1 indicates that people spent less time in the distance class than they would have if they had been traveling at a constant pace. A value greater than one indicates the people spent more time in distance class than they would have if traveling at a constant pace. Figure 5 demonstrates that people were spending more time in the area when they were 0-20 meters from the lake shore and 61-80 meters from the lake shore. I am ignoring the 141-160 and 161-180 distance classes because the sample sizes are too small. Once again, looks like the hypothesis is being supported. But, is this because of the water feature? Or vegetation? Or both?

Figure 6 illustrates the proportion of vegetation types within each distance class. It looks like 0 -20 meters from the lake shore, the predominate vegetation types are conifer woodland (~ 50%) and residential and facilities (~ 40%), i.e. paved parking lots, toilets, picnic areas, paved sidewalks.

The next set of figures illustrate the expected amount a time a person would spend in specific vegetation types within each distance class if they were traveling at a constant rate compared to the observed amount of time that person actually spent in each vegetation type within each distance class.  Coniferous woodland was the first vegetation type I examined. Figure 7 indicates that people were spending less time in conifer despite being closer to water! The further away from water, the more time they spent in conifer. The chi-square test indicates these differences in distributions are statistically significant (X2  = 90.359 df = 7, p-value < .001). Figure 8 demonstrates the odds ratio of Observed/Expected. Interestingly, Figure 5 suggests people are spending more time near water, yet when there’s conifer present, this is not the case; I wonder if there is another vegetation type explaining this?

Figure 9 represents the observed versus expected distributions for herbaceous vegetation types. In other words, open meadows. Interestingly, it appears that the hikers spent more time in meadows than was expected (see Figure 10 for odds ratio). It also appears that meadows are prevalent further away from shore. Back in Figure 5 we learned that people are spending, on the whole, more time in areas 0-20 meters from shore and 61-80 meters from shore. Perhaps for the 61-80 meters distance class, this can be explained by the presence of open meadows. Interestingly, the chi-square test indicates that these distributions are not significantly different from one another (X2= 7.4523 df = 3, p-value = .06).

Figure 11 is very revealing as it indicates that people are spending more time in residential facilities that are close to shore. This includes areas with picnic tables, paved sidewalks, and some areas of the parking lot. This makes sense as I imagine people are drawn to the amenities that are offered. Also, the chi-square test indicates that the differences in these distributions are significant (c2 = 197.86, df = 3, p-value < .01) Seeing these results, I wonder if next time I should modify my hypothesis and examine if/ how distance from vehicle explains behavior.

Critique of Method

I created a workflow that made sense to me, but overall the process was very time consuming, particularly in ArcMap. In my experience thus far, I have appreciated the visualization ArcMap provides and find the tools easy to learn and navigate. However, during this particular exercise ArcMap was especially clunky and slow to process. I was glad to use ArcMap so I could gain experience using the software, but many hours elapsed before I could build an adequate attribute table for subsequent analysis in R. Fortunately, thanks to this class, I am also growing more and more comfortable in R, so, with relative ease, I was able to quickly modify my data and build code that allowed me to analyze and visually represent the data.

Another limitation and critique of this method is that I am only using a conveniently chosen subset of 5 GPS tracks. Therefore, my sample size of humans (not GPS points) is low and not randomly selected; thus, I can’t generalize beyond the five hikers in the subsample. Given the time constraints at this point in the term, I decided to stick with the five hikers rather than boosting my sample size. Despite the low sample size, this type of analysis introduced me to a new approach for examining relationships between recreationists and their surrounding environment. I plan to adopt these methods for future research.

Driven by curiosity, at the end of this exercise I looked at the tracks in ArcMap again. It appears that the majority of use is occurring near the parking lot. Additionally, the ‘meadow’ in this dataset is sandwiched between residential facilities (i.e. the parking lot and picnic tables). This situation makes me wonder if distance to vehicle or parking lot is also playing a role in the response. Further, I wonder if I should even classify that polygon as meadow, especially since it is unclear if people are spending time there because of the vegetation or because of the distance to the vehicle/easy access to facilities. If I had more time, I would examine this additional variable.

Print Friendly, PDF & Email


No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

© 2019 GEOG 566   Powered by WordPress MU    Hosted by blogs.oregonstate.edu