**The research question that you asked.**

Broadly, my research question at the beginning of the course was: “what spatial and temporal patterns emerge from day-use hikers in Grand Teton National Park?”

I appreciated starting out with a broad, exploratory question as it allowed me to think creatively and learn a variety of approaches for analyzing and conceptualizing human behavior using GPS data. As the course progressed, and as I examined patterns within the data, my research question became more specific:

*“What are the relationships among the spatial and temporal behavior of recreationists and the environmental features within the recreation area?”*

**A description of the dataset you examined, with spatial and temporal resolution and extent.**

Throughout the course, I used a sub-set of five GPS tracks (i.e. five hikers) collected on July 19, 2017. This subset came from a collection of 652 GPS tracks of day-use visitors at String and Leigh Lakes in Grand Teton National Park. The GPS units were distributed to a random sample of visitors between July 15 – September 8, 2017. Each intercepted visitor was asked to carry the GPS unit with them throughout the duration of their visit at String and Leigh Lakes. When deploying the units, study technicians also recorded the total number of people in the group, and the intended destination for their day visit. To maintain independence between samples, only one GPS unit was given to each group.

The GPS units used in this study were Garmin eTrex 10 units. These units collected point data every 5 seconds. The GPS tracks were saved as point features for analysis in ArcGIS so that each visitor’s hiking path can be represented by a series of points. The positional accuracy of these units can vary up to 15 meters. However, the Garmin units were calibrated with a high accuracy Trimble GPS unit which indicated a low average positional error of 1.18 meters.

**Hypotheses: predictions of patterns and processes you looked for.**

*I hypothesize that people will spend more time, have shorter step lengths, and more acute turning angles the closer they are to a water feature. *

This hypothesis is grounded in an assumption that summertime recreationists are drawn to open viewscapes, particularly cooling water features like lakes, waterfalls, and streams. The recreation site the data were collected in contains stunning lakes that are couched right on the edge of the Teton mountain range. Perhaps hikers will feel compelled to stop and sightsee the closer they are to these water features, and will have less stopping behavior the further away they are from these features.

**Approaches: analysis approaches you used.**

I sourced data layers representing vegetation cover and elevation from https://irma.nps.gov/. I used tools in ArcMap to build and calculate attributes for analysis. All analyses were conducted in R.

*Exercise 1:* (a) I used R to plot spatially explicit graphs of human movement through space and time. This allowed me to visually examine how people were using the recreation site and provide context for subsequent analysis.

(b) I created histograms representing the proportions of the step lengths and turning angles of the five recreationists. This approach provided me with a better understanding of the distributions and characteristics of the response variable.

*Exercise 2*: I generated box plots and scatter plots that represented the relationships among various environmental features — my independent variables — in the study area. The variables I examined were: vegetation, elevation, and distance to water.

*Exercise 3*: I conducted OE analysis (observed vs. expected) to examine if hikers were spending significantly more or less time in certain vegetation types despite being near water. I followed this with a chi-square test to see if the differences in distributions of observed vs expected were statistically significant.

*Final analysis***:** (a) I ran the data through a hidden Markov Model package in R “moveHMM” to identify the probability of people changing from state 1 (short steps, acute turning angles) to state 2 (long steps, obtuse turning angles) as a function of their distance to water.

(b) I also attempted doing a multiple linear regression on the data. Unfortunately I couldn’t normalize the distribution of step length, realizing this approach may not be an adequate choice for this dataset.

**Results: what did you produce — maps? Statistical relationships? other?**

Throughout the course I produced numerous graphs representing recreationists movements through space and time. Figure 1 demonstrates a simple plot of the five tracks I worked with. This plot indicates most people in this sample stayed close to shore and relatively close to the parking lot. This makes sense as the trail hugs the lake shore.

**Figure 1.** A visual representation of the five tracks used throughout the analysis. All tracks collected on July 19, 2017.

Figure 2 summarizes the distribution of step lengths and turning angles for all five hikers. Step length is calculated as the distance between one GPS point to the subsequent GPS point (all GPS points have a temporal resolution of 1 minute). This figure suggests that these hikers typically walked straight, and nearly 40% of the time their step lengths, or distances between points, were 41-60 meters.

**Figure 2.** Distributions of turning angle (left) and step length (right).

Once I had a better understanding of the features of my response variable, I was curious to learn more about the environmental variables that I thought could be influencing human behavior. I also wanted to explore these variables to check for any confounding relationships between human movement and distance to water (my overarching hypothesis).

Examining the environmental features suggested that elevation may not be an influential variable in relation to the behavior of the recreationists (Figure 3). The range in elevation within the hikers’ movement paths was only 10 meters. However, the results did indicate that the predominant vegetation type for the hikers was coniferous woodland, *and *that conifer was primarily located near the lake shore. Perhaps the presence of conifer is playing a role in the amount of time the hiker spends near water. In other words, is the conifer deterring a person from stopping, even though they are near water? These questions encouraged me to examine the relationships among the recreationists *time* spent near water and the *vegetation type* they were in.

**Figure 3**. Box plots representing vegetation type relative to elevation (top left) and distance to water (top right). Elevation plotted against distance to water.

Before diving into analyzing the relationship between time, distance to water, and vegetation, I first calculated the overall proportions of the *length of the hikers’ tracks* grouped by distance class, and the *time spent in each distance class*. Figure 4 indicates that the five hikers spent 75% of their time 0 -20 meters from the lake. Additionally, of the entire track length, over 60% of the track is 0 – 20 meters from the lakeshore. From an ocular check, it looks like my hypothesis is being supported: people are spending more time near the water. But, is this a function of the lake feature or some other variable?

**Figure 4.** Proportion of time spent in the area compared to the length of the track (left). Frequency of vegetation types along every distance class (right).

The next set of figures illustrate the *expected *amount a time a person would spend in specific vegetation types within each distance class if they were traveling at a constant rate compared to the *observed* amount of time that person actually spent in each vegetation type within each distance class. Coniferous woodland was the first vegetation type I examined.

Additionally, I was introduced to a neat way to conceptualize differences in distributions, particularly when the distributions represent different units of measurement (i.e. time vs. length). Creating an odds ratio essentially divides the values of time vs. length to give me a ratio. A value less than 1 indicates that people spent less time in the distance class than they would have if they had been traveling at a constant pace. A value greater than one indicates the people spent more time in distance class than they would have if traveling at a constant pace.

Figure 5 indicates that people were spending *less time* in conifer despite being closer to water! The further away from water, the *more time* they spent in conifer. The chi-square test indicates these differences in distributions are statistically significant **(X ^{2 } = 90.359 df = 7, p-value < .001).**

**Figure 5.** Observed/Expected for coniferous vegetation types grouped by distance class.

Figure 6 represents the observed versus expected distributions for herbaceous vegetation types (ex: open meadows). Interestingly, it appears that the hikers spent *more time *in meadows than was expected. It also appears that meadows are prevalent further away from shore. Back in Figure 5 we learned that people are spending, on the whole, more time in areas 0-20 meters from shore and 61-80 meters from shore. Perhaps for the 61-80 meters distance class, this can be explained by the presence of open meadows. However, the chi-square test indicates that these distributions are not significantly different from one another **(X ^{2}= 7.4523 df = 3, p-value = .06).**

**Figure 6.** Observed/Expected for meadow vegetation types grouped by distance class.

Figure 7 is very revealing as it indicates that people are spending more time in residential facilities that are close to shore. This includes areas with picnic tables, paved sidewalks, and some areas of the parking lot. This makes sense as I imagine people are drawn to the amenities that are offered. Also, the chi-square test indicates that the differences in these distributions are significant **(X ^{2} = 197.86, df = 3, p-value < .01)** Seeing these results, I wonder if next time I should modify my hypothesis and examine if/ how

*distance from vehicle*explains behavior.

**Figure 7.** Observed/Expected for residential facilities grouped by distance class.

These results suggested that environmental variables may play a larger role in human movement and behavior than originally hypothesized. As a final step in this exploratory journey I experimented with the moveHMM package to identify what the probabilities are of a person changing from state 1 (short step lengths, acute turning angles) to state 2 (long step lengths, obtuse turning angles) as a function of their distance to water. It is important to note that I still don’t have a comprehensive understanding of hidden Markov Models. Therefore, the following figures represent only a fraction of the output generated from this package. Further, the following figures aim to demonstrate how outdoor recreation scientists could potentially use these methods to better understand human behavior.

Figure 8 represents the two state model distributions of step length and turning angle. Figure 9 shows us that, indeed, as the distance to water increases, the likelihood a person will transition from state 1 (slow) to state 2 (fast) increases. According to this output my hypothesis is being supported!

While HMMs provide a statistically rigorous framework for incorporating covariates, allow for the autocorrelation commonly experienced with GPS data, and enable researchers to make inferences about changes in behavioral states, my own knowledge of these processes remains limited. Although it was relatively simple for me to push the data through an HMM model, this final step in my analysis still leaves me with a lot of questions. Further, I was unable to build an HMM model that controls for the vegetation type; thus I have a feeling that only including distance to water as a covariate may not be telling the whole story.

**Figure 8.** State-dependent distributions in the 2 state model.

**Figure 9.** Effect of the covariate ‘distance to water’ on the transition probabilities

As a final exercise, I was curious to see if it was possible to do a multiple linear regression on my own as another way to address the hypothesis. First, I checked for autocorrelation in the data. The output below represents one track.

**Figure 10.** Autocorrelation on distance to water (above) and step length (below).

The acf informed me that the spacing at which observations are no longer autocorrelated is about 2 minutes. I subsequently took a subsample of track to only select points that are greater than two minutes apart. After re-sampling I re-tested for autocorrelation – both variables are no longer autocorrelated (Figure 11).

**Figure 11.** Autocorrelation on re-sampled dats on distance to water (above) and step length (below).

I then tested for normality. The distributions were skewed, so I did a square root transformation on the *distance to water*. This normalized the distribution and a Shapiro-Wilk normality test provided a p-value > .05, suggesting the distribution is not statistically different from the normal curve.

**Figure 12**. Density plot of distance to water before (above) and after (below) transformation.

Unfortunately, I wasn’t able to normalize step length. Doing a log transform or square root just made the skewness even more severe.

**Figure 13. **Density plot of step length. The responses are not normally distributed.

At this point, I realized that doing a regression was perhaps not the best approach – especially since I wasn’t able to normalize the distribution of the response variable (step length). However, despite the cliff hanger ending, I intentionally included this final exercise in the blog post to demonstrate the use and consideration of autocorrelation on GPS data.

**Significance. What did you learn from** **your results? How are these results important to science? to resource managers?**

These results revealed to me that environmental features such as vegetation and open viewsheds have an influence on the behavior of recreationists. Further, the results indicate that vegetation type also plays more of a role in the behavior of recreationists than originally anticipated. I was surprised to see how much vegetation type influenced the amount of time the hikers spend in the area.

Overall, these results also demonstrate how recreationist activity can be measured and analyzed to develop deeper understandings of behavior.

These results are important to both science and resource managers. Parks and protected area land managers strive to provide a quality user experience while also protecting natural and cultural resources. Accurately understanding how people move and behave in a recreation system allow for more informed management decision making. For example, understanding what environmental conditions influence behavior could indicate a need for additional infrastructure, signage, or educational initiatives depending on the management objectives for the area. Additionally, by using statistical tools to analyze these relationships the results can have predictive power for managers.

In the scientific and academic communities, applying spatial methods to outdoor recreation science allows for a more accurate understanding of how people move, experience, and interact in outdoor spaces. By integrating GIScience with other common social science techniques in outdoor recreation — such as surveys, observations, and interviews — scientists glean richer results that can support and contribute to existing theory, generate deeper understandings about human behavior, and inspire additional studies.

**Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?**

I worked exclusively in ArcMap and R. I went into this class with very limited knowledge and minimal confidence with the software; now, I am coming out of the class with more confidence in my ability to learn and figure it out. I appreciated the exploratory, self-guided structure of the class which provided space to play around with the technology and make mistakes. Even more so, I appreciated the insight of Julia and Laura, as well as the tips, tricks, and support from fellow classmates.

I learned how to work with GPS data in ArcMap and source data layers that were relevant to my research question. I learned various tools in ArcMap that allowed me to make calculations (i.e. TrackAnalyst tool and spatial join tools).

I learned how to work with spatial data in R and became more efficient at aggregating and manipulating dataframes for analysis. I also learned how to represent the data in a way that is meaningful to other audiences.

**What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), and (d) multivariate methods (e.g., PCA)?**

I learned how to do expected vs. observed analysis and chi-square test for significant differences in the distributions. I learned more about the application of hidden Markov Models in developing probability models representing changes in human behaviors. I also was able to work a little with autocorrelation.

Beyond my project I learned quite a bit from listening to the tutorials of other classmates. I am now (ever so slightly) more familiar with concepts like spatial and temporal autocorrelation, kriging, and geographically weighted regression. Additionally, it was extremely helpful working with Susie who had a similar dataset and was also using a hidden Markov Model approach.

**References**

Michelot, T., Langrock, R., Patterson, T. (2017). *An R package for the analysis of animal movement data.* Available: https://cran.r-project.org/web/packages/moveHMM/vignettes/moveHMM-guide.pdf.

leatherl — June 15, 2018 @ 3:40 pm

Jenna, it’s been so cool to see and talk to you about the progression of this project! The higher observed frequency in residential areas definitely sticks out. I like and echo your and Julia’s suggestions about including distance from vehicle as a variable. Another variable to work up might be “distance from vegetation?” E.g., if recreationists are further away from trees, are they spending more time in residential areas? The trick here seems to be to get at combinations of variables that capture the behavior of the recreationist, rather than the configuration of the landscape! Nice work!

jonesju — June 15, 2018 @ 7:10 am

Excellent work. Next steps: Maybe try calculating spatial adjacency of vegetation types to other types, which will show that the meadow is next to the parking lot and the facilities. Also use distance from the car as a possible explanatory variable, as you mentioned.

swanssam — June 15, 2018 @ 6:55 am

Hey Jenna, great work. It looks like you became much more comfortable using R for data analysis and representation throughout this course. That’s really difficult and something you should be proud of! I mentioned this in Suzie’s post as well, but I’m curious if there are features on the landscape specific to Grand Teton National Park that may influence the behavior of visitors. For example, GPS tracks of tourists in Yosemite likely would show high step lengths and low turning angles until reaching the base of Half Dome, then short step lengths and high turning angles as they peruse the many shops, visitor centers, and short hiking trails in Yosemite Valley. Is there any similar feature in your study area? Either way, nicely done and good luck in your future endeavors!