*Question Asked *

For Exercise 3, I wanted to explore the degree to which two environmental covariates influence the transition probabilities between and among two behavioral states using a hidden markov model approach. To operationalize the behavioral states, paired step length and turning angle measurements were generated from the raw GPS tracks, as described in my Exercise 1 blogpost. Histograms of both step length and turning angle distributions for five sample tracks revealed that two states may be emerging from the data: 1) a state characterized by small step lengths and wide turning angles and 2) a state characterized by large step lengths and very narrow (near zero) turning angles. The emergence of these two potential behaviors from the visual inspection of the histograms is characteristic of the behavioral states used to describe the movement behaviors of animals, to which hidden markov model approaches have been applied.

In order to fit a hidden markov model to the step length and turning angle data using the moveHMM tool (Exercise 3, Part 2), the null distributions and associated parameters for step length and turning angle must be defined for each state. The moveHMM tool states that one of three possible distributions must be defined for step length: gamma, Weibull, and lognormal; one of two possible distributions must be defined for turning angle: von Mises and wrapped Cauchy. Therefore, for Exercise 3 Part 1, I pose the following research questions related to the distributions of step length and turning angle:

- Of gamma, lognormal, and Weibull distributions, which distribution best fits the step length dataset?
- For the best fitting distribution, what are the associated model parameters for state 1 and state 2 behaviors?

- Of von Mises and wrapped Cauchy, which distribution best fits the turning angle dataset?
- For the best fitting distribution, what are the associated model parameters for state 1 and state 2 behaviors?

*Tool/Approach Used *

Earlier in the term, another classmate, faced with a similar need to define distributions for her data, discovered and presented on an R package called fitdistrplus (Delignette-Muller & Dutang, 2014) – a package allowing for the exploration of various distributions to user-provided data. Remembering that she had presented on Weibull, lognormal, and gamma distributions, I decided to answer my research questions through use of the fitdistrplus R package.

*Description of Steps Used to Complete the Analysis*

My overall approach to the analysis was exploratory, and following a learn-by-doing approach to understanding the best inputs for fitting the model. Of the variety of model-fitting options available within the fitdistrplus R package, I ended up using the following three model-fitting approaches on my data:

- Maximum likelihood estimation
- Maximum goodness of fit estimation with a Cramer-von Mises distance
- Maximum goodness of fit estimation with a Kolmogorov-Smirnov distance

I selected the above listed approaches because of the methods provided by the program (moment matching estimation and quantile matching estimation are also options), these three approaches seemed the most straightforward, referenced by other researchers in the literature and in discussion, and would provide parameter estimates needed in later analytical phases.

I also generated parameter estimates for three different slices of my data. First, I generated parameter estimates for all of the step length and turning angle data. Then I generated estimates for a rough cut at two behavioral states for the step length data: step lengths less than 400 meters (state 1) and step lengths 400 meters or greater (state 2). This cut off point emerged from my data as a trough in an otherwise somewhat bimodal step length distribution with a grouping of small step lengths (100 meters or less) and a grouping of larger step lengths (around 600 meters).

To fit the individual models, I used the “fitdist” function for each combination of data and model fitting type. I also used the “plot” and “fitdistres” functions to generate visualizations of the distributions.

*Description of Results Obtained*

Through my exploration with the tool, the various methods for fitting models, and my three difference data slices, I ended generating parameter estimates for 45 different model/data fitting/data combinations. Of those combinations, the results presented below were the most interesting and influential in my conclusions:

*Step Length*

Figure 1 displays the histogram and theoretical densities for the step length dataset as a whole, showing curves for the Weibull, lognormal, and gamma distributions, generated using maximum likelihood estimation as the model fitting technical. Among the three model fitting techniques tested, I did not see any practically significant differences between the curves produced or the estimated model parameters. Looking at the distribution of the step length data all together, I concluded that the Weibull and gamma distributions did not seem to differ, in a practical sense from each other, at least when looking at the dataset as a whole. Therefore, for the remainder of this blogpost I’ll present figures from the maximum likelihood estimation method, which is the default method for the fitdistrplus tool. I also noticed the bimodal nature of the step length histogram (x axis labeled “data”) and noticed that I should probably divide the data into two sets in order to generate parameter estimates from the data for two behavioral states.

Figure 2 displays the histogram and theoretical densities for the state 1 slice of the step length dataset (step lengths less than 400 meters), showing curves for the Weibull, lognormal, and gamma distributions, generated using maximum likelihood estimation as the model fitting technical. Looking at the state 1 distributions, I noticed that the gamma distribution appeared to be the intermediate distribution among Weibull, the more conservative, and lognormal, the more extreme. Given that both the lognormal and Weibull distributions appeared to be influenced more by the height of the frequency distribution (lognormal) and width of the distribution (Weibull), the gamma distribution emerged as a potential contender for use in modeling the data.

Figure 3 displays the histogram and theoretical densities for the state 2 slice of the step length dataset (step lengths 400 meters and greater), showing curves for the Weibull, lognormal, and gamma distributions, generated using maximum likelihood estimation as the model fitting technical. Looking at the state 2 distributions, noticeable differences emerged between the Weibull and gamma/lognormal curves. The Weibull curve seemed to be much more influenced by the height of the frequency distribution rather than the width of the distribution. Some background exploration of my data revealed that the height was being driven primarily by a single track’s data (of the set of 5 tracks under exploration in this class). Therefore, to not unduly weight one track’s behavior over the other’s, I ruled at Weibull for use in the moveHMM tool.

Among gamma and lognormal, gamma is more frequently used by movement ecologists in modeling step length data. Given the results of the distribution explorations did not practically differ between the lognormal and gamma, I also felt confident moving forward using the gamma parameter estimates. The estimates for the gamma distribution parameters or shape and rate for each state are as follows:

State 1 Shape = 2.07, Rate = 0.031

State 2 Shape = 89.09, Rate = 0.140

*Turning Angle*

The turning angle parameters showed less variability from the onset of the project; my use of the fitdistrplus tool for turning angle was more for my own learning than for deriving data-driven parameter estimates for the moveHMM model. Through the fitdistrplus tool I wanted to explore weather wrapped Cauchey or von Mises was a better fit for the data. Unlike the step length plots, I was unable to create comparative plots for the wrapped Cauchey and von Mises distributions (this might be user error). Instead, I generated the below diagnostic plots for each distribution for each state. I ran the model fitting analyses for turning angle and the states defined by step length presented previously – I do not yet know enough about either distribution or the turning angle data generally to make an informed estimate of what the values might be for each state. Figures 4 and 5 below present the diagnostic plots for the wrapped Cauchey distribution. Ultimately, I selected this distribution because the Q-Q and P-P plots for wrapped Cauchey indicated a marginally better model fit than von Mises; however, both sets of diagnostic plots indicated well-fitting distributions, as indicated by little deviation in the scatter points from the plot lines.

The wrapped Cauchey parameter estimates (location and concentration) for the two states are as follows:

State 1 Location = -0.014, Concentration = 0.15

State 2 Location = -0.015, Concentration = 0.06

*Critique of Method*

Overall, I found the tool to be very useful in allowing me to compare and contrast the fit of various distributions for my step length and turning angle data. It was convenient and straight forward to be able to use a single tool to fit the various distributions, rather than having to use individual tools to step length and turning angle and/or individual tools for various distributions.

My critique of the tool lies in my own lack of knowledge in how best to interpret the outputs of the distributional fits. I appreciated that the tool presented both numeric estimates of the parameters and comparative models, but my own unfamiliarity with each of the distributions I was fitting made it difficult for me to evaluate the output. Through online research, the moveHMM tool documentation, movement ecology publications, and the fitdistrplus documentation, I feel I was able to put together enough of a working understand to make sense of the output; however, I feel like my interpretation could certainly be helped by additional understanding of the distributions and model fitting techniques themselves.

*References*

Delignette-Muller, M.L., & Dutang, C. (2014). fitdistrplus: An R package for fitting distributions. *Journal of Statistical Software 64(4) *[online] https://www.jstatsoft.org/article/view/v064i04.