# GEOG 566

May 10, 2018

### Multivariate analysis of the location of behavior changes

Filed under: Exercise/Tutorial 2 2018 @ 1:25 pm

Question:

Because the spatial distribution of behavior changes doesn’t change with boom angle, the results of Exercise 1 imply that hydraulic conditions do not drive fish behavior in this experiment (Figure 1). However, because analogous research and common sense implies that thresholds of hydraulics indeed affect fish behavior, a statistical analysis is necessary to determine if environmental and/or internal factors did affect the location of behavior changes we observed. Five hydraulic variables – water speed (m/s), turbulent kinetic energy (or TKE, m2/s2), TKE gradient (m2/s2/m), velocity gradient (m/s/m, or s-1), and acceleration (m/s2) – were drawn from the locations of every behavior change in Exercise 1. Then, the locations of behavior changes are compared with channel hydraulics, boom angle, and a fish’s visual fitness (as measured by an optomotor assay) using three methods: 1) multivariate regression analysis, 2) principal component analysis (PCA), and 3) a partial least squares regression. With this analysis, we hope to answer the question: do any of 5 hydraulic variables, the geometry of the channel, or the visual fitness of a fish correlate well with the location of its behavior change?

##### Figure 1. The results of Exercise 1 indicate that despite differences in hydraulics created by varying boom angle, the spatial distribution of location changes is not observed the change between boom angles. Contour lines show two-dimensional 95% confidence intervals.

Method and steps for analysis:

Three methods of regression analysis were used to answer the question above. Although previous work was conducted in Python, R is better suited for complex statistical analyses. First, a multivariate regression examined the correlation between the locations of behavior change and the independent variables. A multivariate regression enables the analysis of more than one dependent variable – in this case, the X- and Y-coordinates of a behavior change. This is not to be confused with multiple linear regression, which only analyzes the correlation of predictor variables with one dependent variable (i.e. just X or just Y). Interestingly, angle, water speed, and velocity gradient show significant, positive correlations with location (where positive locations are downstream and against the left channel wall; Figure 2). Acceleration and TKE gradient, on the other hand, show significant negative correlations with location of behavior change. TKE (red box) shows no significant correlation, nor does visual fitness (blue box). Clearly, the high correlation of hydraulics warrants further analyses to better understand which hydraulic variable, if any, truly influences behavior changes.

##### Figure 2. The results of a multivariate regression analysis of hydraulic, geometric, and visual variables on the locations of behavior change.

Principal component analysis is one method of identifying important variables (in the form of components) from a larger set of variables, especially when they’re highly correlated. A principal component is a linear combination of input variables that explains the variation in the original data. By quantifying the amount of variation of each principal component, an idea of influential variables can be grasped. In the case of X (the downstream position of a behavior change) the first principal and second principal components account for over 85% of the variation observed (Figure 3). Within Principal Components 1 and 2, boom angle, water speed, and velocity gradient influence the dependent variable, X, to the greatest degree, indicated by the length of arrow in Figure 4. Again, visual fitness, TKE, and now TKE gradient lack influence on X. Although promising, the results of PCA are limited by the analysis of only one dependent variable. A partial least squares regression allows a similar investigation into the principal components of behavior changes locations in the X and Y dimensions.

##### Figure 4. The contribution (as indicated by length of arrows) of independent variables to Principal Components 1 and 2. Variables near the bottom, top, and left of the graph have more influence than those near the apex of the arrows.

Partial least squares regression, or PLS regression, combines principal components of PCA and linear regression of multivariate regression. It benefits our data because multiple dependent variables may be analyzed for principal components of many correlated predictor variables. However, the results of PLS perhaps realize our original fear – that no consistent hydraulic variable emerges as a strong predictor of the location of behavior change in our experiment (Figure 5). Instead, TKE gradient now dominates Axis 1 (analogous to Principal Component 1), while velocity gradient and angle largely influence Axis 2. A PLS regression within boom angle failed to identify consistent hydraulic or visual variables that dominate the analysis’s axes.

##### Figure 5. Partial least squares regression of independent hydraulic, visual, and geometric variabls on X and Y, the locations of behavior change. TKE gradient now dominates Axis 1, while velocity gradient and boom angle influence Axis 2 the greatest.

Results:

No hydraulic or visual variable measured in this experiment consistently predicted the locations of behavior changes observed in this experiment. If anything, boom angle most consistently dominates the axes, principal components, and correlations of PLS, PCA, and multivariate regressions. This implies channel geometry, independent of the hydraulics it created, had the largest influence on fish behaviors as we observed them. Taken at face value, this result seems illogical. However, it may indicate that a bias existed in the behavior changes we observed.

Critique of methods:

Multivariate regression analysis easily identifies correlations of independent variables with more than one response variable. However, highly correlated independent variables require PCA or PLS to explain variation amongst many variables. Although powerful, these analyses are more difficult to interpret (in the case of PCA and PLS) and unable to investigate more than 1 response variable (in the case of PCA).