Whale what pattern do we have here?

Research Question

My question and goal for the term stayed consistent across the term. I did drop one environmental driver, swell height, due to time constraints with downloading and extracting data. I also filtered my focus of the data from all years collected to just the month of the year with the most whale sightings. This month occurred during an El Nino year, so my hypotheses and initial goals were still applicable. Due to this shift in examining just the ten days, I stopped using the boundaries of the marine sanctuaries in my analysis in exercises 2 and 3; however, I did still use the clipped whale locations to those sanctuaries.

The original question was: The spatial problem I wanted to examine was the impact of ENSO on environmental drivers for fin whale area restricted search (foraging) behavior. This question was eventually broken down to what is the impact of ENSO on the spatial distribution of fin whales from August 1-10, 2016?

Data Description

The fin whale data used was from the Whale Habitat, Ecology, and Telemetry lab. The fin whales were clipped within the three marine sanctuaries, Cordell Bank, Monterey Bay, and the Greater Farallones, with a behavior state of 2 (area restricted search). The fin whale points have a resolution of 3 locations per 8 hours; however, some only transmitted one location for the day. The fin whale data was collected in 2004, 2006, and 2014-2018 during the summer-fall months. The whale locations were collected using Argos satellite tags and processed through a Bayesian switch state-space model, which produced regularized tracks and assigned behavior classifications due to the characteristics of the points.

All the area restricted search points had 714 point locations that were recorded daily with some uneven gaps. Their northernmost latitude is 38.99569 and their southernmost latitude is 35.54629. Their eastern-most longitude is -121.4425 and their westernmost longitude is -124.25.

When I restricted it to just August 1-10, 2016, the northernmost latitude is 37.82975 and the southernmost latitude 36.57525. The easternmost longitude is -122.0430 and the westernmost longitude is -123.4833.

SST: SST, GOES Imager, Day and Night, Western Hemisphere, 2000-2020 (1-Day Composite) from ERDDAP. This dataset had a resolution of 0.05 for latitude and longitude.

Chlorophyll-a: Chlorophyll-a, Aqua MODIS, NPP, L3SMI, Global, 4km, Science Quality, 2003-present (1-Day Composite) from ERDDAP. The dataset has a resolution of 0.0416 for latitude and longitude.

Disclaimer: Some chlorophyll-a concentration data was fabricated due to internal issues with the nibble program. I recommend using a different dataset for chlorophyll-a if someone were to recreate this analysis.

**Figure 1.** All area restricted search (ARS) locations for fin whales in the data set. Each year is a different shade of blue and the marine sanctuary borders are in different colors. Purple for Monterey Bay, orange for the Gulf of the Farallones, and green for Cordell Bank.

Figures 2-5. Chlorophyll-a concentration and sea surface temperature at the latitude and longitude of the fin whale locations were recorded between August 1-10, 2016.

Hypotheses

Exercise 1
1. During cold modes, the spatial pattern of whale area restricted search will be clustered in areas with low SST, high chlorophyll-a concentration, and higher swell. I expect the clusters of area restricted search to be tighter in those conditions due to them needing to travel shorter distances to find the best locations in the habitat for feeding.
Exercise 2
1. Cold sea surface temperature and high chlorophyll-a concentration in the central California coast area promote (enhances) fin whale area restricted search locations.
1. Warm sea surface temperatures and low chlorophyll-a concentration in the central California coast area limits (reduces) fin whale area restricted search locations.
Exercise 3
1. Based on the distribution of the whales and environmental drivers, the model will produce an output with the similar spatial distribution of whales under similar conditions for sea surface temperature and chlorophyll-a concentration.

Approaches

Point Pattern Analysis

In Exercise 1, I tried multiple methods to assess if the recorded whale locations were in a cluster or dispersal pattern. I also conducted some preliminary data visualization to examine where the patches were before statistical tests were applied. I had attempted to make a k-cluster test in python prior to this class, so I attempted that method first and moved on to Average Nearest Neighbor afterward when those results were difficult to assess in later years. This test was the most useful because the ratio value determined if the data was trending towards dispersal or clustering. I had extra time, so I also conducted point and kernel density on the points. While these helped generate future questions or focus points, they were not helpful for this exercise’s goal.

Cross-Correlation

My initial goal in the second exercise was to examine the autocorrelation between the two environmental drivers, sea surface temperature and chlorophyll-a concentration, and the whale locations. While this was interesting to examine, due to the nature of the telemetry data and the lag component of the autocorrelation and cross-correlation functions, I had difficulty interpreting the results, and it pushed my knowledge of R and statistics to the edge. Dr. Jones and I developed a personalized test using the kernel density value of the August 1-10, 2016, whales and an interpolated trend of the environmental data recorded during that time. This exercise improved my ability to problem solve and find personalized workarounds for my data that can still be understood and replicated by others. This problem helped me break down the larger question into smaller chunks that were easy to accomplish with my knowledge of ArcGIS Pro analysis tools.

Modeling

For the final exercise, my goal was to run a species distribution model that specifically examined the parameter sensitivity and impact on the spatial pattern of the whale locations. Due to the presence-only data, I started my process with the MaxEnt software but quickly ran into an issue because it was no longer supported by R studio. To counter this, I installed and learned the basics of Maxnet, MaxEnt’s successor in R. Despite my data being perfect for that type of model, the results in the output did not make sense. I found the ArcGIS equivalent of MaxEnt and ran into an issue from previous exercises: difficulty interpreting the results. While examining other model options in ArcGIS Pro, I found a random forest tool that predicted the latitude and longitudes of predicted whale locations based on the environmental driver parameters. From here, I devised some code in R to divide the location of the whales into 0.25-degree grid cells to count the number of whales present in each cell for the observed and predicted. I found that the model was both under and over-predicting values between displaying the results and calculating their residuals.

Results

**Table 1.** Tables of the five different average nearest neighbor calculations for each year.

The average nearest neighbor calculations for all the location data used in this exercise have statistically significant distances in all but one year, 2014. 2004’s p-value, while statistically significant, should be viewed with caution as the sample size is very small and likely impacts the p-value. 2015-2017 have a strong correlation between their distances.

The 2004 and 2014 data are considered trending toward dispersion due to their nearest neighbor ratios being higher than 1. 2015, 2016, and 2017 are considered trending towards clustered due to their nearest neighbor ratios being less than 1 (see Table 1). 2004, 2014, and 2015 were El Nino years, and the latter part of 2016 and all of 2017 were La Nina years. Using the Nearest Neighbor and point density results, 2016 and 2017 confirm the hypothesis I tested.

**Table 2.** Pearson’s product-moment correlation was conducted on the environmental factors of interest (SST and chl-a) and the longitude of the fin whale locations. The test produced a t-value, degrees of freedom (df), p-value, and 95% confidence interval.

All but one correlation test resulted in a statistically significant p-value. The test for chlorophyll-a and longitude was the only test to produce a confidence interval entirely in the negative range. The absolute value of the T statistic produced in this test is used to determine if the autocorrelation for a specific lag equals zero. A T value greater than 2 indicates the autocorrelation is not equal to zero. In the chl and lon test, depending on rounding conventions, this T statistic could indicate the autocorrelation is not equal to zero.

**Figure 6.** Kernel density of fin whale locations across August 1-10, 2016, overlayed with the interpolated trend of average SST in the same period recorded at the whale locations.

**Figure 7.** Kernel density of fin whale locations across August 1-10, 2016, overlayed with the interpolated trend of average chlorophyll-a concentration in the same period recorded at the whale locations.

The highest density of whales occurs in the 13-14 degrees Celsius range with a few in the warmer range. The highest density of whales occurs in the 6 mg m-3 range with a few in the higher concentration range.

**Figure 8.** Predicted vs Observed whales from the random forest model. The trendline was set using a linear regression method with a 95% confidence interval.

The random forest model in ArcGIS Pro produced predicted latitude and longitudes for the whales given the parameters of the SST and chlorophyll-a concentration data. When examined in a 0.25-degree cell, there grid cell 7 (-123, -122.7 and 37.1, 37.3) had the most whales with 53 observed and 125 predicted.

Based on the results from Figure 8, there are some aspects of the model that underperformed, the points below the line and confidence interval, and those that overperformed, the ones above the trend line.

Significance

The different results from the exercises help understand the relationship between location and environmental factors. While I focused on a specific month, I combined my oceanographic and biological knowledge with the results and interpreted the outputs constructively. The tools and knowledge gained from this project can be used by future handlers of the data to determine the next steps in spatial analysis and other relationships to examine.

Software-wise, this project was significant because I used several R and ArcGIS Pro combinations. Many complete their analysis in just one software, but I wanted to combine workflows. This desire expanded my problem-solving capabilities because I could draw from multiple sources to achieve the analysis tests.

My Improvements in Skills and Statistics

I initially started with working proficiency with R and ArcGIS Pro and novice skills in Python3. At the end of the term, I believe I am closer to an expert with ArcGIS Pro and R regarding problem-solving and creating custom workflows for the problem. I am now at working proficiency with Python3.

I gained beneficial experience in manipulating data and understanding why specific formats and data collection work with some analysis tools and not with others. I was working with telemetry track data, which resulted in presence-only values.

Evolving Questions and Future Techniques

I would like to make the plots and maps from exercise 2 and run the models used in exercise 3 on different temporal scales and compare the results of the models to the initial question asked and hypotheses. Based on the graph of the relationship between the predicted and observed whales, I would like to examine further where the model is under and overpredicting. Examining those locations and trends in the data may answer why the model produced the output it did.

I would also like to develop a better contingency plan for the fault in remote sensing data and how to fill gaps in the data extraction. I had several issues with the raster data that impacted the accuracy and replicability of this analysis. While I stand by the methods I used for this class project, given more time and resources, I think I could have found another workaround that would make the results publishable.

I would also include all the whale behavior state locations to explore a more comprehensive analysis and explanation regarding what influences foraging behavior locations and spatial patterns.

GEOG 566

Advanced spatial statistics and GIScience