Research Question: What is the correlation between the location of leatherback sea turtles, sea surface temperature, and chlorophyll?
Leatherback sea turtle locations: The leatherback turtle dataset was obtained from http://seamap.env.duke.edu/dataset. The dataset was in point feature format. The extent of the dataset is within the Gulf of Mexico. Observations of each sea turtle was collected from January through December 2005.
Figure 1: Sea turtle locations in the Gulf of Mexico, derived from http://seamap.env.duke.edu/
Sea Surface Temperature: The Sea Surface Temperature (SST) dataset was obtained from NOAA. The dataset was downloaded from the NOAA website in .csv format with latitude and longitude coordinates and temperature data associated with each coordinate. The extent of the dataset was the Gulf of Mexico. The SST data was the average daily maximum for January and December 2005. The data ranged from 2.44 – 26.89 Celsius, the mean was 23.57 Celsius.
Figure 2: Sea Surface Temperature (Celsius) data for January 2005 within the Gulf of Mexico, derived from NOAA.
Chlorophyll-a: The chlorophyll data was obtained from NOAA. The dataset was downloaded from the NOAA website in .csv format with latitude and longitude coordinates and chlorophyll data associated with each coordinate. The extent of the dataset was the Gulf of Mexico. The Chlorophyll data was a 3-day composite in December 2005. The data ranged from 0 to 1.63 mg/m3, the mean was 0.29 mg/m3.
Figure 3: Chlorophyll-a(mg/m^3) data for January 2005 within the Gulf of Mexico, derived from NOAA.
Hypothesis: I expected leatherback sea turtles to be clustered in areas based on high jellyfish concentrations. Jellyfish tend to be located in areas that have higher chlorophyll-a concentrations and where sea surface temperature is low. Thus sea turtle locations should correlate well with higher values of chlorophyll-a and lower values of SST.
Analysis Approaches: To test my hypothesis, I utilized the hotspot analysis, Spatial autocorrelation (Moran’s I), Geographically Weighted Regression, Kernel Density tools in ArcGIS. In addition, I also utilized R statistical package to create a graph that correlated chlorophyll with SST.
a. Hotspot Analysis: The spatial distribution of the sea turtle locations appeared to be clustered toward the middle of the Gulf of Mexico. The hotspot analysis tool helps to identify where statistically significant hotspots or clusters of sea turtles are located within the Gulf of Mexico.
b. Spatial Autocorrelation (Moran’s I): This tool measures spatial autocorrelation using feature locations and feature values simultaneously. The Moran’s I index will be a value between -1 and 1. Positive spatial autocorrelation will show values that are clustered. Negative autocorrelation is dispersed. Random is close to zero. The tool generates a Z-score and p-value which helps evaluate the significance of the Moran’s index. I tested the spatial autocorrelation of chlorophyll and sea surface temperature at each feature location. The conceptualization of spatial relationships method used was the inverse distance and the Euclidean distance measure was used for the distance method. I selected a 500km distance (smaller distances were too small for the study site).
c. Ordinary Least Squares: This tool performs a global linear regression to “generate predictions or model a dependent variable in terms of its relationships to a set of explanatory variables. Before conducting this test, I sampled the SST and the CHL-a values at each of the feature locations (sea turtle locations) using the Extract Multi Values to Points tool. This tool “Extracts cell values at locations specified in a point feature class from one or more rasters, and records the values to the attribute table of the point feature class.” This model was run three separate times, increasingly adding more explanatory variables each time. Each OLS run used Chlorophyll as the dependent variable. The first OLS run, SST as the explanatory variable. The second run, used SST and depth (m) as the explanatory variables and the third run, used SST, depth, and turtle count as the explanatory variables.
d. Geographically Weighted Regression: Based on the observations and results found in the OLS analysis (the data being nonstationary). I decided to conduct a Geographically Weighted Regression analysis. This tool performs a local form of linear regression used to model spatially varying relationships. The dependent variable used for this tool was the Chlorophyll and the explanatory variable was SST.
Hotspot Analysis: The results of the hotspot analysis (shown below) suggest that the turtle locations are significantly clustered off the coast of Louisiana and Texas between approximately 2000 to 3,000m depth of water. However, the results of this analysis appear to be quite deceptive. Upon taking measurements of turtles in the hotspot cluster it appears as though they may be more dispersed. Further analysis is needed in order to determine further patterns of Sea turtle locations.
Figure 4: Results of the hotspot analysis for leatherback sea turtle locations
Spatial Autocorrelation (Moran’s I) – sea surface temperature: The results of the spatial Autocorrelation tool suggest that the pattern of Sea Surface temperature at each feature location is clustered. The Moran’s Index was 0.402514, the z-score was 2.608211, and the p-value was 0.009102. Since the critical value (z-score) was greater than 2.58 there is less than 1-percent likelihood that the clustered pattern is a result of random chance.
Figure 5: Sea surface temperature results for Moran’s I tool.
Spatial Autocorrelation (Moran’s I) – Chlorophyll-a : The results of the spatial Autocorrelation tool suggest that the pattern of chlorophyll at each feature location is clustered. The Moran’s Index was 0.346961, the z-score was 2.216243, and the p-value was 0.026675. The critical value (z-score) was less than 2.58 but greater than 1.96 thus suggesting that there is less than 5-percent likelihood that the clustered pattern is a result of random chance.
Figure 6: Results of the spatial autocorrelation Moran’s I for chlorophyll-a at the leatherback sea turtle locations.
Ordinary Least Squares: This model was run three separate times, increasingly adding more explanatory variables each time. Each OLS run used Chlorophyll as the dependent variable. The first OLS run, SST as the explanatory variable. The second run, used SST and depth (m) as the explanatory variables and the third run, used SST, depth and turtle count as the explanatory variables.
Model Structure: Chl-a = f(SST)
a) Overall r2: 0.449473
b) Coefficient on SST in model: -0.052654
The results suggest that Chl is negatively related to SST and given the p-value of 0.000, we can deduce that SST is a significant predictor of Chlorophyll-a.
Model Structure: (Chl =f(SST), (Depth))
- a) Overall r2: 0.452436
- b) Coefficient on SST: -0.051769
- C) Coefficient on depth: -0.000015
The results suggest that SST and Depth are negatively correlated with Chlorophyll. Given the p-value of 0.056397 for depth, this is not a statistically significant relationship. The p-value for SST is 0.0000 suggesting that it is a statistically significant relationship and is a better predictor of chlorophyll-a than depth.
Model Structure: (Chl =f(SST), (Depth), (Count))
- a) Overall r2: 0.460299
- b) Coefficient on SST: -0.050454
- c) Coefficient on Depth: -0.000014
- d) Coefficient on Count: 0.121644
The results suggest that SST and Depth are negatively correlated with Chlorophyll and the count is positively correlated. Given the p-value of 0.067482 for depth, this is not a statistically significant relationship. The p-value for Count is 0.001815, suggesting that the relationship is statistically significant. The p-value for SST is 0.0000 suggesting that it too is a statistically significant relationship. In this model we see that the number of turtles found at each location and the SST values have statistically significant relationships to Chlorophyll. SST has the lowest p-value and would suggest that it is the best indicator for chlorophyll, though we should not discount the count variable.
Overall results: Running the model using three explanatory variables provided the best Overall R-Square value of .46. The model significance proves to have an overall statistical significance due to the Koenker (BP) statistic being statistically significant, therefore I used the Joint Wald Statistic as an assessor of the model significance. The Joint Wald Statistic was significant as shown below:
Figure 7: OLS model 3 – Joint Wald Statistic
The Koekner statistic is used to assess model stationarity. This statistic revealed that the model was not stationary in geographic space and/ or data space due to its significance and having a p-value <0.05 as shown below:
Figure 8: OLS model 3 – Koenker BP Statistic
Since the Koekner statistic was significant it was appropriate to look at the robust probabilities of each variable to assess their effectiveness. The Robust Probability scores for each of variables reveals that the count and SST are statistically significant (as shown below) thus they are found to be important to the regression model. However, it also appears as though the model is a good candidate for Geographically Weighted Regression due to it being nonstationary.
Figure 9: OLS model 3 – Variable Statistics
Geographically Weighted Regression:
Model Structure: (Chl =f(SST), (Depth), (Count))
a) spatial pattern of r2 values (map)
Figure 10: Geographically Weighted Regression Analysis: Map of the Local R-Squared values
After conducting the GWR analysis using the chlorophyll as the dependent variable and the Sea Surface Temperature as the explanatory variable I mapped the local R-Squared values of each feature location to show where the model predicted well and where it predicted poorly. The map shows that predicts are made best where turtle locations appear to be in areas where Sea Surface temperature is cooler off the Texas coastline.
b) Spatial pattern of coefficients for SST
Figure 11: Geographically Weighted Regression Analysis: Map of the coefficients
I mapped the coefficients in order to understand regional variation of the model. When using GWR to model the Chlorophyll (dependent variable) I was interested in understanding the factors that contribute to the turtle locations (or chlorophyll at each of the turtle locations). I was also interested in examining the stationarity of the data being that the OLS model revealed that it was not stationary. In order to do these tasks I mapped the coefficient distribution as a surface to show where and how much variation was present. As shown below in the map, it appears that Sea surface temperature has a negative relationship with Chlorophyll. The range of the coefficients is -0.066176 to -0.64989. There is very little variation in the coefficients. The results of this test help to inform policies at a regional scale.
Significance: What did you learn from your results? How are these results important to science? to resource managers?
The results of the tests suggest that chlorophyll-a has a negative relationship to sea surface temperature in the Gulf of Mexico according to leatherback sea turtle locations. Sea turtles are located in areas where sea surface temperature is low and chlorophyll values are higher. After mapping the coefficients of the variables we see that there is little variation in the data suggesting that policies regarding the protection of leatherback sea turtles should extend throughout the entirety of the Gulf of Mexico rather than in a few selected areas.
Programming in R:
With the help of Mark Roberts, I was able to use the R software programming to load and utilize the ggplot2 package to plot the correlation between chlorophyll-a and sea surface temperature. Lots of room for improvement here! The following is the code used to make the plot as well as the plot outcome:
Figure 12: ggplot code used in R to plot the correlation between chlorophyll-a and sea surface temperature.
Figure 13: Correlation between chlorophyll-a and sea surface temperature.
What did you learn about spatial statistics:
I think scale is an important factor. You need to understand your data and be sure to not take the results as they are. You should investigate everything to make sure that the data and results make sense. If the results appear incorrect they probably are incorrect. I also learned that it is dangerous to conduct spatial statistical analyses on data unless you can interpret what makes sense. The spatial autocorrelation conducted on SST and Chl-a indicated that they were clustered but I was still confused about this outcome. While conducting research it is important to thing about scale of the data to ensure that all of the data line up. I encountered an issue with the SST data due to the way it was sampled. I had many sea turtle locations that did not have sea surface data associated with them because of the way the SST data was sampled. A classmate pointed out that I did not consider the effects of currents on jellyfish. This could be a significant factor that may lead us to understand why sea turtles are present in Gulf of Mexico since jellyfish are partially distributed by currents. Another lesson learned is that it is important to remember the basic principles of geographic information systems while conducting these analyses. For instance it is important to turn on extensions, and be sure that all the data is properly projected prior to conducting any analysis as this can and will through off the results. Overall, it is important to know your data.