GEOG 566

         Advanced spatial statistics and GIScience

May 23, 2017

Tutorial 2: Extracting raster values to points in Arc for regression analysis in R

Filed under: Tutorial 2 2017 @ 10:38 pm

Research Question

As I have described in my previous blog posts, I am investigating the spatial relationship between blue whales and oceanographic variables in the South Taranaki Bight region of New Zealand. In this tutorial, I expand on the approach I took for tutorial 2 to quantify whether blue whale group size can be predicted by remotely-sensed sea surface temperature and chlorophyll-a concentration values at each sighting location. To accomplish this I used ArcMap to align different streams of spatial data and extract the relevant information, and then used R to perform statistical analyses. Specifically, I asked the following question:

“How well do remotely sensed chlorophyll-a (chl-a) concentration and sea surface temperature (SST) predict blue whale group size? “


I downloaded raster layers for chl-a concentration and SST, and used the the “extract values to points” tool in Arc to obtain chl-a and SST values for each blue whale sighting location. I then exported the attribute table from Arc and read it into R. In R, I performed both a general linear model (GLM) and a generalized additive model (GAM) to investigate how well chl-a and SST predict blue whale group size.


I downloaded both the chl-a and SST raster files from the NASA Moderate Resolution Imaging Spetrometer (MODIS aqua) website ( The raster cell values are averaged over a 4 km2 spatial area and a one-month time period, and for my purposes I selected the month of February 2017 as that is when the study took place.

I then used the extract values to points tool, which is located in the extraction toolset within Arc’s spatial analyst toolbox, to extract values from both the chl-a concentration and SST rasters for each of the blue whale sighting data points. The resulting contained the location of each sighting, and the number of whales sighted (group size), the chl-a concentration, and the SST for each location. All of this information is contained in the attribute table for the layer file (Fig. 1), and so I exported the attribute table as a .csv file so that it could be read in other programs besides Arc.

Figure 1. Attribute table for blue whale sightings layer, which contains blue whale group size, chl-a concentration, and SST for each sighting location.

In R, I read the in the data frame containing the information from the attribute table. In an exploration of the data, I plotted chl-a against SST to get a sense of whether there is spatial autocorrelation between the two. Additionally, I plotted group size against chl-a and SST separately to visualize any relationship.  Subsequently, I ran both a GLM and a GAM with group size as the response variable and chl-a concentration and SST as predictor variables. While a GLM fits a linear regression model, a GAM is more flexible and allows for non-linearity in the regression model. Since I am in the exploratory phase of my data analysis, I tried both approaches. The script I used to do this is copied below:


There appeared to be some relationship between SST and chl-a concentration (Fig. 2), which is what I expected considering that colder waters are generally more productive. There was no clear visual relationship between group size and chl-a or SST (Fig. 3, Fig. 4).

Figure 2. A scatter plot of chlorophyll-a concentration vs. sea surface temperature at each blue whale sighting location.

Figure 3. A scatter plot of chlorophyll-a concentration vs. blue whale group size at each blue whale sighting location.

Figure 4. A scatter plot of sea surface temperature vs. blue whale group size at each blue whale sighting location.

The result of the GLM was that there was no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size. The result of the GAM was very similar to the result of the GLM in that it also showed no significant effect of either chl-a concentration (p=0.73) or SST (p=0.67) on blue whale group size, and the predictor variables could only explain 4.59% of the deviance in the response variable.

Critique of the Method

Overall, the method used here was useful, and it was good practice to step out of Arc and code in R. I found no significant results, and I think this may be partially due to the way that my data are currently organized. In this exercise, I extracted chl-a concentration and SST values only to the points where blue whales were sighted, which means I was working with presence-only data. I think that it might be more informative to extract values for other points along the ship’s track where we did not see whales, so that I can include values for chl-a and SST for locations where the group size is zero as well.

Additionally, something that warrants further consideration is the fact that the spatial relationship between blue whales and these environmental variables of interest likely is not linear, and so therefore might not be captured in these statistical methods. Moving forward, I will first try to include absence data in both the GLM and GAM, and then I will explore what other modeling approaches I might take to explain the spatial distribution of blue whales in relation to their environment.

Print Friendly, PDF & Email

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

© 2019 GEOG 566   Powered by WordPress MU    Hosted by