Analyzing Fish Distribution in Goose Lake Basin, OR

Question That Was Asked

My project this term was exploring abundance and distribution estimates collected during native and invasive fish surveys in 2007 in the Goose Lake basin. Analysis of this data will support my master’s research project – performing updated abundance and distribution estimates in the basin. The wetlands and riparian areas within this ecosystem are highly sensitive to climate-mediated disturbances such as shifting thermal regimes, drought, and wildfire. Increased frequency of these disturbance events may limit the quantity and quality of available habitat for native fishes, while increasing range expansion of non-native fishes may put undue stress on vulnerable species. Analyzing the distribution and abundance of native fishes in the basin will be beneficial as a comparison tool with current distribution and abundance. I explored various explanatory variables that could explain distribution and abundance of fish species throughout the basin.

Exercise 1 Question:

What are the patterns of distribution of redband trout around the Goose Lake subbasin?

Exercise 2 Question:

How are redband trout populations related to land cover use within the watershed area upstream of the survey sites?

Exercise 3 Question:

Is there spatial autocorrelation amongst the sites sampled in 2007 to inform our site selection methods for the field season this summer?

Dataset Description

The dataset I was analyzing included sample sites chosen using a GRTS sample design to select representative sample sites from a pre-determined distribution of fish within a stream network. Each sample site was associated with UTM coordinates. Distribution data of native (Goose Lake redband trout, Goose Lake lamprey, Goose Lake tui chub, Goose Lake sucker, Modoc sucker, speckled dace, Pit roach, pit sculpin) and non-native species (fathead minnow, brown bullhead, white crappie, yellow perch, pumpkinseed, brook trout) was collected throughout the Oregon portion of the Goose Lake basin in 2007. Completed sample sites were geographically stratified throughout the Goose Lake sub-basin, with 40 sites in the Drews Creek drainage, 35 sites in the Cottonwood Creek drainage, 38 sites in the Thomas Creek drainage, 17 sites in the small tributaries on the east side of Goose Lake, and 13 sites in the Dry Creek drainage. Sites that occurred in irrigation canals to be part of the nearest drainage for the totals listed. The data collected at each site included water temperature, site dimensions (mean depth, maximum depth, thalweg length, average width), and physical habitat variables (number of pieces of large wood, number of aggregates of large wood, substrates, channel roughness, percentage of bank with undercut banks, number of channels) to characterize habitat complexity. Each fish captured was identified to species level (when possible), then measured and counted. Dominant land use type was collected at each native fish sample site in the Oregon portion of the Goose Lake basin. The land use data consists of the dominant land-use type at each site where fish were sampled. The possible land-use types include shrub/rangelands, orchard/vineyards, row crops, forest, grass/pasture/hay lands, grain crops/water/wetlands, and developed/barren.

Hypotheses

I expected to see that species composition and relative abundance varied between sites. I was also expecting to find patterns of species preferring sites within a certain range of habitat characteristics (.i.e. temperature, dominant land cover use, and elevation). I expect the distribution of some native fish in the Goose Lake basin to be clustered following my prediction that each species prefers habitat characteristics associated with a specific land use category. I expected some species to have preferences in regards to habitat characteristics which lead their distribution to be clustered throughout the area, while other species that are more generalist will have a more even distribution.

Analysis Approaches Used

  • For Exercise 1, I compared and contrasted the use of trend, spline, and IDW interpolation techniques to estimate distribution of redband trout around the basin. I also used the acf function in R to determine if broad connections could be made between redband trout counts and bands of longitude or redband trout counts and bands of latitude.
  • For Exercise 2, I used a form of neighborhood analysis to see how redband trout populations are related to land cover use within the watershed area upstream of the sampled points.
  • For Exercise 3, I used the R package gstat to create all-directional, North-South, and East-West directional semivariograms for all 19 fish species sampled in 2007. I also created semivariograms for the total fish count at each site.

Results

Exercise 1

Using the three different interpolation methods, I was able to produce three maps predicting the presence of redband trout at unsampled points throughout the Goose Lake Basin. The interpolation maps produced lead me to conclude that the redband trout population is clustered at different locations around the basin. These results can be seen in Figure 1.

Interpolation maps predicting the presence of redband trout at unsampled points throughout the Basin

Exercise 2

After following the steps described above, I was able to produce these 2 bar graphs depicting land use at the surrounding land cover for the upstream reaches of each site (Figures 2 and 3). These plots lead me to notice some pretty big differences between land cover at upstream reaches between the sites that have high numbers of trout and low numbers of trout. The low trout sites have high areas of hay/pasture land, and the high trout sites all have no hay/pasture land in the surrounding land cover of the upstream reaches. The low trout sites also have higher levels of cultivated crops and developed land use types in the surrounding land cover of the upstream reaches than the high trout sites. This would lead me to think that there is some aspect about hay/pasture land and developed land that makes downstream reaches inhospitable to redband trout (I would postulate aspects of these land use types such as fertilizer runoff, pollution, or contamination from grazing animal sewage). There also looks to be a higher prevalence of shrub/scrub land in the surrounding land cover of the upstream reaches in the low trout sites. I would hypothesize this is due to shrub/scrub land having less canopy cover than evergreen forest land, leading to higher stream temperature and less ideal trout habitat.

Exercise 3

None of the semivariograms I produced indicated spatial autocorrelation amongst the sites sampled in 2007 (Figures 5 and 6). There is no discernable trendline in any of the semivariograms.

all-directional semivariogram
N-S, E-W directional semivariogram

 In a semivariogram that indicates spatial autocorrelation, the line starts closer to a semivariance of 0 and has a strong line (Figure 7).

model semivariogram

What was learned from each analysis?

  • From Exercise 1, I learned that interpolation is a useful method to visually display what the distribution of redband trout would look like across the entire Goose Lake basin. It was a bit difficult to use interpolation to display distribution amongst the basin – as the survey sites were in tributaries and the interpolation method was difficult to apply across the entire map area (including land, tributaries and Goose Lake itself). It would be interesting to repeat this analysis using the torgegram method.
  • From Exercise 2, I learned that the method of neighborhood analysis works well for fine scale analysis of a certain area of land (an irregular polygon surrounding the upstream reaches) as opposed to a standardized buffer around each point. I also learned that completing neighborhood analysis in this manner could have led to some inaccuracies due to the freehand drawing of the polygon layers around all upstream reach areas.
  • For Exercise 3, I learned that creating directional semivariograms was great at analyzing spatial autocorrelation between data with a single x variable, single y variable, and single z variable. While I was able to conform my data to fit this structure, it was not great at analyzing spatial autocorrelation between sites with many z variables.

Significance

Identifying where native fish are in the Goose Lake basin and why has importance to science and to resource managers because it can inform state and federal managers because it can inform state and federal managers as to the population status of at-risk native fish species, while an assessment of habitat quality will support actionable management outcomes (such as restoration efforts). Also, analyzing the data collected in the system in 2007 is beneficial to setting up my sampling design and site selection for my field season this summer.

Software Learning

I learned about a lot of available options for how to access and where to download publicly available raster and vector datasets and how to import them to use them in my analysis (such as watershed boundaries and elevation layers). I learned about the torgegram as a method for characterizing spatial dependence among observations of a variable on a stream network. I learned about the pros and cons for trend, spline, and IDW methods of interpolation. I learned about how to run an autocorrelation function and how to determine what lags stand for. I learned how to run a neighborhood analysis, and how to use ggplot2 in R to create plots that are efficient at visually communicating your results. I learned how to use the gstat package in R to create a semivariogram to investigate spatial autocorrelation between points.

Statistics Learning

I learned about the importance of keeping potential statistical power in mind when selecting sites for a study. When only able to hit a certain amount of sites due to time and budgeting constraints, it is important to be very deliberate when choosing study sites to extend spatial extent of the study and statistical power of the conclusions we will be able to draw.

Evolving Question

My original question was to explore abundance and distribution estimates collected during native fish surveys in 2007 in the Goose Lake basin. The analyses I ran throughout the course led me to refine my question into multiple questions as follows: what are the features that lead different land use types to influence fish numbers at downstream sites (pollution, shade cover, fertilizer runoff, etc.), what do the bray Curtis dissimilarity vs. distance semivariograms look like for the 2007 sites, and is there a negative correlation between numbers of invasive species and native species at each site?

Future Techniques

My next steps for analysis follow the questions I am interested in exploring for my master’s thesis. The next analysis I am interested in exploring is to complete a geographically weighted regression in order to investigate whether there is a negative correlation between invasive and native species in the system. I am also interested in putting what I learned about the torgegram into practice, and apply it to this dataset to investigate spatial correlation in the system along the stream networks. In furthering my semivariogram analysis, I want to conduct a Bray-Curtis dissimilarity curve for all of my sites.