1. The research question that you asked.

Spatial patterns in disease spreading in agriculture are important to understand. For example, what causes the infection and how to avoid further spreading. Originally, the idea was to study the Red Blotch disease in vineyards. But due to delays there was no data. With the same goal in mind I looked at NDVI (Normalized Difference Vegetation Index) data for agricultural fields and try to understand the ‘spreading’ of the NDVI to develop methods to analyze spreading of any sort in remote sensed data over time.

In my ‘updated’ spatial problem I looked at;

1) the spatial and temporal autocorrelation of the NDVI parameter, and

2) the correlation between the pattern of NDVI and average biomass in agricultural fields in Manitoba, Canada.

This results in the final research question: Does patchiness in NDVI relate to biomass and does it change over time?

2. A description of the dataset you examined, with spatial and temporal resolution and extent.

I used data from about 30 fields, each around 1 km3. Over the growing season from May until September there is RapidEye derived NDVI at a spatial resolution of 6.5 by 6.5 m. Each field has on average 15,000 pixels. The biomass data is from the SMAPVEX12 study and contains the wet biomass averaged per field.

3. Hypotheses: predictions of patterns and processes you looked for.

It is expected that patchiness in NDVI is an indicator for a low biomass, and therefore a low yield. A uniform NDVI implies that the crops are all in the same health. Patchiness could mean that some plants may suffer stress. Over time, it is expected that the patchiness in the NDVI changes. Either the patches become more distinct, or the fields become uniform.

4. Approaches: analysis approaches you used.

The analysis can be split up in two parts. With part one focusing on the quantifying the pattern of NDVI with variograms, and part two analyzing the correlation between variogram parameters and biomass, and how the parameters change over time.

Part 1:

To assess the patchiness of the NDVI I used a variogram. The variogram shows the characteristics of spatial autocorrelation. There are three parameters of interest that can be read from a variogram graph; a) the range, which gives an indication of the lag in the autocorrelation. b) the sill, which represents the variance of the variable at the range lag. And, c) the nugget. This should be zero, but due to errors in the data this can vary slightly. The variogram is standardized by dividing the semivariance by the standard deviation of that field

Part 2:

We analytically compare the variograms with scatterplots between i) range and biomass, ii) sill and biomass, and iii) range and sill with biomass color scheme. For the scatter plots the correlation coefficients are determined with Pearson’s R and a p-value. Next, a time component is added; for 5 time steps the correlation between range and sill is determined and the development over time is analyzed (there was no biomass data for these dates).

5. Results: what did you produce — maps? statistical relationships? other?

Part 1:

In the variogram plot, Figure 1, we can see that there is a high inter field variability in spatial auto correlation for NDVI. It is difficult to tell from the graph if there is a correlation between biomass and variogram type. Also, there is a difference in crop type between the field, which has a considerable influence on the biomass. For further analysis, a distinction between crop types should be made.

Part 2:

Doing the same analysis but now just focusing on one crop, wheat, does not improve the results. Pearson’s correlation coefficients are low and insignificant with a correlation coefficient for Range vs Biomass of -0.31 (p-value is 0.3461) and a correlation coefficient for Sill vs Biomass of 0.35 (p-value is 0.2922). For the color coded scatterplot for biomass no pattern is visible.

For the sill-range correlation over time it seems that there is an increase in correlation over time. There seems to be a high correlation coefficient of 0.88 with a 5%-significant p-value of 0.047. One hypothesis is that this could be caused by the development of the shapes of the variogram. At the beginning of the growth season the patterns are ill-defined, which makes the variogram whimsical. And with the subjective reading of the variogram this might be emphasized. During the growth season the patterns become more distinct, and the variograms approach the shape of a theoretical variogram. For the perfect variogram one expects to get a perfect correlation of 1 between the sill and the range.

From the two plots showing the change of range and sill over time per field we cannot draw any conclusions. There is no trend visible.

6. Significance. What did you learn from your results? How are these results important to science? to resource managers?

This ‘spatial problem’ has been useful to investigate the methods to investigate spatial correlation and correlation to other parameters. This approach could be used to understand and predict for example the spread of diseases in agricultural fields.

7. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?

I already had some experience in the software mentioned, so I didn’t improve my skills significantly. The only software I was not comfortable with was R, and I used this to calculate and visualize the variograms. The language is not much different from Matlab and Python.

8. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), and (d) multivariate methods (e.g., PCA)?

This class was an eyeopener for me regarding statistics. I investigated Moran’s I, local Moran’s I, Variograms, and Pearson’s correlation coefficient. I had never used a variogram before, and it gave me insight in the spatial autocorrelation. It was a way to quantify the patterns in the fields. For my future research, I’m interesting in Principal Component Analysis.

9. In addition, your final blog post should include a section in which you describe how you have responded to (a) the comments you received from your fellow students on Tutorial 1 and Tutorial 2 and (b) comments on Canvas from Dr. Jones on your exercises 1, 2, and 3.

During the presentations in groups of three, we discussed the approach and analysis. One thing that came up is the subjectivity of reading the variograms and using that data for correlations. I’ve considered fitting a variogram and automatically extracting the sill and range, but this was too complicated and it still needed my subjective input. Another comment was to use more data, however I calculated the significance for the correlation coefficients, which indicated that there was enough data.

Dr. Jones advised to standardize the variograms to be able to compare them.