## Question Asked:

The question I asked in this exercise was: **What degree and timing of temporal autocorrelation exists for LUE and GPP** derived from eddy covariance flux data, and **how does the degree and timing differ** between a C3 and a C4 grassland site?

I aim to use an autocorrelational analysis to assess temporal patterns in these production indices, and to assess whether they reflect distinct seasonality between the C3 and C4 site. I.e., if the production indices show distinct autocorrelational patterns, that may indicate that the production of the C3 and C4 sites are responding to distinct environmental drivers. Because this is just an autocorrelational analysis and not a cross-correlational analysis, I am not quantifying the relationships between environmental drivers in this exercise.

## Tool or approach used:

I used the R function stats::acf() to compute an autocorrelation function for my time series of production indices.

## Steps followed to complete the analysis:

My data are derived from eddy covariance flux tower measurements, which are taken every 30 minutes. My data are from two grasslandsâ€”Konza Prairie Biological Station (US-Kon) is 99% C4 grass, and the University of Kansas Field Station is 75% C3 grass. I obtained gap-filled, 30-minute resolution data from site PIs, for 2008-2015.

From the 30-minute resolution data, I calculated GPP and LUE for time units of days, months, and years. I first sum and convert the EC flux measurements of GPP to be in units of gC/m^2/day. LUE is calculated as the total daily GPP divided by the total daily photosynthetic photon flux density (PPFD) and is in units of gC/MJ/m^2/day. I then smooth the dataset using a 7-day rolling mean, and filter it to remove extreme values that are artifacts of pre-processing or instrument error.

For the monthly time interval, I calculate the monthly average of the daily values of GPP and LUE. I do this by grouping the data by month, and then calculating the mean of all values for that month.

For the annual time interval, I calculate the total annual GPP, and the annual average of daily LUE. I sum the daily values of GPP and PPFD, and then divide the two to obtain annual LUE. This yields GPP in gC/m^2/year and LUE in gC/MJ/m^2/day.

However, the quality of these data vary widely, due to environmental inconsistency and equipment error. Thus, even though the instruments are ostensibly recording data every 30 minutes, after post-processing and gap-filling, I still end up with entire days– and sometimes months of data missing.

For each dataset– daily, monthly, and annual GPP and LUE– I used the R stats::acf() function to compute estimates of the autocorrelation function.

## Results:

Fig. 1: A plot of autocorrelation versus lag, for daily total GPP and LUE, for each site. Dashed lines represent upper and lower 95% confidence intervals, and vertical lines represent lags of 365 and 730 days, to approximate 1- and 2-year lags.

For the daily data, the autocorrelation shows distinct autocorrelational patterns between each of the sites. These plots appear to reflect distinct seasonality of production between the C3 and the C4 sites.

However, I’m not certain whether the different timing of autocorrelation may reflect data that are missing due to filtering– I excluded certain values that were outside thresholds I had set, rather than setting values to 0 or NA. If values are missing for dates, then that would affect the autocorrelation.

Fig.2: Plot of autocorrelation vs. lag for the monthly average of daily GPP and LUE. Dashed lines represent 95% confidence intervals for the autocorrelation function. Vertical lines represent lags of 12, 24, and 36 months.

For the monthly data, and in contrast to the daily data, this plot demonstrates autocorrelation of production indices that is very similar between the two sites.

Fig. 3: Plot of autocorrelation of autocorrelation vs. lag of GPP. Dashed lines represent 95% confidence intervals for the autocorrelation function, colored for each site.

Similar to the plots of autocorrelation of the monthly average of daily GPP and LUE, the autocorrelation of the total annual GPP and average annual LUE show similar patterns between the two sites.

The daily, monthly, and annual datasets suggest alternate conclusions about the temporal autocorrelation of the data. Because the data are filtered heavily to exclude outlying values, this means that data for days- and months- are missing, and are not accounted for in the time series of observations.

## Critique of the method:

Because the autocorrelation function relies on the observations that are *present* in order to calculate lags, and does not use a date-time field present in the function– the autocorrelation function is unable to accurately represent missing data or irregular observation intervals. The unseen, missing data– particularly in the daily dataset– appear to be causing the offset in the autocorrelation function between the two sites. If the sites are missing data from different times, and are missing different numbers of observations at those times, then the autocorrelation function will look different.

I will continue to explore gap-filling and extrapolation methods from my data in order to compare autocorrelation on sub-month time intervals.