**Background and Research Question**

Mineral dust is the most important external source of phosphorus (P), a key nutrient controlling phytoplankton productivity and carbon uptake, to the offshore ocean (Stockdale et al., 2016). Paytan and MacLaughin (2007) emphasized that atmospheric P can be important as the major external supply to the offshore ocean, particularly in oligotrophic areas of the open ocean and areas that are P-limited, such as Bermuda Ocean. The most important source of atmospheric P is desert dust, which has been estimated to supply 83% (1.15 Tg⋅a−1) of the total global sources of atmospheric P (Mahowald et al., 2008). Of that dust, it is estimated that 10% is leachable P (Stockdale et al., 2016). In addition, Saharan dust supplies a significant fraction of the P budget of the highly weathered soils of America’s tropical forests and of the oligotrophic water of the Atlantic Ocean, increasing the fertility of these ecosystems (Gross et al., 2015).

Hence, this background is my starting point to try analyzing the correlation between the Particulate Organic Phosphorus (POP) and Primary Production (PP) in Bermuda Ocean. In exercise 2, I tried to answer these questions, however, the result was not likely that I expected. In this exercise 3, I explored a lot about my raw data and I realized that the way I divide the data will affect the result a lot. In this exercise I dug up a lot about the time series, regression, and auto-correlation function in R.

However, due to my misinterpret data in exercise 2, I ended up adding one more variable in this exercise 3 to broaden my analysis and convince my result. The previous study revealed that most phosphorus (P) and iron (Fe) are present as minerals that are not immediately soluble in water, hence not bioavailable (Lidewijde et al (2000), Shi et al (2012)). Lidewijde et al (2000) also stated that phosphorus (P) and iron (Fe), if deposited to the surface ocean, may pass through the photic zone with no effect on primary productivity, owing to their high settling velocity and low solubility. The photic zone has relatively low levels of nutrient concentrations, as a result, phytoplankton does not receive enough nutrients. Moreover, there are several factors that contribute to the primary production, such as physical factors (temperature, hydrostatic pressure, turbulent mixing), chemical factors (oxygen and trace elements), and biological factors. Hence, I added the temperature variable in this exercise 3.

In this exercise 3, I would like to find out what factor that contributes to the PP in Bermuda Ocean and how is the time cycle of three variables (PP, POP, and Temperature)?

**Tools**

Dr. Julia helped me in pre-processing data by using excel, and for further analysis, I used three tools in R:

- Time series function: This function will help us to plot the time series trend for every data.
`Ex code: plot.ts(POPR['PP'], main="PP depth 0-7 meter", ylab="PP")`

- Linear regression model: This function will help us to identify the correlation between two designated variables, in this exercise, the dependent variable is Primary Production and the independent variables are POP and Temperature.
`Ex code: POPR.lm<-lm(POP~PP,data=POPR) and summary(POPR.lm).`

To plot it into a scatter and linear line, I used ggplot function.`Ex code: ggplot(POPR, aes(x=POP, y=PP))+ geom_point() + geom_smooth(method=lm,se=FALSE)`

- Auto-correlation function: This function is used to find patterns in the data. Specifically, the autocorrelation function tells you the correlation between points separated by various time lags. In this exercise, the lag ranges from +1 to -1, where +1 is perfectly related and -1 is inversely related. In afc chart, the dashed line represents the boundary of the significance of correlation.
`Ex code: acf(POPR['PP'],lag.max = 51,type = c("correlation", "covariance", "partial"),plot = TRUE,na.action = na.contiguous,demean = TRUE)`

In this code, we can change the lag.max to the number of the data if we want to access the lag coefficient individually, or uses NULL to default. For this exercise I used lag.max which according to the number of my data.

**Steps of Analysis**

- Divided the data based on the depth categories:

Category | Depth (meter) |

POP 1 | 0-7 |

POP 2 | 9-12 |

POP 3 | 18-22 |

POP 4 | 37-42 |

POP 5 | 57-62 |

POP 6 | 77-85 |

POP 7 | 98-102 |

POP 8 | 108-122 |

POP 9 | 138-144 |

POP 10 | 158-164 |

POP 11 | 197-200 |

2. Regression analysis for between PP vs POP and PP vs Temperature. Since the data of PP and temperature is only for 0-8 meter depth, hence the regression is only conducted for in this depth category. In order to perform the regression, I pulled out four outliers:

3. Accessing the temporal pattern of variable PP, temperature, and POP by using the auto-correlation function and also time series function. Due to some missingness data, especially for POP and temperature in 0-7 meter depth, hence the date 2014/03/06, 2015/02/04, 2015/12/13 is pulled out for POP and date 2014/03/06 and 2014/12/11 is pulled out for temperature.

**Result**

**Regression**

From figure 1, overall the POP has a positive correlation to the PP, and the temperature has a negative correlation to POP (R square PP vs POP is 0.3064 and R square PP vs temperature is -0.183). From this analysis I can assume that POP is a factor that contributes to the Primary Production in Bermuda ocean. However, to see the detail of this analysis I tried to see the regression analysis per month in a 5 year interval period. I did not perform the regression in January due to a lack of data. From figure 2, we can see that from April to December and February, the pattern of correlation between PP vs POP is similar to the pattern of PP vs temperature, where the highest is in May and the lowest is in December and February. The significant difference happens in March, where the gap is +0.98 for PP vs POP and -0.99 for PP vs Temperature.

**2. Auto-correlation and Time Series PP, POP and Temperature depth 0-7 meter**

From figure 3, as we can see the temporal pattern of PP and POP is likely similar, where the highest peak is at the beginning of the year for every year, except for 2012 and 2016. This is due to in 2012 the data start in June and March in 2016. This pattern is inversely for temperature, wherein the beginning of the year the temperature is very low and high from June to July. To confirm this pattern, we can see the autocorrelation (ACF) chart, where the temperature has significantly related to the function of time, The vertical line crosses the horizontal dash line which means there is a repetition cycle in time for temperature.

As a temporal pattern, there is a repetition pattern for POP and PP, where the highest value happens in January-March. However the difference from temperature is the repetition in temperature has the exact same value, which does not happen for POP and PP. There is a big gap between the value of POP and PP in January and February from 2013 to January and February in other years. I think that is why there is only one significant correlation line in the beginning ACF chart for POP and PP.

**3.** **Auto-correlation and Time Series of POP by depth categories**

From figure 4 we can see that the temporal pattern from lag 0-5 is similar for all depth categories where for overall lag only POP 1 to 3 is similar (depth 0-22 meter). It indicates that the deposition of the POP can reach 22 meters depth at the same time.

If we see at POP 4 to POP 6, there is no repetitive pattern both in the ACF chart and the time series chart. However the repetitive pattern happens from POP 7 to POP 10, or from 98-164 meter depth.

By looking at the number of POP 11 (depth 197-200 meter), there is only a small number of POP can reach this depth.

**Critique of the method – what was useful, what was not?**

Overall this exercise 3 made me realized the way we process the data will affect the result. In exercise 2, I was likely to simplify the process, hence I got difficulties in order to interpret my data and the result is not likely what I expected.

In this exercise 3 I learned a lot about the Auto-correlation function and time series function in R. In my opinion, acf is very useful for stable data like temperature, where the repetitive pattern is along with the value. The significance of ACF is based on the value, hence in the ACF temperature chart (figure 3) we can see that most ACF has a significant correlation overtime period (both positive and negative), which does not happen in POP and PP data.

POP and PP have a pattern, however, the value varies overtime period, hence although we can see the pattern in time-series and ACF charts, the significance only appears in the beginning month in 2013. In my opinion, if we would like to access the significance cycle, the unstable data like POP and PP where the is a big gap value overtime period, ACF is not really suitable. However, if we just would like to see the pattern we can relly on time-series and ACF function as well.

**References**

Gross, A., Goren, T., Pio, C., Cardoso, J., Tirosh, O., Todd, M. C., Rosenfeld, D., Weiner, T., Custódio, D., & Angert, A. (2015). Variability in Sources and Concentrations of Saharan Dust Phosphorus over the Atlantic Ocean. Environmental Science & Technology Letters, 2(2), 31–37. https://doi.org/10.1021/ez500399z

Eijsink LM, Krom MD, Herut B (2000) Speciation and burial flux of phosphorus in the surface sediments of the eastern Mediterranean. Am J Sci 300(6):483–503. doi: 10.2475/ajs.300.6.483.

Mahowald, N., Jickells, T. D., Baker, A. R., Artaxo, P., Benitez-Nelson, C. R., Bergametti, G., Bond, T. C., Chen, Y., Cohen, D. D., Herut, B., Kubilay, N., Losno, R., Luo, C., Maenhaut, W., McGee, K. A., Okin, G. S., Siefert, R. L., & Tsukuda, S. (2008). Global distribution of atmospheric phosphorus sources, concentrations and deposition rates, and anthropogenic impacts. Global Biogeochemical Cycles, 22(4). https://doi.org/10.1029/2008GB003240

Paytan, A., & McLaughlin, K. (2007). The Oceanic Phosphorus Cycle. Chemical Reviews, 107(2), 563–576. https://doi.org/10.1021/cr0503613

Stockdale, A., Krom, M. D., Mortimer, R. J. G., Benning, L. G., Carslaw, K. S., Herbert, R. J., Shi, Z., Myriokefalitakis, S., Kanakidou, M., & Nenes, A. (2016). Understanding the nature of atmospheric acid processing of mineral dusts in supplying bioavailable phosphorus to the oceans. Proceedings of the National Academy of Sciences, 113(51), 14639. https://doi.org/10.1073/pnas.1608136113

Zongbo Shi, Michael D. Krom, Timothy D. Jickells, Steeve Bonneville, Kenneth S. Carslaw, Nikos Mihalopoulos, Alex R. Baker, Liane G. Benning. Impacts on iron solubility in the mineral dust by processes in the source region and the atmosphere: A review. Aeolian Research, Volume 5, 2012, Pages 21-42. ISSN 1875-9637. https://doi.org/10.1016/j.aeolia.2012.03.001

jonesjuLia, This is good progress.

Your interpretation of the acf needs some work.

A positive value of the acf means that values separated by that lag (on x-axis) are positively related.

Because all of your particulate organic P acf plots have significant positive values at a lag of 1, this means that if you know the value of phosphorus in one month, you can predict the value of phosphorus in the next month. In other words, this means that there are periods of 2 (or sometimes 3) consecutive months in which values of phosphorus are either high, or low.

As you point out, the acf functions for the upper layers (POP1 to POP3) and deeper layers (POP7 to POP10) have an alternating pattern, with + and – values of the autocorrelation coefficient. This indicates that there is an approximately annual cycle of phosphorus at these depths. It is interesting that this annual cycle is less clear in depths POP5 to POP6. Why would this be?

I am confused about what data you used for the acf functions. In Figure 3, the middle right hand plot is titled POP, 0-7 m. However, this is not the same as the plot in Figure 4, which has the same label. Also, the resulting acf functions are different. What data did you use for these two analyses?