Significant others? Thinking beyond p-values in science

By Natalie Chazal, PhD student, OSU Department of Fisheries, Wildlife, & Conservation Sciences, Geospatial Ecology of Marine Megafauna Lab

Scientific inquiry relies on quantifying how certain we are of the differences we see in observations. This means that we must look at phenomena based on probabilities that we calculate from observed data, or data that we collect from sampling efforts. Historically, p-values have served as a relatively ubiquitous tool for assessing the strength of evidence in support of a hypothesis. However, as our understanding of statistical methods evolves, so does the scrutiny surrounding the appropriateness and interpretation of p-values. In the realm of research, the debate surrounding the use of p-values for determining statistical significance has sparked some controversy and reflection within the academic community. 

What is a p-value?

To understand the debate itself, we need to understand what a p-value is. The p-value represents the probability of obtaining a result as extreme as, or more extreme than, the observed data, under the assumption that there is no true difference or relationship between groups or variables. Traditionally, a p-value below a predetermined threshold (often 0.05) is considered statistically significant, suggesting that the observed data are unlikely (i.e., a 5% probability) to have occurred by chance alone. Many statistical tests provide p-values, which gives us a unified framework for interpretation across a range of analyses.

To illustrate this, imagine a study aimed at investigating the effects of underwater noise pollution on the foraging behavior of gray whales. Researchers collect data on the diving behavior of gray whales in both noisy and quiet regions of the ocean.

Drawings of gray whales with tags (depicted by orange shapes) in quiet areas (left) and noisy areas (right). 

In this example, the researchers hypothesize that gray whales stop foraging and ultimately change their diving behavior in response to increased marine noise pollution. The data collected from this hypothetical scenario could come from tags equipped with sensors that record diving depth, duration, and location, allowing us to calculate the exact length of time spent foraging. Data would be collected from both noisy areas (maybe near shipping lanes or industrial sites) and quiet areas (more remote regions with minimal human activity). 

To assess the significance of the differences between the two noise regimes, researchers may use statistical tests like t-tests to compare two groups. In our example, researchers use a t-test to compare the average foraging time between whales in noisy and quiet regimes. The next step would be to define hypotheses about the differences we expect to see. The null hypothesis (HN) would be that there is no difference in the average foraging time (X) between noisy and quiet areas: 

Scenario where the noisy area does not elicit a behavioral response that can be detected by the data collected by the tags (orange shapes on whales back). The lower graph shows the distribution of the data (foraging time) for the noisy and the quiet areas. The means of this data (X) are not different. 

And the alternative hypothesis (HA) would be that there is a difference between the noisy and quiet areas: 

Scenario where the noisy area elicits a behavioral response (swimming more towards the surface instead of foraging) that can be detected by the data collected by the tags (orange shapes on whales back). The lower graph shows the distribution of the data (foraging time) for the noisy and the quiet areas. The means of this data (X) are different with the noisy mean foraging time (pink) being lower than the quiet mean foraging time (blue).

For now, we will skip over the nitty gritty of a t-test and just say that the researchers get a “t-score” that says whether or not there is a difference in the means (X) of the quiet and noisy areas. A larger t-score means that there is a difference in the means whereas a smaller t-score would indicate that the means are more similar. This t-score comes along with a p-value. Let’s say we get a t-score (green dot) that is associated with a p-value of 0.03 shown as the yellow area under the curve: 

The t-score is a test statistic that tells us how different the means of our observed data groups are from each other (green dot). The area under the t-distribution that is above the t-score is the p-value (yellow shaded area).

A p-value of 0.03 means that there is a 3% probability of obtaining these observed differences in foraging time between noisy and quiet areas purely by chance, which assumes that the null hypothesis is true (that there is no difference). We usually compare this p-value to a threshold value to say whether this finding is significant. We set this threshold before looking at the results of the test. If the threshold is above our value, like 0.05, then we can “reject the null hypothesis” and conclude that there is a significant difference in foraging time between noisy and quiet areas (green check mark scenario). On the flip-side, if the threshold that we set before our results is too low (0.01), then we will “fail to reject the null hypothesis” and conclude that there was no significant difference in foraging time between noisy and quiet areas (red check mark scenario). The reason that we don’t ever “accept the null” is because we are testing an alternative hypothesis with observations and if those observations are consistent with the null rather than the alternative, this is not evidence for the null because it could be consistent with a different alternative hypothesis that we are not yet testing for.

When our pre-set threshold to determine significance is above or greater than our p-value that was calculated we have enough evidence to ‘reject the null hypothesis’ (left figure) whereas if our p-value is lower or smaller than our calculated p-value, then we ‘fail to reject the null hypothesis’ (right figure).

In this example, the use of p-values helps the researchers quantify the strength of evidence for their hypothesis and determine whether the observed differences in gray whale behavior are likely to be meaningful or merely due to chance. 

The Debate

Despite its widespread use, the reliance on p-values has been met with criticism. Firstly, because p-values are so ubiquitous, it can be easy to calculate them with or without enough critical thinking or interpretation. This critical thinking should include an understanding of what is biologically relevant and avoid the trap of using binary language like significant or non-significant results instead of looking directly at the uncertainty of your results. One of the other most common misconceptions about p-values is that they can measure the direct probability of the null hypothesis being true. As amazing as that would be, in reality we can only use p-values to understand the probability of our observed data. Additionally, it’s common to conflate the significance or magnitude of the p-value with effect size (which is the strength of the relationship between the variables). You can have a small p-value for an effect that isn’t very large or meaningful, especially if you have a large sample size. Sample size is an important metric to report. Larger number of samples generally means more precise estimates, higher statistical power, increased generalizability, and higher possibility for replication.

Furthermore, in studies that require multiple comparisons (i.e. multiple statistical analyses are done in a single study), there is an increased likelihood of observing false positives because each test introduces a chance of obtaining a significant result by random variability alone. In p-value language, a “false positive” is when you say something is significant (below your p-value threshold) when it actually is not, and a “false negative” is when you say something is not significant (above the p-value threshold) when it actually is. So, in terms of multiple comparisons, if there are no adjustments made for the increased risk of false positives, this can potentially lead to inaccurate conclusions of significance.

In our example using foraging time in gray whales, we didn’t consider the context of our findings. To make this a more reliable study, we have to consider factors like the number of whales tagged (sample size!), the magnitude of noise near the tagged whales, other variables in the environment (e.g. prey availability) that could affect our results, and the ecological significance in the difference in foraging time that was found. To make robust conclusions, we need to carefully build hypotheses and study designs that will answer the questions we seek. We must then carefully choose the statistical tests that we use and explore how our data align with the assumptions that these tests make. It’s essential to contextualize our results within the bounds of our study design and broader ecological system. Finally, performing sensitivity analyses (e.g. running the same tests multiple times on slightly different datasets) ensures that our results are stable over a variety of different model parameters and assumptions. 

In the real world, there have been many studies done on the effects of noise pollution on baleen whale behavior that incorporate multiple sources of variance and bias to get robust results that show behavioral responses and physiological consequences to anthropogenic sound stressors (Melcón et al. 2012, Blair et al. 2016, Gailey et al. 2022, Lemos et al. 2022).

Moving Beyond P-values

There has been growing interest in reassessing the role of p-values in scientific inference and publishing. Scientists appreciate p-values because they provide one clear numeric threshold to determine significance of their results. However, the reality is more complicated than this binary approach. We have to explore the uncertainty around these estimates and test statistics (e.g. t-score) and what they represent ecologically. One avenue to explore might be focusing more on effect sizes and confidence intervals as more informative measures of the magnitude and precision of observed effects. There has also been a shift towards using Bayesian methods, which allow for the incorporation of prior knowledge and a more nuanced quantification of uncertainty.

Bayesian methods in particular are a leading alternative to p-values because instead of looking at how likely our observations are given a null hypothesis, we get a direct probability of the hypothesis given our data. For example, we can use Bayes factor for our noisy vs quiet gray whale behavioral t-test (Johnson et al. 2023). Bayes factor measures the likelihood of the data being observed for each hypothesis separately (instead of assuming the null hypothesis is true) so if we calculate a Bayes factor of 3 for the alternative hypothesis (HA), we could directly say that it is 3 times more likely for there to be decreased foraging time in a noisy area than for there to be no difference in the noisy vs quiet group. But that is just one example of Bayesian methods at work. The GEMM lab uses Bayesian methods in many projects from Lisa’s spatial capture-recapture models (link to blog) and Dawn’s blue whale abundance estimates (Barlow et al. 2018) to quantifying uncertainty associated with drone photogrammetry data collection methods in KC’s body size models (link to blog). 

Ultimately, the debate surrounding p-values highlights the necessity of nuanced and transparent approaches to statistical inference in scientific research. Rather than relying solely on arbitrary thresholds, researchers can consider the context, relevance, and robustness of their findings. From justifying our significance thresholds to directly describing parameters based on probability, we have increasingly powerful tools to improve the methodological rigor of our studies. 

References

Agathokleous, E., 2022. Environmental pollution impacts: Are p values over-valued? Science of The Total Environment 850, 157807. https://doi.org/10.1016/j.scitotenv.2022.157807

Barlow, D.R., Torres, L.G., Hodge, K.B., Steel, D., Baker, C.S., Chandler, T.E., Bott, N., Constantine, R., Double, M.C., Gill, P., Glasgow, D., Hamner, R.M., Lilley, C., Ogle, M., Olson, P.A., Peters, C., Stockin, K.A., Tessaglia-Hymes, C.T., Klinck, H., 2018. Documentation of a New Zealand blue whale population based on multiple lines of evidence. Endangered Species Research 36, 27–40. https://doi.org/10.3354/esr00891

Blair, H.B., Merchant, N.D., Friedlaender, A.S., Wiley, D.N., Parks, S.E., 2016. Evidence for ship noise impacts on humpback whale foraging behaviour. Biol Lett 12, 20160005. https://doi.org/10.1098/rsbl.2016.0005

Brophy, C., 2015. Should ecologists be banned from using p-values? Journal of Ecology Blog. URL https://jecologyblog.com/2015/03/06/should-ecologists-be-banned-from-using-p-values/ (accessed 4.19.24).

Castilho, L.B., Prado, P.I., 2021. Towards a pragmatic use of statistics in ecology. PeerJ 9, e12090. https://doi.org/10.7717/peerj.12090

Gailey, G., Sychenko, O., Zykov, M., Rutenko, A., Blanchard, A., Melton, R.H., 2022. Western gray whale behavioral response to seismic surveys during their foraging season. Environ Monit Assess 194, 740. https://doi.org/10.1007/s10661-022-10023-w

Halsey, L.G., 2019. The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biology Letters 15, 20190174. https://doi.org/10.1098/rsbl.2019.0174

Johnson, V.E., Pramanik, S., Shudde, R., 2023. Bayes factor functions for reporting outcomes of hypothesis tests. Proceedings of the National Academy of Sciences 120, e2217331120. https://doi.org/10.1073/pnas.2217331120

Lemos, L.S., Haxel, J.H., Olsen, A., Burnett, J.D., Smith, A., Chandler, T.E., Nieukirk, S.L., Larson, S.E., Hunt, K.E., Torres, L.G., 2022. Effects of vessel traffic and ocean noise on gray whale stress hormones. Sci Rep 12, 18580. https://doi.org/10.1038/s41598-022-14510-5

LU, Y., BELITSKAYA-LEVY, I., 2015. The debate about p-values. Shanghai Arch Psychiatry 27, 381–385. https://doi.org/10.11919/j.issn.1002-0829.216027

Melcón, M.L., Cummins, A.J., Kerosky, S.M., Roche, L.K., Wiggins, S.M., Hildebrand, J.A., 2012. Blue Whales Respond to Anthropogenic Noise. PLOS ONE 7, e32681. https://doi.org/10.1371/journal.pone.0032681

Murtaugh, P.A., 2014. In defense of P values. Ecology 95, 611–617. https://doi.org/10.1890/13-0590.1

Vidgen, B., Yasseri, T., 2016. P-Values: Misunderstood and Misused. Front. Phys. 4. https://doi.org/10.3389/fphy.2016.00006

From Bytes to Behaviors: How AI is Used to Study Whales

By Natalie Chazal, PhD student, OSU Department of Fisheries, Wildlife, & Conservation Sciences, Geospatial Ecology of Marine Megafauna Lab

In today’s media, artificial intelligence, or AI, has captured headlines that can stir up strong emotions and opinions. From promises of seemingly impossible breakthroughs to warnings of job displacement and ethical dilemmas, there is a lot of discourse surrounding AI. 

But what actually is artificial intelligence? The term artificial intelligence (or AI) was defined as “the science and engineering of making intelligent machines” and can generally describe a suite of methods used to simulate human information processing. 

AI actually began in the 1950s with puzzle solving robots and networks that identified shapes. But because the computational power required to run these complex networks was too high and funding cuts, there was an “AI winter” for the following decades. In the 1990’s there was a boom in advancement following renewed interest in AI, advancements in machine learning algorithms, and improved computational power. The 2010’s saw a resurgence of deep learning (a subfield of AI) designed because of the availability of large datasets and optimization algorithm improvements. Currently, AI is being used in extremely diverse ways because of its ability to handle large quantities of unstructured data.

Figure 1. An intuitive visualization of the nested relationship between AI, machine learning, and deep learning as subdomains (Rubbens et al. 2023)

To place AI in a better context, we should clarify some of the buzz words I’ve mentioned: artificial intelligence (AI), machine learning, and deep learning. There are a few schools of thought, but one that is generally accepted is that AI is a broad category of methods and techniques of systems that function to mimic human intelligence. Machine learning falls under this AI category but rather than using explicitly programmed rules to make decisions, we “train” these systems so that they are essentially learning from the data that we provide. Lastly, deep learning falls under machine learning because it uses the principles of “learning” from the data to build neural networks.

While AI is generally rooted in computer science, statistics provides the foundation for AI techniques. In particular, statistical learning is a combined field that adopts machine learning methods for more statistics based settings. Trevor Hastie, a leader in statistical learning, defines the field as “a set of tools for modeling and understanding complex datasets” (Hastie et al. 2009) and is used to explore patterns in data but within a statistical framework. 

Continuously improving methods like statistical learning and AI provide us with very powerful tools to collect data, automate processing, handle large datasets, and understand complex processes. 

How do marine mammal ecologists employ AI?

Even on small scales, marine mammal research often involves vast amounts of data collected from tons of different sources, including drone and satellite imagery, acoustic recordings, boat surveys, buoys, and many more. New deep learning tools, such as neural networks, are able to perform tasks with remarkable precision and speed that we traditionally needed to painstakingly do manually. For example, researchers spend hours poring over thousands of drone images and videos to understand the behavior and health of whales. In the GEMM Lab, postdoc KC Bierlich is leading the development of AI models to automatically measure important whale metrics from the images. These advancements streamline the process of understanding whale ecology and makes it easier to identify stressors that may be affecting these animals.

For photographic analyses, we can leverage Convolutional Neural Networks for tasks like feature extraction, where we can automatically get morphological measurements like body length and body area indices from drone imagery to understand the health of whales. This can provide valuable insight into the stressors placed on these animals. 

We can also identify whale species from boat and aerial imagery (Patton et al. 2023). Projects like Flukebook and Happywhale have even been able to identify individual humpback whales with techniques like this one. 

Figure 2. Flukebook neural networks can use the edges of flukes to identify individuals by mapping marks to a library of known individuals (Flukebook)

AI also excels at prediction especially with non-linear responses. Ecology is filled with thresholds, stepwise changes, and chaos that may not be captured by linear models. But being able to predict these responses is particularly important when we want to look at how whale populations respond to different facets of their environment. Ensemble machine learning algorithms like Random Forests or Gradient Boosting Machines are very common to model species-habitat relationships and can predict how whale distributions will change in response to changes in things like sea surface temperature or ocean currents (Viquerat et al. 2022). 

Even spatial data, which can be tricky to work with analytically, can be used in a machine learning framework. Data from satellite and acoustic tags can be analyzed from hidden Markov models and Gaussian mixture models. The results of these could potentially identify diving behaviors, habitat preferences, identify migration corridors, and aid in marine spatial planning (Quick et al. 2017; Lennox et al. 2019). 

While all of these projects and methods are very exciting, AI is not a panacea. We have to take into account the amount of data that AI models rely on. Some of these methods require very high resolutions of data and without adequate quantity to train the models, results can be biased or produce inaccurate predictions. Data deficiency can be especially problematic for rare, elusive, and quiet animals. Methods that utilize complex architectures and non-linear transformations can often be viewed as “black box” and difficult to interpret at first. However, there are some methods that can be used to retrace the steps of the model and create a pathway of understanding for the results that can help interpretability. AI also requires supervision. While AI methods can operate autonomously, oversight and evaluation are always necessary to validate their reliability in their application.  Lastly, there are also concerns about the use of AI (particularly Large Language Models) in scientific writing, but that’s a whole separate beast. 

With careful consideration, AI can be a powerful method for addressing the unique and challenging problems in marine mammal research. 

Using AI to find dinner

Last fall, I wrote a blog post to introduce my project that involves looking at echograms from the past 8 years of GRANITE effort to characterize prey availability within our study region of the Oregon coast. To automate the process of finding zooplankton swarms in 8 years of echosounder data, I’m planning to utilize deep learning methods to look for structures in our echogram that look like mysid swarms. Instead of reviewing over 500 hours of echosounder data to manually identify mysid swarms (which may produce biased or inaccurate results from human error), I can apply AI methods to process the echogram data with speed and consistent rules. I’ll specifically be using image segmentation, which can fall under any of the AI, machine learning, or deep learning umbrellas depending on the specific algorithms used. 

Another way AI can come into my project is after I gather the mysid swarm data from the image segmentation. While the exact structure of this resulting relative zooplankton abundance data will influence how I can use it, I could combine these prey data at a given place and time with a suite of environmental parameters to make predictions about the health and behavior of PCFG gray whales. This type of analysis could involve models that fall within AI and machine learning similar to the Boosted Regression Trees used by GEMM Labs postdoc, Dawn Barlow. Barlow et al. (2020) used Boosted Regression Trees to test the predictive relationships between oceanographic variables, relative krill abundance, and blue whale presence. Based on that work, Barlow et al. (2022) was able to develop a forecasting model based on these relationships to predict where blue whales will be in New Zealand’s South Taranaki Bight (read more about this conservation tool here!).

Hopefully by now you’ve gained a better sense of what AI actually is and its application in marine mammal ecology. AI is a powerful tool and has its value, but is not always a substitute for more established methods. By carefully integrating AI methodologies with other techniques, we can leverage the strengths of both and enhance existing approaches. The GEMM Lab aims to use AI methods to observe and understand the intricacies of whale ecology more accurately and efficiently to ultimately support effective conservation strategies.

References

  1. Rubbens, P., Brodie, S., Cordier, T., Destro Barcellos, D., Devos, P., Fernandes-Salvador, J.A., Fincham, J.I., Gomes, A., Handegard, N.O., Howell, K., Jamet, C., Kartveit, K.H., Moustahfid, H., Parcerisas, C., Politikos, D., Sauzède, R., Sokolova, M., Uusitalo, L., Van den Bulcke, L., van Helmond, A.T.M., Watson, J.T., Welch, H., Beltran-Perez, O., Chaffron, S., Greenberg, D.S., Kühn, B., Kiko, R., Lo, M., Lopes, R.M., Möller, K.O., Michaels, W., Pala, A., Romagnan, J.-B., Schuchert, P., Seydi, V., Villasante, S., Malde, K., Irisson, J.-O., 2023. Machine learning in marine ecology: an overview of techniques and applications. ICES Journal of Marine Science 80, 1829–1853. https://doi.org/10.1093/icesjms/fsad100
  2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Stanford, CA: Stanford University.
  3. Slonimer, A.L., Dosso, S.E., Albu, A.B., Cote, M., Marques, T.P., Rezvanifar, A., Ersahin, K., Mudge, T., Gauthier, S., 2023. Classification of Herring, Salmon, and Bubbles in Multifrequency Echograms Using U-Net Neural Networks. IEEE Journal of Oceanic Engineering 48, 1236–1254. https://doi.org/10.1109/JOE.2023.3272393
  4. Viquerat, S., Waluda, C.M., Kennedy, A.S., Jackson, J.A., Hevia, M., Carroll, E.L., Buss, D.L., Burkhardt, E., Thain, S., Smith, P., Secchi, E.R., Santora, J.A., Reiss, C., Lindstrøm, U., Krafft, B.A., Gittins, G., Dalla Rosa, L., Biuw, M., Herr, H., 2022. Identifying seasonal distribution patterns of fin whales across the Scotia Sea and the Antarctic Peninsula region using a novel approach combining habitat suitability models and ensemble learning methods. Frontiers in Marine Science 9.
  5. Patton, P.T., Cheeseman, T., Abe, K., Yamaguchi, T., Reade, W., Southerland, K., Howard, A., Oleson, E.M., Allen, J.B., Ashe, E., Athayde, A., Baird, R.W., Basran, C., Cabrera, E., Calambokidis, J., Cardoso, J., Carroll, E.L., Cesario, A., Cheney, B.J., Corsi, E., Currie, J., Durban, J.W., Falcone, E.A., Fearnbach, H., Flynn, K., Franklin, T., Franklin, W., Galletti Vernazzani, B., Genov, T., Hill, M., Johnston, D.R., Keene, E.L., Mahaffy, S.D., McGuire, T.L., McPherson, L., Meyer, C., Michaud, R., Miliou, A., Orbach, D.N., Pearson, H.C., Rasmussen, M.H., Rayment, W.J., Rinaldi, C., Rinaldi, R., Siciliano, S., Stack, S., Tintore, B., Torres, L.G., Towers, J.R., Trotter, C., Tyson Moore, R., Weir, C.R., Wellard, R., Wells, R., Yano, K.M., Zaeschmar, J.R., Bejder, L., 2023. A deep learning approach to photo–identification demonstrates high performance on two dozen cetacean species. Methods in Ecology and Evolution 14, 2611–2625. https://doi.org/10.1111/2041-210X.14167
  6. https://happywhale.com/whaleid
  7. https://www.flukebook.org/
  8. Quick, N.J., Isojunno, S., Sadykova, D., Bowers, M., Nowacek, D.P., Read, A.J., 2017. Hidden Markov models reveal complexity in the diving behaviour of short-finned pilot whales. Sci Rep 7, 45765. https://doi.org/10.1038/srep45765
  9. Lennox, R.J., Engler-Palma, C., Kowarski, K., Filous, A., Whitlock, R., Cooke, S.J., Auger-Méthé, M., 2019. Optimizing marine spatial plans with animal tracking data. Can. J. Fish. Aquat. Sci. 76, 497–509. https://doi.org/10.1139/cjfas-2017-0495
  10. Barlow, D.R., Bernard, K.S., Escobar-Flores, P., Palacios, D.M., Torres, L.G., 2020. Links in the trophic chain: modeling functional relationships between in situ oceanography, krill, and blue whale distribution under different oceanographic regimes. Marine Ecology Progress Series 642, 207–225. https://doi.org/10.3354/meps13339

Sonar savvy: using echo sounders to characterize zooplankton swarms

By Natalie Chazal, PhD student, OSU Department of Fisheries, Wildlife, & Conservation Sciences, Geospatial Ecology of Marine Megafauna Lab

I’m Natalie Chazal, the GEMM Lab’s newest PhD student! This past spring I received my MS in Biological and Agricultural Engineering with Dr. Natalie Nelson’s Biosystems Analytics Lab at North Carolina State University. My thesis focused on using shellfish sanitation datasets to look at water quality trends in North Carolina and to forecast water quality for shellfish farmers in Florida. Now, I’m excited to be studying gray whales in the GEMM Lab!

Since the beginning of the Fall term, I’ve jumped into a project that will use our past 8 years of sonar data collected using a Garmin echo sounder during the GRANITE project work with gray whales off the Newport, OR coast. Echo sounder data is commonly used recreationally to detect bottom depth and for finding fish and my goal is to use these data to assess relative prey abundance at gray whale sightings over time and space. 

There are also scientific grade echo sounders that are built to be incredibly precise and very exact in the projection and reception of the sonar pulses. Both types of echosounders can be used to determine the depth of the ocean floor, structures within the water column, and organisms that are swimming within the sonar’s “cone” of acoustic sensing. The precision and stability of the scientific grade equipment allows us to answer questions related to the specific species of organisms, the substrate type at the sea floor, and even animal behavior. However, scientific grade echo sounders can be expensive, overly large for our small research vessel, and require expertise to operate. When it comes to generalists, like gray whales, we can answer questions about relative prey abundances without the use of such exact equipment (Benoit-Bird 2016; Brough 2019). 

While there are many variations of echo sounders that are specific to their purpose, commercially available, single beam echo sounders generally function in the same way (Fig. 1). First, a “ping” or short burst of sound at a specific frequency is produced from a transducer. The ping then travels downward and once it hits an object, some of the sound energy bounces off of the object and some moves into the object. The sound that bounces off of the object is either reflected or scattered. Sound energy that is either reflected or scattered back in the direction of the source is then received by the transducer. We can figure out the depth of the signal using the amount of travel time the ping took (SeaBeam Instruments 2000).

Figure 1. Diagram of how sound is scattered, reflected, and transmitted in marine environments (SeaBeam Instruments, 2000).

The data produced by this process is then displayed in real-time, on the screen on board the boat. Figure 2 is an example of the display that we see while on board RUBY (the GEMM Lab’s rigid-hull inflatable research boat): 

Figure 2. Photo of the echo sounder display on board RUBY. On the left is a map that is used for navigation. On the right is the real time feed where we can see the ocean bottom shown as the bright yellow area with the distinct boundary towards the lower portion of the screen. The more orange layer above that, with the  more “cloudy” structure  is a mysid swarm.

Once off the boat, we can download this echo sounder data and process it in the lab to recreate echograms similar to those seen on the boat. The echograms are shown with the time on the x-axis, depth on the y-axis, and are colored by the intensity of sound that was returned (Fig. 3). Echograms give us a sort of picture of what we see in the water column. When we look at these images as humans, we can infer what these objects are, given that we know what habitat we were in. Below (Fig. 3) are some example classifications of different fish and zooplankton swarms and what they look like in an echogram (Kaltenberg 2010).

Figure 3. Panel of echogram examples, from Kaltenberg 2010, for different fish and zooplankton aggregations that have been classified both visually (like we do in real time on the boat) as well as statistically (which we hope to do with the mysid aggregations). 

For our specific application, we are going to focus on characterizing mysid swarms, which are considered to be the main prey target of PCFG whales in our study area. With the echograms generated by the GRANITE fieldwork, we can gather relative mysid swarm densities, giving us an idea of how much prey is available to foraging gray whales. Because we have 8 years of GRANITE echosounder data, with 2,662 km of tracklines at gray whale sightings, we are going to need an automated process. This demand is where image segmentation can come in! If we treat our echograms like photographs, we can train models to identify mysid swarms within echograms, reducing our echogram processing load. Automating and standardizing the process can also help to reduce error. 

We are planning to utilize U-Nets, which are a method of image segmentation where the image goes through a series of compressions (encoders) and expansions (decoders), which is common when using convolutional neural nets (CNNs) for image segmentation. The encoder is generally a pre-trained classification network (CNNs work very well for this) that is used to classify pixels into a lower resolution category. The decoder then takes the low resolution categorized pixels and reprojects them back into an image to get a segmented mask. What makes U-Nets unique is that they re-introduce the higher resolution encoder information back into the decoder process through skip connections. This process allows for generalizations to be made for the image segmentation without sacrificing fine-scale details (Brautaset 2020; Ordoñez 2022; Slonimer 2023; Vohra 2023).

Figure 4. Diagram of the encoder, decoder architecture for U-Nets used in biomedical image segmentation. Note the skip connections illustrated by the gray lines connecting the higher resolution image information on the left, with the decoder process on the right (Ronneberger 2015)

What we hope to get from this analysis is an output image that provides us only the parts of the echogram that contain mysid swarms. Once the mysid swarms are found within the echograms, we can use both the intensity and the size of the swarm in the echogram as a proxy for the relative abundance of gray whale prey. We plan to quantify these estimates across multiple spatial and temporal scales, to link prey availability to changing environmental conditions and gray whale health and distribution metrics. This application is what will make our study particularly unique! By leveraging the GRANITE project’s extensive datasets, this study will be one of the first studies that quantifies prey variability in the Oregon coastal system and uses those results to directly assess prey availability on the body condition of gray whales. 

However, I have a little while to go before the data will be ready for any analysis. So far, I’ve been reading as much as I can about how sonar works in the marine environment, how sonar data structures work, and how others are using recreational sonar for robust analyses. There have been a few bumps in the road while starting this project (especially with disentangling the data structures produced from our particular GARMIN echosounder), but my new teammates in the GEMM Lab have been incredibly generous with their time and knowledge to help me set up a strong foundation for this project, and beyond. 

References

  1. Kaltenberg A. (2010) Bio-physical interactions of small pelagic fish schools and zooplankton prey in the California Current System over multiple scales. Oregon State University, Dissertation. https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/z890rz74t
  2. SeaBeam Instruments. (2000) Multibeam Sonar Theory of Operation. L-3 Communications, East Walpole MA. https://www3.mbari.org/data/mbsystem/sonarfunction/SeaBeamMultibeamTheoryOperation.pdf
  3. Benoit-Bird K., Lawson G. (2016) Ecological insights from pelagic habitats acquired using active acoustic techniques. Annual Review of Marine Science. https://doi.org/10.1146/annurev-marine-122414-034001
  4. Brough T., Rayment W., Dawson S. (2019) Using a recreational grade echosounder to quantify the potential prey field of coastal predators. PLoS One. https://doi.org/10.1371/journal.pone.0217013
  5. Brautaset O., Waldeland A., Johnsen E., Malde K., Eikvil L., Salberg A, Handegard N. (2020) Acoustic classification in multifrequency echosounder data using deep convolutional neural networks. ICES Journal of Marine Science 77, 1391–1400. https://doi.org/10.1093/icesjms/fsz235
  6. Ordoñez A., Utseth I., Brautaset O., Korneliussen R., Handegard N. (2022) Evaluation of echosounder data preparation strategies for modern machine learning models. Fisheries Research 254, 106411. https://doi.org/10.1016/j.fishres.2022.106411
  7. Slonimer A., Dosso S., Albu A., Cote M., Marques T., Rezvanifar A., Ersahin K., Mudge T., Gauthier S., (2023) Classification of Herring, Salmon, and Bubbles in Multifrequency Echograms Using U-Net Neural Networks. IEEE Journal of Oceanic Engineering 48, 1236–1254. https://doi.org/10.1109/JOE.2023.3272393
  8. Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. https://doi.org/10.48550/arXiv.1505.04597