Demystifying the algorithm

By Clara Bird, Masters Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Hi everyone! My name is Clara Bird and I am the newest graduate student in the GEMM lab. For my master’s thesis I will be using drone footage of gray whales to study their foraging ecology. I promise to talk about how cool gray whales in a following blog post, but for my first effort I am choosing to write about something that I have wanted to explain for a while: algorithms. As part of previous research projects, I developed a few semi-automated image analysis algorithms and I have always struggled with that jargon-filled phrase. I remember being intimidated by the term algorithm and thinking that I would never be able to develop one. So, for my first blog I thought that I would break down what goes into image analysis algorithms and demystify a term that is often thrown around but not well explained.

What is an algorithm?

The dictionary broadly defines an algorithm as “a step-by-step procedure for solving a problem or accomplishing some end” (Merriam-Webster). Imagine an algorithm as a flow chart (Fig. 1), where each step is some process that is applied to the input(s) to get the desired output. In image analysis the output is usually isolated sections of the image that represent a specific feature; for example, isolating and counting the number of penguins in an image. Algorithm development involves figuring out which processes to use in order to consistently get desired results. I have conducted image analysis previously and these processes typically involve figuring out how to find a certain cutoff value. But, before I go too far down that road, let’s break down an image and the characteristics that are important for image analysis.

Figure 1. An example of a basic algorithm flow chart. There are two inputs: variables A and B. The process is the calculation of the mean of the two variables.

What is an image?

Think of an image as a spread sheet, where each cell is a pixel and each pixel is assigned a value (Fig. 2). Each value is associated with a color and when the sheet is zoomed out and viewed as a whole, the image comes together.  In color imagery, which is also referred to as RGB, each pixel is associated with the values of the three color bands (red, green, and blue) that make up that color. In a thermal image, each pixel’s value is a temperature value. Thinking about an image as a grid of values is helpful to understand the challenge of translating the larger patterns we see into something the computer can interpret. In image analysis this process can involve using the values of the pixels themselves or the relationships between the values of neighboring pixels.

Figure 2. A diagram illustrating how pixels make up an image. Each pixel is a grid cell associated with certain values. Image Source: https://web.stanford.edu/class/cs101/image-1-introduction.html

Our brains take in the whole picture at once and we are good at identifying the objects and patterns in an image. Take Figure 3 for example: an astute human eye and brain can isolate and identify all the different markings and scars on the fluke. Yet, this process would be very time consuming. The trick to building an algorithm to conduct this work is figuring out what processes or tools are needed to get a computer to recognize what is marking and what is not. This iterative process is the algorithm development.

Figure 3. Photo ID image of a gray whale fluke.

Development

An image analysis algorithm will typically involve some sort of thresholding. Thresholds are used to classify an image into groups of pixels that represent different characteristics. A threshold could be applied to the image in Figure 3 to separate the white color of the markings on the fluke from the darker colors in the rest of the image. However, this is an oversimplification, because while it would be pretty simple to examine the pixel values of this image and pick a threshold by hand, this threshold would not be applicable to other images. If a whale in another image is a lighter color or the image is brighter, the pixel values would be different enough from those in the previous image for the threshold to inaccurately classify the image. This problem is why a lot of image analysis algorithm development involves creating parameterized processes that can calculate the appropriate threshold for each image.

One successful method used to determine thresholds in images is to first calculate the frequency of color in each image, and then apply the appropriate threshold. Fletcher et al. (2009) developed a semiautomated algorithm to detect scars in seagrass beds from aerial imagery by applying an equation to a histogram of the values in each image to calculate the threshold. A histogram is a plot of the frequency of values binned into groups (Fig. 4). Essentially, it shows how many times each value appears in an image. This information can be used to define breaks between groups of values. If the image of the fluke were transformed to a gray scale, then the values of the marking pixels would be grouped around the value for white and the other pixels would group closer to black, similar to what is shown in Figure 4. An equation can be written that takes this frequency information and calculates where the break is between the groups. Since this method calculates an individualized threshold for each image, it’s a more reliable method for image analysis. Other characteristics could also be used to further filter the image, such as shape or area.

However, that approach is not the only way to make an algorithm applicable to different images; semi-automation can also be helpful. Semi-automation involves some kind of user input. After uploading the image for analysis, the user could also provide the threshold, or the user could crop the image so that only the important components were maintained. Keeping with the fluke example, the user could crop the image so that it was only of the fluke. This would help reduce the variety of colors in the image and make it easier to distinguish between dark whale and light marking.

Figure 4. Example histogram of pixel values. Source: Moallem et al. 2012

Why algorithms are important

Algorithms are helpful because they make our lives easier. While it would be possible for an analyst to identify and digitize each individual marking from a picture of a gray whale, it would be extremely time consuming and tedious. Image analysis algorithms significantly reduce the time it takes to process imagery. A semi-automated algorithm that I developed to count penguins from still drone imagery can count all the penguins on a one km2 island in about 30 minutes, while it took me 24 long hours to count them by hand (Bird et al. in prep). Furthermore, the process can be repeated with different imagery and analysts as part of a time series without bias because the algorithm eliminates human error introduced by different analysts.

Whether it’s a simple combination of a few processes or a complex series of equations, creating an algorithm requires breaking down a task to its most basic components. Development involves translating those components step by step into an automated process, which after many trials and errors, achieves the desired result. My first algorithm project took two years of revising, improving, and countless trials and errors.  So, whether creating an algorithm or working to understand one, don’t let the jargon nor the endless trials and errors stop you. Like most things in life, the key is to have patience and take it one step at a time.

References

Bird, C. N., Johnston, D.W., Dale, J. (in prep). Automated counting of Adelie penguins (Pygoscelis adeliae) on Avian and Torgersen Island off the Western Antarctic Peninsula using Thermal and Multispectral Imagery. Manuscript in preparation

Fletcher, R. S., Pulich, W. ‡, & Hardegree, B. (2009). A Semiautomated Approach for Monitoring Landscape Changes in Texas Seagrass Beds from Aerial Photography. https://doi.org/10.2112/07-0882.1

Moallem, Payman & Razmjooy, Navid. (2012). Optimal Threshold Computing in Automatic Image Thresholding using Adaptive Particle Swarm Optimization. Journal of Applied Research and Technology. 703.

Challenges of fecal analyses (Round 1)

By Leila Lemos, Ph.D. Student, Department of Fisheries and Wildlife, OSU

Fieldwork is done for the year and lab analyses just started with some challenges. This is not unexpected since no previous hormonal analysis has been conducted with any gray whale tissue, and whale fecal sample analysis is a relatively new technique. So, I have been thinking, learning, consulting, and creating a methodology as I go along. I am grateful to the expert advice and help from many great collaborators:

  • Kathleen Hunt (Northern Arizona University, AZ, United States)
  • Shawn Larson (Seattle Aquarium, WA, United States)
  • Amy Green (Seattle Aquarium, WA, United States)
  • Rachel Ann Hauser-Davis (Fiocruz, RJ, Brazil)
  • Maziet Cheseby (Oregon State University, OR, United States)
  • Scott Klasek (Oregon State University, OR, United States)

I have learned that an important step before undertaking fecal a hormonal analysis is the desalting process of the samples since salts can interfere in hormonal determinations, leading to false results. In order to remove salt content, each sample was first filtered (Fig. 1A), to remove a majority of the salt water content (Fig. 1B) that is inevitably collected along with the fecal sample. Each sample was then re-suspended in ultra-pure water, to dilute the remaining salt content in a higher water volume (Fig. 1C).

Figure 1: Analytical processes: (A) Filtration of the samples; (B) Result from filtration; (C) Addition of pure water to the samples.
Figure 1: Analytical processes: (A) Filtration of the samples; (B) Result from filtration; (C) Addition of pure water to the samples.

After these steps were completed for each sample, the samples were centrifuged (Fig. 2A) to  precipitate the fecal matter and leave the lighter salt ions in the supernatant (the liquid lying above a solid residue; Fig. 2B). After finishing these two phases, the water was removed with aid of a plastic pippete (Fig. 2C), and I was left with only desalted fecal at the bottom of the tubes (Fig. 2D).

Figure 2: Analytical processes: (A) Samples centrifugation; (B) Result from the centrifugation; (C, D) Results from separating water and sample.
Figure 2: Analytical processes: (A) Samples centrifugation; (B) Result from the centrifugation; (C, D) Results from separating water and sample.

The fecal samples were then frozen at -80°C (Fig. 3A & 3B) and then freeze-dried on a lyophilizer for 2 days to remove all remaining water content (Fig. 3C). Finally, I have what I need: desalted, dry fecal samples, ready for hormone analysis (Fig. 3D).

Figure 3: Analytical processes: (A) Freezing process of the samples; (B) Frozen samples ready to go to the lyophilizer; (C) Samples in the lyophilizer; (D) Final result of the lyophilizing process.
Figure 3: Analytical processes: (A) Freezing process of the samples; (B) Frozen samples ready to go to the lyophilizer; (C) Samples in the lyophilizer; (D) Final result of the lyophilizing process.

Writing this now, this process seems simple, but it was laborious, and took time to find the equipment needed at the right times. The end product is crucial to get a good final result, so my time investment (and my own increased stress level!) was worth it. This type of analysis is very new for marine mammals and our research lab is still in the learning the best methods. Along the way we were unsure of some decisions, some mistakes were made, and we were afraid of losing precious fecal material. But, this is the fun and challenge of working with a new species and new type of sample and, importantly, we have developed a working protocol that should make the process more efficient and reduce our stress levels next time around.

At the end of this sample preparation process, our 53 samples look great and are ready to be analyzed during my training at the Seattle Aquarium. We are also planning to analyze the water that was removed from the samples (Fig. 2D) to see if any hormone leached out from the poop into the water.

Results from this process will aid in future whale fecal hormone studies. Perhaps only the centrifugation step is needed and we can discard the water without losing hormone content. Or, perhaps we need to analyze both portions of the sample and sum the hormones found in each. We shall know the answer when we get our hormone metabolite results. Just another protocol to be worked out as I move ahead with the hormone analysis of these fecal samples. And through all these challenges I keep the end goal of this work in my mind: to learn about the reproductive and stress hormonal variation in gray whales and to link these variations to nutritional status and noise events. Onward!