Categories
Uncategorized

Technology

Responsibilities

I am in charge of a fairly specific part of my Genre Classification Neural Network project. In order for our machine learning model to process audio, we need some way to convert it to a format the model can read. My job is to build this bridge by producing a process that can effectively convert audio files into spectrogram images.

Technologies

The most crucial technology that I use for this portion of the project is a Python library called Librosa. Librosa takes audio files and converts them into usable data such as waveforms, spectrograms, or MFCCs (which is what we will likely be using from this point on in our project). This conversion from spectrograms to MFCCs is a little intimidating to me but the good news is that the two processes will be very similar by using Librosa.

One of the biggest challenges that we have run into while working through this project is Librosa’s inability to elegantly process some audio file types. So far, it has no problem processing .au and .mp3 files but when we wanted to start running files like mp4, m4a, and webm it became a bit more of a struggle. My teammate who is in charge of building out the dataset found a quick solution before I could, which was a built-in program called ffmpeg. Librosa automatically runs files using other tools such as ffmpeg and audio-read which has been super useful.

Products

Spectrograms are a type of image used to interpret audio visually. Typically, it is a measurement of frequency in terms of time with an added color intensity representing density. Below is an example of a person saying the words “nineteenth century”. This was our initial plan for the input into the neural network but we realized that converting directly from audio to MFCC would be a bit more efficient.

MFCCs are a lot like spectrograms. Much of the process to create them is exactly the same. The difference is that MFCCs are a representation of audio through arrays of numbers rather than visual. This makes it easy to use json files rather than thousands of jpegs of spectrograms. Additionally, these MFCCs can be bunched together and placed into single json files rather than producing an output file for every single audio file in the database.

Sources

https://en.wikipedia.org/wiki/Spectrogram

Leave a Reply

Your email address will not be published. Required fields are marked *