Categories
Required

Blog Post #2

Hello again! Luckily, my group was assigned the project that we ranked first: Top-n Music Genre Classification Neural Networks! Since my last post, we’ve written our Team Standards and have submitted our Project Plan. To choose the technologies for our project, we looked at our project’s two main components: (1) audio processing and (2) neural network training/implementation.

Here’s some highlights of the technologies we’ll be using:

  • Dataset: GTZAN dataset
    • Why we picked it: Tzanetakis and Cook created this popular dataset for their influential 2002 study on music genre classification. The dataset contains 1,000 30-second music files categorized into 10 genres. Though Tzanetakis and Cook didn’t use deep learning for their original study, later music classification studies that do utilize deep learning methods often use the GTZAN dataset.
    • How we’ll use it: We will store the raw .wav files in Google Cloud Storage. Then, we’ll create a MongoDB database where each sample is represented as a document containing the sample url, genre, and other metadata. This dataset will serve as the training data for our neural network.
    • Pros: The GTZAN dataset is clean, easy-to-use, and created specifically for music genre classification. From the most basic standpoint, it’s almost plug-n-play.
    • Cons: It’s a relatively small dataset, especially in the realm of deep learning. We also don’t know how Tzanetakis and Cook chose the samples. Since it’s a popular dataset, we risk following in the footsteps of other studies and possibly replicating their shortcomings. To overcome the small data size, we’re considering splitting the clips into shorter samples and/or adding new samples to the dataset.
  • Music Processing: Librosa
    • Why we picked it: Because we’ll already be using Tensorflow (which uses Python) to train our neural network, it makes sense to also use a Python library for the audio processing portion. Librosa is the go-to open source Python library for anything audio.
    • How we’ll use it: With Librosa, we’ll load the GTZAN audio samples and produce mel spectrograms for each one. These visual representations can act as inputs for our convolutional neural network, which will map the features it identifies into genre categories.
    • Pros: Librosa is an open source library with thorough documentation and an active online community. Librosa’s capability to produce mel spectrograms allows us to use a CNN (which is widely used for visual imagery analysis) instead of a different neural network model.
    • Cons: None of us have a background in music theory or audio technology, so understanding the documentation is difficult. I had no idea what a mel spectrogram was before starting my research for this project, and that doesn’t even begin to scratch the surface of Librosa’s features and capabilities.
  • Machine Learning Framework: Tensorflow
    • Why we picked it: Python seems to be the de facto language for machine learning, and Tensorflow is one of the most popular Python machine learning frameworks. It also doesn’t hurt that all of us are most comfortable with Python (as opposed to other languages)!
    • How we’ll use it: We’ll use the mel spectrograms as the input layer for our neural network. Tensorflow will allow us to train the neural network and eventually output music genre classifications for new sample data.
    • Pros: There are ample online resources for learning how to use Tensorflow, and the Tensorflow documentation is well-organized and easy to understand. Tensorflow will take care of the underlying calculus and linear algebra necessary for setting up input matrices, calculating gradient descent, performing back propagation, etc.
    • Cons: It might have a steep learning curve, especially since none of us had any prior knowledge of neural networks conceptually or experience using Tensorflow. There’s a risk that we’ll follow tutorials without actually understanding what’s actually going on.

This project is a lot to handle conceptually, but I’m excited to see what we can achieve in the next few weeks!

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *