Required – Jenna Bucien's CS467 Capstone Blog

Hello again! Luckily, my group was assigned the project that we ranked first: Top-n Music Genre Classification Neural Networks! Since my last post, we’ve written our Team Standards and have submitted our Project Plan. To choose the technologies for our project, we looked at our project’s two main components: (1) audio processing and (2) neural network training/implementation.

Here’s some highlights of the technologies we’ll be using:

Dataset: GTZAN dataset
- Why we picked it: Tzanetakis and Cook created this popular dataset for their influential 2002 study on music genre classification. The dataset contains 1,000 30-second music files categorized into 10 genres. Though Tzanetakis and Cook didn’t use deep learning for their original study, later music classification studies that do utilize deep learning methods often use the GTZAN dataset.
- How we’ll use it: We will store the raw .wav files in Google Cloud Storage. Then, we’ll create a MongoDB database where each sample is represented as a document containing the sample url, genre, and other metadata. This dataset will serve as the training data for our neural network.
- Pros: The GTZAN dataset is clean, easy-to-use, and created specifically for music genre classification. From the most basic standpoint, it’s almost plug-n-play.
- Cons: It’s a relatively small dataset, especially in the realm of deep learning. We also don’t know how Tzanetakis and Cook chose the samples. Since it’s a popular dataset, we risk following in the footsteps of other studies and possibly replicating their shortcomings. To overcome the small data size, we’re considering splitting the clips into shorter samples and/or adding new samples to the dataset.
Music Processing: Librosa
- Why we picked it: Because we’ll already be using Tensorflow (which uses Python) to train our neural network, it makes sense to also use a Python library for the audio processing portion. Librosa is the go-to open source Python library for anything audio.
- How we’ll use it: With Librosa, we’ll load the GTZAN audio samples and produce mel spectrograms for each one. These visual representations can act as inputs for our convolutional neural network, which will map the features it identifies into genre categories.
- Pros: Librosa is an open source library with thorough documentation and an active online community. Librosa’s capability to produce mel spectrograms allows us to use a CNN (which is widely used for visual imagery analysis) instead of a different neural network model.
- Cons: None of us have a background in music theory or audio technology, so understanding the documentation is difficult. I had no idea what a mel spectrogram was before starting my research for this project, and that doesn’t even begin to scratch the surface of Librosa’s features and capabilities.
Machine Learning Framework: Tensorflow
- Why we picked it: Python seems to be the de facto language for machine learning, and Tensorflow is one of the most popular Python machine learning frameworks. It also doesn’t hurt that all of us are most comfortable with Python (as opposed to other languages)!
- How we’ll use it: We’ll use the mel spectrograms as the input layer for our neural network. Tensorflow will allow us to train the neural network and eventually output music genre classifications for new sample data.
- Pros: There are ample online resources for learning how to use Tensorflow, and the Tensorflow documentation is well-organized and easy to understand. Tensorflow will take care of the underlying calculus and linear algebra necessary for setting up input matrices, calculating gradient descent, performing back propagation, etc.
- Cons: It might have a steep learning curve, especially since none of us had any prior knowledge of neural networks conceptually or experience using Tensorflow. There’s a risk that we’ll follow tutorials without actually understanding what’s actually going on.

This project is a lot to handle conceptually, but I’m excited to see what we can achieve in the next few weeks!

Hello world! My name is Jenna Bucien. I currently live in San Jose, CA and will graduate this June.

In this blog, I will reflect on my journey completing my CS 467 capstone project. Here’s my self-introduction:

Why did you choose to go into Computer Science?

Like all students in Oregon State’s Computer Science Post-Bacc program, I hold a bachelor’s degree in another field. In my case, I graduated from Barnard College in 2019 with a BA in East Asian Studies. After graduation, I worked as a travel planner for a boutique travel agency in New York. Unfortunately, roughly seven months later, the Covid-19 pandemic crashed the travel industry, and thus my new professional life. Yikes!

While sitting at home unemployed, I decided to pivot towards a different career direction. I was always attracted to Computer Science because its concepts are the backbone of our world’s technological progress and innovation. How do these seemingly magical technologies work? It would be amazing to contribute to their development and advancement! So, to test the waters, I enrolled in a self-paced Python course. I ended up loving the analytical, problem-solving aspects of coding, and shortly afterwards decided to apply to OSU. And here I am!

Now that I’ve almost completed this computer science degree, I can say that I have no regrets. I still love working iteratively to solve problems. Although there are definitely lots of frustrating moments, the high I get from making a tiny bit of progress is addictive. I’m excited to enter the professional tech field and continue my computer science adventure!

Why did you choose the projects you did on the survey? What makes them interesting to you?

The projects that I chose are all related to artificial intelligence and machine learning. Here’s my ranked list:

Top-n Music Classification
AI/ML Trading Bot
Investor Match.ai
ML Breakout
Smart Recycling Bin
Data Mining of Disparate Date Sets

It’s not shocking to any of us that AI/ML is shaking up the world. Headlines about the coming AI revolution are constant, and every company wants in on the latest AI technology. (Even my 90-year-old grandpa knows what ChatGPT is!)

Unfortunately, because AI is an advanced, complex, and vast field, I haven’t been exposed any AI/ML concepts in my time at OSU thus far. So, the opportunity to scrape the surface of AI through these capstone projects strongly appeals to me.

In particular, my top choice is the Top-n Music Classification project because I am interested in how to train, test, and validate neural networks using Keras or Tensorflow. I think it will be fascinating exploring how programs learn to classify objects in datasets. What characteristics and methodologies do they use? How quickly do they learn? What biases and/or unexpected results should we watch out for? I’d love to gain a basic understanding of neutral networking, as well as a how it can be applied to a subject all humans relate to–music! (Learning how Python can perform auditory processing is also a plus.)

I hope that I learn valuable skills through this capstone project, and I’m excited to share my progress with you through this blog. I look forward to writing my next entry!

Recent Posts

Recent Comments

Archives

Categories