Categories
Uncategorized

Blog Post #4

We are still plugging away at our project, which is due in a few short weeks! In this post, I’ll describe some strategies I’ve used to complete my work so far:

How do you approach learning something new, like a new technology?

This project has required me to quickly learn many new technologies (at least at a basic level). For example, I needed a crash course in Tensorflow, a popular machine learning library. I approached learning Tensorflow similarly to how I’ve familiarized myself with new material in the past.

First, I poked around Google’s official Tensorflow developer’s website. The Tensorflow ecosystem is enormous, so it was a bit overwhelming at first. I found a section of the website called Introduction to Tensorflow, which acted as a gateway to beginner-friendly learning resources. For example, it linked to a great tutorial called Basic Classification: Classify Images of Clothing. Not only was this tutorial super topical (as my group had decided to convert our audio samples to image representations), but it was interactive (always great) since it was built on Google CoLabs.

Besides looking at official learning materials and documentation, I also like reading books (yes, paper printed books). There’s something about the thoroughness and structure of books that appeals to me. For this project, I read the first couple chapters of Deep Learning with Python by Francois Chollet (the creator of Keras). The book provided a great conceptual overview of machine learning and neural networks before diving into how to actually use Tensorflow. I also really enjoyed reading the specific chapters on computer vision.

Finally, I also like watching Youtube videos. The quality can be a bit hit-or-miss (and you have to use your best judgement about the accuracy of the information, which can be tricky if you’re a novice), but I still love Youtube as a supplementary resource. Finding a great Youtube tutorial is like striking gold. For this project, I enjoyed watching 3Blue1Brown’s series on neural networks, as well as Valerio Velardo – The Sound of Ai’s series about audio processing with Tensorflow.

So to sum up, I basically use a multi-pronged approach when trying to learn a new technology. I use a combination of resources in tandem to gain knowledge about the technology from both a theoretical and practical standpoint. I combine a mix of passive learning (reading, watching) and active learning (doing tutorials) . So far, I think my approach has been successful.

Do you use chatGPT or other AI tools? In what way?

I’ve only used chatGPT briefly. During the research phase of this project, I had to gain a basic understanding of deep learning, neural networks, computer vision, etc. Without any prior knowledge of these topics, I was a bit overwhelmed. For example, many sources explaining neural networks assumed an understanding of calculus and linear algebra (mathematics has always been my weakness).

I had read that many people were having success using chatGPT to clearly summarize complicated topics by prompting chatGPT with phrases like “explain <insert complex concept here>,” so I decided to try that. For example, I asked, “Can you explain how neural networks use back propagation simply?” ChatGPT immediately produced an easy-to-understand response that clarified what I had already read about back propagation.

Then, I decided to put chatGPT to a bit of a harder test. I asked it, “How can I use a neural network to categorize music into genres?” In other words, I wanted to see how it would (broadly) design our project! The results were pretty stunning. It immediately generated an 8-step guide that reaffirmed the steps my group were planning to take almost exactly. The guide provided a broad overview about dataset preparation, feature extraction, data preprocessing, neural network architecture, model training, and model evaluation (about 2-3 sentences for each step). Even more impressively, when I prompted it with follow-up questions, the bot provided answers that made sense contextually within the scope of our conversation. For instance, when I asked it to give “more information about Step 2,” it knew what I was referring to and immediately delved into examples of audio feature extraction. Impressive (and a little frightening)!

That was the extent of my experimentation with chatGPT for this project. I can definitely see how chatGPT can be a great resource for breaking down tough concepts, augmenting one’s background research, and project planning. However, there are still limitations. For example, it didn’t provide any specific sources. It could give me complete garbage and if I wasn’t already at least a little versed on the subject, I might believe I was getting correct information. That’s why I think it should always be used in conjunction with other research methods, and not as a replacement.

In conclusion, chatGPT is an exciting albeit scary tool that will undoubtedly change how we live, learn, and work. Humans must all deeply consider how to use this technology ethically (although I doubt we will reach consensus on what “ethically” entails).

_______

Anyway, this is my last assigned blog post. I hope you’ve enjoyed reading about my capstone journey (and its ups and downs)!

Categories
Uncategorized

Blog Post #3

Hi again! As I approach the half-way mark for my team’s Capstone project, I’d like to reflect on some of our hits and misses (and there been a lot of misses!) so far. To be blunt, since the last time I checked in here, it’s been a rough journey.

I think my group has two main issues:

  1. Our project requires us to understand concepts and use technologies that we have absolutely no experience in.
  2. Our communication and sharing of information has been poor.

These two issues feed off of each other, which is why our journey has been less-than-smooth. Let me get into some more detail:

To review, my group is working on the Top-N Music Genre Classification Neural Network project. When we chose this project, none of us had any clue what a neural network was, let alone how to do any of the following steps necessary to complete the project, such as:

  • What types of datasets are suitable for this project? How should we store the dataset in a way that makes it easy for Tensorflow to process?
  • How do we process the audio in our dataset? What output should the audio be in?
  • What inputs does Keras accept that would be easiest to implement?
  • Are we able to use Tensorflow locally, or should we use Google Colabs or some other cloud-based service that provides virtual machines?
  • How do we deploy our classification model as a web service?

And most problematically, we were (and are still) so novice that we didn’t even know what steps were required in the first place. Therefore, our Project Plan was vague. Too vague.

For example, my main task for last week was to create the audio pipeline. When we created the Project Plan, we didn’t really even know what an “audio pipeline” meant, let alone what kind of work it would entail. Therefore, we didn’t break down this large task into manageable, “trackable” subtasks. We also didn’t have a concrete “definition of done,” nor did we think about how exactly the processed audio would be fed to the neural network. Yikes!

So, I ended up dividing each audio sample in the GTZAN dataset into three 10-second segments. Then, I transformed each segment into a mel spectrogram image. But because I didn’t have an underlying understanding of librosa (the audio processing Python library) before starting the task, I didn’t realize that librosa outputs a mel spectrogram as a 2D numpy array. To create an actual human-readable image, I needed to use matplotlib’s Pyplot library to create a plot, then export the plot as a JPG file. I had looked up an overview of creating a convolutional neural network with Tensorflow, so I knew that image data is first converted to tensors, which are similar to numpy arrays. So, I figured that it would be best to save the librosa mel spectrogram data as numpy arrays rather than convert each one to a JPG. Seems simple enough right?

Wrong!

Although I told my teammate assigned to building the neural network that I intended to do this, he hadn’t actually begun implementing his program (it was earlier in the week). Later, I realized that he was using the Keras utility image_dataset_from_directory to input the dataset. This utility requires the dataset to be organized as image files located in labeled subdirectories. Instead, I had put the data in MongoDB database, where each document consisted of the sample’s name, genre, and mel spectrogram (as a compressed numpy array). This format was completely incompatible with the Keras utility. And if we wanted to continue with this format, we’d have to do a lot more pre-processing with Tensorflow itself (without the nice, high-level Keras library).

So, I spent a decent amount of additional time:

  • Exporting the MongoDB collection as multiple JSON files (one JSON file per genre)
  • Decompressing each document’s numpy array
  • Converting each numpy array into a JPG image
  • Reorganizing the file structure to conform to the structure the Keras utility uses.

This extra work could have been avoided had my teammate and I communicated more clearly. However, I think it also displays how our novice understanding of how neural networks are programmed is undermining our ability to plan and work efficiently. And of course, being unfamiliar with tools such as librosa, Keras, and Tensorflow is also challenging. I’m sure there’s a far better way out there to solve the example issue I explained above, but given how unfamiliar we all are with everything, we can’t see it.

This has been a learning experience for sure, and I think my group still has a chance of meeting our project goals if we take what we learned these past weeks into consideration. I’m still proud of how much we’ve achieved in this short period of time.

Categories
Required

Blog Post #2

Hello again! Luckily, my group was assigned the project that we ranked first: Top-n Music Genre Classification Neural Networks! Since my last post, we’ve written our Team Standards and have submitted our Project Plan. To choose the technologies for our project, we looked at our project’s two main components: (1) audio processing and (2) neural network training/implementation.

Here’s some highlights of the technologies we’ll be using:

  • Dataset: GTZAN dataset
    • Why we picked it: Tzanetakis and Cook created this popular dataset for their influential 2002 study on music genre classification. The dataset contains 1,000 30-second music files categorized into 10 genres. Though Tzanetakis and Cook didn’t use deep learning for their original study, later music classification studies that do utilize deep learning methods often use the GTZAN dataset.
    • How we’ll use it: We will store the raw .wav files in Google Cloud Storage. Then, we’ll create a MongoDB database where each sample is represented as a document containing the sample url, genre, and other metadata. This dataset will serve as the training data for our neural network.
    • Pros: The GTZAN dataset is clean, easy-to-use, and created specifically for music genre classification. From the most basic standpoint, it’s almost plug-n-play.
    • Cons: It’s a relatively small dataset, especially in the realm of deep learning. We also don’t know how Tzanetakis and Cook chose the samples. Since it’s a popular dataset, we risk following in the footsteps of other studies and possibly replicating their shortcomings. To overcome the small data size, we’re considering splitting the clips into shorter samples and/or adding new samples to the dataset.
  • Music Processing: Librosa
    • Why we picked it: Because we’ll already be using Tensorflow (which uses Python) to train our neural network, it makes sense to also use a Python library for the audio processing portion. Librosa is the go-to open source Python library for anything audio.
    • How we’ll use it: With Librosa, we’ll load the GTZAN audio samples and produce mel spectrograms for each one. These visual representations can act as inputs for our convolutional neural network, which will map the features it identifies into genre categories.
    • Pros: Librosa is an open source library with thorough documentation and an active online community. Librosa’s capability to produce mel spectrograms allows us to use a CNN (which is widely used for visual imagery analysis) instead of a different neural network model.
    • Cons: None of us have a background in music theory or audio technology, so understanding the documentation is difficult. I had no idea what a mel spectrogram was before starting my research for this project, and that doesn’t even begin to scratch the surface of Librosa’s features and capabilities.
  • Machine Learning Framework: Tensorflow
    • Why we picked it: Python seems to be the de facto language for machine learning, and Tensorflow is one of the most popular Python machine learning frameworks. It also doesn’t hurt that all of us are most comfortable with Python (as opposed to other languages)!
    • How we’ll use it: We’ll use the mel spectrograms as the input layer for our neural network. Tensorflow will allow us to train the neural network and eventually output music genre classifications for new sample data.
    • Pros: There are ample online resources for learning how to use Tensorflow, and the Tensorflow documentation is well-organized and easy to understand. Tensorflow will take care of the underlying calculus and linear algebra necessary for setting up input matrices, calculating gradient descent, performing back propagation, etc.
    • Cons: It might have a steep learning curve, especially since none of us had any prior knowledge of neural networks conceptually or experience using Tensorflow. There’s a risk that we’ll follow tutorials without actually understanding what’s actually going on.

This project is a lot to handle conceptually, but I’m excited to see what we can achieve in the next few weeks!

Categories
Required

Blog Post #1

Hello world! My name is Jenna Bucien. I currently live in San Jose, CA and will graduate this June.

In this blog, I will reflect on my journey completing my CS 467 capstone project. Here’s my self-introduction:

Why did you choose to go into Computer Science?

Like all students in Oregon State’s Computer Science Post-Bacc program, I hold a bachelor’s degree in another field. In my case, I graduated from Barnard College in 2019 with a BA in East Asian Studies. After graduation, I worked as a travel planner for a boutique travel agency in New York. Unfortunately, roughly seven months later, the Covid-19 pandemic crashed the travel industry, and thus my new professional life. Yikes!

While sitting at home unemployed, I decided to pivot towards a different career direction. I was always attracted to Computer Science because its concepts are the backbone of our world’s technological progress and innovation. How do these seemingly magical technologies work? It would be amazing to contribute to their development and advancement! So, to test the waters, I enrolled in a self-paced Python course. I ended up loving the analytical, problem-solving aspects of coding, and shortly afterwards decided to apply to OSU. And here I am!

Now that I’ve almost completed this computer science degree, I can say that I have no regrets. I still love working iteratively to solve problems. Although there are definitely lots of frustrating moments, the high I get from making a tiny bit of progress is addictive. I’m excited to enter the professional tech field and continue my computer science adventure!

Why did you choose the projects you did on the survey? What makes them interesting to you?

The projects that I chose are all related to artificial intelligence and machine learning. Here’s my ranked list:

  1. Top-n Music Classification
  2. AI/ML Trading Bot
  3. Investor Match.ai
  4. ML Breakout
  5. Smart Recycling Bin
  6. Data Mining of Disparate Date Sets

It’s not shocking to any of us that AI/ML is shaking up the world. Headlines about the coming AI revolution are constant, and every company wants in on the latest AI technology. (Even my 90-year-old grandpa knows what ChatGPT is!)

Unfortunately, because AI is an advanced, complex, and vast field, I haven’t been exposed any AI/ML concepts in my time at OSU thus far. So, the opportunity to scrape the surface of AI through these capstone projects strongly appeals to me.

In particular, my top choice is the Top-n Music Classification project because I am interested in how to train, test, and validate neural networks using Keras or Tensorflow. I think it will be fascinating exploring how programs learn to classify objects in datasets. What characteristics and methodologies do they use? How quickly do they learn? What biases and/or unexpected results should we watch out for? I’d love to gain a basic understanding of neutral networking, as well as a how it can be applied to a subject all humans relate to–music! (Learning how Python can perform auditory processing is also a plus.)

I hope that I learn valuable skills through this capstone project, and I’m excited to share my progress with you through this blog. I look forward to writing my next entry!