May 2023 – Jenna Bucien's CS467 Capstone Blog

Blog Post #4

We are still plugging away at our project, which is due in a few short weeks! In this post, I’ll describe some strategies I’ve used to complete my work so far:

How do you approach learning something new, like a new technology?

This project has required me to quickly learn many new technologies (at least at a basic level). For example, I needed a crash course in Tensorflow, a popular machine learning library. I approached learning Tensorflow similarly to how I’ve familiarized myself with new material in the past.

First, I poked around Google’s official Tensorflow developer’s website. The Tensorflow ecosystem is enormous, so it was a bit overwhelming at first. I found a section of the website called Introduction to Tensorflow, which acted as a gateway to beginner-friendly learning resources. For example, it linked to a great tutorial called Basic Classification: Classify Images of Clothing. Not only was this tutorial super topical (as my group had decided to convert our audio samples to image representations), but it was interactive (always great) since it was built on Google CoLabs.

Besides looking at official learning materials and documentation, I also like reading books (yes, paper printed books). There’s something about the thoroughness and structure of books that appeals to me. For this project, I read the first couple chapters of Deep Learning with Python by Francois Chollet (the creator of Keras). The book provided a great conceptual overview of machine learning and neural networks before diving into how to actually use Tensorflow. I also really enjoyed reading the specific chapters on computer vision.

Finally, I also like watching Youtube videos. The quality can be a bit hit-or-miss (and you have to use your best judgement about the accuracy of the information, which can be tricky if you’re a novice), but I still love Youtube as a supplementary resource. Finding a great Youtube tutorial is like striking gold. For this project, I enjoyed watching 3Blue1Brown’s series on neural networks, as well as Valerio Velardo – The Sound of Ai’s series about audio processing with Tensorflow.

So to sum up, I basically use a multi-pronged approach when trying to learn a new technology. I use a combination of resources in tandem to gain knowledge about the technology from both a theoretical and practical standpoint. I combine a mix of passive learning (reading, watching) and active learning (doing tutorials) . So far, I think my approach has been successful.

Do you use chatGPT or other AI tools? In what way?

I’ve only used chatGPT briefly. During the research phase of this project, I had to gain a basic understanding of deep learning, neural networks, computer vision, etc. Without any prior knowledge of these topics, I was a bit overwhelmed. For example, many sources explaining neural networks assumed an understanding of calculus and linear algebra (mathematics has always been my weakness).

I had read that many people were having success using chatGPT to clearly summarize complicated topics by prompting chatGPT with phrases like “explain <insert complex concept here>,” so I decided to try that. For example, I asked, “Can you explain how neural networks use back propagation simply?” ChatGPT immediately produced an easy-to-understand response that clarified what I had already read about back propagation.

Then, I decided to put chatGPT to a bit of a harder test. I asked it, “How can I use a neural network to categorize music into genres?” In other words, I wanted to see how it would (broadly) design our project! The results were pretty stunning. It immediately generated an 8-step guide that reaffirmed the steps my group were planning to take almost exactly. The guide provided a broad overview about dataset preparation, feature extraction, data preprocessing, neural network architecture, model training, and model evaluation (about 2-3 sentences for each step). Even more impressively, when I prompted it with follow-up questions, the bot provided answers that made sense contextually within the scope of our conversation. For instance, when I asked it to give “more information about Step 2,” it knew what I was referring to and immediately delved into examples of audio feature extraction. Impressive (and a little frightening)!

That was the extent of my experimentation with chatGPT for this project. I can definitely see how chatGPT can be a great resource for breaking down tough concepts, augmenting one’s background research, and project planning. However, there are still limitations. For example, it didn’t provide any specific sources. It could give me complete garbage and if I wasn’t already at least a little versed on the subject, I might believe I was getting correct information. That’s why I think it should always be used in conjunction with other research methods, and not as a replacement.

In conclusion, chatGPT is an exciting albeit scary tool that will undoubtedly change how we live, learn, and work. Humans must all deeply consider how to use this technology ethically (although I doubt we will reach consensus on what “ethically” entails).

_______

Anyway, this is my last assigned blog post. I hope you’ve enjoyed reading about my capstone journey (and its ups and downs)!

Blog Post #3

Hi again! As I approach the half-way mark for my team’s Capstone project, I’d like to reflect on some of our hits and misses (and there been a lot of misses!) so far. To be blunt, since the last time I checked in here, it’s been a rough journey.

I think my group has two main issues:

Our project requires us to understand concepts and use technologies that we have absolutely no experience in.
Our communication and sharing of information has been poor.

These two issues feed off of each other, which is why our journey has been less-than-smooth. Let me get into some more detail:

To review, my group is working on the Top-N Music Genre Classification Neural Network project. When we chose this project, none of us had any clue what a neural network was, let alone how to do any of the following steps necessary to complete the project, such as:

What types of datasets are suitable for this project? How should we store the dataset in a way that makes it easy for Tensorflow to process?
How do we process the audio in our dataset? What output should the audio be in?
What inputs does Keras accept that would be easiest to implement?
Are we able to use Tensorflow locally, or should we use Google Colabs or some other cloud-based service that provides virtual machines?
How do we deploy our classification model as a web service?

And most problematically, we were (and are still) so novice that we didn’t even know what steps were required in the first place. Therefore, our Project Plan was vague. Too vague.

For example, my main task for last week was to create the audio pipeline. When we created the Project Plan, we didn’t really even know what an “audio pipeline” meant, let alone what kind of work it would entail. Therefore, we didn’t break down this large task into manageable, “trackable” subtasks. We also didn’t have a concrete “definition of done,” nor did we think about how exactly the processed audio would be fed to the neural network. Yikes!

So, I ended up dividing each audio sample in the GTZAN dataset into three 10-second segments. Then, I transformed each segment into a mel spectrogram image. But because I didn’t have an underlying understanding of librosa (the audio processing Python library) before starting the task, I didn’t realize that librosa outputs a mel spectrogram as a 2D numpy array. To create an actual human-readable image, I needed to use matplotlib’s Pyplot library to create a plot, then export the plot as a JPG file. I had looked up an overview of creating a convolutional neural network with Tensorflow, so I knew that image data is first converted to tensors, which are similar to numpy arrays. So, I figured that it would be best to save the librosa mel spectrogram data as numpy arrays rather than convert each one to a JPG. Seems simple enough right?

Wrong!

Although I told my teammate assigned to building the neural network that I intended to do this, he hadn’t actually begun implementing his program (it was earlier in the week). Later, I realized that he was using the Keras utility image_dataset_from_directory to input the dataset. This utility requires the dataset to be organized as image files located in labeled subdirectories. Instead, I had put the data in MongoDB database, where each document consisted of the sample’s name, genre, and mel spectrogram (as a compressed numpy array). This format was completely incompatible with the Keras utility. And if we wanted to continue with this format, we’d have to do a lot more pre-processing with Tensorflow itself (without the nice, high-level Keras library).

So, I spent a decent amount of additional time:

Exporting the MongoDB collection as multiple JSON files (one JSON file per genre)
Decompressing each document’s numpy array
Converting each numpy array into a JPG image
Reorganizing the file structure to conform to the structure the Keras utility uses.

This extra work could have been avoided had my teammate and I communicated more clearly. However, I think it also displays how our novice understanding of how neural networks are programmed is undermining our ability to plan and work efficiently. And of course, being unfamiliar with tools such as librosa, Keras, and Tensorflow is also challenging. I’m sure there’s a far better way out there to solve the example issue I explained above, but given how unfamiliar we all are with everything, we can’t see it.

This has been a learning experience for sure, and I think my group still has a chance of meeting our project goals if we take what we learned these past weeks into consideration. I’m still proud of how much we’ve achieved in this short period of time.

Recent Posts

Recent Comments

Archives

Categories