Hi again! As I approach the half-way mark for my team’s Capstone project, I’d like to reflect on some of our hits and misses (and there been a lot of misses!) so far. To be blunt, since the last time I checked in here, it’s been a rough journey.
I think my group has two main issues:
- Our project requires us to understand concepts and use technologies that we have absolutely no experience in.
- Our communication and sharing of information has been poor.
These two issues feed off of each other, which is why our journey has been less-than-smooth. Let me get into some more detail:
To review, my group is working on the Top-N Music Genre Classification Neural Network project. When we chose this project, none of us had any clue what a neural network was, let alone how to do any of the following steps necessary to complete the project, such as:
- What types of datasets are suitable for this project? How should we store the dataset in a way that makes it easy for Tensorflow to process?
- How do we process the audio in our dataset? What output should the audio be in?
- What inputs does Keras accept that would be easiest to implement?
- Are we able to use Tensorflow locally, or should we use Google Colabs or some other cloud-based service that provides virtual machines?
- How do we deploy our classification model as a web service?
And most problematically, we were (and are still) so novice that we didn’t even know what steps were required in the first place. Therefore, our Project Plan was vague. Too vague.
For example, my main task for last week was to create the audio pipeline. When we created the Project Plan, we didn’t really even know what an “audio pipeline” meant, let alone what kind of work it would entail. Therefore, we didn’t break down this large task into manageable, “trackable” subtasks. We also didn’t have a concrete “definition of done,” nor did we think about how exactly the processed audio would be fed to the neural network. Yikes!
So, I ended up dividing each audio sample in the GTZAN dataset into three 10-second segments. Then, I transformed each segment into a mel spectrogram image. But because I didn’t have an underlying understanding of librosa (the audio processing Python library) before starting the task, I didn’t realize that librosa outputs a mel spectrogram as a 2D numpy array. To create an actual human-readable image, I needed to use matplotlib’s Pyplot library to create a plot, then export the plot as a JPG file. I had looked up an overview of creating a convolutional neural network with Tensorflow, so I knew that image data is first converted to tensors, which are similar to numpy arrays. So, I figured that it would be best to save the librosa mel spectrogram data as numpy arrays rather than convert each one to a JPG. Seems simple enough right?
Wrong!
Although I told my teammate assigned to building the neural network that I intended to do this, he hadn’t actually begun implementing his program (it was earlier in the week). Later, I realized that he was using the Keras utility image_dataset_from_directory to input the dataset. This utility requires the dataset to be organized as image files located in labeled subdirectories. Instead, I had put the data in MongoDB database, where each document consisted of the sample’s name, genre, and mel spectrogram (as a compressed numpy array). This format was completely incompatible with the Keras utility. And if we wanted to continue with this format, we’d have to do a lot more pre-processing with Tensorflow itself (without the nice, high-level Keras library).
So, I spent a decent amount of additional time:
- Exporting the MongoDB collection as multiple JSON files (one JSON file per genre)
- Decompressing each document’s numpy array
- Converting each numpy array into a JPG image
- Reorganizing the file structure to conform to the structure the Keras utility uses.
This extra work could have been avoided had my teammate and I communicated more clearly. However, I think it also displays how our novice understanding of how neural networks are programmed is undermining our ability to plan and work efficiently. And of course, being unfamiliar with tools such as librosa, Keras, and Tensorflow is also challenging. I’m sure there’s a far better way out there to solve the example issue I explained above, but given how unfamiliar we all are with everything, we can’t see it.
This has been a learning experience for sure, and I think my group still has a chance of meeting our project goals if we take what we learned these past weeks into consideration. I’m still proud of how much we’ve achieved in this short period of time.