In CS467, when I selected to work on a project that described the work as development of “an Interactive Web application,” I didn’t imagine it would involve anything to do with the field of Computer Vision. Prior to enrolling in the course, I spent sometime dusting off my javascript prowess thinking I could elect to work on a project that would have me honing those skills on the front-end of some web based app or potentially learning some commercially relevant web framework like Angular JS. What I didn’t know is that my expectations were about to be trashed. I don’t mean that in a bad way, it just wasn’t the trajectory I expected and the last thing I anticipated was to work on was training models for object detection, classification, or segmentation.
From our group’s the initial meeting, I understood that the Webapp was not going to be the focus of our endeavor for the quarter. Words like “AI” were being used in conversation and acronyms like ‘LLM’ and “neural network” were touched on, and I had a sinking feeling that I was out of my depth. Without clear direction, I reached for any viable resource I had to lean on to bring myself up to speed on what training a custom model involved. I quickly understood Python was going to be leveraged and while many CS students may feel comfortable with the language’s syntax; I had successfully navigated my course load without much need of it. I felt uneasy going through several tutorials prior to setting up my own testing environment where I trained my own custom model repeatedly. The process was slow (very slow). But from that point, I started to tinker with various options and parameters to tweak the results of my model’s output. I can’t get into specifics for fear of violating an NDA. But I began to see a workflow come together I and spent some time afterwards documenting my steps of procuring images, annotating those images, training the custom model, and eventually testing. Presently the training process moves a bit quicker than before, and I’ve streamlined acquiring the image files I need to my local machine; thanks to some scripting tools. But the tedious process of annotation is rather time consuming, and I am starting to see why certain services like Amazon SageMaker Ground Truth in AWS have sprung up and are charging a premium for the service. Where things go from here I don’t know. I assume I’ll one day be grateful for the exposure into this unique technology the project has brought me. But it’s a bit too early in the quarter to see the proverbial “light at the end” and I need to keep things moving.
Leave a Reply