As opposed to last week, we have finally contacted our client and gotten information on what we are specifically supposed to do on the project. Most of the work on the neural network has already been done. We are instead supposed to optimize how well the network can handle the recognition process. Currently the project can only handle images so well, with an error in recognition which is above what is allowable.
Another part of the project which I didn’t even consider is the inclusion of three dimensional convolutional neural networks to pattern match a video segment as opposed to a single image. Because some letters in American Sign Language require an aspect of movement, like the letter J, the network would need to analyze multiple images that are in sequence. Out of everything that has been gone over so far, things like LiDAR (Very similar to how GPS signals function) and CNN’s, this is what I’m most unfamiliar with. I believe the ramp up to understanding how we will incorporate a system to recognize movement will be fairly fast, but it will be difficult in of itself to implement. Many tensor libraries have a general lack of support for them and not very many guides that cover the subject.
It’s a bit disappointing to pick up directly where someone left off when all the most exciting segment has already been setup. From here on out it will essentially just be changing the size of the neural network, or changing the the hyperparameters. The same goes for the actual LiDAR camera, creating the proccessing for the images for the images would be really cool, but that is not an aspect of the project we are really working on. Although, this is a minor complaint, as there is still a lot of cool things to mess around with.
Categories