The past term has been a pretty hectic one when it comes to project management. Although the capstone has basically been finished in terms of the neural network aspects, the GUI needed to be finished, which was quite a bit of work. To go along with this, I have been working on a nother project which is, I would say, much more complex than the capstone project. It involves using a YOLO algorithm to check an image and it’s depth based counterpart given a stereoscopic camera to find the corners of a box and measure depth. The most difficult aspects of that project have been attempting to calibrate the cameras and finding the pixel perfect corners of a box. I have already attempted many different methods, but every one runs into some problem. Originally the idea was to just use a YOLO and use the network to solve for depth on a monoscopic camera. However, when it comes to gathering data, it would take an unreasonable amount of time to attempt to generalize position and box attributes, as I cannot use mutation.
This is something that is apparently not discussed a lot, but transformations of a 2d image warp the object in 3d space, atleast accoring to a pinhole schematic. This is due to the oblique nature of how a 3d space is transformed into a 2d image. This is normally fine for most networks if you are attempting to simply draw dots on corners or to detect an object, however when it comes to measuring depth, everything must be perfect. the warping normally occurs when there is parts of an object being obscured, as a shift along the 2d image should bring sections of the object into view that were not there before. This and, there is no good guide out there for solving for depth when it comes to mass transformations of shifting and rotation, based on a 2d mutation to 3d space. I drew a ton of diagrams to try to solve this problem and decided it was just easier to get a stereoscopic camera.
The next idea was to use Sobel edge detection along with the depth map from a calibrated stereo image to draw objects based on their adjacency. This works some of the time, however depth maps have a lot of artefacting that can occur next to an object edge, from the fact that one camera can see a section of an object, wheras that section is gone in the other camera. Not only this but it relies on an environment where the colors are very distinct from one another, otherwise there will be open edges where an adjacency check algorithm will fill out the whole image instead of sections of the box. Where I am at right now is just going to throw the image and depth map to the network and hope it gets a good corner, by training on a very limited set of angles as opposed to all possible options. Then I can just force my client to place the box in a specific way. Then epipolar geometry and convolution matching can be used to find the other corner on the other image. It’s simple from there, as simple geometry is all that’s needed to calculate depth to the corner of the box. Then with depth, the dimensions of the box can be solved from the intrinsic angles of the camera and the relative angles of the corners themselves using the law of cosines.
Eighth Blog Post Term 2
This week I have been spending a lot of time learning how to create a complex loss and displaying different metrics from Tensorflow. A gui to display the output labels from the NN was difficult because there is quite a bit of back end to Tensorflow that changes what’s passed in and passed out of things defined by the user. In the end, I finally managed to get a bunch of different metrics to be displayed every batch on a tkinter basic gui, along with some image representations of the actual and prediction labels.
With more information being displayed for the network, it was easier to understand why my complex loss function wasn’t working. After spending ~15 hours experimenting with different aspects of the network, I finally figured out that I was using a softmax activation on my output like an idiot. Now, with all the experience I had messing with the loss functionality, I quickly changed things around so the model learns the different aspects of the output labels. This link leads to a video showing the network learn in real time:
https://www.youtube.com/watch?v=evwahPfXUXg&t=340s
The first index is object probability, meaning if the index is 0, there is no object in frame, which is why the difference image (the leftmosst one) still has dots appearing at the end.
Seventh Blog Term 2
This week I have been working pretty much exclusively on research and my other capstone. Although I basically finished the other capstone already, as it has accomplished all tasks that have been set out for it , I have continued to improve on it. I home brewed a small YOLO (you only look once) CNN for multi-object detection. It’s much more complex than a simple resnet implementation as it requires a custom loss as opposed to a simple MSE loss method. I managed to get it to work, and it gave me a lot of insight in how to create a functioning NN that is specifically crafted for a specific scenario.
Now, moving forward, I will be using a lot of these new ideas in implemented a new NN for the main capstone. However, just three hours ago, my power supply for my laptop blew out, which means that priorities have shifted a bit. I am currently staying in my parents house in Portland, and don’t have access to my backups on my desktop. So now it’s a game of trying to get files over to this temporary laptop (my moms) to get something working. Hopefully it will all work out and I’ll be able to get some of my new code out for the code review on Sunday.
Sixth Blog Term 2
Now that our group is nearing the end of our project, we can begin to mess around with different architectures that may help in categorizing different gestures. I have been researching residual networks, which may help with the disheartening gradients issue. I do not know how it will interactive with a 3D CNN/LSTM combination but we shall see.
I have gotten more and more frustrated with attempting to get a new GPU for training neural nets. Currently the market for GPUs is barren as there are tariffs on imports and crypto currency is exploding again. Many people are attempting to enter into this market and are now buying up the brand new GPUs. Along with this, scalpers have taken advantage of the limited supply and are charging exhobinant amounts (3-4x MSRP). I even asked my mentor if they had an in on GPU’s and apparently Intel is having trouble acquiring them also.
Fifth Blog Term 2
This week my team and I have been working on consolidating many different aspects of the project together. This involves talking with our other team members to get the gui setup, along with using OpenVINO to make the neural net more effective. As we are nearing the end of the capstone project, it makes sense to start winding down on a lot of the features, and start working on making what we currently have done. So far, due to how the nature of creating neural nets is so tedious and also hard to determine, we don’t have a neural net that actually functions to specifications. So It is a good idea to really hammer that down before we move onto anything else.
On other news, we finally have a method of mass generating figures for a neural net to train on, which will help increase accuracy and speed on up figure generation immensely.
Fourth Blog Post Term 2
I’ve been lazing around quite a bit this week when it comes to the capstone project. It’s midterms week and I’ve been focusing on studying and finishing assignments for my other classes. That’s not to say that I have not been working on the capstone at all, it’s just that I have not made as much progress that I would have liked to have made. I’ve been working on a label creator that would take very few inputs of information, mutate them into several different images a NN would still recognize and then stitch them together. This would allow for a small batch of 26 gestures to be expanded into 100k+ data sets.
I’ve also finally got the last parts for my second capstone. Nothing too special, just a couple of leds and some resistors that will determine what state the camera is in, ae: still, moving, and not registering anything. With this, I can submit this second capstone and move forward and spend more time on other assignments.
Third Blog Post Term 2
This week I have mostly been working on my second capstone project. This is for my second major, as I need to have some aspect of ECE in my second project so I can have it count for both majors. In my second project, I am designing a convolutional neural network to locate the users hand and follow it. The camera taking the photos of the user will be mounted on a servo which are both hooked to a raspberry pi. This way the raspberry pi can communicate with the camera, asses the network and then move the servo with the camera attached to follow the users movement. So fat, as of this post, I have finished most of these tasks. However, a lot of the current setup has errors in it. The current CNN has a high misidentification rate if the background has changes, and will sometimes target the users head instead. The servo will also jump back and forth of the the object when it has reached center frame.
I have also managed to get the LSTM NN working properly. It will recognize a gesture only when it has followed the proper movement, and not when it is a still image. However, this also has identification. General movements taht follow the same arc as the gesture will also be identified as it.
Second Blog Post Term 2
This week, or this term, has been sort of frustrating. During the first two weeks of the term, I was both working at Tektronix and handling four classes. Luckily, my advisor managed to get a credit transfer so that I did not need to take a 100 level course this term. I barely had any free time and also had trouble trying to stay on top of things. This meant that I had very little time to work on the captsone project. I did manage to get some things done, however, like a generator that pulled information directly from the user hard drive to train instead of forcing the developer to keep everything in memory.
Currently I have been working on the recurrent aspect of the project. We need some method to keep track of temporal changes for a streamed video, so some of the NN needs to be dedicated to handling previous images from. This is where LSTM comes in. Long Short Term Memory is a layer in a NN which takes in a sequence of information instead of just one input. This alows it to take into account changes between the different frames. This is hard to implement, as we now need to generate a ton of data of video.
Finally, we’ve had a bit of trouble with one of our other teammates. He’s part of a different section, and has failed to respond to any communication for the last while. I hope that he comes around and starts helping again, as right now it’s worrying.
Introductory Blog
My current interests include several different activities. Drawing, which takes up most of my free time, is the one I spend the most time on. Russian language is something that I participate with every day to help me learn over time. Finally, I hike often enough for it to be considered a hobby, every couple of weekends.
I’ve been at OSU for six years now. At first I was only a CS major, but found during my internship at Garmin that I enjoyed a combination which included aspects of signals and systems. So I decided to take a second major in ECE for the different courses that pertained to that topic.
As part of this second major, I must complete a second small project. This second project involves using a raspberry pi to make a camera attached to a servo follow movement.
Ninth Blog Post
This week has been one of the lighter weeks throughout the term, as thanksgiving was during it. My workplace gave me two days off and I took another day off, so I had time to develop some more on the neural networks that our team has thought up.
significantly this week I created a convolutional neural network that handles some gesture recognition. Currently it only handles three separate gesture, however the architecture is all there for using different data sets for more categories. This will be a launching off point for our team as we can use OpenVINO to optimize it, one of the requirements that our client has set up for us. It can also be used as a backup in case we don’t get the video recognition working for the capstone .
Along with this, my teammate and I have been working on a white paper covering the various topics that make up the preproccessor side. Currently the paper is in it’s rough draft stage, so there are quite a grammatical errors within it. However it goes over all the fundamentals topics and ideas that we will be using or attempting to use when designing the preproccessor aspects of the project. The purpose of this white paper is so that when we finish our tasks on the project, or drop it because the deadline was hit, then the next person who picks the project up will have an easier time. Not having white papers to documents what aspects worked and which didn’t work can lead to future designers trying to the same tactics.