Last Blog Post – My Oregon State Technology Blog

The past term has been a pretty hectic one when it comes to project management. Although the capstone has basically been finished in terms of the neural network aspects, the GUI needed to be finished, which was quite a bit of work. To go along with this, I have been working on a nother project which is, I would say, much more complex than the capstone project. It involves using a YOLO algorithm to check an image and it’s depth based counterpart given a stereoscopic camera to find the corners of a box and measure depth. The most difficult aspects of that project have been attempting to calibrate the cameras and finding the pixel perfect corners of a box. I have already attempted many different methods, but every one runs into some problem. Originally the idea was to just use a YOLO and use the network to solve for depth on a monoscopic camera. However, when it comes to gathering data, it would take an unreasonable amount of time to attempt to generalize position and box attributes, as I cannot use mutation.

This is something that is apparently not discussed a lot, but transformations of a 2d image warp the object in 3d space, atleast accoring to a pinhole schematic. This is due to the oblique nature of how a 3d space is transformed into a 2d image. This is normally fine for most networks if you are attempting to simply draw dots on corners or to detect an object, however when it comes to measuring depth, everything must be perfect. the warping normally occurs when there is parts of an object being obscured, as a shift along the 2d image should bring sections of the object into view that were not there before. This and, there is no good guide out there for solving for depth when it comes to mass transformations of shifting and rotation, based on a 2d mutation to 3d space. I drew a ton of diagrams to try to solve this problem and decided it was just easier to get a stereoscopic camera.

The next idea was to use Sobel edge detection along with the depth map from a calibrated stereo image to draw objects based on their adjacency. This works some of the time, however depth maps have a lot of artefacting that can occur next to an object edge, from the fact that one camera can see a section of an object, wheras that section is gone in the other camera. Not only this but it relies on an environment where the colors are very distinct from one another, otherwise there will be open edges where an adjacency check algorithm will fill out the whole image instead of sections of the box. Where I am at right now is just going to throw the image and depth map to the network and hope it gets a good corner, by training on a very limited set of angles as opposed to all possible options. Then I can just force my client to place the box in a specific way. Then epipolar geometry and convolution matching can be used to find the other corner on the other image. It’s simple from there, as simple geometry is all that’s needed to calculate depth to the corner of the box. Then with depth, the dimensions of the box can be solved from the intrinsic angles of the camera and the relative angles of the corners themselves using the law of cosines.

Leave a Reply Cancel reply