Well, we’re five weeks into the quarter now, and I can safely say that working on this capstone project has been both challenging and rewarding so far. The project I was assigned to is a SaaS Application for Fire Department 911 Risk Analysis. Essentially what we’re doing is creating a tool that can analyze EMS response times within a jurisdiction, and provide a visualization of those response times for departments to make informed decisions about their resource allocation. In order to do this, we need to store information in a database that can be used to trace pathways and calculate response times from a fire station to a location within its jurisdiction. The application itself is really quite brilliant, and I’m excited to be working on it! However, there have been a few road blocks when it comes to actually storing the necessary data.
The problem we’ve been facing is that, in order to accurately trace routes throughout a jurisdiction, a large amount of data needs to be stored… a truly massive amount of data, actually. In addition to this, with the way the software processes the data, all of the data points need to be stored within a single entity, and this data needs to be stored in a NoSQL database. The thing is, almost all NoSQL databases have a document size limit, including our database of choice, MongoDB. We were working with a 16MB document size limit, and the results we were calculating were consistently ten times that size.
Enter GridFS!
Simply put, GridFS is a MongoDB specification for storing and retrieving documents that exceed 16MB. Instead of storing a file in a single document, GridFS divides the file into 255kB chunks, and stores each chunk as a separate document. After a little research, I was pleasantly surprised at how easy implementing GridFS into our current code base would be. There are some tweaks that need to happen to method prototypes that will be a bit involved to implement, but overall, GridFS really deals with the hard part for us!
While the technology itself is a huge boon to our group’s progress, the thing about GridFS that truly impressed me was the documentation. As Computer Scientists, we’ve all come across a lot of dense, jargon-y documentation. It’s important that we know how to utilize this documentation, but sometimes we can spend more time learning how to learn from documentation than we do actually learning the technology its documenting. This is not the case with GridFS. The documentation was extremely concise, to the point, and easy to understand. This was a huge relief, has I had very little experience with MongoDB coming into this project, and I now feel that I have the tools to complete this implementation quickly and accurately.
I have very few criticisms of GridFS. If I had to list one, it would be the fact that it processes data very different from a normal MongoDB insert operation. While this is to be expected – it is inserting data in a very different way – it does make migrating to GridFS slightly complicated. Instead of referencing a document by an ID number, you have to reference a GridFS Bucket by a Name string, meaning you have to establish a unique name for each document manually. It would be nice if there was a way to automatically generate a unique name for a bucket, and be able to access and store that name when a document is created. It also means that, when accessing information from the database, you need to do it in a very different way from a standard MongoDB document, requiring fairly extensive changes to functions that would access and process this data. In a truly modular application, this should be fairly easy to deal with, but I have yet to experience the true scope of what this change will require.
Overall, my use of GridFS has been incredibly enjoyable so far! I look forward to finishing this implementation and truly experiencing the scope of what this technology can accomplish.
Leave a Reply