Large Data Analysis

I’ve been tasked with analyzing our large dataset of call times from the Travel Time project to determine the distribution of features within our model. This analysis will help us filter out outliers and provide context to the machine learning models. An example would be to apply a long tail distribution to our destination routes to see which routes have a low probability of occurrence as well as differences in travel time and route length to the same destination under different conditions(apparatus type, time of day, urgency, etc.). After playing around with the data in excel and creating visualizations in the form of graphs, it became much easier to figure out what our upper and lower bounds could be within our machine learning features. We were initially discarding any data with route lengths of less than 0.3 miles but realized that 5% of the routes are under 0.3 miles, so we moved the lower bound to 0.1 miles. This figure seemed valid after cross referencing route lengths within the city of Boulder Colorado to the fire stations, as the city is very small, and the two largest fire stations are right next to the major highway. This also helped in validating that 49% of the travel speeds are under 25mph, considering the short distances travelled these figures made sense.

We’ll be comparing our figures with OSM. Open Street Maps (OSM) is a geographic database that is open and free and commonly used to make electronic maps and turn-by-turn navigation. The database is updated and maintained by volunteers that collect data from surveys, aerial imagery, and other public domain geodata sources. Unfortunately, I had difficulty finding resources on how to utilize OSM with C# and within our project framework.

Capstone Blog

Comments

Leave a Reply Cancel reply