Matched and Orientated
The diversity in capstone project proposal could run a gamut from web services to artificial intelligence and machine learning (AI/ML). I was fortunate to land one of my top choices for prospective capstone project, and that is Data Mining Using a Web Crawler. I particularly like this project because it dabbles in automation and AI/ML, both of which are of great interest to me. We started the project with meeting with the sponsors to discern the outcomes expected out of the software we are developed and having it aligned with their specific needs. Thus far, I feel that this part of the project has been the most educational for me as often the projects we encounter in the class setting does not expose me to the same intricacies. In the class setting, the goals and expected outcomes of the exercises are specific and often niche in terms of its scope whereas in the capstone project working with a sponsor has been a good litmus test of the culmination of my integrated education through the program. Determining project needs and identifying the specific needs is a skill we often do not exercise in the classroom setting, but it was a necessary aspect for the progress of our project.
The project itself is interesting where the primary goal is to scrape data from websites of granting institutions and mining the data to generate insights that will inform the grants matching process between grantors and applicants. One of the things that is interesting is how a process I described in one sentence could involve so many moving parts to implement. The materials I have learned from courses such as web development, data bases, and algorithms are all coming together towards completing a software dedicated to a specific real-world need.
Explorations and Roadmap
Explorations can take many shapes and form. For our project, it took the form of spiking and scrutinizing the technology stack that are available for our project. For example, the scraping aspect of the project alone include technologies such as scrapy (which works best for static content), selenium (which takes advantage of the web driver to extract the content of a web page), and data streaming/parsing approaches which are advantageous for needs that deal with heavy data streams. Having the time to fully explore the advantages and disadvantages of each technology and deciding which matches the needs of the project is a great learning tool in itself. I felt like I was kid in a toy store trying to decide what will be the best one to buy after fully playing with it. The time we had dedicated in exploring the technologies available gave me both an appreciation for the process and a realization that it is an essential skill that we need to develop to be successful in our field.
I am writing this blog update after we had out meeting with our sponsors this week where we discussed and settled on specific technologies that will be used for the project. Weighing the pros and cons while paying attention to details such as the longevity of the program enabled informed decision making that both teams appreciate. Now that we have a handle on the tools we are using and the objectives clearly defined, the roadmap to completion is now at our hands to get us to our destination of a product that we can be proud of. Thus, we are primed and ready to build the product.