In working on the cloud-based algorithmic trading application. My focus as been on the data acquisition and data formatting services. Ultimately, this has little to do with investing / finances, and more so with handling large amounts of data. I have integrated 4 technologies into my services, two for acquiring and two for transforming the data.
Technologies
In gathering stock market data I have used two data sources, Yahoo Finance and Alpaca. Of the two, Alpaca has more reliable data when it comes to recent data, making it my go to data service. Both services are easy to use and integrate into any system requiring stock market data. The difficulties in using these technologies is more so in combining them. In certain situations, Alpaca is not as reliable. Therefor, I must combine the data I obtain from both data sources to create a more reliable output for the stock market data I need. Yahoo Finance has been my least favorite technology thus far. If I could change Yahoo Finance, I hope to be able to structure the data better once obtaining it, as it is a blob of data and does not have any typing.
In looking into different data sources, I have used the Polygon API, Interactive Broker’s API, and EOHD’s API. All of these are easy to implement as well, but the difficulties mainly come in the design process. These data sources do not offer free data collection for large amounts of data. Most of them required payments upon requesting data under 100 times in 1 month. So in designing my program, I thought about creating a database to reuse collected data, but that would be much more costly for the overall project.
Moving past data collection technologies, I have had to cleanse, format, and interact with data using libraries focused on handling data. Pandas and NumPy were my chosen technologies. By far NumPy was my favorite library as I am a more math orientated person and enjoy performing math operations. I hope to use NumPy in my future projects that are more related to rigorous calculations. Pandas was easy to use as well and made formatting my data easy as I needed less than 10 lines of code to format my data in any way I want.
Overall, I researched numerous libraries and technologies I can use in my project. I am happy with my choices and would use the same libraries again as they are easy and efficient.