The Tech Stack

Why did you and your team choose the technologies you did?

There are plenty of technologies to choose from for machine learning and artificial intelligence. With the explosion of AI, specifically OpenAIs ChatGPT, it seems like there is a new technology every day that can enhance and optimize our lives in large and small amounts alike. Taking what is complex and turning it into a simple interface makes work that would take hours and days turn into minutes and even seconds. So where do you go to find great tech for the specific task of creating a crypto trading bot trained with multi-objective reinforcement learning? Well it seems that there are a few options at this point and from what we gathered we just have to select a few and blend them together to make essentially our own tech stack for this project. We start with the basics of python given that it is the go to language for problems like this and with the popularity of Jupyter it felt like we would be able to train and test models most efficiently using that power duo. We also need to consider using Pandas, Scikit-learn, and NumPy, popular libraries for Python in their own right, for this project so we could accurately define a data set and structure it in a meaningful way. Preprocessing is a large part of the project so we are working with quality data and that can’t go overlooked, otherwise our results may not be accurate at all. We found that the ease of use and accessibility of yfinance would work well for our purposes and has done just that so far.

Reinforcement Learning. Figure 1.

Now once we gather the data and clean it what is next? Training, of course. For that we opted for sponsor and instructor guidance of TensorFlow and even OpenAI’s Gym to use as the environment. With so much buzz around the company and their products it seems like a great chance to learn more and get involved at such an early stage.Together we found that these technologies can all come together to build an amazing product and we are excited to present our findings!

How will your project use them?

When we build a machine learning model, our main goal is to create a model that can make accurate predictions on new, unseen data. To achieve this, we need to ensure that our model is not only good at fitting the data it has seen but also generalizes well to new data. The dataset we have is usually divided into two parts: the training set and the testing set.

Figure 2.

Training set: This is the part of the dataset that we use to “teach” or “train” our machine learning model. During the training process, the model learns patterns and relationships between the input features (e.g., characteristics of a flower) and the output (e.g., the flower’s species). Another example would be input of student study hours and output would be their grades. The model adjusts its internal parameters to minimize the error between its predictions and the actual output values in the training set.

Testing set: This is the part of the dataset that we use to evaluate the performance of our trained model. We do not use the testing set during the training process, so it represents new, unseen data for the model. By comparing the model’s predictions on the testing set with the actual output values, we can estimate how well the model will perform on real-world data that it has never seen before.

So using each of the technologies discussed previously are shown in Figure 2. Python is the base language across the entire project with the modeling taking place in Jupyter notebooks to aid in efficient memory usage and only training data once vs running large data sets over and over again each time we make an adjustment to the algorithm. The reason we use NumPy arrays instead of built-in Python lists is that NumPy arrays are more efficient for numerical computations, which are at the core of machine learning algorithms. NumPy provides a wide range of built-in functions for performing mathematical operations on arrays. These functions are highly optimized and easy to use, making it convenient to work with arrays. Scikit-learn is a great library for ML related tools like when we create a LinearRegression model object, we’re instantiating an object from the LinearRegression class provided by Scikit-learn. This class has several methods, including the fit() method, which is responsible for training the model using the provided input features and target values. OpenAI’s Gym makes setting up models super easy, like 5 lines of code easy. It really is the only exposure to training models I have to this point but I can say I can’t see it getting much better than this. Here is an example snippet:

import gym
from my_custom_environment import MyCustomEnv
from mo_dqn import MODQN

# Create the custom environment
env = MyCustomEnv()

# Initialize the multi-objective DQN agent
agent = MODQN(env)

So as you can see, we can quickly build off existing MORL agents and just get to testing.

What are their pros and cons?

Well initially the cons are that there are paid versions of what we are trying to do so depending on your financial situation you may be able to use for example quantconnect and build this out in a matter of minutes, including connecting to a brokerage of your choice and executing the trades. I guess that could be considered a pro as well. I think that for the enjoyment of learning though I will stick with the con side of view because I want to learn how to do it vs just using someone else’s product. I think that the pros of using all this tech comes with the failures and iterations that will surely have to be made because it just means that we are learning and building a tool suite of our own for future projects. It is one thing to say I made money with a bitcoin bot but it is another to say that you developed the bot itself. Even better would be to say that you in some manner sold the bot, whether through subscription or what ever else. At the end of the day the guys that made the most in the gold rush were the ones selling shovels and pans, not the gold miners themselves so I think that is the overall pro of using the tools we selected and the learning that comes along with them.

Want to See more? Check out my latest post here!