Beginning AI

image generated with Stable Diffusion

Intro

This blog will focus on AI topics, especially ones related to reinforcement learning
(“RL”) as that is the focus of my current project. The project I’m working on with 2 other teammates is a Bitcoin trading Bot. We will use RL to train our algorithm to trade Bitcoin.

How does RL fit within the world of machine learning (“ML”). There are 3 main branches of machine learning and numerous sub-areas. The 3 main branches are, 1) Supervised Learning 2) Unsupervised Learning 3) Reinforcement Learning.

Supervised Learning

Supervised Learning is where you have some authority or supervisor who teaches the algorithm. In practice what this often means is that you have a data set that has been labelled with the answers as opposed to a program or person directly supervising the ML algorithm. For example you may have 100,000 photos of cats and dogs, each photo is labelled as being either a photo of a cat or a dog. Those photos and the label are fed into the machine learning algorithm and it learns to distinguish the different features of cats and dogs. After training you would show the algorithm a picture of a cat or dog without the associated label and see if it could classify it correctly.

Unsupervised Learning

Unsupervised Learning is where you are dealing with unlabeled data and allowing the ML algorithm to try and find the structure hidden within that data. For example you might have a data set of handwritten digits that you feed to the ML algorithm. Based on the differences in the digits it will group similar pictures (digits) together so that when it is done it should have 10 groups (0-9) with all the 0’s in a group all the 1’s in a group and so on.

Reinforcement Learning

Reinforcement Learning’s goal is to maximize its reward by determining the optimal policy in a given situation. These type of algorithms are often applied to games or problems that can be specified as a type of game with a reward for doing well. For example if your RL algorithm is learning tic-tac-toe it will learn that when it has 2 x’s in a row, putting a third x will give it a win and thus a positive reward. The reward it gets from winning gets added to all the moves that it made to get to 3 x’s in a row so that the entire sequence benefits from the reward and makes that same sequence more likely in the future as the algorithm updates its policy.  A RL algorithm makes a choice what to do at each step based on its estimate of the value of a particular action. For example does it think it more likely to win if it puts an x in the centre square or the corner square. As more games are played it uses the data from the games (rewards from wins and losses) to updated its expected value of each move. This then impacts the algorithms policy (i.e. what action it should take in a given situation). A RL algorithm will not always pick the best option in a given situation, if it does that it will fail to learn, you need to balance exploration vs exploitation. Sometimes you take what appears to be a less desirable action in order to learn its true value. This is one of the central issues of RL, how much do you explore (try new things) vs exploit (pick what you think is the best move). Playing around with the learning rate is an important part of fine tuning your RL  algorithm.

Hopefully that gave you a good background on the 3 major ML methodologies. We will explore RL further in subsequent posts.