Post 4: Machine Learning

This week we’re talking about the continuation of the ML Agents Breakout project. As we have hit the midway point of the project (week 5), we are starting to really get into testing. On the one hand, there is a ton of data that can be gathered in Machine Learning. On the other hand, “good” testing takes time.

In our case, we have adjusted hyper parameters for max step size, gamma and adjusted our paddle collider shape to train different neural network brains. I’ll show some graphs for context:

The above may be hard to see without zooming in, but these are different test runs with an overlay of the command window showing the test results live. The light blue and pink runs were to see how adjusting the paddle configuration to a triangle shape rather would impact rewards. You can see that they both averaged out to around 0.5 mean reward. The gold is using a more traditional paddle shape of a pentagon, like so:

The green lines above are what was used as the paddle for the agent trained in the gold graph. We have really been struggling with how to account for a vertical ball bounce issue, meaning the ball stays near a side wall and the agent learns to keep the paddle sandwiched there so it can maximize rewards. I think this paddle shape should be the best of 2 worlds. First, it will make it much harder for a vertical bounce, and second, it will allow a significant area of the paddle to have a flat redirection. This also allows for similar behavior to the original game, where hitting near the edge of the paddle gives it more of an angled deflection.

So the ball and the agent rewards are in, but having the paddle hit the ball is to just not lose Breakout. If you want to win Breakout, you need to break all the bricks. For that, there needs to be a rewards system added in for an agent hitting a brick, and it will need to be substantial. But how does the agent know what bricks are remaining? As a human, it is easy to see what is left and make an informed decision. As an agent, maybe not so much. That’s it for this week’s blog.

Happy Coding!

Comments

Leave a Reply Cancel reply