Unity ML-Agents — Part III: Training

This blog post is the third in a series covering getting started with the ML-Agents Unity toolkit. I am following Code Monkey’s YouTube tutorial, and these posts roughly follow that video. The ML-Agents GitHub repository also includes example projects and code to help you get started: https://github.com/Unity-Technologies/ml-agents/

For steps on how to set up ML-Agents and resource links, check out my Part I post. For steps on getting started with ML-Agents and setting up your project, check out my Part II post.

1. Simple test

Starting where we left off last time, we can now start to train our AI Agent. Let’s start by adding a Debug line in our Agent script to print out actions as they occur:

    Debug.Log(action.DiscreteActions[0]);       // for discrete (int) or
    Debug.Log(action.ContinuousActions[0]);     // for continuous (float)

In the Unity editor, make sure your Agent’s Behavior Type is set to “Default”.

To start training, go to your terminal (from inside your virtual environment), and use the command mlagents-learn. You can specify a new run ID, or you can use the --resume flag to resume an old ID. Or, you can use the --force flag to overwrite previous data:

    $ mlagents-learn --run-id=Test1 --force

The terminal should now direct you to go back to the Unity Editor and hit “Play” to run your game. Once you do, you should see your Agent start jiggling around the screen. Learning!

2. Contain your bot

At this point, I realized why the examples include Wall objects (instead of just an Enemy object like I had in my Part II post). If you don’t contain your Agent within a specified area, they can aimlessly jiggle around forever. I made sure to add some wall elements to my scene before I continued training. 🙂

3. Prevent possum bot

What if our Agent learns to avoid Walls and Enemies, but becomes overly risk-averse in the process and decides to never move anywhere at all?

To force our Agent into some type of action, and to ensure each episode ends and doesn’t run forever, we can set the “Max Step” in our Agent’s properties. In your Agent’s Properties, under “Script”, set the “Max Step” to 1000.

4. Better smarter faster (Kage Bunshin no Jutsu)

How can we make our Agent learn more, faster?

A simple way to speed up our Agent’s learning is to create more Agents. It’ll be just like in Naruto, when Naruto is able to quickly level up and gain new skills by creating a ton of shadow clones of himself that can all train simultaneously.

Make clones

First, make your entire environment into a prefab. Create a new empty object and name it something like “Environment”. Select all the objects involved in your scene — I have my Agent JimBand, my Target Microfilm, my Enemy Shark, 4 Walls, and a Ground — and move them into your Environment. Now create a prefab from this Environment object. Now that you have an Environment prefab, plop a bunch of them into your scene, so that they can all train simultaneously.

Update your script!

Important: Now that you have a bunch of duplicate environments, you will need to update your script to use local positions instead of global positions. Replace calls to transform.position with transform.localPosition, for example:

    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(transform.localPosition);
        sensor.AddObservation(microfilmTransform.localPosition);
    }

You’ll probably also need to adjust your camera position if you want to see all your Agents training at once.

Get smart

Run the training for a while until your Agents get smart. It’s helpful and fun to include a visual for when each Agent “wins” or “loses”. I changed the background color for each. Here’s what that looked like in training (Agent is purple, Target is pink, Enemy is blue; note that the background color represents the result of the previous attempt):

5. How to use your brain

Once your Agents are good and smart, go ahead and stop the game, and your neural network model (the “brain”) will be saved to a .onnx file. The terminal output will let you know where to find this file and what its name is:

    [INFO] Exported results/TestParameters/GetMicrofilm/GetMicrofilm-240818.onnx

To be able to use this neural network model, copy/move the .onnx file into your project’s Assets directory. I created a new folder called “NNModels”.

In the Unity Editor, temporarily disable all your copy Environments. You can disable an object by un-checking the box next to its name in its properties, or you can use the keyboard shortcut Alt-Shift-A, which will allow you to easily disable (or activate) multiple items at once.

In your original Environment, select your original Agent, and in its properties, assign your new neural network brain to it as its “Model”.

You can leave “Behavior Type” as “Default”, or you can explicitly set it to “Inference only”.

From here, you can simply hit Play/Run and watch your Agent use its new brain to solve its mission.

6. Environment Parameters

The hyperparameters for training your Agent are specified in a configuration file that you can pass to the mlagents-learn program. To have finer control over your training, create a folder in your Assets folder named “config”, and create a new .yaml file inside it. The name of this file will need to match the name of your Agent’s “Behavior Name”.

The ML-Agents GitHub repository includes an example config .yaml you can use: https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Create-New.md#training-the-environment. Here’s what that example .yaml config looks like:

behaviors:
  RollerBall:
    trainer_type: ppo
    hyperparameters:
      batch_size: 10
      buffer_size: 100
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
      beta_schedule: constant
      epsilon_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 500000
    time_horizon: 64
    summary_freq: 10000

The “beta_schedule” and “epsilon_schedule” parameters in this example gave me errors, so I removed those two lines from my config file for now.

Now that you have this config file set up, you can call mlagents-learn like this:

    $ mlagents-learn  ./Assets/config/GetMicrofilm.yaml --run-id=TestParameters

One of my current to-do items is to get familiar with all of these environment parameters, what they do and how to smartly twiddle them.

7. Get more smart with more random

Our AI Agent JimBand is now trained and can complete his given mission. But, he’s not very smart. If we move the Microfilm or his enemy the Shark and send him off with his current NN brain, poor JimBand will most likely do very poorly. This is because he has only learned to find the target and avoid obstacles as they are in their current, static positions. 🙁

To help our Agent out, we can introduce randomness into our training. Let’s start the Agent and his Target at a new random position each time a new episode begins.

I also had fun experimenting with moving my Shark object around randomly, which added an extra level of challenge for JimBand. I ended up having to set a rule to keep the Shark and the Microfilm a sufficient distance apart, to avoid creating a scenario that was unsolvable. Even with all three characters’ positions randomized, the Agent was able to learn to find the Target while avoiding the Enemy, which I found pretty impressive.

Here’s my training in action, after adding randomness (Agent is purple, Target is pink, Enemy is blue, background color represents result of previous episode):

8. Observe your training progress

In order to observe the training process in detail, you can use TensorBoard, which will graph progress made as your train your AI. From within your virtual environment, while your training running, run this command from a new terminal to view TensorBoard:

    $ tensorboard --logdir results

Then navigate to localhost:6006 in your browser to view your training stats. In the TensorBoard graphs, you should see Reward increasing (as your AI gets the goal) and Episode length decreasing (as your AI gets the goal faster).

Have fun making some smart bots!

       __==`==__
     {|  o L o  |}
     ,|  '''''  |,
   /'.|=========|.'\
  / / |.. ___ ..| \ \
 (/\) |  |   |  | (/\)
      |___\  |___\

10-21-21

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *