Unity ML-Agents — Part II: Getting Started

This blog post is the second in a series covering getting started with the ML-Agents Unity toolkit. I am following Code Monkey’s YouTube tutorial, and these posts roughly follow that video.

For steps on how to set up ML-Agents and resource links, check out my Part I post.

The general idea

Using ML-Agents in your Unity project, you will create an “Agent” that will follow this pattern:

  1. Observation
  2. Decision
  3. Action
  4. Reward

(Repeat)

The rewards can be either positive or negative (penalties) and can be weighted however you choose.

The ML-Agents GitHub repository includes example projects to help get you started: https://github.com/Unity-Technologies/ml-agents/

Following Code Monkey’s YouTube tutorial, we will create a game with an Agent object and a “goal”/”target” object. We can also add “enemy” objects that the agent should avoid. When the Agent triggers the “target” they will be positively rewarded, but when they trigger an “enemy” they will be penalized.

1. Create your “Target” game object

Open up your Unity project (set-up with the required packages, like we did in Part I). Create a game object to act as your “target” / “goal”. I added a sphere to my scene and named it “Microfilm”. Add both a “Rigidbody” component and a “Physics Box Collider” component to your Target game object, and in the collider settings, mark the object as a trigger.

2. Create your “Enemy” game object(s)

Now create a game object to act as your “enemy” / thing you want the Agent to avoid. I added a cube to my scene and named it “Shark”. Like we did with the object, add both a “Rigidbody” component and a “Physics Box Collider” component to your Enemy game object, and mark the object as a trigger.

3. Create your “Agent” game object

Now create a game object to be your Agent. For this I added another cube to my scene and named him “JimBand”. Like before, add both a “Rigidbody” component and a “Physics Box Collider” component to your Agent game object, and mark the object as a trigger. We’ll come back to our Agent Jim Band in a moment after we get some other stuff set up…

4. Make a script for your Agent

We need some code to tell Agent Jim Band what to do, so create a new C# script in your project. I’ll call my script “JimBandAgent.cs”.

Inherit from the Agent class

Instead of inheriting from the MonoBehavior class, we need our JimBandAgent class to inherit from the Agent class:

    public class JimBandAgent : Agent
    {
        // ...
    }

Include ML-Agents

Our JimBandAgent class also needs access to the ML-Agents toolkit, so at the top of your script include:

    using Unity.MLAgents;

To follow along with the Code Monkey’s tutorial, I also needed to directly include the Actuators and Sensors classes:

    using Unity.MLAgents.Actuators;
    using Unity.MLAgents.Sensors;

Override Agent methods

We’ll need to override some of the built-in Agent methods to be able to control Agent Jim Band exactly how we want. Include these the `OnActionReceived` and `CollectObservations` methods in your script, making sure to use the “override” modifier:

    public override void CollectObservations(VectorSensor sensor)
    {
        ;
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        ;
    }

We’ll come back to these in a bit.

5. Back to your Agent game object…

Link the script

Link your new Agent script to your Agent game object by adding a script component and selecting your script.

Set Behavior Parameters

Now we’ll set the Agent object’s Behavior Parameters. In the Agent object’s Properties menu, there should now be a “Behavior Parameters” section. Give the behavior a name, such as “GetMicrofilm”.

Under “Vector Action” or “Actions”, there are some options for how our actions will be represented. There are two space types to choose from: “Discrete” and “Continuous”. “Discrete” means integers and “Continuous” means floating point.

For either continuous or discrete, you can set the number of actions. For continuous actions, this field might be labeled “Continuous Actions” or “Space Size”, and for discrete actions it might be labeled “Discrete Branches” or “Branches Size”. This will be the number of available actions (and likewise the size of the Action Buffer array holding those actions).

For discrete actions, you can also set a “Branch Size” for each individual branch. This is the number of options for each branch (action). For example, you could choose to have 2 actions represented by 2 discrete branches, with one action of size 2 and the other of size 3. The first action/branch might represent “Accelerate” and “Brake”, and the second action/branch represents “Left”, “Right”, and “Forward” (example taken from Code Monkey tutorial).

For this demo, we’ll select Continuous Actions of size 2, representing the x and z axes.

Add a Decision Requester

Add a Decision Requester component to your Agent game object, which is listed under “Components” > “ML Agents”. This will request a decision on regular intervals, which will allow the Agent to then make actions.

At this point we can run a test training. Since I am going to include info on training in my next post, I am going to skip this for now, but check out the Code Monkey video tutorial for more info [17:21].

6. Observations (inputs)

Back to your Agent script…

In your Agent script, add a reference to the Target’s position and a reference to the Enemy’s position:

    public Transform microfilmTransform;
    public Transform sharkTransform;

Make sure to link the Target object and Enemy object to your script’s Transform reference variables. To do this, go into the Unity Editor, go to the script section of your Agent object, and link the correct objects to each variable.

Now add inputs for your Target and Agent to your `CollectObservations` method:

    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(transform.position);  // pass in agent's current position
        sensor.AddObservation(microfilmTransform.position); // pass in target's position
    }

Since our input will be 2 positions (that of JimBand and that of his Microfilm target), each represented by 3 values (x, y, z), we will have 6 input values to observe. Next we have to add these to our Agent’s Behavior Parameters.

And back to your Agent object…

In your Agent game object’s Behavior Parameters, add the correct “Space Size” for “Vector Observation”. We have 6 input values to observe, so our Vector Observation Space Size is 6. The “Stacked Vectors” parameter in this same section is how many observations you want an Agent to make before a decision — it allows your AI to have memory. Cool!

7. Actions

In the script

Now we’ll start overriding the `OnActionReceived` method. This is where you’ll add actions. Agent JimBand will be moving around searching for the Microfilm, so we’ll set a speed he can move at, and move him to his new position using the x and z coordinates from the Actions Buffer:

    public override void OnActionReceived(ActionBuffers actions)
    {
        float moveSpeed = 2f;
        float moveX = actions.ContinuousActions[0];
        float moveZ = actions.ContinuousActions[1];
        transform.position += new Vector3(moveX, 0, moveZ) * Time.deltaTime * moveSpeed;
    }

8. Rewards and penalties

Still in the script…

Let’s add a method to handle trigger events. Here’s what we want our trigger events to do:

  • If Agent triggers Microfilm, give positive reward and reset game.
  • If Agent triggers Shark, give penalty and reset game.

In our trigger handling method, we can handle rewards with built-in method `AddReward`, which increments a reward, or built-in method `SetReward`, which sets a specific reward. Here’s what my method to handle trigger events looks like:

    private void OnTriggerEnter(Collider other)
    {
        if (other.TryGetComponent(out Microfilm microfilm))
        {
            SetReward(1f);
            EndEpisode():
        }
        if (other.TryGetComponent(out Shark shark))
        {
            SetReward(-1f);
            EndEpisode();
        }
    }

We need to also override the `OnEpisodeBegin` method to reset the state of the game and move Jim back his starting position:

    public override void OnEpisodeBegin()
    {
        transform.position = Vector3.zero;
    }

9. Test it out

In order to test out our code before training our Agent, we can override the method `Heuristic`. This will allow us to control the actions passed to the `OnActionReceived` method. In your Agent script include:

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        ActionSegment continuousActions = actionsOut.ContinuousActions;
        continuousActions[0] = Input.GetAxisRaw("Horizontal");
        continuousActions[1] = Input.GetAxisRaw("Vertical");
    }

To use your Heuristic override method for testing, in your Agent object’s Behavior Parameters, set “Behavior Type” to “Heuristic Only” (“Default” will also work if there is no ML model in use).

Now when you run your game, your input (up/down/left/right) will control Agent JimBand. Test if your inputs and triggers are working correctly. When you drive Jim into either the Shark or the Microfilm, he should be reset to his starting position. Use `Debug.Log` to print any debugging output you need to the terminal

In the next segment I’ll go over training our ML Agent, so he can drive himself into the Shark and Microfilm, and hopefully into the Microfilm more than into the Shark, all on his own. 🙂

    |\____/|
    | @  @ |
>-oo| '''' |oo-<
    |______|
    |_/  \_|
    ^^    ^^

10-13-21

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *