Marathon Envs by Unity-Technologies - 36

Tools

A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.

Unknown VersionApache License 2.0Updated 225 days agoCreated on August 23rd, 2018
Go to source

Marathon Environments

A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.

MarathonEnvs

MarathonEnvs enables the reproduction of these benchmarks within Unity ml-agents using Unity’s native physics simulator, PhysX. MarathonEnvs maybe useful for:

  • Video Game researchers interested in apply bleeding edge robotics research into the domain of locomotion and AI for video games.
  • Traditional academic researchers looking to leverage the strengths of Unity and ML-Agents along with the body of existing research and benchmarks provided by projects such as the DeepMind Control Suite, or OpenAI Mujoco environments.

Note: This project is the result of a contribution from Joe Booth (@Sohojo), a member of the Unity community who currently maintains the repository. As such, the contents of this repository are not officially supported by Unity Technologies.


Getting Started

* * * New Turotial: Getting Started With MarathonEnvs * * *

The tutorial covers:

  • How to setup your Development Environment (Unity, MarthonEnvs + ML-Agents + TensorflowSharp)
  • How to run each agent with their pre-trained models.
  • How to retrain the hopper agent and follow training in Tensorboard.
  • How to modify the hopper reward function and train it to jump.
  • See tutorial here

Requirements

  • Unity 2018.3 (Download here).

Project Organization

  • Marathon Enviroments Specific folders & files:
    • MLAgentsSDK\Assets\MarathonEnvs - The core project directory
    • config\marathon_envs_config.yaml - Config file for use when training with ml-agents
    • README.md - Read me for marathon-envs
  • Marathon Enviroments now incluses ML-Agents Toolkit v0.6 (Learn more here). All other files and folders are for ML-Agents. We do not include the ML-Agents example and documentation to keep the repro size down.

Publications & Usage

An early version of this work was presented March 19th, 2018 at the AI Summit - Game Developer Conference 2018 - http://schedule.gdconf.com/session/beyond-bots-making-machine-learning-accessible-and-useful/856147

Active Research using ML-Agents + MarathonEnvs


Support and Contributing

Support: Post an issue if you are having problems or need help getting a xml working.

Contributing: Ml-Agents 0.6 supports the Gym interface. It would be of value to the community to reproduce more benchmarcks and create a set of sample code for various algorthems. This would be a great way for someone looking to gain some experiance with Re-enforcement Learing. I would gladdly support and / or partner. Please post an issue if you are interesgted. Here are some ideas:


Included Environments

Humanoid

DeepMindHumanoid
DeepMindHumanoid
  • Set-up: Complex (DeepMind) Humanoid agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -joints at limit penality
      • -effort penality (ignors hip_y and knee)
      • +velocity
      • -height penality if below 1.2m
    • Inspired by Deliberate Practice (currently, only does legs)
      • +facing upright bonus for shoulders, waist, pelvis
      • +facing target bonus for shoulders, waist, pelvis
      • -non straight thigh penality
      • +leg phase bonus (for height of knees)
      • +0.01 times body direction alignment with goal direction.
      • -0.01 times head velocity difference from body velocity.
  • Agent Terminate Function:
    • TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 88 variables
    • Vector Action space: (Continuous) Size of 21 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Hopper

DeepMindHopper
DeepMindHopper
  • Set-up: DeepMind Hopper agents.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -effort penality
      • +velocity
      • +uprightBonus
      • -height penality if below .65m OpenAI, 1.1m DeepMind
  • Agent Terminate Function:
    • DeepMindHopper: TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
    • OpenAIHopper
      • TerminateOnNonFootHitTerrain
      • Terminate if height < .3m
      • Terminate if head tilt > 0.4
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 31 variables
    • Vector Action space: (Continuous) 4 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Walker

DeepMindWalker
DeepMindWalker
  • Set-up: DeepMind Walker agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -effort penality
      • +velocity
      • +uprightBonus
      • -height penality if below .65m OpenAI, 1.1m DeepMind
  • Agent Terminate Function:
    • TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 41 variables
    • Vector Action space: (Continuous) Size of 6 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Ant

OpenAIAnt
OpenAIAnt
  • Set-up: OpenAI and Ant agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -joints at limit penality
      • -effort penality
      • +velocity
  • Agent Terminate Function:
    • Terminate if head body > 0.2
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 53 variables
    • Vector Action space: (Continuous) Size of 8 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Details

Key Files / Folders

  • MarathonEnvs - parent folder
    • Scripts/MarathonAgent.cs - Base Agent class for Marathon implementations
    • Scripts/MarathonSpawner.cs - Class for creating a Unity game object from a xml file
    • Scripts/MarathonJoint.cs - Model for mapping MuJoCo joints to Unity
    • Scripts/MarathonSensor.cs - Model for mapping MuJoCo sensors to Unity
    • Scripts/MarathonHelper.cs - Helper functions for MarathonSpawner.cs
    • Scripts/HandleOverlap.cs - helper script to for detecting overlapping Marathon elements.
    • Scripts/ProceduralCapsule.cs - Creates a Unity capsule which matches MuJoCo capsule
    • Scripts/SendOnCollisionTrigger.cs - class for sending collisions to MarathonAgent.cs
    • Scripts/SensorBehavior.cs - behavior class for sensors
    • Scripts/SmoothFollow.cs - camera script
    • Enviroments - sample enviroments
      • DeepMindReferenceXml - xml model files used in DeepMind research source
      • DeepMindHopper - Folder for reproducing DeepMindHopper
      • OpenAIAnt - Folder for reproducing OpenAIAnt
      • etc
  • config
    • marathon_envs_config.yaml - trainer-config file. The hyperparameters used when training from python.

Tuning params / Magic numbers

  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.Force2D = set to True when implementing a 2d model (hopper, walker)

  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.DefaultDesity:

    • 1000 = default (= same as MuJoCo)
    • Note: maybe overriden within a .xml script
  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.MotorScale = Magic number for tuning (scaler applied to all motors)

    • 1 = default ()
    • 1.5 used by DeepMindHopper, DeepMindWalker
  • xxNamexx\Prefab\xxNamexx -> xxAgentScript.MaxStep / DecisionFrequency:

    • 5000,5: OpenAIAnt, DeepMindHumanoid
    • 4000,4: DeepMindHopper, DeepMindWalker
    • Note: all params taken from OpenAI.Gym

Important:

  • This is not a complete implementation of MuJoCo; it is focused on doing just enough to get the locomotion enviroments working in Unity. See Scripts/MarathonSpawner.cs for which MuJoCo commands and ignored or partially implemented.
  • PhysX makes many tradeoffs in terms of accuracy when compared with Mujoco. It may not be the best choice for your research project.
  • Marathon environments are running at 300-500 physics simulations per second. This is significantly higher that Unity’s defaults setting of 50 physics simulations per second.
  • Currently, Marathon does not properly simulate how MuJoCo handles joint observations - as such, it maybe difficult to do transfer learning (from simulation to real world robots)

References:

Show all projects by Unity-Technologies