Ball Balancer With Deep Q Network Unity by rkoramtin - 3


In this project we train a Ball with Q-learning To stay on the platform.

Unknown VersionUnknown LicenseUpdated 272 days agoCreated on August 16th, 2020
Go to source


In this project we train a Ball with Q-learning To stay on the platform.

Reinforcement learning:

Alt Text
In Reinforcement learning agents learn to perform actions in an environment in order to to maximize a reward. The key difference between reinforcement learning from supervised or unsupervised learning is presence of two things:
ο‚—An environment
ο‚—An agent


Q-learning is a reinforcement learning algorithm that seeks to find the best action to take given the current state. Q-Learning is based on a Q-function.
Alt Text
Which means that the maximum return from state β€œs” and action β€œa” is the sum of the immediate reward r and the maximum reward from the next state " s’ " .

Deep Q-Learning:

Deep Q-learning makes use of neural networks and The Deep Q-Network algorithm was developed by DeepMind in 2015. It actually enhance Q-Learning which is a classic Reinforcement learning algorithm, with deep neural networks and a technique called experience replay.

Experience Replay:

At each time step of data collection, the transitions are added to a circular buffer called the replay buffer. Then during training, instead of using just the latest transition to compute the loss and its gradient, we compute them using a mini-batch of transitions sampled from the replay buffer. This is called Experience Replay which makes the network updates more stable and has the following benefits:
ο‚— A better data efficiency by by make use of each transition in many updates.
ο‚— A better stability using uncorrelated transitions in a batch.

Our Network:

Alt Text
For input we use Platform X Rotation, Ball Z Position, and Ball’s X Velocity.
The outputs are Quality Values of how quality to the left and the right of the platform is.

Training Process:

Alt Text

Show all projects by rkoramtin