In this project we train a Ball with Q-learning To stay on the platform.
In Reinforcement learning agents learn to perform actions in an environment in order to to maximize a reward. The key difference between reinforcement learning from supervised or unsupervised learning is presence of two things:
Q-learning is a reinforcement learning algorithm that seeks to find the best action to take given the current state.
Q-Learning is based on a Q-function.
Which means that the maximum return from state “s” and action “a” is the sum of the immediate reward r and the maximum reward from the next state " s’ " .
Deep Q-learning makes use of neural networks and The Deep Q-Network algorithm was developed by DeepMind in 2015. It actually enhance Q-Learning which is a classic Reinforcement learning algorithm, with deep neural networks and a technique called experience replay.
At each time step of data collection, the transitions are added to a circular buffer called the replay buffer. Then during training, instead of using just the latest transition to compute the loss and its gradient, we compute them using a mini-batch of transitions sampled from the replay buffer.
This is called Experience Replay which makes the network updates more stable and has the following benefits:
A better data efficiency by by make use of each transition in many updates.
A better stability using uncorrelated transitions in a batch.
For input we use Platform X Rotation, Ball Z Position, and Ball’s X Velocity.
The outputs are Quality Values of how quality to the left and the right of the platform is.