Deep Q-networks (DQN) is a type of deep reinforcement learning algorithm developed by DeepMind in 2013.

DQN uses a deep convolutional neural network to approximate the Q-value of action in a given state.

Hyperparameters

There are several hyperparameters that can be tuned to get better results with DQN.

Performance metrics

We can use the Q value and loss as two metrics to evaluate the performance of a DQN model:

Q value

  • Increasing average Q value is a sign that model is getting better at the game (better performance).
  • However, Q value being too high (>50) can signal poor choice in reward function design.
  • A good range for Q value is 5 to 20, with a small and steady increasing trend.

Loss

  • Decreasing loss is a sign that model is becoming more accurate at predictions.
  • A perfect model would have a loss value of zero, meaning it can predict state reward perfectly without any errors.
  • A good range for loss value is 0 to 5, with a small and steady decreasing trend.

Examples

DQN can be trained to play many single-player games, for example Tetris, Snake, 2048.

This is a screenshot of tfboard for training DQN to play 2048 over 100M frames:

Screenshot showing tfboard for training DQN to play 2048

Observations on key metrics:

  • The Q value is stable around 15 to 16 and increasing steadily.
  • The loss value is stable at around 0.1 to 0.2 and decreasing steadily.

This is a screenshot of tfboard for training DQN to play AI Simulator: Robot over 13M frames:

Screenshot of tfboard for training DQN to play AI Simulator: Robot

Observations on key metrics:

  • The Q value is stable around 4 and increasing steadily.
  • The loss value is stable at around 0.3 and decreasing steadily.

Further readings

DQN paper

Interactive demos