Target network update frequency (sync every 1000 frames)
Performance metrics
We can use the Q value and loss as two metrics to evaluate the performance
of a DQN model.
Q value
Q value measures the expected reward for performing an
action in a given state.
Increasing average Q value is a sign that model is getting better at the
game (better performance).
A good range for Q value is 5 to 20, with a small and steady increasing
trend.
It is normal for Q value to fluctuate or decrease at the start of
training.
Tips for Q value
Expected Q value is affected various hyperparameters as well as the reward
function.
Here are some common issues with Q values and tips on how to fix them:
1. Q value too low (<1)
Alpha (learning rate) might be too low. Increase alpha to make the model
learn faster.
Gamma (discount factor) might be too low. Increase gamma to make the
model account for more future reward.
The model might not be learning at all. This could be due to poor design
or conflicting weights in reward function.
2. Q value too high (>50)
Gamma (discount factor) might be too high. Decrease gamma to avoid
compounding future reward too much.
Weights in reward function might be too high. Try to lower the weights
for factors affecting the reward function.
3. Q value unstable and fluctuates widely
Alpha (learning rate) might be too high. Decrease alpha to make the
model learn in a more stable manner.
Loss
Loss measures the difference between the predicted and the actual result
(how accurate the prediction is). It is the squared error
of the target Q value and prediction Q value.
Decreasing loss is a sign that model is becoming more accurate at
predictions.
A perfect model would have a loss value of zero, meaning it can predict
state reward perfectly without any errors.
A good range for loss value is 0 to 5, with a small and steady
decreasing trend.
It is normal for loss to increase at the start of training, before Q
value stabilizes.
Tips for Loss
Here are some common issues with loss and tips on how to fix them:
1. Negative loss
This is likely due to a bug. You can report bugs on our
Discord server.
2. Loss too high (>10)
Gamma (discount factor) might be too high. Decrease gamma to avoid
compounding future reward too much.
Weights in reward function might be too high. Try to lower the weights
for factors affecting the reward function.
The model is not learning and becoming better. This could be due to poor
design or conflicting weights in reward function.
3. Loss unstable and fluctuates widely
Alpha (learning rate) might be too high. Decrease alpha to make the
model learn in a more stable manner.
Examples
DQN can be trained to play many single-player games, for example Tetris,
Snake, 2048.