Dropout is a technique used to improve over-fit on neural networks, you should use Dropout along with other techniques like L2 Regularization.
Bellow we have a classification error (Not including loss), observe that the test/validation error is smaller using dropout
As other regularization techniques the use of dropout also make the training loss error a little worse. But that's the idea, basically we want to trade training performance for more generalization. Remember that's more capacity you add on your model (More layers, or more neurons) more prone to over-fit it becomes.
Bellow we have a plot showing both training, and validation loss with and without dropout
Basically during training half of neurons on a particular layer will be deactivated. This improve generalization because force your layer to learn with different neurons the same "concept".
During the prediction phase the dropout is deactivated.
Normally some deep learning models use Dropout on the fully connected layers, but is also possible to use dropout after the max-pooling layers, creating some kind of image noise augmentation.
In order to implement this neuron deactivation, we create a mask(zeros and ones) during forward propagation. This mask is applied to the layer outputs during training and cached for future use on back-propagation. As explained before this dropout mask is used only during training.
On the backward propagation we're interested on the neurons that was activated (we need to save mask from forward propagation). Now with those neurons selected we just back-propagate dout. The dropout layer has no learnable parameters, just it's input (X). During back-propagation we just return "dx". In other words:
Next chapter we will learn about Convolution layer