We will start this chapter explaining how to implement in Python/Matlab the ReLU layer.
In simple words, the ReLU layer will apply the function in all elements on a input tensor, without changing it's spatial or depth information.
From the picture above, observe that all positive elements remain unchanged while the negatives become zero. Also the spatial information and depth are the same.
Thinking about neural networks, it's just a new type of Activation function, but with the following features:
Easy to compute (forward/backward propagation)
Suffer much less from vanishing gradient on deep models
A bad point is that they can irreversibly die if you use a big learning rate
Change all negative elements to zero while retaining the value of the positive elements. No spatial/depth information is changed.
Basically we're just applying the max(0,x) function to every input element. From the back-propagation chapter we can notice that the gradient dx will be zero if the element is negative or if the element is positive.
Next chapter we will learn about Dropout layers