We will start this chapter explaining how to implement in Python/Matlab the ReLU layer.

In simple words, the ReLU layer will apply the function $f(x)=max(0,x)$ in all elements on a input tensor, without changing it's spatial or depth information.

â€‹â€‹

From the picture above, observe that all positive elements remain unchanged while the negatives become zero. Also the spatial information and depth are the same.

Thinking about neural networks, it's just a new type of Activation function, but with the following features:

Easy to compute (forward/backward propagation)

Suffer much less from vanishing gradient on deep models

A bad point is that they can irreversibly die if you use a big learning rate

Forward propagation

Change all negative elements to zero while retaining the value of the positive elements. No spatial/depth information is changed.

Python forward propagation

Matlab forward propagation

Backward propagation

Basically we're just applying the max(0,x) function to every $X=[x_1,x_2,x_3]$ input element. From the back-propagation chapter we can notice that the gradient dx will be zero if the element $x_n$is negative or $dout_n$ if the element is positive.