Pooling Layer

Introduction

The pooling layer, is used to reduce the spatial dimensions, but not depth, on a convolution neural network, model, basically this is what you gain:

1. By having less spatial information you gain computation performance

2. Less spatial information also means less parameters, so less chance to over-fit

3. You get some translation invariance

Some projects don't use pooling, specially when they want to "learn" some object specific position. Learn how to play atari games.

On the diagram bellow we show the most common type of pooling the max-pooling layer, which slides a window, like a normal convolution, and get the biggest value on the window as the output.

The most important parameters to play:

• Input: H1 x W1 x Depth_In x N

• Stride: Scalar that control the amount of pixels that the window slide.

• K: Kernel size

Regarding it's Output H2 x W2 x Depth_Out x N:

$W_2 = (W_1 - K)/S + 1\\ H_2 = (H_1 - K)/S + 1 \\ Depth_{out} = Depth_{In}$

It's also valid to point out that there is no learnable parameters on the pooling layer. So it's backpropagation is simpler.

Forward Propagation

The window movement mechanism on pooling layers is the same as convolution layer, the only change is that we will select the biggest value on the window.

Backward Propagation

From the backpropagation chapter we learn that the max node simply act as a router, giving the input gradient "dout" to the input that has value bigger than zero.

In other words the gradient with respect to the input of the max pooling layer will be a tensor make of zeros except on the places that was selected during the forward propagation.

Improving performance

On future chapter we will learn a technique that improves the convolution performance, until them we will stick with the naive implementation.

Next Chapter

Next chapter we will learn about Batch Norm layer

Last updated