Artificial Inteligence
  • Preface
  • Introduction
  • Machine Learning
    • Linear Algebra
    • Supervised Learning
      • Neural Networks
      • Linear Classification
      • Loss Function
      • Model Optimization
      • Backpropagation
      • Feature Scaling
      • Model Initialization
      • Recurrent Neural Networks
        • Machine Translation Using RNN
    • Deep Learning
      • Convolution
      • Convolutional Neural Networks
      • Fully Connected Layer
      • Relu Layer
      • Dropout Layer
      • Convolution Layer
        • Making faster
      • Pooling Layer
      • Batch Norm layer
      • Model Solver
      • Object Localization and Detection
      • Single Shot Detectors
        • Yolo
        • SSD
      • Image Segmentation
      • GoogleNet
      • Residual Net
      • Deep Learning Libraries
    • Unsupervised Learning
      • Principal Component Analysis
      • Generative Models
    • Distributed Learning
    • Methodology for usage
      • Imbalanced/Missing Datasets
  • Artificial Intelligence
    • OpenAI Gym
    • Tree Search
    • Markov Decision process
    • Reinforcement Learning
      • Q_Learning_Simple
      • Deep Q Learning
      • Deep Reinforcement Learning
    • Natural Language Processing
      • Word2Vec
  • Appendix
    • Statistics and Probability
      • Probability
        • Markov Chains
        • Random Walk
    • Lua and Torch
    • Tensorflow
      • Multi Layer Perceptron MNIST
      • Convolution Neural Network MNIST
      • SkFlow
    • PyTorch
      • Transfer Learning
      • DataLoader and DataSets
      • Visualizing Results
Powered by GitBook
On this page
  • Introduction
  • Forward propagation
  • Matlab Forward propagation
  • Python Forward propagation
  • Back-propagation
  • Implementation Notes
  • Matlab Backward propagation
  • Python Backward propagation
  • Next Chapter

Was this helpful?

  1. Machine Learning
  2. Deep Learning

Convolution Layer

PreviousDropout LayerNextMaking faster

Last updated 5 years ago

Was this helpful?

Introduction

This chapter will explain how to implement the convolution layer on python and matlab.

In simple terms the convolution layer, will apply the convolution operator on all images on the input tensor, and also transform the input depth to match the number of filters. Bellow we explain it's parameters and signals:

  1. N: Batch size (Number of images on the 4d tensor)

  2. F: Number of filters on the convolution layer

  3. kW/kH: Kernel Width/Height (Normally we use square images, so kW=kH)

  4. H/W: Image height/width (Normally H=W)

  5. H'/W': Convolved image height/width (Remains the same as input if proper padding is used)

  6. Stride: Number of pixels that the convolution sliding window will travel.

  7. Padding: Zeros added to the border of the image to keep the input and output size the same.

  8. Depth: Volume input depth (ie if the input is a RGB image depth will be 3)

  9. Output depth: Volume output depth (same as F)

Forward propagation

On the forward propagation, you must remember, that we're going to "convolve" each input depth with a different filter, and each filter will look for something different on the image.

Here observe that all neurons(flash-lights) from layer 1 share the same set of weights, other filters will look for different patterns on the image.

Matlab Forward propagation

Python Forward propagation

The only point to observe here is that due to the way the multidimensional arrays are represented in python our tensors will have different order.

Back-propagation

In order to derive the convolution layer back-propagation it's easier to think on the 1d convolution, the results will be the same for 2d.

So doing a 1d convolution, between a signal X=[x0,x1,x2,x3,x4]X=[x0,x1,x2,x3,x4]X=[x0,x1,x2,x3,x4] and W=[w0,w1,w2]W=[w0,w1,w2]W=[w0,w1,w2], and without padding we will have Y=[y0,y1,y2]Y=[y0,y1,y2]Y=[y0,y1,y2], where Y=X∗flip(W)Y = X * flip(W)Y=X∗flip(W). Here flip can be consider as a 180 degrees rotation.

Now we convert all the "valid cases" to a computation graph, observe that for now we're adding the bias because it is used on the convolution layer.

Observe that the graphs are basically the same as the fully connected layer, the only difference is that we have shared weights.

Now changing to the back-propagation

Now consider some things: 1. dX must have the same size of X, so we need padding 2. dout must have the same size of Y, which in this case is 3 (Gradient input) 3. To save programming effort we want to calculate the gradient as a convolution 4. On dX gradient all elements are been multiplied by W so we're probably convolving W and dout

Following the output size rule for the 1d convolution: outputSize=(InputSize−KernelSize+2P)+1outputSize=(InputSize-KernelSize+2P)+1outputSize=(InputSize−KernelSize+2P)+1 Our desired size is 3, our original input size is 3, and we're going to convolve with the W matrix that also have 3 elements. So we need to pad our input with 2 zeros.

The convolution above implement all calculations needed for ∂L∂X\frac{\partial L}{\partial X}∂X∂L​, so in terms of convolution: ∂L∂X=dout⏟zero padded∗W⏞flipped K or W\Large\frac{\partial L}{\partial X}=\underbrace{dout}_\text{zero padded} * \overbrace{W}^\text{flipped K or W}∂X∂L​=zero paddeddout​​∗Wflipped K or W

Now let's continue for ∂L∂W=[∂w0,∂w1,∂w2]\frac{\partial L}{\partial W}=[\partial w_0, \partial w_1, \partial w_2]∂W∂L​=[∂w0​,∂w1​,∂w2​], considering that they must have the same size as W. ∂L∂w0=(x2.douty0)+(x3.douty1)+(x4.douty2)∂L∂w1=(x1.douty0)+(x2.douty1)+(x3.douty2)∂L∂w2=(x0.douty0)+(x1.douty1)+(x2.douty2)\frac{\partial L}{\partial w_0}=(x2.dout_{y0})+(x3.dout_{y1}) + (x4.dout_{y2})\\ \frac{\partial L}{\partial w_1}=(x1.dout_{y0})+(x2.dout_{y1}) + (x3.dout_{y2})\\ \frac{\partial L}{\partial w_2}=(x0.dout_{y0})+(x1.dout_{y1}) + (x2.dout_{y2})∂w0​∂L​=(x2.douty0​)+(x3.douty1​)+(x4.douty2​)∂w1​∂L​=(x1.douty0​)+(x2.douty1​)+(x3.douty2​)∂w2​∂L​=(x0.douty0​)+(x1.douty1​)+(x2.douty2​)

Again by just looking to the expressions that we took from the graph we can see that is possible to represent them as a convolution between dout and X. Also as the output will be 3 elements, there is no need to do padding.

So in terms of convolution the calculations for ∂L∂W\frac{\partial L}{\partial W}∂W∂L​ will be: ∂L∂W=X^⏟flipped X∗dout\Large\frac{\partial L}{\partial W}=\underbrace{\hat{X}}_\text{flipped X} * dout∂W∂L​=flipped XX^​​∗dout Just one point to remember, if you consider X to be the kernel, and dout the signal, X will be automatically flipped. ∂L∂W=dout∗X\Large\frac{\partial L}{\partial W}=dout * X∂W∂L​=dout∗X

Now for the bias, the calculation will be similar to the Fully Connected layer. Basically we have one bias per filter (depth) ∂L∂b=[∑batch(douty0)∑batch(douty1)∑batch(douty2)]\Large \frac{\partial L}{\partial b}=\begin{bmatrix} \sum_{batch}(dout_{y0}) & \sum_{batch}(dout_{y1}) & \sum_{batch}(dout_{y2}) \end{bmatrix}∂b∂L​=[∑batch​(douty0​)​∑batch​(douty1​)​∑batch​(douty2​)​]

Implementation Notes

Before jumping to the code some points need to be reviewed:

  1. If you use some parameter (ie: Stride/Pad) during forward propagation you need to apply them on the backward propagation.

  2. On Python our multidimensional tensor will be "input=[N x Depth x H x W]" on matlab they will be "input=[H x W x Depth x N]"

  3. As mentioned before the gradients of a input, has the same size as the input itself "size(x)==size(dx)"

Matlab Backward propagation

Python Backward propagation

Next Chapter

Next chapter we will learn about Pooling layer

Basically we can consider the previous "convn_vanilla" function on the and apply for each depth on the input and output.

If you follow the computation graphs backward, as was presented on the we will have the following formulas for ∂L∂X\frac{\partial L}{\partial X}∂X∂L​, which means how the loss will change with the input X ∂L∂x0=(w2.douty0)∂L∂x1=(w1.douty0)+(w2.douty1)∂L∂x2=(w0.douty0)+(w1.douty1)+(w2.douty2)∂L∂x3=(w0.douty1)+(w1.douty2)∂L∂x4=(w0.douty2)\frac{\partial L}{\partial x_0}=(w2.dout_{y0})\\ \frac{\partial L}{\partial x_1}=(w1.dout_{y0})+(w2.dout_{y1})\\ \frac{\partial L}{\partial x_2}=(w0.dout_{y0})+(w1.dout_{y1}) + (w2.dout_{y2})\\ \frac{\partial L}{\partial x_3}=(w0.dout_{y1})+(w1.dout_{y2}) \\ \frac{\partial L}{\partial x_4}=(w0.dout_{y2})∂x0​∂L​=(w2.douty0​)∂x1​∂L​=(w1.douty0​)+(w2.douty1​)∂x2​∂L​=(w0.douty0​)+(w1.douty1​)+(w2.douty2​)∂x3​∂L​=(w0.douty1​)+(w1.douty2​)∂x4​∂L​=(w0.douty2​)

Convolution chapter
Backpropagation chapter