Convolution is a mathematical operation that does the integral of the product of 2 functions(signals), with one of the signals flipped. For example below we convolve 2 signals f(t) and g(t).
So the first thing to do is to flip horizontally (180 degrees) the signal g, then slide the flipped g over f, multiplying and accumulating all it's values. The order that you convolve the signals does not matter for the end result, so conv(a,b)==conv(b,a)
On this case consider that the blue signal is our input signal and our kernel, the term kernel is used when you use convolutions to filter signals.
In the case of 1D convolution the output size is calculated like this:
People use convolution on signal processing for the following use cases:
Filter signals (1D audio, 2D image processing)
Check how much a signal is correlated to another
Find patterns in signals
Below we convolve two signals x = (0,1,2,3,4) with w = (1,-1,2).
To understand better the concept of convolution let's do the example above by hand. Basically we're going to convolve 2 signals (x,w). The first thing is to flip W horizontally (Or rotate to left 180 degrees)
After that we need to slide the flipped W over the input X
Observe that on steps 3,4,5 the flipped window is completely inside the input signal. Those results are called 'valid' convolutions. The cases where the flipped window is not fully inside the input window(X), we can consider to be zero, or calculate what is possible to be calculated, e.g. on step 1 we multiply 1 by zero, and the rest is simply ignored.
In order to keep the convolution result size the same size as the input, and to avoid an effect called circular convolution, we pad the signal with zeros. Where you put the zeros depends on what you want to do, ie: on the 1D case you can concatenate them on each end, but on 2D it is normally placed all the way around the original signal.
On matlab you can use the command 'padarray' to pad the input:
>> xx(:,:,1) =1 1 0 2 02 2 2 2 10 0 0 2 12 2 2 2 12 0 2 2 1x(:,:,2) =2 1 0 0 00 2 0 1 01 0 1 2 01 2 0 2 11 2 1 2 2x(:,:,3) =2 1 1 2 21 1 1 0 02 0 1 0 20 2 0 2 10 0 2 1 0>> padarray(x,[1 1])ans(:,:,1) =0 0 0 0 0 0 00 1 1 0 2 0 00 2 2 2 2 1 00 0 0 0 2 1 00 2 2 2 2 1 00 2 0 2 2 1 00 0 0 0 0 0 0ans(:,:,2) =0 0 0 0 0 0 00 2 1 0 0 0 00 0 2 0 1 0 00 1 0 1 2 0 00 1 2 0 2 1 00 1 2 1 2 2 00 0 0 0 0 0 0ans(:,:,3) =0 0 0 0 0 0 00 2 1 1 2 2 00 1 1 1 0 0 00 2 0 1 0 2 00 0 2 0 2 1 00 0 0 2 1 0 00 0 0 0 0 0 0
In order to calculate partial derivatives of every nodes inputs and parameters, it's easier to transform the operation to a computational graph. Here I'm going to transform the previous 1D convolution, but this can be extended to 2D convolution as well.
Here our graph will be created on the valid cases where the flipped kernel(weights) will be fully inserted on our input window.
We're going to use this graph in the future to infer the gradients of the inputs (x) and weights (w) of the convolution layer.
Now we extend to the second dimension. 2D convolutions are used as image filters, and when you would like to find a specific patch on an image. An example of filtering is below:
First we should flip the kernel, then slide the kernel on the input signal. Before doing this operation by hand check out the animation showing how this sliding works.
By default when we're doing convolution we move our window one pixel at a time (stride=1), but some times in convolutional neural networks we want to move more than one pixel. For example on pooling layers with kernels of size 2 we will use a stride of 2. Setting the stride and kernel size both to 2 will result in the output being exactly half the size of the input along both dimensions. Observe that below the red kernel window is moving much more than one pixel at a time.
It is useful to know what the dimensions of our output are going to be after we have performed some convolution operation to it. Luckily there is a handy formula that tells us exactly that.
If we consider convolving an input, of spatial size [H, W] padded by P, with a square kernel of size F and using stride S, then the output size of convolution is defined as:
F is the size of the kernel, normally we use square kernels, so F is both the width and height of the kernel
The example below will convolve a 5x5x3 (WxHx3) input, with a conv layer with the following parameters Stride=2, Pad=1, F=3 (3x3 kernel), and K=2 (two filters).
Our input has 3 channels, so we need a 3x3x3 kernel weight. We have 2 filters (K=2) so we will have 2 output activations at the end. Also we can calculate the size of these two outputs to be: (5 - 3 + 2)/2 + 1 = 3.
So we will get a final output volume of size (3x3x2).
Looking at this example in more detail, basically we need to calculate 2 convolutions, one for each 3x3x3 filter (w0,w1), and remembering to add the bias.
The code below (vanilla version) cannot be used in real life because it will be slow but its good for a basic understanding. Usually deep learning libraries do the convolution as one matrix multiplication, using the im2col/col2im method.
%% Convolution n dimensions% The following code is just a extension of conv2d_vanila for n dimensions.% Parameters:% Input: H x W x depth% K: kernel F x F x depth% S: stride (How many pixels he window will slide on the input)% This implementation is like the 'valid' parameter on normal convolutionfunction outConv = convn_vanilla(input, kernel, S)% Get the input size in terms of rows and cols. The weights should have% same depth as the input volume(image)[rowsIn, colsIn, ~] = size(input);% Get volume dimensiodepthInput = ndims(input);% Get the kernel size, considering a square kernel alwaysF = size(kernel,1);%% Initialize outputssizeRowsOut = ((rowsIn-F)/S) + 1;sizeColsOut = ((colsIn-F)/S) + 1;outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);%% Do the convolution% Convolve each channel on the input with it's respective kernel channel,% at the end sum all the channel results.for depth=1:depthInput% Select input and kernel current channelinputCurrDepth = input(:,:,depth);kernelCurrDepth = kernel(:,:,depth);% Iterate on every row and col, (using stride)for r=1:S:(rowsIn-1)for c=1:S:(colsIn-1)% Avoid sampling out of the image.if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)% Select window on input volume (patch)sampleWindow = inputCurrDepth(r:(r+F)-1,c:(c+F)-1);% Do the dot productdotProd = sum(sampleWindow(:) .* kernelCurrDepth(:));% Store resultoutConvAcc(ceil(r/S),ceil(c/S),depth) = dotProd;endendendend% Sum elements over the input volume dimension (sum all the channels)outConv = sum(outConvAcc,depthInput);end