Fully Connected Layer
Last updated
Last updated
This chapter will explain how to implement in matlab and python the fully connected layer, including the forward and back-propagation.
First consider the fully connected layer as a black box with the following properties: On the forward propagation 1. Has 3 inputs (Input signal, Weights, Bias) 2. Has 1 output
On the back propagation 1. Has 1 input (dout) which has the same size as output 2. Has 3 (dx,dw,db) outputs, that has the same size as the inputs
Just by looking the diagram we can infer the outputs:
Now vectorizing (put on matrix form): (Observe 2 possible versions)
Depending on the format that you choose to represent W attention to this because it can be confusing.
For example if we choose X to be a column vector, our matrix multiplication must be:
In order to discover how each input influence the output (backpropagation) is better to represent the algorithm as a computation graph.
Now for the backpropagation let's focus in one of the graphs, and apply what we learned so far on backpropagation.
Also extending to the second output (y2)
Merging the results, for dx:
On the matrix form
Depending on the format that you choose to represent X (as a row or column vector), attention to this because it can be confusing.
Now for dW It's important to not that every gradient has the same dimension as it's original value, for instance dW has the same dimension as W, in other words:
All the examples so far, deal with single elements on the input, but normally we deal with much more than one example at a time. For instance on GPUs is common to have batches of 256 images at the same time. The trick is to represent the input signal as a 2d matrix [NxD] where N is the batch size and D the dimensions of the input signal. So if you consider the CIFAR dataset where each digit is a 28x28x1 (grayscale) image D will be 784, so if we have 10 digits on the same batch our input will be [10x784].
In this case W must be represented in a way that support this matrix multiplication, so depending how it was created it may need to be transposed
Continuing the forward propagation will be computed as:
One point to observe here is that the bias has repeated 4 times to accommodate the product X.W that in this case will generate a matrix [4x2]. On matlab the command "repmat" does the job. On python it does automatically.
Before jumping to implementation is good to verify the operations on Matlab or Python (sympy) symbolic engine. This will help visualize and explore the results before acutally coding the functions.
Now we also confirm the backward propagation formulas. Observe the function "latex" that convert an expression to latex on matlab
Our library will be handling images, and most of the time we will be handling matrix operations on hundreds of images at the same time. So we must find a way to represent them, here we will represent batch of images as a 4d tensor, or an array of 3d matrices. Bellow we have a batch of 4 rgb images (width:160, height:120). We're going to load them on matlab/python and organize them one a 4d matrix
On Python before we store the image on the tensor we do a transpose to convert out image 120x160x3 to 3x120x160, then to store on a tensor 4x3x120x160
One special point to pay attention is the way that matlab represent high-dimension arrays in contrast with matlab. Also another point that may cause confusion is the fact that matlab represent data on col-major order and numpy on row-major order.
One difference on how matlab and python represent multidimensional arrays must be noticed. We want to create a 4 channel matrix 2x3. So in matlab you need to create a array (2,3,4) and on python it need to be (4,2,3)
As mentioned before matlab will run the command reshape one column at a time, so if you want to change this behavior you need to transpose first the input matrix.
If you are dealing with more than 2 dimensions you need to use the "permute" command to transpose. Now on Python the default of the reshape command is one row at a time, or if you want you can also change the order (This options does not exist in matlab)
Bellow we have a reshape on the row-major order as a new function:
The other option would be to avoid this permutation reshape is to have the weight matrix on a different order and calculate the forward propagation like this:
With x as a column vector and the weights organized row-wise, on the example that is presented we keep using the same order as the python example.
Next chapter we will learn about Relu layers
Summarizing the calculation for the first output (y1), consider a global error L(loss) and
, or .
And dB
For the sake of argument, let's consider our previous samples where the vector X was represented like , if we want to have a batch of 4 elements we will have:
Here after we defined the variables which will be symbolic, we create the matrix W,X,b then calculate , compare the final result with what we calculated before.
Here I've just copy and paste the latex result of dW or " " from matlab
Observe that in matlab the image becomes a matrix 120x160x3. Our tensor will be 120x160x3x4