This chapter will explain how to implement in matlab and python the fully connected layer, including the forward and back-propagation.
First consider the fully connected layer as a black box with the following properties: On the forward propagation 1. Has 3 inputs (Input signal, Weights, Bias) 2. Has 1 output
On the back propagation 1. Has 1 input (dout) which has the same size as output 2. Has 3 (dx,dw,db) outputs, that has the same size as the inputs
Neural network point of view
Just by looking the diagram we can infer the outputs:
∂X∂L=[douty1douty2].[w11w21w12w22w13w23], or ∂X∂L=w11w12w13w21w22w23.[douty1douty2].
Depending on the format that you choose to represent X (as a row or column vector), attention to this because it can be confusing.
Now for dW It's important to not that every gradient has the same dimension as it's original value, for instance dW has the same dimension as W, in other words:
All the examples so far, deal with single elements on the input, but normally we deal with much more than one example at a time. For instance on GPUs is common to have batches of 256 images at the same time. The trick is to represent the input signal as a 2d matrix [NxD] where N is the batch size and D the dimensions of the input signal. So if you consider the CIFAR dataset where each digit is a 28x28x1 (grayscale) image D will be 784, so if we have 10 digits on the same batch our input will be [10x784].
For the sake of argument, let's consider our previous samples where the vector X was represented like X=[x1x2x3], if we want to have a batch of 4 elements we will have:
One point to observe here is that the bias has repeated 4 times to accommodate the product X.W that in this case will generate a matrix [4x2]. On matlab the command "repmat" does the job. On python it does automatically.
Using Symbolic engine
Before jumping to implementation is good to verify the operations on Matlab or Python (sympy) symbolic engine. This will help visualize and explore the results before acutally coding the functions.
Symbolic forward propagation on Matlab
Here after we defined the variables which will be symbolic, we create the matrix W,X,b then calculate y=(W.X)+b, compare the final result with what we calculated before.
Symbolic backward propagation on Matlab
Now we also confirm the backward propagation formulas. Observe the function "latex" that convert an expression to latex on matlab
Our library will be handling images, and most of the time we will be handling matrix operations on hundreds of images at the same time. So we must find a way to represent them, here we will represent batch of images as a 4d tensor, or an array of 3d matrices. Bellow we have a batch of 4 rgb images (width:160, height:120). We're going to load them on matlab/python and organize them one a 4d matrix
On Python before we store the image on the tensor we do a transpose to convert out image 120x160x3 to 3x120x160, then to store on a tensor 4x3x120x160
Python Implementation
Forward Propagation
Backward Propagation
Matlab Implementation
One special point to pay attention is the way that matlab represent high-dimension arrays in contrast with matlab. Also another point that may cause confusion is the fact that matlab represent data on col-major order and numpy on row-major order.
Multidimensional arrays in python and matlab
One difference on how matlab and python represent multidimensional arrays must be noticed. We want to create a 4 channel matrix 2x3. So in matlab you need to create a array (2,3,4) and on python it need to be (4,2,3)
Matlab Reshape order
As mentioned before matlab will run the command reshape one column at a time, so if you want to change this behavior you need to transpose first the input matrix.
If you are dealing with more than 2 dimensions you need to use the "permute" command to transpose. Now on Python the default of the reshape command is one row at a time, or if you want you can also change the order (This options does not exist in matlab)
Bellow we have a reshape on the row-major order as a new function:
The other option would be to avoid this permutation reshape is to have the weight matrix on a different order and calculate the forward propagation like this: