Artificial Inteligence
  • Preface
  • Introduction
  • Machine Learning
    • Linear Algebra
    • Supervised Learning
      • Neural Networks
      • Linear Classification
      • Loss Function
      • Model Optimization
      • Backpropagation
      • Feature Scaling
      • Model Initialization
      • Recurrent Neural Networks
        • Machine Translation Using RNN
    • Deep Learning
      • Convolution
      • Convolutional Neural Networks
      • Fully Connected Layer
      • Relu Layer
      • Dropout Layer
      • Convolution Layer
        • Making faster
      • Pooling Layer
      • Batch Norm layer
      • Model Solver
      • Object Localization and Detection
      • Single Shot Detectors
        • Yolo
        • SSD
      • Image Segmentation
      • GoogleNet
      • Residual Net
      • Deep Learning Libraries
    • Unsupervised Learning
      • Principal Component Analysis
      • Generative Models
    • Distributed Learning
    • Methodology for usage
      • Imbalanced/Missing Datasets
  • Artificial Intelligence
    • OpenAI Gym
    • Tree Search
    • Markov Decision process
    • Reinforcement Learning
      • Q_Learning_Simple
      • Deep Q Learning
      • Deep Reinforcement Learning
    • Natural Language Processing
      • Word2Vec
  • Appendix
    • Statistics and Probability
      • Probability
        • Markov Chains
        • Random Walk
    • Lua and Torch
    • Tensorflow
      • Multi Layer Perceptron MNIST
      • Convolution Neural Network MNIST
      • SkFlow
    • PyTorch
      • Transfer Learning
      • DataLoader and DataSets
      • Visualizing Results
Powered by GitBook
On this page
  • Introduction
  • Neural network point of view
  • Computation graph point of view
  • Expanding for bigger batches
  • Using Symbolic engine
  • Symbolic forward propagation on Matlab
  • Symbolic backward propagation on Matlab
  • Input Tensor
  • Python Implementation
  • Forward Propagation
  • Backward Propagation
  • Matlab Implementation
  • Multidimensional arrays in python and matlab
  • Matlab Reshape order
  • Forward Propagation
  • Backward Propagation
  • Next Chapter

Was this helpful?

  1. Machine Learning
  2. Deep Learning

Fully Connected Layer

PreviousConvolutional Neural NetworksNextRelu Layer

Last updated 5 years ago

Was this helpful?

Introduction

This chapter will explain how to implement in matlab and python the fully connected layer, including the forward and back-propagation.

First consider the fully connected layer as a black box with the following properties: On the forward propagation 1. Has 3 inputs (Input signal, Weights, Bias) 2. Has 1 output

On the back propagation 1. Has 1 input (dout) which has the same size as output 2. Has 3 (dx,dw,db) outputs, that has the same size as the inputs

Neural network point of view

Just by looking the diagram we can infer the outputs:

y1=[(w11.x1)+(w12.x2)+(w13.x3)]+b1y2=[(w21.x1)+(w22.x2)+(w23.x3)]+b2y_1=[(w_{11}.x_1)+(w_{12}.x_2)+(w_{13}.x_3)] + b1\\ y_2=[(w_{21}.x_1)+(w_{22}.x_2)+(w_{23}.x_3)] + b2y1​=[(w11​.x1​)+(w12​.x2​)+(w13​.x3​)]+b1y2​=[(w21​.x1​)+(w22​.x2​)+(w23​.x3​)]+b2

Now vectorizing (put on matrix form): (Observe 2 possible versions)

[w11w12w13w21w22w23]⏟One collumn per x dimension.[x1x2x3]+[b1b2]=[y1y2]∴H(X)=(W.x)+bTH(X)=(WT.x)+b\underbrace{\begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{bmatrix}}_{\text{One collumn per x dimension}}. \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix}+\begin{bmatrix} b_{1} \\ b_{2} \end{bmatrix}=\begin{bmatrix} y_{1} \\ y_{2} \end{bmatrix} \therefore \\ H(X) = (W.x)+b^T \\ H(X) = (W^T.x)+bOne collumn per x dimension[w11​w21​​w12​w22​​w13​w23​​]​​.​x1​x2​x3​​​+[b1​b2​​]=[y1​y2​​]∴H(X)=(W.x)+bTH(X)=(WT.x)+b

Depending on the format that you choose to represent W attention to this because it can be confusing.

For example if we choose X to be a column vector, our matrix multiplication must be:

([x1x2x3].[w11w21w12w22w13w23])+[b1b2]=[y1y2]∴H(X)=(x.Wt)+b(\begin{bmatrix} x_{1} & x_{2} & x_{3} \end{bmatrix}.\begin{bmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \\ w_{13} & w_{23} \end{bmatrix})+\begin{bmatrix} b_{1} & b_{2}\end{bmatrix}=\begin{bmatrix} y_{1} & y_{2}\end{bmatrix} \therefore \\ H(X)=(x.W^t)+b([x1​​x2​​x3​​].​w11​w12​w13​​w21​w22​w23​​​)+[b1​​b2​​]=[y1​​y2​​]∴H(X)=(x.Wt)+b

Computation graph point of view

In order to discover how each input influence the output (backpropagation) is better to represent the algorithm as a computation graph.

Now for the backpropagation let's focus in one of the graphs, and apply what we learned so far on backpropagation.

Summarizing the calculation for the first output (y1), consider a global error L(loss) and douty1=∂L∂y1dout_{y1}=\frac{\partial L}{\partial y_1}douty1​=∂y1​∂L​

∂L∂x1=douty1.w11∂L∂x2=douty1.w12∂L∂x3=douty1.w13\Large \frac{\partial L}{\partial x_1}=dout_{y1}.w11\\ \Large \frac{\partial L}{\partial x_2}=dout_{y1}.w12\\ \Large \frac{\partial L}{\partial x_3}=dout_{y1}.w13∂x1​∂L​=douty1​.w11∂x2​∂L​=douty1​.w12∂x3​∂L​=douty1​.w13

∂L∂w11=douty1.x1∂L∂w12=douty1.x2∂L∂w13=douty1.x3\Large \frac{\partial L}{\partial w_{11}}=dout_{y1}.x1\\ \Large \frac{\partial L}{\partial w_{12}}=dout_{y1}.x2\\ \Large \frac{\partial L}{\partial w_{13}}=dout_{y1}.x3∂w11​∂L​=douty1​.x1∂w12​∂L​=douty1​.x2∂w13​∂L​=douty1​.x3

∂L∂b1=douty1\Large \frac{\partial L}{\partial b_1}=dout_{y1}∂b1​∂L​=douty1​

Also extending to the second output (y2)

∂L∂x1=douty2.w21∂L∂x2=douty2.w22∂L∂x3=douty2.w23\Large \frac{\partial L}{\partial x_1}=dout_{y2}.w21\\ \Large \frac{\partial L}{\partial x_2}=dout_{y2}.w22\\ \Large \frac{\partial L}{\partial x_3}=dout_{y2}.w23∂x1​∂L​=douty2​.w21∂x2​∂L​=douty2​.w22∂x3​∂L​=douty2​.w23

∂L∂w21=douty2.x1∂L∂w22=douty2.x2∂L∂w23=douty2.x3\Large \frac{\partial L}{\partial w_{21}}=dout_{y2}.x1\\ \Large \frac{\partial L}{\partial w_{22}}=dout_{y2}.x2\\ \Large \frac{\partial L}{\partial w_{23}}=dout_{y2}.x3∂w21​∂L​=douty2​.x1∂w22​∂L​=douty2​.x2∂w23​∂L​=douty2​.x3

∂L∂b2=douty2\Large \frac{\partial L}{\partial b_2}=dout_{y2}∂b2​∂L​=douty2​

Merging the results, for dx:

∂L∂x1=[douty1.w11+douty2.w21]∂L∂x2=[douty1.w12+douty2.w22]∂L∂x3=[douty1.w13+douty2.w23]\frac{\partial L}{\partial x1}=[dout_{y1}.w11+dout_{y2}.w21]\\ \frac{\partial L}{\partial x2}=[dout_{y1}.w12+dout_{y2}.w22]\\ \frac{\partial L}{\partial x3}=[dout_{y1}.w13+dout_{y2}.w23]∂x1∂L​=[douty1​.w11+douty2​.w21]∂x2∂L​=[douty1​.w12+douty2​.w22]∂x3∂L​=[douty1​.w13+douty2​.w23]

On the matrix form

∂L∂X=[douty1douty2].[w11w12w13w21w22w23]\frac{\partial L}{\partial X}=\begin{bmatrix} dout_{y1} & dout_{y2} \end{bmatrix}.\begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{bmatrix}∂X∂L​=[douty1​​douty2​​].[w11​w21​​w12​w22​​w13​w23​​], or ∂L∂X=[w11w21w12w22w13w23].[douty1douty2]\frac{\partial L}{\partial X}=\begin{bmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \\ w_{13} & w_{23} \end{bmatrix}. \begin{bmatrix} dout_{y1} \\ dout_{y2} \end{bmatrix}∂X∂L​=​w11​w12​w13​​w21​w22​w23​​​.[douty1​douty2​​].

Depending on the format that you choose to represent X (as a row or column vector), attention to this because it can be confusing.

Now for dW It's important to not that every gradient has the same dimension as it's original value, for instance dW has the same dimension as W, in other words:

W=[w11w12w13w21w22w23]∴∂L∂W=[∂L∂w11∂L∂w12∂L∂w13∂L∂w21∂L∂w22∂L∂w23]W=\begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{bmatrix} \therefore \frac{\partial L}{\partial W}=\begin{bmatrix} \frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{12}} & \frac{\partial L}{\partial w_{13}} \\ \frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{22}} & \frac{\partial L}{\partial w_{23}} \end{bmatrix}W=[w11​w21​​w12​w22​​w13​w23​​]∴∂W∂L​=[∂w11​∂L​∂w21​∂L​​∂w12​∂L​∂w22​∂L​​∂w13​∂L​∂w23​∂L​​]

∂L∂W=[douty1douty2].[x1x2x3]=[∂L∂w11∂L∂w12∂L∂w13∂L∂w21∂L∂w22∂L∂w23]\frac{\partial L}{\partial W}=\begin{bmatrix} dout_{y1} \\ dout_{y2} \end{bmatrix}.\begin{bmatrix} x_{1} && x_{2} && x_{3} \end{bmatrix}=\begin{bmatrix} \frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{12}} & \frac{\partial L}{\partial w_{13}} \\ \frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{22}} & \frac{\partial L}{\partial w_{23}} \end{bmatrix}∂W∂L​=[douty1​douty2​​].[x1​​​x2​​​x3​​]=[∂w11​∂L​∂w21​∂L​​∂w12​∂L​∂w22​∂L​​∂w13​∂L​∂w23​∂L​​]

And dB ∂L∂b=[douty1douty2]\Large \frac{\partial L}{\partial b}=\begin{bmatrix} dout_{y1} & dout_{y2} \end{bmatrix}∂b∂L​=[douty1​​douty2​​]

Expanding for bigger batches

All the examples so far, deal with single elements on the input, but normally we deal with much more than one example at a time. For instance on GPUs is common to have batches of 256 images at the same time. The trick is to represent the input signal as a 2d matrix [NxD] where N is the batch size and D the dimensions of the input signal. So if you consider the CIFAR dataset where each digit is a 28x28x1 (grayscale) image D will be 784, so if we have 10 digits on the same batch our input will be [10x784].

For the sake of argument, let's consider our previous samples where the vector X was represented like X=[x1x2x3]X=\begin{bmatrix} x_1 & x_2 & x_3 \end{bmatrix}X=[x1​​x2​​x3​​], if we want to have a batch of 4 elements we will have:

Xbatch=[x1sample1x2sample1x3sample1x1sample2x2sample2x3sample2x1sample3x2sample3x3sample3x1sample4x2sample4x3sample4]∴Xbatch=[4,3]X_{batch}=\begin{bmatrix} x_{1 sample 1} & x_{2 sample 1} & x_{3 sample 1} \\ x_{1 sample 2} & x_{2 sample 2} & x_{3 sample 2} \\ x_{1 sample 3} & x_{2 sample 3} & x_{3 sample 3} \\ x_{1 sample 4} & x_{2 sample 4} & x_{3 sample 4} \end{bmatrix} \therefore X_{batch}=[4,3]Xbatch​=​x1sample1​x1sample2​x1sample3​x1sample4​​x2sample1​x2sample2​x2sample3​x2sample4​​x3sample1​x3sample2​x3sample3​x3sample4​​​∴Xbatch​=[4,3]

In this case W must be represented in a way that support this matrix multiplication, so depending how it was created it may need to be transposed

WT=[w11w21w12w22w13w23]W^T=\begin{bmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \\ w_{13} & w_{23} \end{bmatrix}WT=​w11​w12​w13​​w21​w22​w23​​​

Continuing the forward propagation will be computed as:

([x1sample1x2sample1x3sample1x1sample2x2sample2x3sample2x1sample3x2sample3x3sample3x1sample4x2sample4x3sample4].[w11w21w12w22w13w23])+[b1sample1b2sample1b1sample2b2sample2b1sample3b2sample3b1sample4b2sample4]=[y1sample1y2sample1y1sample2y2sample2y1sample3y2sample3y1sample4y2sample4](\begin{bmatrix} x_{1 sample 1} & x_{2 sample 1} & x_{3 sample 1} \\ x_{1 sample 2} & x_{2 sample 2} & x_{3 sample 2} \\ x_{1 sample 3} & x_{2 sample 3} & x_{3 sample 3} \\ x_{1 sample 4} & x_{2 sample 4} & x_{3 sample 4} \end{bmatrix}.\begin{bmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \\ w_{13} & w_{23} \end{bmatrix})+\begin{bmatrix} b_{1 sample 1} & b_{2 sample 1} \\ b_{1 sample 2} & b_{2 sample 2} \\ b_{1 sample 3} & b_{2 sample 3} \\ b_{1 sample 4} & b_{2 sample 4} \end{bmatrix}=\begin{bmatrix} y_{1 sample 1} & y_{2 sample 1} \\ y_{1 sample 2} & y_{2 sample 2} \\ y_{1 sample 3} & y_{2 sample 3} \\ y_{1 sample 4} & y_{2 sample 4}\end{bmatrix}(​x1sample1​x1sample2​x1sample3​x1sample4​​x2sample1​x2sample2​x2sample3​x2sample4​​x3sample1​x3sample2​x3sample3​x3sample4​​​.​w11​w12​w13​​w21​w22​w23​​​)+​b1sample1​b1sample2​b1sample3​b1sample4​​b2sample1​b2sample2​b2sample3​b2sample4​​​=​y1sample1​y1sample2​y1sample3​y1sample4​​y2sample1​y2sample2​y2sample3​y2sample4​​​

One point to observe here is that the bias has repeated 4 times to accommodate the product X.W that in this case will generate a matrix [4x2]. On matlab the command "repmat" does the job. On python it does automatically.

Using Symbolic engine

Before jumping to implementation is good to verify the operations on Matlab or Python (sympy) symbolic engine. This will help visualize and explore the results before acutally coding the functions.

Symbolic forward propagation on Matlab

Here after we defined the variables which will be symbolic, we create the matrix W,X,b then calculate y=(W.X)+by=(W.X)+by=(W.X)+b, compare the final result with what we calculated before.

Symbolic backward propagation on Matlab

Now we also confirm the backward propagation formulas. Observe the function "latex" that convert an expression to latex on matlab

(douty1 x1douty1 x2douty1 x3douty2 x1douty2 x2douty2 x3)\left(\begin{array}{ccc} \mathrm{douty1}\, \mathrm{x1} & \mathrm{douty1}\, \mathrm{x2} & \mathrm{douty1}\, \mathrm{x3}\\ \mathrm{douty2}\, \mathrm{x1} & \mathrm{douty2}\, \mathrm{x2} & \mathrm{douty2}\, \mathrm{x3} \end{array}\right)(douty1x1douty2x1​douty1x2douty2x2​douty1x3douty2x3​)

Input Tensor

Our library will be handling images, and most of the time we will be handling matrix operations on hundreds of images at the same time. So we must find a way to represent them, here we will represent batch of images as a 4d tensor, or an array of 3d matrices. Bellow we have a batch of 4 rgb images (width:160, height:120). We're going to load them on matlab/python and organize them one a 4d matrix

On Python before we store the image on the tensor we do a transpose to convert out image 120x160x3 to 3x120x160, then to store on a tensor 4x3x120x160

Python Implementation

Forward Propagation

Backward Propagation

Matlab Implementation

One special point to pay attention is the way that matlab represent high-dimension arrays in contrast with matlab. Also another point that may cause confusion is the fact that matlab represent data on col-major order and numpy on row-major order.

Multidimensional arrays in python and matlab

One difference on how matlab and python represent multidimensional arrays must be noticed. We want to create a 4 channel matrix 2x3. So in matlab you need to create a array (2,3,4) and on python it need to be (4,2,3)

Matlab Reshape order

As mentioned before matlab will run the command reshape one column at a time, so if you want to change this behavior you need to transpose first the input matrix.

If you are dealing with more than 2 dimensions you need to use the "permute" command to transpose. Now on Python the default of the reshape command is one row at a time, or if you want you can also change the order (This options does not exist in matlab)

Bellow we have a reshape on the row-major order as a new function:

The other option would be to avoid this permutation reshape is to have the weight matrix on a different order and calculate the forward propagation like this:

[w11w12w13w21w22w23].[x1sample1x1sample2x2sample1x2sample2x3sample1x3sample2]+[b1b1b2b2]=[y1sample1y1sample2y2sample1y2sample2]\begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{bmatrix} . \begin{bmatrix} x_{1 sample 1} & x_{1 sample 2} \\ x_{2 sample 1} & x_{2 sample 2} \\ x_{3 sample 1} & x_{3 sample 2} \\ \end{bmatrix}+\begin{bmatrix} b_{1} & b_{1} \\ b_{2} & b_{2} \end{bmatrix}=\begin{bmatrix} y_{1 sample 1} & y_{1 sample 2} \\ y_{2 sample 1} & y_{2 sample 2} \end{bmatrix}[w11​w21​​w12​w22​​w13​w23​​].​x1sample1​x2sample1​x3sample1​​x1sample2​x2sample2​x3sample2​​​+[b1​b2​​b1​b2​​]=[y1sample1​y2sample1​​y1sample2​y2sample2​​]

With x as a column vector and the weights organized row-wise, on the example that is presented we keep using the same order as the python example.

Forward Propagation

Backward Propagation

Next Chapter

Next chapter we will learn about Relu layers

Here I've just copy and paste the latex result of dW or " ∂L∂W\frac{\partial L}{\partial W}∂W∂L​ " from matlab

Observe that in matlab the image becomes a matrix 120x160x3. Our tensor will be 120x160x3x4