Residual Net


This chapter will present the 2016 state of the art on object classification. The ResidualNet it's basically a 150 deep convolution neural network made by equal "residual" blocks.
The problem is for real deep networks (more than 30 layers), all the known techniques (Relu, dropout, batch-norm, etc...) are not enough to do a good end-to-end training. This contrast with the common "empirical proven knowledge" that deeper is better.
The idea of the residual network is use blocks that re-route the input, and add to the concept learned from the previous layer. The idea is that during learning the next layer will learn the concepts of the previous layer plus the input of that previous layer. This would work better than just learn a concept without a reference that was used to learn that concept.
Another way to visualize their solution is remember that the back-propagation of a sum node will replicate the input gradient with no degradation.
Bellow we show an example of a 34-deep residual net.
The ResidualNet creators proved empiricaly that it's easier to train a 34-layer residual compared to a 34-layer cascaded (Like VGG).
Observe that on the end of the residual net there is only one fully connected layer followed by a previous average pool.

Residual Block

At it's core the residual net is formed by the following structure.
Basically this jump and adder creates a path for back-propagation, allowing even really deep models to be trained.
As mention before the Batch-Norm block alleviate the network initialization, but it can be omitted for not so deep models (less than 50 layers).
Again like googlenet we must use bottlenecks to avoid a parameter explosion.
Just to remember for the bottleneck to work the previous layer must have same depth.

Caffe Example

Here we show 2 cascaded residual blocks form residual net, due to difficulties with batch-norm layers, they were omitted but still residual net gives good results.