ResNet and the Power of Shortcut Connections in Deep Learning

ResNet, which stands for Residual Networks, is a seminal architecture in the field of deep learning and computer vision. It was introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their 2015 paper "Deep Residual Learning for Image Recognition." The main innovation of ResNet is the introduction of "skip connections" or "shortcut connections," which allow the gradient to be directly backpropagated to earlier layers.

Introduction

In traditional deep neural networks, adding more layers should ideally lead to better performance. However, prior to ResNet, researchers often found that deeper networks were harder to train due to the problem of vanishing gradients. This is where the gradients of the loss function become very small as they are backpropagated through each layer, making the weights of the initial layers very hard to train.

ResNet was designed to solve this problem. Its architecture includes "shortcut connections" that skip one or more layers, as shown in the diagram below. Instead of trying to learn an underlying mapping from inputs to outputs directly, ResNet effectively learns the residual mapping, i.e., the difference or "residual" between the input and the output of a stack of layers. This makes it easier for the network to push the residual to zero, and this, in turn, ensures that the addition of more layers does not harm the network's performance.

Residual Block

A key component of a ResNet is the residual block. A residual block consists of several convolutional layers, followed by a ReLU activation function, and a shortcut connection that bypasses the convolutional layers.

The output of a residual block is calculated by adding the output of the convolutional layers to the input. This is then passed through a ReLU activation function. The use of the shortcut connection means that the convolutional layers only need to learn the residual function, i.e., the difference between the input and output, rather than the whole mapping.

Variants of ResNet

Several variants of ResNet have been proposed since its introduction, including ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The numbers refer to the total number of layers in the network. For instance, ResNet-50 has 50 layers including both convolutional and fully connected layers.

These variations differ primarily in their depth and in the number of layers in each block. In general, deeper networks have more capacity to learn complex patterns, but they are also more prone to overfitting and are more computationally expensive to train.

Conclusion

ResNet has had a significant impact on the field of deep learning and computer vision. By introducing the concept of residual learning, it has enabled the training of very deep neural networks, leading to improved performance on a range of tasks. Its architecture has been widely adopted and adapted in many subsequent models, cementing its place as a key innovation in the field.