Deep Learning- using ResNets for Transfer Learning

4 min readJul 3, 2019

In traditional supervised learning algorithms, we were always given a particular dataset and particular domain in which the data is labelled. We used this data to train a model and apply it on the same domain and measure whether the model performed well. For example, if we had a model to predict pedestrians during the day time, the same model would not predict pedestrians properly during night times. Though we are trying to solve the same problem, their domains are different and so one model cannot be used in the other domain though they are similar. For these types of problems, we use something called “Transfer Learning”

What is Transfer Learning?

Transfer learning is using a pre-trained network(pre-trained on a larger dataset) on your data. In the figure below, you can see that there was a model which was trained on a huge image dataset (ImageNet) which is used on a new data with new classes and weights updated.

The advantage of using Transfer Learning is

It gives faster progress
Can train using a smaller amount of data

When do you apply Transfer Learning?

When there is huge amount of similar data in another domain, but less data in current domain
Have enough data for training, but don’t have enough computational resources to train it.

How does Transfer Learning Work?

As we have discussed about Convolution Neural Network, we know that CNN’s extract important features from an image. So, the neural network has learned to detect an object or classify an image. This has been stored in the form of weights throughout the neural network. We can use these weights on any data to classify images. This is the whole idea behind transfer learning. When you have new data and you use a pre-trained network, you just have to update the weights and fine-tune to model to make it work.

Generally, the first layers in a CNN try to extract the generalized features from every image. Only in the last few layers, the model tries to distinguish between different classes. So when we apply transfer learning, we have 2 options

Freeze the weights and bias on initial few layers and train only the last few layers- dense and fully connected and their respective activation functions. In this case you don’t need to re-train the whole model again.
Re-train the whole network, initializing from the learned weights and bias. While we do this, we keep the learning rate very low so that we don’t deviate from the original weights drastically.

ImageNet:

ImageNet is a image database of 1.5 million training images and 1000 different classes. Training time would be approximately few days on best GPU’s. The ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. All the following CNN architectures have been trained using the ImageNet database.

Available Transfer Learning Architectures:

AlexNet- In 2012, there was a major breakthrough in the ILSVRC contest, when the top scorer reduced the error rate from 26% to 15.3%. It used 11x11, 5x5,3x3 convolutions, max pooling, dropout, data augmentation, ReLU activations and SGD with momentum for training the model.
VGGNet- In 2014, VGGNet were the runner’s up in the ILSVRC competition. It consisted of 16 convolution layers and only uses 3x3 convolutions. It has about 138 million training parameters which makes the model very complex. VGGNet is one of the widely used architectures for a variety of applications.
GoogleNet- In 2014, GoogleNet was the winner in the ILSVRC competition. It achieved an error rate of 6.7%. Their architecture consisted of a 22 layer CNN but reduced the number of parameters from 60 million (AlexNet) to about 4 million. This was a great break-through since it reduced the trainable parameters to 6%.
ResNet- In ILSVRC 2015, the Residual Neural Network (ResNet) used skip connections and heavy batch normalizations to come up with a 152 layer neural network which reduced the error rate to 3.57%, which beats human level performance for the dataset.

Comparison of error rates for different network architectures

Conclusion:

All the above pre-trained neural network architectures can be useful for a wide variety of applications, especially while classifying new image data. For example, the Google Landmark Recognition and Google Landmark Retrieval challenges on Kaggle need the use of transfer learning architectures. Hope this blog was useful to you. Let me know your comments or questions below.

Thanks for reading! :) You can also contact me on LinkedIn!