Using Pre-Trained Models Effectively

Home Contact

Main Menu

Home Contact

Topics

CI/CD MongoDB React Data Science Programming

Data Science

Using Pre-Trained Models Effectively

Prashant Brahmbhatt

November 16, 2020

4 min

If you have been using deep learning in your applications and problem solving for a while, specially with transfer learning, there are strong chances that at some point in time you may have used some pre-trained model to solve your specific problem. Be it a face-recognition, object recognition, or some image classification problem that you may have on your hands, you must have tried some existing architecture trained on some popular datasets.

The pre-trained models are very much useful because of the research that goes into developing their architecture and the high computation training on some humungous dataset that you may never be able to train a model on your machine. Sometimes, the abilities of a pre-trained model are overkill for the problem you are trying to solve but still, the architecture seems to be useful or better than something that you may have to come up with on your own.

Well, what can you do about it?

The answer is pretty simple, you only channel the power of the pre-trained model as per your own requirement, unlock only the potential of the model as much you require. If that is something that you are not aware of then stay tuned till the end!

per-trained-model-channel

Besides, there are some downsides to using a pre-trained model if you don’t regulate the capacity for your specific problem. We will discuss it later on.

Freezing and Unfreezing Model

This is the key, through the process of freezing and unfreezing a model, you can take control of the layers that you want to train and also the numbers of parameters that are linked to them.

We will be using Keras’s functional API for our demonstration here rather than sequential one (If you don’t have a clue what we’re talking about you may want to head here), so let’s take the example of ResNet for solving some hypothetical multi-class image classification problem (say 4 classes).

If you initialize a base model with the ResNet50 that is available in the Keras library, you can find that it has 175 layers and 23 million parameters and we may not want to use all of those parameters for our problem.

Let’s get imports

import tensorflow as tf
from tensorflow.keras.preprocessing.image import Iterator
from tensorflow.keras.utils import to_categorical
import tensorflow.keras.backend as K    
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten, GlobalAveragePooling2D, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import Input

Now we can define a base model.

base_model = ResNet50(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

Now, each Keras model has two attributes namely .trainable and .layers. We can use these attributes to check whether the complete model, or some specific layer is trainable (unfreeze) or not (freeze).

# to check if the entire model is trainable or not
print(base_model.trainable)
# to get all of the layers of the model
print("Total layers: ",len(base_model.layers))
print(base_model.layers)
# to check whether the first layer of the model is trainable or not
base_model.layers[0]

We can use these attributes with some sophistication to make it work better. Have a look at the code below.

class ResNet():
'''
This class creates the ResNet model.
'''
def __init__(self, input_shape, nb_classes):
    self.input_shape = input_shape
    self.nb_classes = nb_classes
def get_model(self, unfreeze_layers = None, lr_rate = 0.001):
    # load base model
    base_model = ResNet50(weights='imagenet', include_top=False,
    input_shape=self.input_shape)
    # freezing the layers
    for layer in (base_model.layers) if not unfreeze_layers else (base_model.layers[:-int(unfreeze_layers)]):
        layer.trainable = False
    inputs = Input(shape=self.input_shape)
    x = base_model(inputs, training=False)
    x = GlobalAveragePooling2D()(x)
    outputs = Dense(self.nb_classes, activation='softmax')(x)
    model = Model(inputs, outputs)
    # model compilation
    optimizer = Adam(learning_rate=lr_rate)
    model.compile(loss='categorical_crossentropy', optimizer= optimizer, metrics=['accuracy'])
    return model

The above class allows a parameter unfreeze_layers in the .get_model() that has been put together with the attributes in the snippet such that it allows us to define the numbers of layers that we want the model to train when we fit it with our data. The parameters in the rest of the layers would not be affected during the training at all.

# creating a resnet class object
  resnet = ResNet(INPUT_SHAPE, NB_CLASSES)
  # getting the resnet model
  model = resnet.get_model(unfreeze_layers= None)
  model.summary()

Now take a look at the summary.

Model: "model_3"
  _________________________________________________________________
  Layer (type)                 Output Shape              Param #   
  =================================================================
  input_8 (InputLayer)         [(None, 224, 224, 3)]     0         
  _________________________________________________________________
  resnet50 (Model)             (None, 7, 7, 2048)        23587712  
  _________________________________________________________________
  global_average_pooling2d_3 ( (None, 2048)              0         
  _________________________________________________________________
  dense_5 (Dense)              (None, 4)                 8196      
  =================================================================
  Total params: 23,595,908
  Trainable params: 8,196
  Non-trainable params: 23,587,712

We can clearly see that there are only 8,196 trainable parameters left in the model which are the ones that are relevant to the 4 neurons that are present in the output layer.

But now the question arises, would such a model with only the output layer as trainable work for any practical problem? In all likelihood! NO!

So what should we do? As you may have guessed one way is to start unfreezing some layers at a time and have some more parameters trained, but before that we can try something else. Rather than unlocking the capacity we can try adding a couple layers additionally after the base model. The reason behind that is we would want the base model to learn some completely new features that could be specific to our problem while if we unfreeze the layers it would be more like tuning those unfreezed layers of the trained model. So let’s do that.

class ResNet():
 '''
 This class creates the ResNet model.
 '''
     def __init__(self, input_shape, nb_classes):
         self.input_shape = input_shape
         self.nb_classes = nb_classes
     def get_model(self, unfreeze_layers = None, lr_rate = 0.001):
         # load base model
         base_model = ResNet50(weights='imagenet', include_top=False,
         input_shape=self.input_shape)
         # freezing the layers
         for layer in (base_model.layers) if not unfreeze_layers else (base_model.layers[:-int(unfreeze_layers)]):
             layer.trainable = False
         inputs = Input(shape=self.input_shape)
         x = base_model(inputs, training=False)
         x = GlobalAveragePooling2D()(x)
         # Adding some additional capacity
         x = Dropout(0.2)(x)
         x = Dense(256, activation='relu')(x)
         outputs = Dense(self.nb_classes, activation='softmax')(x)
         model = Model(inputs, outputs)
         # model compilation
         optimizer = Adam(learning_rate=lr_rate)
         model.compile(loss='categorical_crossentropy', optimizer= optimizer, metrics=['accuracy'])
         return model

Now let’s take a look at the summary again.

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
resnet50 (Model)             (None, 7, 7, 2048)        23587712  
_________________________________________________________________
global_average_pooling2d_1 ( (None, 2048)              0         
_________________________________________________________________
dropout (Dropout)            (None, 2048)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               524544    
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 1028      
=================================================================
Total params: 24,113,284
Trainable params: 525,572
Non-trainable params: 23,587,712

Now if we observe the parameters, they have increased from 8,196 to 525,572 because of the new layers that we have added. Now if we train, the base model would remain as it is but the added layers would be trained.

Now there could be a scenario where we may feel that our problem is still more complex and the current parameters are not sufficient to capture the required features, then we can try unfreezing some of the layers from the latter part of the base model.

In our function the unfreeze_layers parameter does just that, it would unfreeze the specified number of layers from the bottom.

# creating a resnet class object
resnet = ResNet(INPUT_SHAPE, NB_CLASSES)
# getting the resnet model
model = resnet.get_model(unfreeze_layers= 5)
model.summary()

So now we should have some of the layers unlocked for training and some additional trainable parameters.

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
resnet50 (Model)             (None, 7, 7, 2048)        23587712  
_________________________________________________________________
global_average_pooling2d_2 ( (None, 2048)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               524544    
_________________________________________________________________
dense_4 (Dense)              (None, 4)                 1028      
=================================================================
Total params: 24,113,284
Trainable params: 1,580,292
Non-trainable params: 22,532,992

So we have got the model that may work just as well as per the complexity of the problem that we are trying to solve.

While unfreezing the layers of the base model make sure that the layers you are unfreezing have some computational importance, like some Conv2D or Conv3D layer in CNN, if you end up like unfreezing 2 layers and those layers turn out to be some final activation or dropout layers then you may not be unlocking the capacity at all.

Limitations of Using Unregulated Pre-Trained Models

Now coming back to the point why would we want to use regulated pre-trained models and not the model with full capacity. There could be a couple of reasons not to do that.

As you may have guessed, that the model may get overfit on the data we have for the problem if the capacity is too much. It may not have enough to fill all the capacity.
If our dataset is not enough for the model of such large capacity then we are also wasting our computational resources.
One key point is that the pre-trained models are trained on large datasets, and if you are using the model for the purpose of transfer learning where you want to use the previous knowledge of the model, using the entire model unfrozen would mess up the trained weights and it may even most of the knowledge it had from the large dataset that could have been used for your problem if it is a similar one. Like if you are solving for a car classification problem you may not want your model to learn from scratch what are wheels or doors that it may have learned using trucks etc. and just want to tune it to suit better for cars.

So that’s it for this one! You can find all the snippets and code for this post here.

This is something that I learned while working on a recent image classification problem and I hope this helps you as well with your transfer learning.

Leave any suggestions or feedback if you like. Also if you are stuck somewhere, do reach out to me. Remember…

“Help will always be given, to those who ask for it!”

Until next time! Ciao!

Main Menu

Topics

Freezing and Unfreezing Model

Limitations of Using Unregulated Pre-Trained Models

Tags

Share

Related Posts