HomeContact
Data Science
Using Pre-Trained Models Effectively
Prashant Brahmbhatt
November 16, 2020
4 min

If you have been using deep learning in your applications and problem solving for a while, specially with transfer learning, there are strong chances that at some point in time you may have used some pre-trained model to solve your specific problem. Be it a face-recognition, object recognition, or some image classification problem that you may have on your hands, you must have tried some existing architecture trained on some popular datasets.

The pre-trained models are very much useful because of the research that goes into developing their architecture and the high computation training on some humungous dataset that you may never be able to train a model on your machine. Sometimes, the abilities of a pre-trained model are overkill for the problem you are trying to solve but still, the architecture seems to be useful or better than something that you may have to come up with on your own.

Well, what can you do about it?

The answer is pretty simple, you only channel the power of the pre-trained model as per your own requirement, unlock only the potential of the model as much you require. If that is something that you are not aware of then stay tuned till the end!

per-trained-model-channel

Besides, there are some downsides to using a pre-trained model if you don’t regulate the capacity for your specific problem. We will discuss it later on.

Freezing and Unfreezing Model

This is the key, through the process of freezing and unfreezing a model, you can take control of the layers that you want to train and also the numbers of parameters that are linked to them.

We will be using Keras’s functional API for our demonstration here rather than sequential one (If you don’t have a clue what we’re talking about you may want to head here), so let’s take the example of ResNet for solving some hypothetical multi-class image classification problem (say 4 classes).

If you initialize a base model with the ResNet50 that is available in the Keras library, you can find that it has 175 layers and 23 million parameters and we may not want to use all of those parameters for our problem.

Let’s get imports

Now we can define a base model.

Now, each Keras model has two attributes namely .trainable and .layers. We can use these attributes to check whether the complete model, or some specific layer is trainable (unfreeze) or not (freeze).

We can use these attributes with some sophistication to make it work better. Have a look at the code below.

The above class allows a parameter unfreeze_layers in the .get_model() that has been put together with the attributes in the snippet such that it allows us to define the numbers of layers that we want the model to train when we fit it with our data. The parameters in the rest of the layers would not be affected during the training at all.

Now take a look at the summary.

We can clearly see that there are only 8,196 trainable parameters left in the model which are the ones that are relevant to the 4 neurons that are present in the output layer.

But now the question arises, would such a model with only the output layer as trainable work for any practical problem? In all likelihood! NO!

So what should we do? As you may have guessed one way is to start unfreezing some layers at a time and have some more parameters trained, but before that we can try something else. Rather than unlocking the capacity we can try adding a couple layers additionally after the base model. The reason behind that is we would want the base model to learn some completely new features that could be specific to our problem while if we unfreeze the layers it would be more like tuning those unfreezed layers of the trained model. So let’s do that.

Now let’s take a look at the summary again.

Now if we observe the parameters, they have increased from 8,196 to 525,572 because of the new layers that we have added. Now if we train, the base model would remain as it is but the added layers would be trained.

Now there could be a scenario where we may feel that our problem is still more complex and the current parameters are not sufficient to capture the required features, then we can try unfreezing some of the layers from the latter part of the base model.

In our function the unfreeze_layers parameter does just that, it would unfreeze the specified number of layers from the bottom.

So now we should have some of the layers unlocked for training and some additional trainable parameters.

So we have got the model that may work just as well as per the complexity of the problem that we are trying to solve.

While unfreezing the layers of the base model make sure that the layers you are unfreezing have some computational importance, like some Conv2D or Conv3D layer in CNN, if you end up like unfreezing 2 layers and those layers turn out to be some final activation or dropout layers then you may not be unlocking the capacity at all.

Limitations of Using Unregulated Pre-Trained Models

Now coming back to the point why would we want to use regulated pre-trained models and not the model with full capacity. There could be a couple of reasons not to do that.

  • As you may have guessed, that the model may get overfit on the data we have for the problem if the capacity is too much. It may not have enough to fill all the capacity.

  • If our dataset is not enough for the model of such large capacity then we are also wasting our computational resources.

  • One key point is that the pre-trained models are trained on large datasets, and if you are using the model for the purpose of transfer learning where you want to use the previous knowledge of the model, using the entire model unfrozen would mess up the trained weights and it may even most of the knowledge it had from the large dataset that could have been used for your problem if it is a similar one. Like if you are solving for a car classification problem you may not want your model to learn from scratch what are wheels or doors that it may have learned using trucks etc. and just want to tune it to suit better for cars.

So that’s it for this one! You can find all the snippets and code for this post here.

This is something that I learned while working on a recent image classification problem and I hope this helps you as well with your transfer learning.

Leave any suggestions or feedback if you like. Also if you are stuck somewhere, do reach out to me. Remember…

“Help will always be given, to those who ask for it!”

Until next time! Ciao!


Tags

Deep Learning Neural NetworksData ScienceTransfer Learning

Related Posts

Approaching a Sales Forecast Problem (Part-1)
September 12, 2020
8 min
© 2021, All Rights Reserved.

Quick Links

Advertise with usContact Us

Social Media