As a deep learning practitioner, I have trained neural network models countless times, be it for work or just for personal projects. It seems like a simple process, at least the training part of it when you are settled on the data and the architecture that you are going to train with. But while working with a Sr. Data Scientist I learned a simple but very effective and intuitive method for neural network model training. This approach can also be found hiding in plain sight if you riff through the keras documentation ( original source), but I am just going to make it quick and save you some time.
If you don’t have an idea of freezing and unfreezing model architectures, you may want to head here, it’s relevant here and comes in handy. Anyway let’s get started.
When working with a NN model, we usually setup our dataset and run the model over it and right away, sure we all know. Although there is a very important point that you may want to keep in mind during the first phase. While training the model keep a high learning rate, well high enough to allow the model to learn quickly and as much information as it can without overfitting, if you don’t have an idea, using somthing like 1e-3 is a good number to start with then you can access how much to change further. The exception to this approach would be in the scenario where you have a pre-trained model in training with all of it’s layers unfreezed, you may want to avoid this point. When the training is over you would have a model that has learned a lot of the information and you’d think this is it. But there is still something that you could do. There comes in the Phase 2.
This phase is analogous to finding a channel on the radio, initially we just try to break the static, once we get some signal we move the tuning knob very slowly to get to the accurate frequency, which is why I guess this phase gets it’s name from. In the fine tuning we run the model again on the same dataset, but the key difference would be that now we would keep the learning phase lower than the training phase, for example, if you trained your model on lr = 1e-3, the fine tuning phase would could be effective with a lr = 1e-5 but why so? The logic behind that is once the model is trained it has got a good idea of the data and learned the macroscopic features from it. When we re-run the model on the same data the slow learning rate keeps the model from overfitting, so that the weights learned in the first phase change enough but aren’t messed up completely. It helps the model to learn the finer details in the data. This phase is specially necessary while using a pre-trained model architecture. While using something like Inception, ResNet, etc. we generally train only a few layers of the model to keep it from losing the existing knowledge while also learning the new features from our data and open up most or all of the layers during fine tuning to groom all the layers specific for our problem.
The parameters aren’t written in stone, using 1e-5 learning rate could still make your model overfit during fine tuning. You’d have to play with the parameters to find what works for you.
I train all of my deep learning architectures with this approach and it is really a sophisticated and intuitive way to train the models. I hope it helps you with your models as well.
Leave your feedback any questions that you may have. Until next time, Ciao!