Skip to content

gunchagarg/learning-rate-techniques-keras

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

learning-rate-techniques-keras

Exploring learning rates to improve model performance

Transfer Learning is a proven method to generate much better results in computer vision tasks. Most of the pretrained architectures (Resnet, VGG, inception, etc.) are trained on ImageNet and depending on the similarity of your data to the images on ImageNet, these weights will need to be altered more or less greatly.

Reference

https://towardsdatascience.com/exploring-learning-rates-to-improve-model-performance-in-keras-e37f5e63f16c

Differential Learning

The phrase 'Differential Learning' implies the use of different learning rates on different parts of the network.

alt text

When it comes to modifying weights, the last layers of the model will often need the most changing, while deeper levels that are already well trained to detecting basic features (such as edges and outlines) will need less.

Stochastic Gradient Descent with Restarts

It makes sense to reduce the learning rate as the training progresses, such that the algorithm does not overshoot and settles as close to the minimum as possible. With cosine annealing, we can decrease the learning rate following a cosine function.

SGDR is a recent variant of learning rate annealing that was introduced by Loshchilov & Hutter [5] in their paper "Sgdr: Stochastic gradient descent with restarts". In this technique, we increase the learning rate suddenly from time to time. Below is an example of resetting learning rate for three evenly spaced intervals with cosine annealing.

alt text

The rationale behind suddenly increasing the learning rate is that, on doing so, the gradient descent does not get stuck at any local minima and may "hop" out of it in its way towards a global minimum. Each time the learning rate drops to it's minimum point (every 100 iterations in the figure above), we call this a cycle. The authors also suggest making each next cycle longer than the previous one by some constant factor.

alt text

Functionality Test

dlr_implementation.py Modified source code of Adam optimizer to implement differential learning.

sgdr_implementation.py Implementation of SGDR using Keras callbacks.

test.py Trains ResNet50 model on CIFAR 10 dataset with the use of differential learning and SGDR.

About

Exploring learning rates to improve model performance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages