Rate this Page
โ˜… โ˜… โ˜… โ˜… โ˜…

CosineAnnealingLR#

class torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0.0, last_epoch=-1)[source]#

Set the learning rate of each parameter group using a cosine annealing schedule.

The learning rate is updated recursively using:

ฮทt+1=ฮทminโก+(ฮทtโˆ’ฮทminโก)โ‹…1+cosโก((Tcur+1)ฯ€Tmax)1+cosโก(Tcurฯ€Tmax)\eta_{t+1} = \eta_{\min} + (\eta_t - \eta_{\min}) \cdot \frac{1 + \cos\left(\frac{(T_{cur}+1) \pi}{T_{max}}\right)} {1 + \cos\left(\frac{T_{cur} \pi}{T_{max}}\right)}

This implements a recursive approximation of the closed-form schedule proposed in SGDR: Stochastic Gradient Descent with Warm Restarts:

ฮทt=ฮทminโก+12(ฮทmaxโกโˆ’ฮทminโก)(1+cosโก(Tcurฯ€Tmax))\eta_t = \eta_{\min} + \frac{1}{2}(\eta_{\max} - \eta_{\min}) \left( 1 + \cos\left(\frac{T_{cur} \pi}{T_{max}}\right) \right)

where:

  • ฮทt\eta_t is the learning rate at step tt

  • TcurT_{cur} is the number of epochs since the last restart

  • TmaxT_{max} is the maximum number of epochs in a cycle

Note

Although SGDR includes periodic restarts, this implementation performs cosine annealing without restarts, so Tcur=tT_{cur} = t and increases monotonically with each call to step().

Parameters
  • optimizer (Optimizer) โ€“ Wrapped optimizer.

  • T_max (int) โ€“ Maximum number of iterations.

  • eta_min (float) โ€“ Minimum learning rate. Default: 0.

  • last_epoch (int) โ€“ The index of the last epoch. Default: -1.

Example

>>> num_epochs = 100
>>> scheduler = CosineAnnealingLR(optimizer, T_max=num_epochs)
>>> for epoch in range(num_epochs):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()
../_images/CosineAnnealingLR.png
get_last_lr()[source]#

Return last computed learning rate by current scheduler.

Return type

list[float]

get_lr()[source]#

Retrieve the learning rate of each parameter group.

Return type

list[float]

load_state_dict(state_dict)[source]#

Load the schedulerโ€™s state.

Parameters

state_dict (dict) โ€“ scheduler state. Should be an object returned from a call to state_dict().

state_dict()[source]#

Return the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

Return type

dict[str, Any]

step(epoch=None)[source]#

Perform a step.