Milind Deore
1 min readApr 16, 2020

--

One very important step is missing during training cycle. clearing the gradients by calling zero_grad post backward() call.

zero_grad clears old gradients from the previous step (otherwise you’d just accumulate the gradients from all backward() calls.)

--

--

Milind Deore
Milind Deore

No responses yet