1 min readApr 16, 2020
One very important step is missing during training cycle. clearing the gradients by calling zero_grad
post backward()
call.
zero_grad
clears old gradients from the previous step (otherwise you’d just accumulate the gradients from all backward()
calls.)