Even though gradients remain < max_grad_norm throughout training, the gradient still goes through a scaling process. For instance, I set max_grad_norm = 1, and grad_norm consistently remains <= 0.33.
Ain't That A Shame: one of David Maxwell's stars for sale at Cheltenham at the end of the monthCredit: Patrick McCann Popular owner-rider David Maxwell announced his retirement from riding on doctor's ...