Abide by the law and follow the flow: conservation laws for gradient flows
Sibylle Marcotte, Remi Gribonval, Gabriel Peyré
Think of neural network training as a dynamical system obeying the laws of classical mechanics. The loss function L is like a potential energy surface, and the NN weights W follow trajectories of steepest descent according to “laws of motion”, which are defined by a differential equation dW/dt = -k * dL/dW. The authors show that the NN weights obey conservation laws just like conservation of energy in classical mechanics.
For example, for a 1-dimensional, 2-layer ReLU network with two weights u and v, there is one conserved quantity h = u^2 – v^2. This implies that the initial choice of weights is important as the final state is constrained to keep h constant throughout training. This builds on previous work (Zhao 2022) which argues that these conservation laws induce an inductive bias towards “flat” minima of the loss function, which reduces overfitting and makes training more robust.
The paper contains a complicated procedure for computing the conserved quantities for more complicated NNs, but the slides have some nice pictures illustrating the 1-d example. I like it because it is a neat way to understand NN training using ideas from physics. It also suggests that bigger NNs with more parameters might work well.
Abide by the law and follow the flow: conservation laws for gradient flows