In this preprint, we analyze properties stochastic gradient descent by recasting the problem in terms of interacting particle systems. This approach allows us to put a collection of technical tools to use that have been developed to study thermodynamic limits and the propagation of molecular chaos. In this setting, the parameters play the role of particle positions in some high dimensional space. The interaction potential, in turn, is determined by the structure of the neural network and the objective function.

We show that the objective function of the parameter optimization problem becomes convex in the limit when viewed in terms of a particle density. This allows us to demonstrate both a law of large numbers and a central limit theorem, yielding information about the scaling of error as the number of parameters becomes large.

You can find the details on the [arXiv].