Posted on February 8, 2019

In this paper, we use the mean-field perspective on “wide” neural networks to motivate a birth-death PDE that accelerates the rate of decay of the loss for a fixed parameter distribution. We prove a number of theorems about this process—importantly, the birth-death dynamics ensures global convergence to energy minimizers. We also compute rates of convergence and show the efficacy of this approach on some simple examples.

You can find the details on the [arXiv].