Feature learning of neural network: Generalization ability and optimization

Speaker: Taiji Suzuki

Location: 60 Fifth Avenue, Room 650

Date: Monday, December 18, 2023

In this talk, I will discuss the feature learning ability of neural networks from statistical and optimization perspectives. In particular, I will present recent developments of theory of the mean-field Langevin dynamics (MFLD) and its application to neural network training. MFLD is a nonlinear generalization of the gradient Langevin dynamics (GLD) that minimizes an entropy regularized convex function defined on the space of probability distributions, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. In the first half, I will present the convergence result of MFLD and explain how the convergence of MFLD is connected to the duality gap through the log-Sobolev inequality of the so-called proximal Gibbs measure. In addition to that, the time-space discretization of MFLD will be addressed. It can be shown that the discretization error can be bounded uniformly in time unlike existing work. In the latter half, I will discuss the generalization error analysis of neural networks trained by MFLD. Addressing a binary classification problem, we have a general form of a test classification error bound that provides a fast learning rate based on a local Rademacher complexity analysis. By applying this general framework to the k-sparse parity problem, we demonstrate how the feature learning helps its sample complexity compared with the kernel methods. Finally, we also discuss how anisotropic structure of input will affect the sample complexity and computational complexity. If the data is well aligned to the target function, both sample and computational complexities are significantly mitigated.