
Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks
Hu, Kaitong; Ren, Zhenjie; Siska, David; Szpruch, Lukasz (2019-05), Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks. https://basepub.dauphine.fr/handle/123456789/19858
View/ Open
Type
Document de travail / Working paperDate
2019-05Publisher
Cahier de recherche CEREMADE, Université Paris-Dauphine
Series title
Cahier de recherche CEREMADE, Université Paris-DauphinePublished in
Paris
Pages
29
Metadata
Show full item recordAuthor(s)
Hu, KaitongCentre de Mathématiques Appliquées - Ecole Polytechnique [CMAP]
Ren, Zhenjie
CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Siska, David
School of Mathematics - University of Edinburgh
Szpruch, Lukasz
School of Mathematics - University of Edinburgh
Abstract (EN)
We present a probabilistic analysis of the long-time behaviour of the nonlocal, diffusive equations with a gradient flow structure in 2-Wasserstein metric, namely, the Mean-Field Langevin Dynamics (MFLD). Our work is motivated by a desire to provide a theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for non-convex learning tasks such as training of deep neural networks. The key insight is that the certain class of the finite dimensional non-convex problems becomes convex when lifted to infinite dimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a first order condition using the notion of linear functional derivative. Next, we show that the flow of marginal laws induced by the MFLD converges to the stationary distribution which is exactly the minimiser of the energy functional. We show that this convergence is exponential under conditions that are satisfied for highly regularised learning tasks. At the heart of our analysis is a pathwise perspective on Otto calculus used in gradient flow literature which is of independent interest. Our proof of convergence to stationary probability measure is novel and it relies on a generalisation of LaSalle's invariance principle. Importantly we do not assume that interaction potential of MFLD is of convolution type nor that has any particular symmetric structure. This is critical for applications. Finally, we show that the error between finite dimensional optimisation problem and its infinite dimensional limit is of order one over the number of parameters.Subjects / Keywords
Mean-Field Langevin Dynamics; Gradient Flow; Neural NetworksRelated items
Showing items related by title and author.
-
Hu, Kaitong; Kazeykina, Anna; Ren, Zhenjie (2019-09) Document de travail / Working paper
-
Conforti, Giovanni; Kazeykina, Anna; Ren, Zhenjie (2020) Document de travail / Working paper
-
Kazeykina, Anna; Ren, Zhenjie; Tan, Xiaolu; Yang, Junjian (2020) Document de travail / Working paper
-
Aïd, René; Dumitrescu, Roxana; Tankov, Peter (2021) Article accepté pour publication ou publié
-
Chenavaz, Régis; Paraschiv, Corina; Turinici, Gabriel (2021) Article accepté pour publication ou publié