Show simple item record

hal.structure.identifierCEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
dc.contributor.authorGenevay, Aude
hal.structure.identifierCEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
dc.contributor.authorPeyré, Gabriel
HAL ID: 1211
hal.structure.identifierCentre de Recherche en Économie et Statistique [CREST]
dc.contributor.authorCuturi, Marco
HAL ID: 3354
ORCID: 0000-0002-1934-0588
dc.date.accessioned2022-11-22T15:20:17Z
dc.date.available2022-11-22T15:20:17Z
dc.date.issued2018
dc.identifier.urihttps://basepub.dauphine.psl.eu/handle/123456789/23175
dc.language.isoenen
dc.subject.ddc4en
dc.titleLearning Generative Models with Sinkhorn Divergencesen
dc.typeCommunication / Conférence
dc.description.abstractenThe ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.en
dc.identifier.citationpages1608-1617en
dc.relation.ispartoftitleProceedings of Machine Learning Research, Volume 84: International Conference on Artificial Intelligence and Statisticsen
dc.relation.ispartofeditorAmos Storkey, Fernando Perez-Cruz
dc.relation.ispartofpublnameProceedings of Machine Learning Research (PMLR)en
dc.relation.ispartofdate2018
dc.subject.ddclabelInformatique généraleen
dc.relation.conftitleAISTATSen
dc.relation.confdate2018-04
dc.relation.confcityLanzaroteen
dc.relation.confcountrySpainen
dc.relation.forthcomingnonen
dc.description.ssrncandidatenon
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.relation.Isversionofjnlpeerreviewednonen
dc.date.updated2022-11-22T15:15:38Z
hal.author.functionaut
hal.author.functionaut
hal.author.functionaut


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record