Mixture of von Mises-Fisher distribution with sparse prototypes
Rossi, Fabrice; Barbaro, Florian (2022), Mixture of von Mises-Fisher distribution with sparse prototypes, Neurocomputing, 501, p. 41-74. 10.1016/j.neucom.2022.05.118
View/ Open
Type
Article accepté pour publication ou publiéDate
2022Journal name
NeurocomputingVolume
501Publisher
Elsevier
Pages
41-74
Publication identifier
Metadata
Show full item recordAuthor(s)
Rossi, FabriceCEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Barbaro, Florian
Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) [SAMM]
Abstract (EN)
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.Subjects / Keywords
Clustering; MixturesVon Mises-Fisher; Expectation maximization; High dimensional data; Path following strategy; Model selectionRelated items
Showing items related by title and author.
-
Barbaro, Florian; Rossi, Fabrice (2021) Communication / Conférence
-
Barbaro, Florian; Rossi, Fabrice (2021) Communication / Conférence
-
Mengersen, Kerrie; Rousseau, Judith; McVinish, Ross (2009) Article accepté pour publication ou publié
-
Rousseau, Judith; Mengersen, Kerrie; McVinish, Ross (2005) Document de travail / Working paper
-
Bouin, Emeric; Garnier, Jimmy; Henderson, Christopher; Patout, Florian (2018) Article accepté pour publication ou publié