Show simple item record

hal.structure.identifier
dc.contributor.authorMorvan, Anne
hal.structure.identifier
dc.contributor.authorChoromanski, Krzysztof
hal.structure.identifier
dc.contributor.authorGouy-Pailler, Cedric
HAL ID: 6827
ORCID: 0000-0003-1298-7845
hal.structure.identifierLaboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
dc.contributor.authorAtif, Jamal
HAL ID: 15689
dc.date.accessioned2020-06-09T13:53:39Z
dc.date.available2020-06-09T13:53:39Z
dc.date.issued2018
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/20861
dc.language.isoenen
dc.subjectspace constraintsen
dc.subjectresources-limited mobile devicesen
dc.subjectDBMSTCluen
dc.subjectclustering partitionen
dc.subjectSpectral Clustering methoden
dc.subjectdata clusteren
dc.subject.ddc005en
dc.titleGraph sketching-based Space-efficient Data Clusteringen
dc.typeCommunication / Conférence
dc.description.abstractenIn this paper, we address the problem of recovering arbitrary-shaped data clusters from datasets while facing high space constraints, as this is for instance the case in many real-world applications when analysis algorithms are directly deployed on resources-limited mobile devices collecting the data. We present DBMSTClu a new space-efficient density-based non-parametric method working on a Minimum Spanning Tree (MST) recovered from a limited number of linear measurements i.e. a sketched version of the dissimilarity graph between the N objects to cluster. Unlike k-means, k-medians or k-medoids algorithms, it does not fail at distinguishing clusters with particular forms thanks to the property of the MST for expressing the underlying structure of a graph. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. An approximate MST is retrieved by following the dynamic semi-streaming model in handling the dissimilarity graph as a stream of edge weight updates which is sketched in one pass over the data into a compact structure requiring O(N polylog(N)) space, far better than the theoretical memory cost O(N2) of . The recovered approximate MST as input, DBMSTClu then successfully detects the right number of nonconvex clusters by performing relevant cuts on in a time linear in N. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing state-of-the-art on several datasets.en
dc.identifier.citationpages10-18en
dc.relation.ispartoftitleProceedings of the 2018 SIAM International Conference on Data Miningen
dc.relation.ispartofeditorEster, Martin
dc.relation.ispartofeditorPedreschi, Dino
dc.relation.ispartofpublnameSIAM - Society for Industrial and Applied Mathematicsen
dc.relation.ispartofpublcityPhiladelphiaen
dc.relation.ispartofpages764en
dc.relation.ispartofurl10.1137/1.9781611975321en
dc.subject.ddclabelProgrammation, logiciels, organisation des donnéesen
dc.relation.ispartofisbn978-1-61197-532-1en
dc.relation.conftitle2018 SIAM International Conference on Data Miningen
dc.relation.confdate2018-05
dc.relation.confcitySan Diegoen
dc.relation.confcountryUnited Statesen
dc.relation.forthcomingnonen
dc.identifier.doi10.1137/1.9781611975321.2en
dc.description.ssrncandidatenonen
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.relation.Isversionofjnlpeerreviewednonen
dc.relation.Isversionofjnlpeerreviewednonen
dc.date.updated2020-06-09T13:49:35Z
hal.author.functionaut
hal.author.functionaut
hal.author.functionaut
hal.author.functionaut


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record