Show simple item record

hal.structure.identifierInria Saclay - Ile de France
dc.contributor.authorCvetkov-Iliev, Alexis
hal.structure.identifierLaboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
dc.contributor.authorAllauzen, Alexandre
HAL ID: 171266
hal.structure.identifierInria Saclay - Ile de France
dc.contributor.authorVaroquaux, Gaël
dc.date.accessioned2023-05-10T15:54:40Z
dc.date.available2023-05-10T15:54:40Z
dc.date.issued2023
dc.identifier.issn0885-6125
dc.identifier.urihttps://basepub.dauphine.psl.eu/handle/123456789/24724
dc.language.isoenen
dc.subjectFeature engineeringen
dc.subjectFeature enrichmenten
dc.subjectKnowledge graph embeddingen
dc.subject.ddc005.7en
dc.titleRelational Data Embeddings for Feature Enrichment with Background Informationen
dc.typeArticle accepté pour publication ou publié
dc.description.abstractenFor many machine-learning tasks, augmenting the data table at hand with features built from external sources is key to improving performance. For instance, estimating housing prices benefits from background information on the location, such as the population density or the average income. However, this information must often be assembled across many tables, requiring time and expertise from the data scientist. Instead, we propose to replace human-crafted features by vectorial representations of entities (e.g. cities) that capture the corresponding information. We represent the relational data on the entities as a graph and adapt graph-embedding methods to create feature vectors for each entity. We show that two technical ingredients are crucial: modeling well the different relationships between entities, and capturing numerical attributes. We adapt knowledge graph embedding methods that were primarily designed for graph completion. Yet, they model only discrete entities, while creating good feature vectors from relational data also requires capturing numerical attributes. For this, we introduce KEN: Knowledge Embedding with Numbers. We thoroughly evaluate approaches to enrich features with background information on 7 prediction tasks. We show that a good embedding model coupled with KEN can perform better than manually handcrafted features, while requiring much less human effort. It is also competitive with combinatorial feature engineering methods, but much more scalable. Our approach can be applied to huge databases, creating general-purpose feature vectors reusable in various downstream tasks.en
dc.relation.isversionofjnlnameMachine Learning
dc.relation.isversionofjnlvol112en
dc.relation.isversionofjnldate2023-01
dc.relation.isversionofjnlpages687-720en
dc.relation.isversionofdoi10.1007/s10994-022-06277-7en
dc.identifier.urlsitehttps://hal.archives-ouvertes.fr/hal-03848124en
dc.relation.isversionofjnlpublisherSpringeren
dc.subject.ddclabelOrganisation des donnéesen
dc.relation.forthcomingnonen
dc.description.ssrncandidatenon
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.relation.Isversionofjnlpeerreviewedouien
dc.date.updated2023-04-04T13:21:12Z
hal.export.arxivnonen
hal.export.pmcnonen
hal.hide.repecnonen
hal.hide.oainonen
hal.author.functionaut
hal.author.functionaut
hal.author.functionaut


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record