Show simple item record

hal.structure.identifierCEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
dc.contributor.authorDutang, Christophe
HAL ID: 9174
ORCID: 0000-0001-6732-1501
hal.structure.identifierCEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
dc.contributor.authorGuibert, Quentin
HAL ID: 3858
ORCID: 0000-0002-4915-2422
dc.date.accessioned2022-02-23T09:11:28Z
dc.date.available2022-02-23T09:11:28Z
dc.date.issued2022
dc.identifier.issn0960-3174
dc.identifier.urihttps://basepub.dauphine.psl.eu/handle/123456789/22720
dc.language.isoenen
dc.subjectGLMen
dc.subjectmodel-based recursive partitioningen
dc.subjectGLM treesen
dc.subjectrandom foresten
dc.subjectGLM foresten
dc.subject.ddc515en
dc.titleAn explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forestsen
dc.typeArticle accepté pour publication ou publié
dc.description.abstractenClassification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.en
dc.relation.isversionofjnlnameStatistics and Computing
dc.relation.isversionofjnlvol32en
dc.relation.isversionofjnldate2022
dc.relation.isversionofjnlpagesnuméro 6en
dc.relation.isversionofdoi10.1007/s11222-021-10059-xen
dc.relation.isversionofjnlpublisherSpringeren
dc.subject.ddclabelAnalyseen
dc.relation.forthcomingnonen
dc.description.ssrncandidatenon
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.relation.Isversionofjnlpeerreviewedouien
dc.date.updated2022-02-23T09:09:43Z
hal.author.functionaut
hal.author.functionaut


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record