Show simple item record

hal.structure.identifier
dc.contributor.authorNicholls, Geoff K*
hal.structure.identifier
dc.contributor.authorRyder, Robin J.*
dc.date.accessioned2011-10-12T10:45:28Z
dc.date.available2011-10-12T10:45:28Z
dc.date.issued2011
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/7185
dc.descriptionLe fichier attaché ne contient pas le texte de cet article, mais des éléments complémentaires : This supplement to Ryder and Nicholls (2009) gives results for a second data set, by Dyen et al. (1997), as well as details of validations using synthetic data.en
dc.language.isoenen
dc.subjectBayesian inferenceen
dc.subjectDating methodsen
dc.subjectMarkov chain Monte Carlo methodsen
dc.subjectMissing dataen
dc.subjectPhylogeneticsen
dc.subjectProto-Indo-Europeanen
dc.subjectRate heterogeneityen
dc.subject.ddc519en
dc.titleMissing data in a stochastic Dollo model for binary trait data, and its application to the dating of Proto-Indo-Europeanen
dc.typeArticle accepté pour publication ou publié
dc.description.abstractenNicholls and Gray have described a phylogenetic model for trait data. They used their model to estimate branching times on Indo-European language trees from lexical data. Alekseyenko and co-workers extended the model and gave applications in genetics. We extend the inference to handle data missing at random. When trait data are gathered, traits are thinned in a way that depends on both the trait and the missing data content. Nicholls and Gray treated missing records as absent traits. Hittite has 12% missing trait records. Its age is poorly predicted in their cross-validation. Our prediction is consistent with the historical record. Nicholls and Gray dropped seven languages with too much missing data. We fit all 24 languages in the lexical data of Ringe and co-workers. To model spatiotemporal rate heterogeneity we add a catastrophe process to the model. When a language passes through a catastrophe, many traits change at the same time. We fit the full model in a Bayesian setting, via Markov chain Monte Carlo sampling. We validate our fit by using Bayes factors to test known age constraints. We reject three of 30 historically attested constraints. Our main result is a unimodal posterior distribution for the age of Proto-Indo-European centred at 8400 years before Present with 95% highest posterior density interval equal to 7100–9800 years before Present.en
dc.relation.isversionofjnlnameJournal of the Royal Statistical Society. Series C, Applied Statistics
dc.relation.isversionofjnlvol60en
dc.relation.isversionofjnlissue1en
dc.relation.isversionofjnldate2011
dc.relation.isversionofjnlpages71-92en
dc.relation.isversionofdoihttp://dx.doi.org/10.1111/j.1467-9876.2010.00743.xen
dc.description.sponsorshipprivateouien
dc.relation.isversionofjnlpublisherWileyen
dc.subject.ddclabelProbabilités et mathématiques appliquéesen
hal.author.functionaut
hal.author.functionaut


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record