• xmlui.mirage2.page-structure.header.title
    • français
    • English
  • Help
  • Login
  • Language 
    • Français
    • English
View Item 
  •   BIRD Home
  • CEREMADE (UMR CNRS 7534)
  • CEREMADE : Publications
  • View Item
  •   BIRD Home
  • CEREMADE (UMR CNRS 7534)
  • CEREMADE : Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesTypeThis CollectionBy Issue DateAuthorsTitlesType

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors
Thumbnail

SCALPEL3: a scalable open-source library for healthcare claims databases

Bacry, Emmanuel; Gaiffas, Stéphane; Leroy, Fanny; Morel, Maryan; Nguyen, D.P.; Sebiat, Youcef; Sun, D. (2019), SCALPEL3: a scalable open-source library for healthcare claims databases. https://basepub.dauphine.fr/handle/123456789/20688

View/Open
1910.07045.pdf (1.383Mb)
Type
Document de travail / Working paper
External document link
https://arxiv.org/abs/1910.07045
Date
2019
Series title
Cahier de recherche CEREMADE, Université Paris Dauphine-PSL
Published in
Paris
Pages
14
Metadata
Show full item record
Author(s)
Bacry, Emmanuel
Gaiffas, Stéphane
Leroy, Fanny
Morel, Maryan
Nguyen, D.P.
Sebiat, Youcef
Sun, D.
Abstract (EN)
This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppin et al. (2017)), a huge healthcare claims database that handles the reimbursement of almost all French citizens.SCALPEL3 focuses on scalability, easy interactive analysis and helpers for data flow analysis to accelerate studies performed on LODs. It consists of three open-source libraries based on Apache Spark. SCALPEL-Flattening allows denormalization of the LOD (only SNDS for now) by joining tables sequentially in a big table. SCALPEL-Extraction provides fast concept extraction from a big table such as the one produced by SCALPEL-Flattening. Finally, SCALPEL-Analysis allows interactive cohort manipulations, monitoring statistics of cohort flows and building datasets to be used with machine learning libraries. The first two provide a Scala API while the last one provides a Python API that can be used in an interactive environment. Our code is available on GitHub.SCALPEL3 allowed to extract successfully complex concepts for studies such as Morel et al (2017) or studies with 14.5 million patients observed over three years (corresponding to more than 15 billion healthcare events and roughly 15 TeraBytes of data) in less than 49 minutes on a small 15 nodes HDFS cluster. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs.
Subjects / Keywords
Healthcare claims data; ETL; Large observational database; Concept extraction; Scalability; Reproducibility; Interactive data manipulation

Related items

Showing items related by title and author.

  • Thumbnail
    SCALPEL3 : a scalable open-source library for healthcare claims databases 
    Bacry, Emmanuel; Gaïffas, Stéphane; Leroy, Fanny; Morel, Maryan; Nguyen, Dinh-Phong; Sebiat, Youcef; Sun, Dian (2020) Article accepté pour publication ou publié
  • Thumbnail
    Screening anxiolytics, hypnotics, antidepressants and neuroleptics for bone fracture risk among elderly: a nation-wide dynamic multivariate self-control study using the SNDS claims database 
    Morel, Maryan; Bouyer, Benjamin; Guilloux, Agathe; LAANANI, Moussa; Leroy, Fanny; Nguyen, Dinh Phong; Sebiat, Youcef; Bacry, Emmanuel; Gaïffas, Stéphane (2021) Document de travail / Working paper
  • Thumbnail
    ConvSCCS: convolutional self-controlled case-seris model for lagged adverser event detection 
    Morel, Maryan; Bacry, Emmanuel; Gaïffas, Stéphane; Guilloux, Agathe; Leroy, Fanny (2019) Article accepté pour publication ou publié
  • Thumbnail
    ZiMM : a deep learning model for long term adverse events with non-clinical claims data 
    Kabeshova, Anastasiia; Yu, Yiyang; Lukacs, Bertrand; Bacry, Emmanuel; Gaïffas, Stéphane (2020) Article accepté pour publication ou publié
  • Thumbnail
    Dual Optimization for convex constrained objectives without the gradient-Lipschitz assumptions 
    Bompaire, Martin; Gaïffas, Stéphane; Bacry, Emmanuel (2018) Document de travail / Working paper
Dauphine PSL Bibliothèque logo
Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16
Phone: 01 44 05 40 94
Contact
Dauphine PSL logoEQUIS logoCreative Commons logo