• xmlui.mirage2.page-structure.header.title
    • français
    • English
  • Help
  • Login
  • Language 
    • Français
    • English
View Item 
  •   BIRD Home
  • LAMSADE (UMR CNRS 7243)
  • LAMSADE : Publications
  • View Item
  •   BIRD Home
  • LAMSADE (UMR CNRS 7243)
  • LAMSADE : Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesTypeThis CollectionBy Issue DateAuthorsTitlesType

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors
Thumbnail

Schema Inference for Massive JSON Datasets

Ben Lahmar, Houssem; Colazzo, Dario; Ghelli, Giorgio; Sartiani, Carlo (2017), Schema Inference for Massive JSON Datasets, 20th International Conference on Extending Database Technology (EDBT 2017), 2017-03, Venise, Italy

View/Open
paper-62.pdf (489.5Kb)
Type
Communication / Conférence
Date
2017
Conference title
20th International Conference on Extending Database Technology (EDBT 2017)
Conference date
2017-03
Conference city
Venise
Conference country
Italy
Publication identifier
10.5441/002/edbt.2017.21
Metadata
Show full item record
Author(s)
Ben Lahmar, Houssem

Colazzo, Dario
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Ghelli, Giorgio

Sartiani, Carlo
Abstract (EN)
Recent years have seen the widespread use of JSON as a data format to represent massive data collections. JSON data collections are usually schemaless. While this ensures several advantages, the absence of schema information has important negative consequences: the correctness of complex queries and programs cannot be statically checked, users cannot rely on schema information to quickly figure out structural properties that could speed up the formulation of correct queries, and many schema-based optimizations are not possible. In this paper we deal with the problem of inferring a schema from massive JSON data sets. We first identify a JSON type language which is simple and, at the same time, expressive enough to capture irregularities and to give complete structural information about input data. We then present our main contribution, which is the design of a schema inference algorithm, its theoretical study and its implementation based on Spark, enabling reasonable schema inference time for massive collections. Finally, we report about an experimental analysis showing the effectiveness of our approach in terms of execution time, precision and conciseness of inferred schemas, and scalability.
Subjects / Keywords
JSON; schema inference

Related items

Showing items related by title and author.

  • Thumbnail
    Parametric schema inference for massive JSON datasets 
    Baazizi, Mohamed-Amine; Colazzo, Dario; Ghelli, Giorgio; Sartiani, Carlo (2019) Article accepté pour publication ou publié
  • Thumbnail
    Human-in-the-Loop Schema Inference for Massive JSON Datasets 
    Baazizi, Mohamed-Amine; Berti, Clément; Colazzo, Dario; Ghelli, Giorgio; Sartiani, Carlo (2020) Communication / Conférence
  • Thumbnail
    Inférence de Schémas pour Données JSON Massives 
    Baazizi, Mohamed-Amine; Ben Lahmar, Houssem; Colazzo, Dario; Ghelli, Giorgio; Sartiani, Carlo (2016) Communication / Conférence
  • Thumbnail
    A Type System for Interactive JSON Schema Inference (Extended Abstract) 
    Baazizi, Mohamed-Amine; Colazzo, Dario; Ghelli, Giorgio; Sartiani, Carlo (2019) Communication / Conférence
  • Thumbnail
    Schemas And Types For JSON Data 
    Baazizi, Mohamed-Amine; Colazzo, Dario; Ghelli, Giorgio; Sartiani, Carlo (2019) Communication / Conférence
Dauphine PSL Bibliothèque logo
Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16
Phone: 01 44 05 40 94
Contact
Dauphine PSL logoEQUIS logoCreative Commons logo