Reuse-based Optimization for Pig Latin
Camacho-Rodríguez, Jesús; Colazzo, Dario; Herschel, Melanie; Manolescu, Ioana; Roy Chowdhury, Soudip (2014), Reuse-based Optimization for Pig Latin, BDA'2014: 30e journées Bases de Données Avancées, 2014-10, Grenoble-Autrans, France
Type
Communication / ConférenceExternal document link
https://hal.inria.fr/hal-01086497Date
2014Conference title
BDA'2014: 30e journées Bases de Données AvancéesConference date
2014-10Conference city
Grenoble-AutransConference country
FranceMetadata
Show full item recordAuthor(s)
Camacho-Rodríguez, JesúsInria Saclay - Ile de France
Colazzo, Dario
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Herschel, Melanie
Inria Saclay - Ile de France
Manolescu, Ioana

Inria Saclay - Ile de France
Roy Chowdhury, Soudip
Inria Saclay - Ile de France
Abstract (EN)
Pig Latin has become a popular language within the data management community interested in the efficient parallel processing of large data volumes. The dataflow-style primi-tives of Pig Latin provide an intuitive way for users to write complex analytical queries, which are in turn compiled into MapReduce jobs. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they occur, leading to avoidable MapReduce jobs. The current Pig Latin optimizer is not capable of recognizing, and thus optimizing, such repeated subexpressions. We present a novel approach for identifying and reusing common subexpressions occurring in Pig Latin scripts. In particular, we lay the foundation of our reuse-based algo-rithms by formalizing the semantics of the Pig Latin query language with extended nested relational algebra for bags. Our algorithm, named PigReuse, operates on the algebraic representations of Pig Latin scripts, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and merges other equivalent expressions to share its result. Our experimental results demonstrate the efficiency and effectiveness of our reuse-based algorithms and optimization strategies.Subjects / Keywords
experiments; PigLatin; reuse-based optimization; optimizationRelated items
Showing items related by title and author.
-
Camacho-Rodríguez, Jesús; Colazzo, Dario; Herschel, Melanie; Manolescu, Ioana; Chowdhury, Soudip Roy (2016) Communication / Conférence
-
Camacho-Rodríguez, Jesús; Colazzo, Dario; Manolescu, Ioana (2014) Communication / Conférence
-
Camacho-Rodríguez, Jesús; Colazzo, Dario; Manolescu, Ioana; Naranjo, Juan A. M. (2015) Communication / Conférence
-
Camacho-Rodríguez, Jesús; Colazzo, Dario; Manolescu, Ioana (2015) Article accepté pour publication ou publié
-
Camacho-Rodríguez, Jesús; Colazzo, Dario; Manolescu, Ioana (2014) Communication / Conférence