Reuse-based Optimization for Pig Latin
Camacho-Rodríguez, Jesús; Colazzo, Dario; Herschel, Melanie; Manolescu, Ioana; Chowdhury, Soudip Roy (2016), Reuse-based Optimization for Pig Latin, in Mukhopadhyay, Snehasis; Zhai, ChengXiang, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM'16), ACM Press : New York, p. 2215-2220. 10.1145/2983323.2983669
TypeCommunication / Conférence
Conference title25th ACM International on Conference on Information and Knowledge Management (CIKM'16)
Conference countryUnited States
Book titleProceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM'16)
Book authorMukhopadhyay, Snehasis; Zhai, ChengXiang
Number of pages2512
MetadataShow full item record
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Institut für Parallele und Verteilte Systeme [IPVS]
Laboratoire d'informatique de l'École polytechnique [Palaiseau] [LIX]
Chowdhury, Soudip Roy
Abstract (EN)Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.
Subjects / KeywordsMapReduce; Big Data; Pig Latin; Reuse-based Optimization; Linear Programming
Showing items related by title and author.