Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures
dc.contributor.author | Litwin, Witold | |
dc.contributor.author | Mokadem, Riad | |
dc.contributor.author | Rigaux, Philippe
HAL ID: 172888 ORCID: 0000-0002-9189-7292 | |
dc.contributor.author | Schwartz, Thomas | |
dc.date.accessioned | 2010-10-21T13:25:42Z | |
dc.date.available | 2010-10-21T13:25:42Z | |
dc.date.issued | 2007 | |
dc.identifier.uri | https://basepub.dauphine.fr/handle/123456789/4957 | |
dc.language.iso | en | en |
dc.subject | databases | en |
dc.subject | algebraic signature | en |
dc.subject | Search algorithm | en |
dc.subject.ddc | 005.7 | en |
dc.title | Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures | en |
dc.type | Communication / Conférence | |
dc.description.abstracten | We propose a novel string search algorithm for data stored once and read many times. Our search method combines the sublinear traversal of the record (as in Boyer Moore or Knuth-Morris-Pratt) with the agglomeration of parts of the record and search pattern into a single character – the algebraic signature – in the manner of Karp-Rabin. Our experiments show that our algorithm is up to seventy times faster for DNA data, up to eleven times faster for ASCII, and up to a six times faster for XML documents compared with an im- plementation of Boyer-Moore. To obtain this speed-up, we store records in encoded form, where each original character is replaced with an algebraic signature. Our method applies to records stored in databases in general and to distributed implementations of a Database As Service (DAS) in particular. Clients send records for insertion and search patterns already in encoded form and servers never operate on records in clear text. No one at a node can involuntarily discover the content of the stored data. | en |
dc.identifier.citationpages | 207-218 | en |
dc.relation.ispartoftitle | Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007 | en |
dc.relation.ispartofeditor | Koch, Christoph | |
dc.relation.ispartofeditor | Gehrke, Johannes | |
dc.relation.ispartofeditor | Garofalakis, Minos | |
dc.relation.ispartofeditor | Srivastava, Divesh | |
dc.relation.ispartofeditor | Aberer, Karl | |
dc.relation.ispartofeditor | Deshpande, Anand | |
dc.relation.ispartofeditor | Forescu, Daniela | |
dc.relation.ispartofeditor | Chan, Chee Yong | |
dc.relation.ispartofeditor | Ganti, Venkatesh | |
dc.relation.ispartofeditor | Kanne, Karl-Christian | |
dc.relation.ispartofeditor | Klas, Wolfgang | |
dc.relation.ispartofeditor | Neuhold, Erich J. | |
dc.relation.ispartofpublname | AMC | en |
dc.relation.ispartofdate | 2007 | |
dc.description.sponsorshipprivate | oui | en |
dc.subject.ddclabel | Organisation des données | en |
dc.relation.ispartofisbn | 978-1-59593-649-3 | en |
dc.relation.conftitle | 33rd International Conference on Very Large Data Bases VLDB 2007 | en |
dc.relation.confdate | 2007-09 | |
dc.relation.confcity | Vienne | en |
dc.relation.confcountry | Autriche | en |