Identifier
etd-06212017-142105
Degree
Doctor of Philosophy (PhD)
Department
Computer Science
Document Type
Dissertation
Abstract
Let T be a fixed text-string of length n and P be a varying pattern-string of length |P| <= n. Both T and P contain characters from a totally ordered alphabet Sigma of size sigma <= n. Suffix tree is the ubiquitous data structure for answering a pattern matching query: report all the positions i in T such that T[i + k - 1] = P[k], 1 <= k <= |P|. Compressed data structures support pattern matching queries, using much lesser space than the suffix tree, mainly by relying on a crucial property of the leaves in the tree. Unfortunately, in many suffix tree variants (such as parameterized suffix tree, order-preserving suffix tree, and 2-dimensional suffix tree), this property does not hold. Consequently, compressed representations of these suffix tree variants have been elusive. We present the first compressed data structures for two important variants of the pattern matching problem: (1) Parameterized Matching -- report a position i in T if T[i + k - 1] = f(P[k]), 1 <= k <= |P|, for a one-to-one function f that renames the characters in P to the characters in T[i,i+|P|-1], and (2) Order-preserving Matching -- report a position i in T if T[i + j - 1] and T[i + k -1] have the same relative order as that of P[j] and P[k], 1 <= j < k <= |P|. For each of these two problems, the existing suffix tree variant requires O(n*log n) bits of space and answers a query in O(|P|*log sigma + occ) time, where occ is the number of starting positions where a match exists. We present data structures that require O(n*log sigma) bits of space and answer a query in O((|P|+occ) poly(log n)) time. As a byproduct, we obtain compressed data structures for a few other variants, as well as introduce two new techniques (of independent interest) for designing compressed data structures for pattern matching.
Date
2017
Document Availability at the Time of Submission
Release the entire work immediately for access worldwide.
Recommended Citation
Ganguly, Arnab, "Succinct Data Structures for Parameterized Pattern Matching and Related Problems" (2017). LSU Doctoral Dissertations. 4370.
https://repository.lsu.edu/gradschool_dissertations/4370
Committee Chair
Shah, Rahul
DOI
10.31390/gradschool_dissertations.4370