Document Type
Article
Publication Date
10-14-2013
Abstract
Property matching is a biologically motivated problem where the task is to find those occurrences of an online pattern P in a string text T (of size n), such that the matched text part satisfies some conceptual property. The property of a string is a set π of (possibly overlapping) intervals {( s1,f1),(s2,f2),.} corresponding to the part of text and an occurrence of a pattern P=T[i,.,(i+|P|-1)] is a valid output only if T[i,.,(i+|P|-1)] is completely contained in at least one interval (sj,fj)â̂̂π. The indexing version of this problem was introduced by A. Amir (2008), where the text is preprocessed in O(nlogσ+nloglogn) time and an O(nlogn) bits index, named Property Suffix Tree (PST) is maintained. PST can perform property matching in O(|P|logσ+occπ) time, where occπ is the number of occurrences of P in T satisfying the property. T. Kopelowitz (2010) considered the dynamic version of this problem where intervals can be added or deleted. However, all these indexes take space linear to the size of text (O(nlogn) bits), which can be much more than the size of the text (nlogσ bits). In this paper, we propose the first index for property matching occupying space close to the entropy compressed space requirement of the text. Our compressed index takes |CSA|+n(2+Ïμ+o(1)) bits space and performs query answering in O(t(|P|)+1Ïμ(1+occπ)tSA) time, where |CSA| is the size of compressed suffix array of T, t(|P|) be the time for searching a pattern of length |P| in CSA, tSA is the time for computing the suffix array value and Ïμ>0 is a constant. We also introduce a dynamic index, which takes |CSA|+O(n+|π|logn) bits space and performs query answering in O(t(|P|)+(1+occπ)logn( tSA+logn/loglogn)) time and can update (insert/delete) an interval (s,f) in O((f-s)(logn+tSA)) time. © 2013 Elsevier Inc.
Publication Source (Journal or Book title)
Information and Computation
First Page
10
Last Page
18
Recommended Citation
Hon, W., Patil, M., Shah, R., & Thankachan, S. (2013). Compressed property suffix trees. Information and Computation, 232, 10-18. https://doi.org/10.1016/j.ic.2013.09.001