Document Type
Article
Publication Date
8-16-2015
Abstract
We consider the problem of indexing a collection D of D strings (documents) of total n characters from an alphabet set of size σ, such that whenever a pattern P (of p characters) and an integer τ ∈ [1, D] come as a query, we can efficiently report all (i) maximal generic words and (ii) minimal discriminating words as defined below: •maximal generic word is a maximal extension of P occurring in at least τ documents. •minimal discriminating word is a minimal extension of P occurring in at most τ documents. These problems were introduced by Kucherov et al. (SPIRE) [8], they proposed indexes occupying O(nlogn) bits with query times O(p+ output) and O(p+ loglogn+ output) for Problem (i) and Problem (ii) respectively. The query time for Problem (ii) is later improved to optimal O(p+ output) by Gawrychowski et al. (SPIRE) [6]. In this paper, we describe succinct indexes of nlogσ + o(nlogσ) + O(n) bits space with near-optimal query times i.e., O(p+ loglogn+ output) for both these problems.
Publication Source (Journal or Book title)
Theoretical Computer Science
First Page
165
Last Page
173
Recommended Citation
Biswas, S., Patil, M., Shah, R., & Thankachan, S. (2015). Succinct indexes for reporting discriminating and generic words. Theoretical Computer Science, 593, 165-173. https://doi.org/10.1016/j.tcs.2015.06.007