Document Type
Article
Publication Date
7-1-2011
Abstract
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlogkn)B) disk pages and finds all k-error matches with O((|P|+occ)B+logknloglogBn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+logk(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved.
Publication Source (Journal or Book title)
Theoretical Computer Science
First Page
3579
Last Page
3588
Recommended Citation
Hon, W., Lam, T., Shah, R., Tam, S., & Vitter, J. (2011). Cache-oblivious index for approximate string matching. Theoretical Computer Science, 412 (29), 3579-3588. https://doi.org/10.1016/j.tcs.2011.03.004