Compressed text indexing with wildcards

Document Type

Conference Proceeding

Publication Date

10-17-2011

Abstract

Let T = T1φk1T2φ k2⋯φkdTd+1 be a text of total length n, where characters of each Ti are chosen from an alphabet Σ of size σ, and φ denotes a wildcard symbol. The text indexing with wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nHh + o(n log σ) + O(d log n) bits space, where H h is the hth-order empirical entropy (h = o(logσ n)) of T. © 2011 Springer-Verlag.

Publication Source (Journal or Book title)

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

First Page

267

Last Page

277

This document is currently not available here.

Share

COinS