Faculty Publications

Compressed dictionary matching with one error

Wing Kai Hon, National Tsing Hua University
Tsung Han Ku, National Tsing Hua University
Rahul Shah, Louisiana State University
Sharma V. Thankachan, Louisiana State University
Jeffrey Scott Vitter, KU School of Engineering

Document Type

Conference Proceeding

Publication Date

5-12-2011

Abstract

Given a set D of d patterns of total length n, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(|T|+occ) time by the classical AC automaton (Aho and Corasick, 1975) where occ denotes the number of occurrences. The space requirement is O(n) words. In the \emph{approximate} dictionary matching problem with one error, we consider a substring of T[i.j] an occurrence of P whenever the edit distance between T[i.j] and P is at most one. For this problem, the best known indexes are by Cole et al. (2004), which requires O(n+ d\log+d) words of space and reports all occurrences in O(|T|\log{d}\log{\log{d}+occ) time, and by Ferragina et al. (1999), which requires O(n1+\epsilon}) words of space and reports all occurrences in O(|T|\log\log n + occ) time. Recently, there have been successes in compressing the dictionary matching index while keeping the query time optimal (Belazzougui, 2010, Hon et al., 2010). However, a compressed index for approximate dictionary matching problem is still open. In this paper, we propose the first such index which requires an optimal nHk+O(n)+o(n\log\ sigma)-bit index space, where H-k denotes the kth-order empirical entropy of D, and sigma is the size of alphabet set from which all the characters in D and T are drawn. The query time of our index is O(\σ |T|\log3{n}\ log{\logn+occ). © 2011 IEEE.

Publication Source (Journal or Book title)

Data Compression Conference Proceedings

First Page

113

Last Page

122

Recommended Citation

Hon, W., Ku, T., Shah, R., Thankachan, S., & Vitter, J. (2011). Compressed dictionary matching with one error. Data Compression Conference Proceedings, 113-122. https://doi.org/10.1109/DCC.2011.18

Download

COinS

Faculty Publications

Compressed dictionary matching with one error

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Compressed dictionary matching with one error

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY