Faster compressed top-k document retrieval
Document Type
Conference Proceeding
Publication Date
8-8-2013
Abstract
Let D = {d1, d2, ...dD} be a given collection of D string documents of total length n, our task is to index D, such that whenever a pattern P (of length p) and an integer k come as a query, those k documents in which P appears the most number of times can be listed efficiently. In this paper, we propose a compressed index taking 2|CSA| + Dlog n/D + O(D) + o(n) bits of space, which answers a query with O(tsa log k logε n) per document report time. This improves the O(t sa log k log+ε n) per document report time of the previously best-known index with (asymptotically) the same space requirements [Belazzougui and Navarro, SPIRE 2011]. Here, |CSA| represents the size (in bits) of the compressed suffix array (CSA) of the text obtained by concatenating all documents in D, and tsa is the time for decoding a suffix array value using the CSA. © 2013 IEEE.
Publication Source (Journal or Book title)
Data Compression Conference Proceedings
First Page
341
Last Page
350
Recommended Citation
Hon, W., Shah, R., Thankachan, S., & Vitter, J. (2013). Faster compressed top-k document retrieval. Data Compression Conference Proceedings, 341-350. https://doi.org/10.1109/DCC.2013.42