Faculty Publications

Ranked Document Retrieval in External Memory

Rahul Shah, LSU College of Engineering
Cheng Sheng, Chinese University of Hong Kong
Sharma Thankachan, NC State College of Engineering
Jeffrey Vitter, School of Engineering

Document Type

Article

Publication Date

3-9-2023

Abstract

The ranked (or top-k) document retrieval problem is defined as follows: preprocess a collection {T1,T2,... ,Td} of d strings (called documents) of total length n into a data structure, such that for any given query (P,k), where P is a string (called pattern) of length p ≥ 1 and k ϵ [1,d] is an integer, the identifiers of those k documents that are most relevant to P can be reported, ideally in the sorted order of their relevance. The seminal work by Hon et al. [FOCS 2009 and Journal of the ACM 2014] presented an O(n)-space (in words) data structure with O(p+k log k) query time. The query time was later improved to O(p+k) [SODA 2012] and further to O(p/ log σn+k) [SIAM Journal on Computing 2017] by Navarro and Nekrich, where σ is the alphabet size. We revisit this problem in the external memory model and present three data structures. The first one takes O(n)-space and answer queries in O(p/B + log B n + k/B+ log ∗ (n/B)) I/Os, where B is the block size. The second one takes O(n log ∗ (n/B)) space and answer queries in optimal O(p/B + log B n + k/B) I/Os. In both cases, the answers are reported in the unsorted order of relevance. To handle sorted top-k document retrieval, we present an O(n log (d/B)) space data structure with optimal query cost.

Publication Source (Journal or Book title)

ACM Transactions on Algorithms

Recommended Citation

Shah, R., Sheng, C., Thankachan, S., & Vitter, J. (2023). Ranked Document Retrieval in External Memory. ACM Transactions on Algorithms, 19 (1) https://doi.org/10.1145/3559763

Download

COinS

Faculty Publications

Ranked Document Retrieval in External Memory

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Ranked Document Retrieval in External Memory

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY