Ranked document selection
Document Type
Conference Proceeding
Publication Date
1-1-2014
Abstract
Let D be a collection of string documents of n characters in total. The top-k document retrieval problem is to preprocess D into a data structure that, given a query (P,k), can return the k documents of D most relevant to pattern P. The relevance of a document d for a pattern P is given by a predefined ranking function w(P,d). Linear space and optimal query time solutions already exist for this problem. In this paper we consider a novel problem, document selection queries, which aim to report the kth document most relevant to P (instead of reporting all top-k documents). We present a data structure using O(n log ε n) space, for any constant ε > 0, answering selection queries in time O(log k / log log n), and a linear-space data structure answering queries in time O(log k), given the locus node of P in a (generalized) suffix tree of D. We also prove that it is unlikely that a succinct-space solution for this problem exists with poly-logarithmic query time. © 2014 Springer International Publishing.
Publication Source (Journal or Book title)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
First Page
344
Last Page
356
Recommended Citation
Munro, J., Navarro, G., Shah, R., & Thankachan, S. (2014). Ranked document selection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8503 LNCS, 344-356. https://doi.org/10.1007/978-3-319-08404-6_30