Top-k term-proximity in succinct space

Document Type

Article

Publication Date

1-1-2014

Abstract

Let D = {T1, T2,..., TD} be a collection of D string documents of n characters in total, that are drawn from an alphabet set Σ = [σ]. The top-k document retrieval problem is to preprocess D into a data structure that, given a query (P[1..p], k), can return the k documents of D most relevant to pattern P. The relevance is captured using a predefined ranking function, which depends on the set of occurrences of P in Td. For example, it can be the term frequency (i.e., the number of occurrences of P in Td), or it can be the term proximity (i.e., the distance between the closest pair of occurrences of P in Td), or a patternindependent importance score of Td such as PageRank. Linear space and optimal query time solutions already exist for this problem. Compressed and compact space solutions are also known, but only for a few ranking functions such as term frequency and importance. However, space efficient data structures for term proximity based retrieval have been evasive. In this paper we present the first sub-linear space data structure for this relevance function, which uses only o(n) bits on top of any compressed suffix array of D and solves queries in time O((p+k) polylog n).

Publication Source (Journal or Book title)

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

First Page

169

Last Page

180

Recommended Citation

Munro, J., Navarro, G., Nielsen, J., Shah, R., & Thankachan, S. (2014). Top-k term-proximity in succinct space. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8889, 169-180. https://doi.org/10.1007/978-3-319-13075-0_14

Faculty Publications

Top-k term-proximity in succinct space

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Top-k term-proximity in succinct space

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY