Ranked document retrieval with forbidden pattern
Document Type
Conference Proceeding
Publication Date
1-1-2015
Abstract
Let D = {T1,T2,...,TD} be a collection of D string documents of n characters in total. The forbidden pattern document listing problem asks to report those documents D′ ⊆ D which contain the pattern P, but not the pattern Q. The top-k forbidden pattern query (P,Q, k) asks to report those k documents in D′ that are most relevant to P. For typical relevance functions (like document importance, termfrequency, term-proximity), we present a linear space index with worst case query time of O(|P|+|Q|+ √nk) for the top-k problem. As a corollary of this result, we obtain a linear space and O(|P|+|Q|+ √nt) query time solution for the document listing problem, where t is the number of documents reported. We conjecture that any significant improvement over the results in this paper is highly unlikely.
Publication Source (Journal or Book title)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
First Page
77
Last Page
88
Recommended Citation
Biswas, S., Ganguly, A., Shah, R., & Thankachan, S. (2015). Ranked document retrieval with forbidden pattern. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9133, 77-88. https://doi.org/10.1007/978-3-319-19929-0_7