Ranked document retrieval with forbidden pattern

Document Type

Conference Proceeding

Publication Date

1-1-2015

Abstract

Let D = {T1,T2,...,TD} be a collection of D string documents of n characters in total. The forbidden pattern document listing problem asks to report those documents D′ ⊆ D which contain the pattern P, but not the pattern Q. The top-k forbidden pattern query (P,Q, k) asks to report those k documents in D′ that are most relevant to P. For typical relevance functions (like document importance, termfrequency, term-proximity), we present a linear space index with worst case query time of O(|P|+|Q|+ √nk) for the top-k problem. As a corollary of this result, we obtain a linear space and O(|P|+|Q|+ √nt) query time solution for the document listing problem, where t is the number of documents reported. We conjecture that any significant improvement over the results in this paper is highly unlikely.

Publication Source (Journal or Book title)

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

First Page

77

Last Page

88

This document is currently not available here.

Share

COinS