SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System
chapter
W. Croft
J. Callan
J. Broglio
National Institute of Standards and Technology
D. K. Harman
TREC-2 Routing and Ad-Hoe Retrieval Evaluation using
the INQUERY System
Bruce Croft, James Callan, and John Broglio
Computer Science Department
University of Massachusetts
Amherst, MA. 01003
1 Project Goals
The ARPA TIPSTER project, which is the source of the data and funding for TREC,
has involved four sites in the area of text retrieval and routing. The TIPSTER project
in the Information Retrieval Laboratory of the Computer Science Department, University
of Massachusetts, Amherst (which includes MCC as a subcontractor), has focused on the
following goals:
* Improving the effectiveness of information retrieval techniques for large, full-text
databases,
* Improving the effectiveness of routing techniques appropriate for long-term informa-
tion needs, and
* Demonstrating the effectiveness of these retrieval and routing techniques for Japanese
f':ill text databases [4].
Our general approach to achieving these goals has been to use improved representations
of text and information needs in the framework of a new model of retrieval. This model
uses Bayesian networks to describe how text and queries should be used to identify relevant
documents [6, 3, 7]. Retrieval (and routing) is viewed as a probabilistic inference process
which compares text representations based on different forms of linguistic and statistical
evidence to representatioil$ of information needs based on siniilar evidence from natural
language queries and user interaction. Learning techniques are used to modify the ini-
tial queries both for short-term and long-term information needs (relevance feedback and
routing, respectively).
This approach (generally known as the inference net model and implemented in the
INQUERY system) emphasizes retrieval based on combination of evidence. Different text
representations (such as words, phrases, paragraphs, or manually assigned keywords) and
different versions of the query (such as natural language and Boolean) can be combined
in a consistent probabilistic framework. This type of [OCRerr]tdata fusion" has been known to be
effective in the information retrieval context for a number of years, and was one of the
primary motivations for developing the inference net approach.
Another feature of the inference net approach is the ability to capture complex structure
in the network representing the information need (i.e. the query). A practical consequence
75