SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System chapter W. Croft J. Callan J. Broglio National Institute of Standards and Technology D. K. Harman TREC-2 Routing and Ad-Hoe Retrieval Evaluation using the INQUERY System Bruce Croft, James Callan, and John Broglio Computer Science Department University of Massachusetts Amherst, MA. 01003 1 Project Goals The ARPA TIPSTER project, which is the source of the data and funding for TREC, has involved four sites in the area of text retrieval and routing. The TIPSTER project in the Information Retrieval Laboratory of the Computer Science Department, University of Massachusetts, Amherst (which includes MCC as a subcontractor), has focused on the following goals: * Improving the effectiveness of information retrieval techniques for large, full-text databases, * Improving the effectiveness of routing techniques appropriate for long-term informa- tion needs, and * Demonstrating the effectiveness of these retrieval and routing techniques for Japanese f':ill text databases [4]. Our general approach to achieving these goals has been to use improved representations of text and information needs in the framework of a new model of retrieval. This model uses Bayesian networks to describe how text and queries should be used to identify relevant documents [6, 3, 7]. Retrieval (and routing) is viewed as a probabilistic inference process which compares text representations based on different forms of linguistic and statistical evidence to representatioil$ of information needs based on siniilar evidence from natural language queries and user interaction. Learning techniques are used to modify the ini- tial queries both for short-term and long-term information needs (relevance feedback and routing, respectively). This approach (generally known as the inference net model and implemented in the INQUERY system) emphasizes retrieval based on combination of evidence. Different text representations (such as words, phrases, paragraphs, or manually assigned keywords) and different versions of the query (such as natural language and Boolean) can be combined in a consistent probabilistic framework. This type of [OCRerr]tdata fusion" has been known to be effective in the information retrieval context for a number of years, and was one of the primary motivations for developing the inference net approach. Another feature of the inference net approach is the ability to capture complex structure in the network representing the information need (i.e. the query). A practical consequence 75