SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Application of the Automatic Message Router to the TIPSTER Collection chapter R. Jones S. Leung D.L. Pape National Institute of Standards and Technology Donna K. Harman The AMR Project The AMR Project set out to develop viable techniques that operate in an electronic mail or wire service environment. In these environments, the roles of query and document are reversed, compared with document retrieval. Users have a relatively long term interest in a topic and wish to receive documents that are relevant to that topic passed to them as soon as possible. However, documents are of short term interest and lose their value rapidly with time, unless they have been routed to someone with a specific interest in them. of course all documents may be routed to a document retrieval system for more general historical access. AMR reflects this exchange of role of query and document, by inverting the queries (referred to as filters) and passing the documents one at a time against them. AMR allows filters to be prepared in a structured form, where each filter term represents a set of synonymous terms, or in a plain English statement of the information that is desired. The former technique was used for all the filters used in the experiment. The filters are inverted into memory for performance reasons. AMR computes the relevance of each document by utilising a set of heuristics that take into account the number of different terms in a filter, and their relative positioning at a paragraph level. Each term is automatically weighted by estimating its effectiveness as a discriminator, i.e. its ability to divide the universe of documents into two groups, those that are relevant and those that are not. From the model of routing defined above a decision on the fate of a document must be made immediately. Thus the universe of documents is not constant but changes as each document is filtered. To handle this dynamic environment, AIDA keeps statistics of the discriminating power of each terms, both with respect to the most recent documents seen and the average over the life-time of the filter. In practice, the weights stabilise after some 40 documents that have some degree of relevance to the filter have been processed. Thereafter, the weights are only changed by a group of documents that predominanfly discuss a few aspects of a filter. The TREC Experiments The experiments conducted for TREC had four major objectives: to obtain an objective evaluation of AMR on a large document collection; to develop as many different filters as possible for each topic to see what sensitivity there was in the AMR heuristic to widely differing filters; to investigate AMR's performance and robustness; to perform tuning and relevance normalisation on the AMR heuristics. 246