SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Application of the Automatic Message Router to the TIPSTER Collection
chapter
R. Jones
S. Leung
D.L. Pape
National Institute of Standards and Technology
Donna K. Harman
The AMR Project
The AMR Project set out to develop viable techniques that operate in an electronic mail or wire
service environment. In these environments, the roles of query and document are reversed,
compared with document retrieval. Users have a relatively long term interest in a topic and wish
to receive documents that are relevant to that topic passed to them as soon as possible. However,
documents are of short term interest and lose their value rapidly with time, unless they have been
routed to someone with a specific interest in them. of course all documents may be routed to a
document retrieval system for more general historical access. AMR reflects this exchange of role
of query and document, by inverting the queries (referred to as filters) and passing the
documents one at a time against them.
AMR allows filters to be prepared in a structured form, where each filter term represents a set of
synonymous terms, or in a plain English statement of the information that is desired. The former
technique was used for all the filters used in the experiment. The filters are inverted into
memory for performance reasons.
AMR computes the relevance of each document by utilising a set of heuristics that take into
account the number of different terms in a filter, and their relative positioning at a paragraph
level. Each term is automatically weighted by estimating its effectiveness as a discriminator, i.e.
its ability to divide the universe of documents into two groups, those that are relevant and those
that are not. From the model of routing defined above a decision on the fate of a document must
be made immediately. Thus the universe of documents is not constant but changes as each
document is filtered. To handle this dynamic environment, AIDA keeps statistics of the
discriminating power of each terms, both with respect to the most recent documents seen and
the average over the life-time of the filter. In practice, the weights stabilise after some 40
documents that have some degree of relevance to the filter have been processed. Thereafter, the
weights are only changed by a group of documents that predominanfly discuss a few aspects of a
filter.
The TREC Experiments
The experiments conducted for TREC had four major objectives:
to obtain an objective evaluation of AMR on a large document collection;
to develop as many different filters as possible for each topic to see what sensitivity there
was in the AMR heuristic to widely differing filters;
to investigate AMR's performance and robustness;
to perform tuning and relevance normalisation on the AMR heuristics.
246