SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-II Routing Experiments with the TRW/Paracel Fast Data Finder
chapter
M. Mettler
National Institute of Standards and Technology
D. K. Harman
TREC-Il Routing Experiments with the
TRW/Paracel Fast Data Finder
Matt Mettler
TRW Systems Development Division
Redondo Beach, CA
Fritz Nordby
Paracel Inc.
Pasadena, CA
1.0 Introduction
For TREC-Il, we were interested in experimenting with improved methods of constructing
queries for the Fast Data Finder (FDF) text search coprocessor. We learned from TREC-I
that while the pattern matching ability of the FDF can sometimes be put to significant
advantage (we had the high score on 8 of the 50 routing topics in TREC-I), this wasn't
sufficient overall to overcome the weaknesses traditionally associated with the boolean
approach to text retrieval. Many of the TREC topics are too abstract and ambiguous to
respond well to a boolean query formulation.
Our goal for this year therefore, was to apply the FDF hardware to a more statistical or soft
boolean retrieval approach while not giving up on our ability to make use of specific
features or patterns in the text when they are obviously important.
We experimented with two different schemes. In the first scheme, we utilized subquery
proximity to rank hit documents. We developed the subqueries manually, then determined
the optimum proximity values by test runs on the training data. The most effective values
were then used in the official routing queries. The second scheme was an FDF adaptation
of the traditional Information Retrieval (IR) term weighting approach. In addition to single
word terms, we also included two and three word phrases, and FDF subqueries designed to
detect special features in the text.
While in the terminology of TREC both are examples of manual query formulation with
feedback, we believe these techniques can be evolved to create quenes automatically from
samples of relevant text and to also incorporate user knowledge of specific text features of
interest when it exists. We also continue to believe that the utilization of a hardware
accelerator such as the Fast Data Finder, enables the implementafion of high performance
routing or dissemination applications at a far lower cost than can be achieved with
conventional general purpose processors.
201