SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-II Routing Experiments with the TRW/Paracel Fast Data Finder chapter M. Mettler National Institute of Standards and Technology D. K. Harman TREC-Il Routing Experiments with the TRW/Paracel Fast Data Finder Matt Mettler TRW Systems Development Division Redondo Beach, CA Fritz Nordby Paracel Inc. Pasadena, CA 1.0 Introduction For TREC-Il, we were interested in experimenting with improved methods of constructing queries for the Fast Data Finder (FDF) text search coprocessor. We learned from TREC-I that while the pattern matching ability of the FDF can sometimes be put to significant advantage (we had the high score on 8 of the 50 routing topics in TREC-I), this wasn't sufficient overall to overcome the weaknesses traditionally associated with the boolean approach to text retrieval. Many of the TREC topics are too abstract and ambiguous to respond well to a boolean query formulation. Our goal for this year therefore, was to apply the FDF hardware to a more statistical or soft boolean retrieval approach while not giving up on our ability to make use of specific features or patterns in the text when they are obviously important. We experimented with two different schemes. In the first scheme, we utilized subquery proximity to rank hit documents. We developed the subqueries manually, then determined the optimum proximity values by test runs on the training data. The most effective values were then used in the official routing queries. The second scheme was an FDF adaptation of the traditional Information Retrieval (IR) term weighting approach. In addition to single word terms, we also included two and three word phrases, and FDF subqueries designed to detect special features in the text. While in the terminology of TREC both are examples of manual query formulation with feedback, we believe these techniques can be evolved to create quenes automatically from samples of relevant text and to also incorporate user knowledge of specific text features of interest when it exists. We also continue to believe that the utilization of a hardware accelerator such as the Fast Data Finder, enables the implementafion of high performance routing or dissemination applications at a far lower cost than can be achieved with conventional general purpose processors. 201