SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Text Retrieval with the TRW Fast Data Finder chapter M. Mettler National Institute of Standards and Technology Donna K. Harman Text Retrieval with the TRW Fast Data Finder Matt Mettler TRW Systems Development Division One Space Park R212194 Redondo Beach, CA 90278 (310) 814-4925 mat[OCRerr]wilbur.coyote.trw.com 1.0 TRW has been building high performance text processing and retrieval systems for a number of years. Most of these systems have involved the application of the TRW Fast Data Finder (FDF) text search hardware and have been designed to meet the requirements of specific government customers. Our goal for the TREC conference has been to consider and experiment with the FDF as a tool for more general purpose information retrieval, and to determine the FDF's strengths and weakness compared to conventional information retrieval techniques. Introduction Our experience with the TREC conference has left us encouraged about the ability of a text scanning approach to be competitive with the more involved information retrieval techniques. The inherent limitations of the FDF hardware do not prevent competitive precision and recall for general information retrieval applications when the user topics are properly understood and the topic queries are properly tuned to the dataset. 2.0 FDF Text Retrieval Approach The Fast Data Finder is a hardware device that performs high-speed pattern matching on a stream of 8-bit data. It consists of an array of identical programmable text processing cells connected in series to form a pipeline processor. The cells are implemented using a custom VLSI chip designed and patented by TRW. In the latest implementation, each chip contains 24 processor cells and a typical system will have 3,600 of cells. Each cell can match a single character of query or perform all or part of a logical operation. The processors are interconnected with an 8-bit data path and approximately 20-bit control path. To perform a search, a microcode program is first downloaded into the pipeline to direct each processor. The database is then streamed through the pipeline. The data bytes clock through each processor in t[OCRerr] until the whole database has passed through all processors. As the data is clocking through, the processors alter the state of the control lines depending on their program and the data stream values. 309