SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Text Retrieval with the TRW Fast Data Finder
chapter
M. Mettler
National Institute of Standards and Technology
Donna K. Harman
Text Retrieval with the TRW Fast Data Finder
Matt Mettler
TRW Systems Development Division
One Space Park R212194
Redondo Beach, CA 90278
(310) 814-4925
mat[OCRerr]wilbur.coyote.trw.com
1.0
TRW has been building high performance text processing and retrieval systems for a
number of years. Most of these systems have involved the application of the TRW Fast
Data Finder (FDF) text search hardware and have been designed to meet the requirements
of specific government customers. Our goal for the TREC conference has been to consider
and experiment with the FDF as a tool for more general purpose information retrieval, and
to determine the FDF's strengths and weakness compared to conventional information
retrieval techniques.
Introduction
Our experience with the TREC conference has left us encouraged about the ability of a text
scanning approach to be competitive with the more involved information retrieval
techniques. The inherent limitations of the FDF hardware do not prevent competitive
precision and recall for general information retrieval applications when the user topics are
properly understood and the topic queries are properly tuned to the dataset.
2.0 FDF Text Retrieval Approach
The Fast Data Finder is a hardware device that performs high-speed pattern matching on a
stream of 8-bit data. It consists of an array of identical programmable text processing cells
connected in series to form a pipeline processor. The cells are implemented using a custom
VLSI chip designed and patented by TRW. In the latest implementation, each chip contains
24 processor cells and a typical system will have 3,600 of cells. Each cell can match a
single character of query or perform all or part of a logical operation. The processors are
interconnected with an 8-bit data path and approximately 20-bit control path. To perform
a search, a microcode program is first downloaded into the pipeline to direct each
processor. The database is then streamed through the pipeline. The data bytes clock
through each processor in t[OCRerr] until the whole database has passed through all processors.
As the data is clocking through, the processors alter the state of the control lines depending
on their program and the data stream values.
309