SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-II Routing Experiments with the TRW/Paracel Fast Data Finder chapter M. Mettler National Institute of Standards and Technology D. K. Harman 2.0 The FDF Text Retrieval Approach The Fast Data Finder is a hardware device that performs high-speed pattern matching on a stream of 8-bit data. It consists of an array of identical programmable text processing cells connected in series to form a pipeline processor. The cells are implemented using a custom VLSI chip designed and patented by TRW. In the latest implementation, each chip contains 24 processor cells and a typical system will have 3,600 cells. Each cell can match a single character of query or perform all or part of a logical operation. The processors are interconnected with an 8-bit data path and approximately 20-bit control path. To perform a search, a microcode program is first downloaded into the pipeline to direct each processor. The database is then streamed through the pipeline. The data bytes clock through each processor in turn until the whole database has passed through all processors. As the data is clocking through, the processors alter the state of the control lines depending on their program and the data stream values. When the pipeline's processor cells detect that a series of database characters match the desired pattern, a hit is indicated and passed by external circuitry back to the memory of the host processor and to the user. The FDF pipeline runs at a constant speed as it performs character comparisons and logical operations, regardless of query complexity. The queries or patterns are specified in the FDF's Pattern Specification Language (PSL). The hardware directly supports all the features in the PSL query language without the need for software post-processing. The processors in the pipeline may all be used to evaluate a single large query or may be assigned to evaluate numerous smaller queries. The number of pipeline cells a query needs is proportional to the size of the query. PSL provides numerous search functions, which may be nested in any combination, including: * Boolean logic including negative conditions * Proximity on any arbitrary pattern * Wildcards and "don't cares" anywhere in the word * Character alternation * Term counting, thresholds, and sets * Error tolerance (fuzzy matching) * Term weighting * Numeric ranges The Fast Data Finder was originally designed and developed at TRW. In 1992, TRW licensed the FDF technology to Paracel Inc., which now sells a commercial product called the FDF-3. 3.0 Proximity Query Generation Our first set of experiments revolved around the use of subquery proximity to rank hit documents. We began with a simple observation: topics are often a conjunction of ideas or concepts. For example, Topic 51, Airbus Subsidies, is a conjunction of the idea "Airbus" (a particular aircraft manufacturer and European consortium) and the idea "subsidy" (in particular, subsidies from the nations belonging to that consortium). Other articles about Airbus Industrie (new planes, fly-by-wire in the A320, accident reports, financial health 202