SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) A Boolean Approximation Method for Query Construction and Topic Assignment in TREC chapter P. Jacobs G. Krupka L. Rau National Institute of Standards and Technology Donna K. Harman (none for the ad hoc test), how to make the queries practical and accurate. Our choice here was to keep the initial queries relatively simple, and to run the results of a "first pass" retrieval against the entire corpus through a statistical filter to pull out terms that would help to augment or refine the query. In addition, the matching engine would display the exact portion of each text that (correctly or incorrectly) matched the query, making it easy to correct glaring errors and refine ambiguous terrns. This amounts to a peculiar sort of feedback mechanism that relies on detailed analysis of portions of the corpus instead of user input. 3.1 Detailed Method The method brings together four key elements: (1) a language for express- ing knowledge-based topics or queries, developed at GE and described in the open literature, (2) a new program to generate Boolean expressions that ap- proximate these queries (called the riLic compiler), (3) a program to match the automatically-generated expressions against text to be retrieved (called the pre-flUer), and (4) a knowledge-based pattern matcher, described in the open literature [2], that takes the results of the first match and rejects texts that do not satisfy the more constrained, knowledge-based query. Because the pattern matcher is designed as an efficient "trigger" mechanism and an aid in parsing, the knowledge-based queries are mostly simple combina- tions of lexical categories. The patterns largely adopt the language of regular expressions, including the following terms and operators: * Lexical features that can be tested in a pattern: - token "name" (e.g. "AK-47") - lexical category (e.g. "adj") - root (e.g. "shoot") - conceptual category (e.g. "human") * Logical combination of lexical feature tests - OR, AND ,and NOT * Wild cards $ - 0 or 1 tokens * - 0 or more tokens + - 1 or more tokens * Variable assignment from pattern components = * Grouping operators: <>for grouping []for disjunctive grouping 301