SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) A Boolean Approximation Method for Query Construction and Topic Assignment in TREC chapter P. Jacobs G. Krupka L. Rau National Institute of Standards and Technology Donna K. Harman A Boolean Approximation Method for Query Construction and Topic Assignment in TREC* Paul S. Jacobs, George B[OCRerr]. Krupka, and Lisa F. R.au GE [OCRerr]esearch and Development Center Schenectady, NY 12301 (518) 387 - 5059 Abstract Word-based search is the simplest and most widely-available method for process- ing and retrieving volumes of information from free text. However, this common method of searching is generally cumbersome and inaccurate. The method de- scribed here automatically generates complex Boolean queries for word-based search, so that the same mechanism can be used with higher accuracy and effi- ciency. This practical approach worked, with good results, on the entire TREC corpus on both the "routing" and "ad hoc retrieval" tasks, with the official averages in the top group for both tasks. 1 Introduction Full-text search is currently the simplest and most commonly-used method for locating information in large volumes of free text. Because users are accustomed to describing what they are looking for with specific words, and those words are often found in the texts, searching the text for selected words or word combi- nations is a natural and easy-to-implement method for information retrieval. However, it can be very inaccurate. In some experiments, the percentage of relevant texts retrieved has been shown to be lower than 10%, and a high per- centage of material that is retrieved can be irrelevant. Also, it can be very difficult for searchers to compose "queries" that combine the words that are ef- fective in locating relevant material without finding large quatities of irrelevant information as well. *Special thank[OCRerr] to Michael Charbonnean, Mark Freeman, Geoff Gordon, John Mnnq.on and IJn 7ernik, who all helped participate in the design and implementation of the GE TREC system. This research was sponsored in part by the l,efense Advanced Research Project Agency. The views and concli'sions contained in this doci'ment are those of the anthors and shoi'ld not be interpreted as representing the official policies, either expressed or implied, of the l,efense Advanced Research Project Agency or the IJ[OCRerr] Government. 297