SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
A Boolean Approximation Method for Query Construction and Topic Assignment in TREC
chapter
P. Jacobs
G. Krupka
L. Rau
National Institute of Standards and Technology
Donna K. Harman
8 Summary
GE's participation in TREC involved a small implementation of a simple strat-
egy for compiling knowledge based pattern matcher rules into the language of
Boolean expressions. A statistical corpus analyzer helped to formulate and re-
fine queries for both the ad hoc and routing tasks, and the resulting matching
engine ran on the entire 2.3 gigabytes of text. The simple Boolean retrieval
engine performed very well on both tasks. These results are promising, both
from the perspective of accuracy and for the simplicity with which they seem to
bring knowledge-based techniques to bear within the rudimentary framework of
word-based retrieval.
References
[1] Paul S. Jacobs. Joining statistics with NLP for text categorization. In
Proceedings of [OCRerr]he 3rd Conference on Applied Na[OCRerr]ural Lang[OCRerr]age Processing,
April 1992.
[2] Paul S. Jacobs, George R. Krupka, and Lisa F. Rau. Lexico-semantic pat-
tern matching as a companion to parsing in text understanding. In Four[OCRerr]h
DARPA Speech and Natural Language Workshop, pages 337-342, San Ma-
teo, CA, February 1991. Morgan-Kaufmann.
308