SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Workshop on: Use of Natural Language Processing at TREC
report of discussion group
David Lewis
Alan Smeaton
National Institute of Standards and Technology
Donna K. Harman
A tentative hypothesis [OCRerr] f[OCRerr])rwayd by some was that NLP-based methods may be best for
paragraph or sub-document retneval (termed "nugget extraction" during discussions) and that
more traditional methods may be better for more general types of queries. It was suggested
that testing this hypothesis, and in general getting a real understanding of the effect of NLP
techniques on IR, would require a more careful analysis of the kinds of queries used (sugges-
tions were made about how the query set might be improved or augmented), as well as details
of how relevance judgments are made and what parts of documents are relevanL
In conclusion, it was acknowledged that the emphasis of researchers in ThEC- 1 had quite
reasonably been simply on getting their systems to work at all with such a large collecfion of
text. It was hoped that for ThEC-2 more controlled comparisons and detailed analyses of
failures and successes could be done, to give us more insight into the strengths and
weaknesses of NLP methods in IR.
366