SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Overview of the First Text REtrieval Conference (TREC-1)
chapter
D. Harman
National Institute of Standards and Technology
Donna K. Harman
Overview of the First Text REtrieval Conference (TREC-1)
Donna Harman
National Institute of Standards and Technology
Gaithersburg, Md. 20899
1. Introduction
There is a long history of experimentation in information retrieval. Research started with experiments in
indexing languages, such as the Cranfield I tests (Cleverdon 1962), and has continued with over 30 years of
experimentation with the retrieval engines themselves. The Cranfield II studies (Cleverdon et al. 1966) showed
that automatic indexing was comparable to manual indexing, and this and the availability of computers created a
major interest in the automatic indexing and searching of texts. The Cranfield experiments also emphasized the
importance of creating test collections and using these for comparative evaluation. The Cranfield collection,
created in the late 1960's, contained 1400 documents and 225 queries, and has been heavily used by researchers
since then. Subsequendy other collections have been built, such as the CACM collection (Fox 1983), and the
NPL collection (Sparck Jones & Webster 1979).
In the 30 or 50 years of experimentation there have been two missing elements. First, although some
research groups have used the same collections, there has been no concerted effort by groups to work with the
same data, use the same evaluation techniques, and generally compare results across systems. The importance
of this is not to show any system to be superior, but to allow comparison across a very wide variety of tech-
niques, much wider than only one research group would tackle. Karen Sparck Jones in 1981 commented that:
Yet the most striking feature of the test history of the past two decades is its lack of
consolidation. It is true that some very broad generalizations have been endorsed
by successive tests: for example...but there has been a real failure at the detailed
level to build one test on another. As a result there are no explanations for these
generalizations, and hence no means of knowing whether improved systems could
be designed (p.245).
This consolidation is more likely if groups can compare results across the same data, using the same evaluation
method, and then meet to discuss openly how methods differ.
The second missing element, which has become critical in the last 10 years, is the lack of a realistically-
sized test collection. Evaluation using the small collections currently available may not reflect performance of
systems in large full-text searching, and certainly does not demonstrate any proven abilities of these systems to
operate in real-world information retrieval environments. This is a major barrier to the transfer of these labora-
tory systems into the commercial world. Additionally some techniques such as the use of phrases and the con-
struction of automatic thesauri seem intuitively workable, but have repeatedly failed to show improvement in
performance using the small collections. Larger collections might demonstrate the effectiveness of these pro-
cedures.
The overall goal of the Text REtrieval Conference (TREC) was to address these two missing elements. It is
hoped that by providing a very large test collection, and encouraging interaction with other groups in a friendly
evaluation forum, a new thrust in information retrieval will occur. There is also an increased interest in this
field within the DARPA community, and TREC is designed to be a showcase of the state-of-the-art in retrieval
research. MST's goal as co-sponsor of TREC is to encourage communication and technology transfer among
academia, industry, and government.
1