SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Overview of the First Text REtrieval Conference (TREC-1)
chapter
D. Harman
National Institute of Standards and Technology
Donna K. Harman
most systems to handle. The two narratives shown beloW illustrate this point.
<num> Number: 051
A relevant document will cite or discuss assistance to Airbus Industrie by the French, German, British
or Spanish government(s), or will discuss a trade dispute between Airbus or the European governments
and a U.S. aircraft producer, most likely Boeing Co. or McDonnell Douglas Corp., or the U.S. govern-
ment, over federal subsidies to Airbus.
[OCRerr]num> Number: 058
A relevant document will either report an impending rail strike, describing the conditions which may
lead to a strike, or will provide an update on an ongoing strike. To be relevant, the document will
identify the location of the strike or potential strike. For an impending strike, the document will report
the status of negotiations, contract talks, etc. to enable an assessment of the probability of a strike. For
an ongoing strike, the document will report the length of the strike to the current date and the status of
negotiations or mediation.
In a preliminary analysis, the narratives and the factors played a strange and unpredictable role in the results
for TREC-1. Systems did as well on topics with very restrictive narratives, such as that of topic 58, as on topics
with non-restrictive narratives, such as topic 51. The subject and terms in the entire topic were more important
in determining success than the restrictiveness of the narrative. The factors also did not play a major role in
system performance. This could change in TREC-2 when groups have more time to adjust their systems to the
TREC task.
3A The Relevance Judgments
The relevance judgments are of critical importance to a test collection. For each topic it is necessary to
compile a list of relevant documents; hopefully as comprehensive a list as possible. For the TREC task, three
possible methods for finding the relevant documents could have been used. In the first method, full relevance
judgments could have been made on all 742,611 documents, for each topic, resulting in over 74 million judg-
ments. This was clearly impossible. As a second approach, a random sample of the documents could have been
taken, with relevance judgments done on that sample only. The problem with this approach is that a random
sample that is large enough to find on the order of 200 relevant documents per topic is a very large random
sample, and is likely to result in insufficient relevance judgments. The third method, the one used in TREC,
was to make relevance judgments on the sample of documents selected by the various participating systems.
This method is known as the pooling method, and has been used successfully in creating other collections. It
was the recommended method in 1975 proposal to the British Library to build a very large test collection
(Sparck Jones & van Rijsbergen).
To construct the pool, the following was done.
1. Divide each set of results into results for a given topic
2. For each topic within a set of results, select the top 200 ranked documents for input to the pool
3. For each topic, merge results from all systems
4. For each topic, sort results based on document numbers
5. For each topic, remove duplicate documents
Pooling proved to be an effective method. There was litde overlap among the 25 systems in their retrieved
documents. Table 4 shows the overlap statistics. The first overlap statistics are for the adhoc topics (test topics
against both training documents Dl and test documents D2), and the second statistics are for the routing topics
(training topics against test documents D2 only).
9