SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Overview of the First Text REtrieval Conference (TREC-1)
chapter
D. Harman
National Institute of Standards and Technology
Donna K. Harman
construct three sets of queries. Qi is the set of queries (probably multiple sets) created to help in adjusting a
system to this task, to create better weighting algorithms, and in general to train the system for testing. The
results of this research were used to create Q2, the routing queries to be used against the test documents. Q3 is
the set of queries created from the test topics as adhoc queries for searching against the combined documents
(both training documents and test documents). The results from searches using Q2 and Q3 were the official test
results. The documents were full-length text from various sources such as newspapers, newswires, magazines
and journals (see sect. 32 for more details).
2[OCRerr] Specific Task Guidelines
The various ThEC participants used a wide variety of indexing,knowledge base building techniques, and a
wide variety of approaches to generate search queries. Therefore it was important to establish clear guidelines
for the [OCRerr]TREC task and to develop some methods of standardized reporting to allow comparison. The guidelines
deal with the methods of indexing[OCRerr]owledge base construction, and with the methods of generating the queries
from the supplied topics. In general they were constructed to reflect an actual operational environment, and to
allow as fair as possible a separation among the diverse query construction approaches.
There were guidelines for constructing and manipulating the system data structures. These structures were
defined to consist of the original documents, any new structures built automatically from the documents (such as
inverted files, thesauri, conceptual networks, etc.) and any new structures built manually from the documents
(such as thesauri, synonym lists, knowledge bases, rules, etc.). The following guidelines were provided to the
participants.
1. System data structures can be built using the initial training set (do[OCRerr]uments Dl, training topics, and
relevance judgments). They may be modified based on the test documents D2, but not based on the test
topics. In particular, the processing of one test topic should not affect the processing of another test topic.
For example, it would not be allowed to update a system knowledge base based on the analysis of one test
topic in such a way that the interpretation of subsequent test topics was changed in any fashion.
2. There are several parts of the Wall Street Journal and the Ziff material (see sec[OCRerr] 3.2) that contain manually
assigned controlled or uncontrolled index terms. These fields are delimited by SGML tags, as specified in
the documentation files included with the data. Other parts of the [OCRerr]IXEC data contain no manual indexing.
Since the primary focus of ThEC is on retrieval and routing of naturally occurring text, these manually
indexed terms should not be indiscriminately used as if they are a normal part of the text. If your group
decides to use these terms, they should be part of a specific experiment that utilizes manual indexing
terms, and their use should be declare[OCRerr]
3. Special care should be used in handling the routing topics. In a true routing situation, a single document
would be indexed and "passed" against the routing topics. Since most of you will be indexing the test
documents as a complete set, routing should be simulated by not using any test document information (such
as IDF based on the test collection, total frequency based on the test collection, etc.) in the searching. It is
perfecdy permissible to use training-set collection information however. If your system bases system data
structures on the entire test data and is unable to operate in a proper routing mode, then you should either
have a different method for handling routing, or only submit results for the adhoc part of TREC.
Additionally there were guidelines for constructing the queries from the provided topics (see sect. 3.3 for more
on the topics). These guidelines were considered of great importance for fair system comparison and were
therefore carefully constructed. Three generic categories were defined, based on the amount and kind of manual
intervention used.
1. Method 1-- completely automatic initial query construction.
adhoc queries -- The system will automatically extract information from the topic (the topic fields used
should be identified) to construct the query. The query will then be submitted to the system (with no
manual modifications) and the results from the system will be the results submitted to NIST. There should
be no manual intervention that would affect the results.
routing queries -- The queries should be constructed automatically using the training topics, the training
relevance judgments and the training documents. The queries should then be submitted to NIST before the
3