SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Overview of the Second Text REtrieval Conference (TREC-2)
chapter
D. Harman
National Institute of Standards and Technology
D. K. Harman
constructed automatically using the training top-
ics, the training relevance judgments and the train-
ing documents. The queries should then be sub-
mitted to NIST before the test documents are
released and should not be modified after that
point. The unmodified queries should be run
against the test documents and the results submit-
ted to NIST.
2. MANUAL (manual initial query construction)
adhoc queries -- The query is constructed in some
manner from the topic, either manually or using
machine assistance. Once the query has been con-
structed, it will be submitted to the system (with
no manual intervention), and the results from the
system will be the results submitted to NIST.
There should be no manual intervention after ini-
tial query construction that would affect the
results. (Manual intervention is covered by the cat-
egory labelled FEEDBACK.)
routing queries -- The queries should be con-
structed in the same manner as the adhoc queries
for MANUAL, but using the training topics, rele-
vance judgments, and training documents. They
should then be submitted to NIST before the test
documents are released and should not be modi-
fled after that point. The unmodified queries
should be run against the test documents and the
results submitted to NIST.
3. FEEDBACK (automatic or manual query con-
struction with feedback)
atihoc queries -- The initial query can be con-
structed using either AUTOMATIC or MAAUAL
methods. The query is submitted to the system,
and a subset of the retrieved documents is used for
manual feedback, i.e., a human maltes judgments
about the relevance of the documents in this sub-
set. These judgments may be communicated to
the system, which may automatically modify the
query, or the human may snnply choose to modify
the query himself. At some point, feedback
should end, and the query should be accepted as
final. Systems that submit runs using this method
must submit several different sets of results to
allow tracking of the time/cost benefit of doing
relevance feedback.
22 The Participants
There were 31 participating systems in ThEC-2, using a
wide range of retrieval techniques. The participants were
able to choose from three levels of participation: Cate-
gory A, full participation, Category B, full participation
using a reduced dataset (1/4 of the full document set), and
Category C for evaluation only (to allow commercial sys-
tems to protect proprietary algoritiuns). The program
committee selected only 20 category A and B groups to
present talks because of limited conference time, and
requested that the rest of the groups present posters. All
groups were asked to submit papers for the proceedings.
Each group was provided the data and asked to turn in
either one or two sets of results for each topic. When two
sets of results were sent, they could be made using differ-
ent methods of creating queries (AUTOMATIC, MAN-
UAL, or FEIIIDBACK), or by using different parameter
settings for one query creation method. Groups could
choose to do the routing task, the atihoc task, or both, and
were requested to submit the top 1000 documents
retrieved for each topic for evaluation.
3. The Test Collection
3.1 Introduction
The creation of the test collection (called the `IIPSThR
collection) was critical to the success of ThEC. Like
most traditional retrieval collections, there are three dis-
tinct parts to this collection -- the documents, the queries
or topics, and the relevance judgments or "right answers."
These test collection components are discussed briefly in
the rest of this section. For a more complete description
of the collection, see [Hannan 1994].
3,2 The Documents
The documents needed to mirror the different tyi[OCRerr] of
documents used in the theoretical TREC application.
Specifically they had to have a varied length, a varied
writing style, a varied level of editing and a varied vocab-
ulary. As a final requirement, the documents had to cover
different timeframes to show the effects of document date
on the routing task.
The documents were distributed as CD-ROMs with about
1 gigabyte of data each, compressed to fit. The following
shows the actual contents of each disk.
Disk 1
routing queries -- FEEDBACK cannot be used for
routing queries as routing systems have not sup-
ported feedback.
3
WSJ -- Wall Street Journal (1987, 1988, 1989)
* --- AP Newswire (1989)