SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Design and Evaluation of the CLARIT-TREC-2 System
chapter
D. Evans
R. Lefferts
National Institute of Standards and Technology
D. K. Harman
Design and Evaluation of the CLARIT-TREC-2 System
David A. Evans1'2 and Robert G. Lefferts2
1Laboratory for Computational Linguistics
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213-3890
1 Introduction
The CLARIT team used the opportunity of the TREC-
2 evaluations to explore several facets of the CLARIT
system. In particular, given the performance of the
CLARIT system on TREC-1 tasks (Evans et aL. 1993), we
focused our attention on evaluating
1. fully-automatic processing of topics and potentially-
relevant documents and
2. topic/query augmentation using CLARIT thesaurus-
discovery techniques.
All of the results we report in this paper follow from
straightforward applications of base-level CLARIT pro-
cessing, utilizing essentially the same CLARIT com-
ponents that were employed in the CLARIT-TREC-
1 system. The general improvements we observe in
CLARIT-TREC-2 processing are attributable to modifi-
cations (especially simplifications) in processing steps
and in the settings of system variables.
In the following sections, we describe the CLARIT-
TREC-2 system, report our official processing results,
and offer a brief analysis of performance. In addition,
we report on several subsequent experiments we have
conducted on the TREC-2 collection that test the pa-
rameters of the CLARIT-TREC-2 system and identify
sources of immediate improvements in processing.
2 CLARIT-TREC-2 System Description
and Processing Method
The CLARIT-TREC-2 system reflects a re-organization
of the tools and techniques employed in the CLARIT-
TREC-1 system. One of our principal goals was to
streamline CLARIT processing and to establish a base-
line method that is amenable to parameterization and
analysis. As a consequence, the flow of data in the
CLARIT-TREC-2 system is simple, straightforward,
and efficient; furthermore, all CLARIT processing is
fully automatic.
137
and 2CLARIT Corporation
Suite 200A, 319 South Craig St.
Pittsburgh, Pennsylvania 15213-3726
2.1 Changes from TREC-1
The essential differences between the CLARIT-TREC-1
and TREC-2 systems are in the preparation and evalua-
tion of queries (TREC-2 "topics") and the automation of
steps designed to identify and process potentially rel-
evant documents for use in query augmentation. The
following summaries highlight these points.
* One-Pass Querying. The CLARIT-TREC-1 sys-
tem employed a two-step process to retrieve
documents-a first pass for partitioning ("evok-
ing") and a second pass for final ranking ("dis-
crimination"). This has been eliminated in the
CLARIT-TREC-2 system. Querying takes place
in one step over the entire collection using vector-
space-retrieval methods.
* Automatic Query Creation. The CLARIT-TREC-
1 system was categorized as a "manual" system,
though the required manual intervention was min-
imal. In particular, users were expected to assign
an importance coefficient (with possible values "1",
"2", or "3") to the CLARIT-parsed terms in a topic
statement and possibly also to add terms to or
delete terms from the CLARIT-generated list. In
the CLARIT-TREC-2 system, the importance coef-
ficient is assigned automatically by simple heuris-
tics (described below). While users are still free
to modify coefficients or terms, such intervention
is not required. Ml "CLARTA" results reported
in this paper reflect processing in which queries
were fully automatically prepared by the CLARIT
system, without review or modification.
* Automatic Retrieval Refinement. When pro-
cessing ad-hoc queries, the CLARIT-TREC-1 sys-
tem required that the user evaluate a few of the
top-ranked retrieved documents. User-nominated
documents were processed to identify terms for
use in supplementing the source query. In the
CLARIT-TREC-2 system, user evaluations are not
required. Initial querying is accurate enough to
support the automatic processing of the highest-
scoring retrieved documents without `inspection'.