SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
CLARIT TREC Design, Experiments, and Results
chapter
D. Evans
R. Lefferts
G. Grefenstette
S. Handerson
W. Hersh
A. Archbold
National Institute of Standards and Technology
Donna K. Harman
Feature: Exclusions
Total Yes No
71 24 47
Good 12 25
Bad 12 22
Feature: Generalizations
Total Yes No
71 30 41
Good 15 22
Bad 15 19
Feature: Te[OCRerr][OCRerr] LI )oral Constraints
Total Yes No
71 11 60
Good 7 30
Bad 4 30
Note: None of the above shows significant correlation!
Table 10: Topic Features x Performance
least half of the relevant documents were missing in both the ad-hoc and routing experiments.
Relative to the available relevant documents, the vector-space discrimination phase of CLARIT-
TREC processing demonstrated fairly accurate retrieval. Refinements in the techniques used
to nominate an initial partition, therefore, become a natural focus for efforts to improve overall
system performance.
In some cases, CLARIT processing missed relatively large numbers of correct documents.
While it is difficult to imagine a single-strategy IR system that would perform optimally on
all types of queries, it is important to understand why certain queries may cause failures for
a given system. To this end, we conducted several experiments to attempt to identify features
in the queries that might have caused sub-optimal performance. We examined all of the topic
statements to determine if the presence or absence of certain features is correlated with `good'
or `bad' recall rates for that topic. ("Good" here is defined as being above the median, "bad"
as being below the median recall rate.) In particular, we tested three hypotheses:
1. The presence of exclusions in the topic causes poor retrieval. "Exclusions" include any
statements that specifically identify concepts or interpretations related to the topic that
are not to be considered relevant for the purposes of retrieval. For example, one query
asked for information on computer crimes, but specifically excluded computer viruses.
In CLARIT[OCRerr]TREC processing, no attempt was made to accommodate such exclusions
except that specifically negated phrases were deleted from the query term list during
initial manual evaluation and term weighting.
283