SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) CLARIT TREC Design, Experiments, and Results chapter D. Evans R. Lefferts G. Grefenstette S. Handerson W. Hersh A. Archbold National Institute of Standards and Technology Donna K. Harman Feature: Exclusions Total Yes No 71 24 47 Good 12 25 Bad 12 22 Feature: Generalizations Total Yes No 71 30 41 Good 15 22 Bad 15 19 Feature: Te[OCRerr][OCRerr] LI )oral Constraints Total Yes No 71 11 60 Good 7 30 Bad 4 30 Note: None of the above shows significant correlation! Table 10: Topic Features x Performance least half of the relevant documents were missing in both the ad-hoc and routing experiments. Relative to the available relevant documents, the vector-space discrimination phase of CLARIT- TREC processing demonstrated fairly accurate retrieval. Refinements in the techniques used to nominate an initial partition, therefore, become a natural focus for efforts to improve overall system performance. In some cases, CLARIT processing missed relatively large numbers of correct documents. While it is difficult to imagine a single-strategy IR system that would perform optimally on all types of queries, it is important to understand why certain queries may cause failures for a given system. To this end, we conducted several experiments to attempt to identify features in the queries that might have caused sub-optimal performance. We examined all of the topic statements to determine if the presence or absence of certain features is correlated with `good' or `bad' recall rates for that topic. ("Good" here is defined as being above the median, "bad" as being below the median recall rate.) In particular, we tested three hypotheses: 1. The presence of exclusions in the topic causes poor retrieval. "Exclusions" include any statements that specifically identify concepts or interpretations related to the topic that are not to be considered relevant for the purposes of retrieval. For example, one query asked for information on computer crimes, but specifically excluded computer viruses. In CLARIT[OCRerr]TREC processing, no attempt was made to accommodate such exclusions except that specifically negated phrases were deleted from the query term list during initial manual evaluation and term weighting. 283