SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) CLARIT TREC Design, Experiments, and Results chapter D. Evans R. Lefferts G. Grefenstette S. Handerson W. Hersh A. Archbold National Institute of Standards and Technology Donna K. Harman 2. The presence of generalizations in the topic causes poor retrieval. For example, one query asked for information on currently proposed acquisitions involving a U.S. company and a foreign company. Without a list of U.S. and foreign companies, one cannot accurately evaluate candidate documents about acquisitions. 3. The presence of temporal constraints in the topic causes poor retrieval. For example, one query asked for the location of presidential candidates during a certain time period. Documents describing events outside of that time period were not considered relevant. Table 10 gives the instances of `good' and `bad' performance relative to the presence or absence of the specific features in topic statements. There is no correlation. The sources of difficulty that the CLARIT-TREC system experienced cannot be traced to such simple characteristics of queries. (Of course, it is still the case that such features of queries present special problems to all IR systems.) 6.4 A Collection of Known Problems As noted previously, we made a number of obvious mistakes in processing the TREC corpus that we simply did not have time to correct. It is likely that some such mistakes contributed to poor performance. The following is a list of known problems that occurred while processing the TREC corpus: 1. Errors in the Lexicon. All CLARIT NLP depends on information derived from the lexicon; incorrect lexical entries will cause errors throughout the system. As with morphological processing, these errors result in false analysis for some words. 2. Morphological Processing Errors. A certain number of rather simple bugs have been discovered in the morpholgical analysis module. The bugs have caused incorrect analyses of some words. 3. "Robust" Parsing of Training Corpus. Unfortunately, the initial parsing of half of the TREC corpus was done in a mode where all phrases (not just NPs) in the input were retained in the output. Therefore, this output contained quite a bit of `noise' in the form of verbs, adverbs, and adjective phrases. 4. Limited Partition Size. In creating the partition using the partitioning thesauri, we were limited to sets of 2000 documents. As noted above, one result of the `low' cutoff is that many relevant documents were simply not included in the partition. Using larger partitions should improve overall performance. In addition, there are many facets of the CLARIT-TREC process that we believe are not properly calibrated or configured. These include the following: 1. Iteration of Retrieval. Automatic feedback of retrieval results can be used to expand the set of relevant documents retrieved. However, we did not have time to perform such feedback during the TREC experiment. 2. Alternative Scoring Functions. It is not clear that IDF-TF is the best scoring function to use in biased collections of text, such as our 2000 document partitions. In fact, IDF scoring will specifically demote terms in the document that are known to be important, yet are very well distributed because of their prominence in the partitioning thesaurus. 3. Refinements to Document Partitioning Formula. TREC represents our initial attempt at using the partitioning techniques on a large corpus. The feature-scoring formula has not been validated. Additional experiments will likely lead to refinements. 284