SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Application of the Automatic Message Router to the TIPSTER Collection chapter R. Jones S. Leung D.L. Pape National Institute of Standards and Technology Donna K. Harman Four sets of filters were run, though because of time constraints, only two were submitted to the formal experiment set. These two were: CPGHC - Hand-crafted structured form of filters. These were written by an experienced staff member over a period of a week. * CPGCN - Automated structured form, generated directly from the Concept field of each topic. These were generated by a simple LEX program that converted each entry directly into an AMR term. Experimental Procedure The filters were run against a portion of the Disk 1 data. The primary purpose of this run was to tune the internal weights of the terms in each filter. Because of hardware problems, the filters had been submitted to TREC before these experiments were conducted so no changes were made to the filters in the light of these runs. However, some changes to the AMR algorithms were made to stabilise the dynamic weight modification process described above. The two sets of filters submitted, together with two other sets generated from the Disk 1 data (200 filters in all) were run against the Disk 2 collection over a weekend. Though AMR provides a measure of the relevance of each document in the range 1 to 100, no cut-off was applied to the results submitted. If a filter returned 200 documents, then these were submitted, regardless of the scores obtained. Results Analysis The results of the experiments were very encouraging, with both the manually generated and automatic sets appearing in the top four routing results presented at ThEC, as measured by the 11-point average precision vs recall scores. The manually generated filters performed better than the automatically generated ones though not markedly so. This is probably due to the fact that the Concepts were well described and provided an adequate range of synonyms for most topics. Four of the manually generated filters presented very poor results, due to the insertion of a mandatory term in the filter. As measured by the precision figures, the filters performed better on the second 25 topics than they did on the first 25. This may be due to the different nature of the topics; the first half of the topics being more fact specific, and the last being more generic and `information retrieval' oriented. However, the 11 point average recall precision scores do not reflect any significant difference. 247