SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Application of the Automatic Message Router to the TIPSTER Collection
chapter
R. Jones
S. Leung
D.L. Pape
National Institute of Standards and Technology
Donna K. Harman
Four sets of filters were run, though because of time constraints, only two were submitted to the
formal experiment set. These two were:
CPGHC - Hand-crafted structured form of filters. These were written by an experienced
staff member over a period of a week.
* CPGCN - Automated structured form, generated directly from the Concept field of each
topic. These were generated by a simple LEX program that converted each entry directly
into an AMR term.
Experimental Procedure
The filters were run against a portion of the Disk 1 data. The primary purpose of this run was to
tune the internal weights of the terms in each filter. Because of hardware problems, the filters
had been submitted to TREC before these experiments were conducted so no changes were made
to the filters in the light of these runs. However, some changes to the AMR algorithms were
made to stabilise the dynamic weight modification process described above.
The two sets of filters submitted, together with two other sets generated from the Disk 1 data
(200 filters in all) were run against the Disk 2 collection over a weekend.
Though AMR provides a measure of the relevance of each document in the range 1 to 100, no
cut-off was applied to the results submitted. If a filter returned 200 documents, then these were
submitted, regardless of the scores obtained.
Results Analysis
The results of the experiments were very encouraging, with both the manually generated and
automatic sets appearing in the top four routing results presented at ThEC, as measured by the
11-point average precision vs recall scores.
The manually generated filters performed better than the automatically generated ones though
not markedly so. This is probably due to the fact that the Concepts were well described and
provided an adequate range of synonyms for most topics. Four of the manually generated filters
presented very poor results, due to the insertion of a mandatory term in the filter.
As measured by the precision figures, the filters performed better on the second 25 topics than
they did on the first 25. This may be due to the different nature of the topics; the first half of the
topics being more fact specific, and the last being more generic and `information retrieval'
oriented. However, the 11 point average recall precision scores do not reflect any significant
difference.
247