SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix A: TREC-1 Results
appendix
National Institute of Standards and Technology
Donna K. Harman
APPENDIX A
This appendix contains tables of results for all the TREC-1 participants except the TIPSThR panel, whose
tables appear in Appendix B. The tables in Appendix A and Appendix B show various measures of the perfor-
mance on the adhoc and routing tasks. The adhoc results come first, followed by the routing results, with the
tables in the same order as the presentation order of the papers. The definitions of the evaluation measures are
given in the Overview, section 4, and readers unfamiliar with these measures should read that section first.
Care should be taken in comparing the tables across systems. These measures show performance only, with
no measure of user or system effort. Additionally, because of misunderstandings about the query categories,
some results may reflect manual adjustments of the queries even though the results are not formally categorized
as feedback results.
The tables contain four major boxes of statistics and two graphs.
Box 1-- Summary Statistics
line 1-- unique run identifier, data subset, and query construction method used
Data subset
full (disks 1 and 2 for adhoc, disk 2 for routing)
category B (the official subset of data, 1/4 of the data using the Wall Street Journal articles)
disk 1 only
disk 2 only
wsj, disk 1 only [OCRerr]all Street Journal from disk 1)
wsj, disk 2 only (Wall Street Journal from disk 2)
Query construction method
automatic (method 1)
manual (method 2)
feedback (method 3, frozen evaluation used)
line 2-- Number of topics included in averages.
line 3-- Total number of documents retrieved over all topics. Here, "retrieved" means having a rank less 200.
line 4-- Total number of relevant documents for all topics in the collection (whether retrieved or not).
line 5-- Total number of relevant retrieved documents for this run.
Box 2-- Recall Level Averages
lines 1-11-- The average over all topics of the precision at each of the 11 recall points given. Note
that this is interpolated precision: e.g., for a particular topic, if the precision at 0.50
recall is greater than the precision at OAO recall, then the precision at 0.50 recall
is used for both the 0.50 and 0A0 recall levels.
line 12-- The average precision based on the 11 recall points in lines 1-11.
line 13-- The average precision based on 3 intermediate recall points (0.2, 0.5, and 0.8).
Box 3-- Document Level Averages
lines 1-5-- The average recall and precision after the given number of documents have been retrieved.
373