SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix A: TREC-1 Results appendix National Institute of Standards and Technology Donna K. Harman APPENDIX A This appendix contains tables of results for all the TREC-1 participants except the TIPSThR panel, whose tables appear in Appendix B. The tables in Appendix A and Appendix B show various measures of the perfor- mance on the adhoc and routing tasks. The adhoc results come first, followed by the routing results, with the tables in the same order as the presentation order of the papers. The definitions of the evaluation measures are given in the Overview, section 4, and readers unfamiliar with these measures should read that section first. Care should be taken in comparing the tables across systems. These measures show performance only, with no measure of user or system effort. Additionally, because of misunderstandings about the query categories, some results may reflect manual adjustments of the queries even though the results are not formally categorized as feedback results. The tables contain four major boxes of statistics and two graphs. Box 1-- Summary Statistics line 1-- unique run identifier, data subset, and query construction method used Data subset full (disks 1 and 2 for adhoc, disk 2 for routing) category B (the official subset of data, 1/4 of the data using the Wall Street Journal articles) disk 1 only disk 2 only wsj, disk 1 only [OCRerr]all Street Journal from disk 1) wsj, disk 2 only (Wall Street Journal from disk 2) Query construction method automatic (method 1) manual (method 2) feedback (method 3, frozen evaluation used) line 2-- Number of topics included in averages. line 3-- Total number of documents retrieved over all topics. Here, "retrieved" means having a rank less 200. line 4-- Total number of relevant documents for all topics in the collection (whether retrieved or not). line 5-- Total number of relevant retrieved documents for this run. Box 2-- Recall Level Averages lines 1-11-- The average over all topics of the precision at each of the 11 recall points given. Note that this is interpolated precision: e.g., for a particular topic, if the precision at 0.50 recall is greater than the precision at OAO recall, then the precision at 0.50 recall is used for both the 0.50 and 0A0 recall levels. line 12-- The average precision based on the 11 recall points in lines 1-11. line 13-- The average precision based on 3 intermediate recall points (0.2, 0.5, and 0.8). Box 3-- Document Level Averages lines 1-5-- The average recall and precision after the given number of documents have been retrieved. 373