SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Overview of the First Text REtrieval Conference (TREC-1) chapter D. Harman National Institute of Standards and Technology Donna K. Harman 4. Evaluation 4.1 Existing Evaluation Methodology An important element of ThEC was to provide a common evaluation forum. Standard recall/precision figures were calculated for each system and the tables and graphs for the results are presented in Appendix A. Figure 4 shows a typical r[OCRerr]a11/precision curve for illustration purposes. The x axis plots the recall values at fixed levels of recall1 where Recall = number of relevant items retrieved total number of relevant items in collection Th[OCRerr] y axis plots the average precision values at those given recall values, where precision is calculated by Precision = number of relevant items retrieved total number of items retrieved 0.9 0.8 0.7 0.6 o 0.5 [OCRerr] 0.4 Illustration of Recall-Precision Curve ½ 0.3 0.2 0.1 0 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Recall Figure 4. A Sample Recall/Precision Curve. There is a standard table and graph in Appendix A for each run for each system, with the runs identified by their unique tags. A map for matching the tags to the systems is also provided. Note that the tables for the [OCRerr]P- SThR panel are in Appendix B as the results are not directly comparable to the TREC results. The tables show some total statistics for each run, plus both the recall-level and document-level recall/precision averages. A second type of information about each system is shown in Appendix C. These standardized forms describe system features and system timing, and allow some primitive comparison of the amount of effort needed to produce the results. 11