SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Overview of the First Text REtrieval Conference (TREC-1)
chapter
D. Harman
National Institute of Standards and Technology
Donna K. Harman
4. Evaluation
4.1 Existing Evaluation Methodology
An important element of ThEC was to provide a common evaluation forum. Standard recall/precision
figures were calculated for each system and the tables and graphs for the results are presented in Appendix A.
Figure 4 shows a typical r[OCRerr]a11/precision curve for illustration purposes. The x axis plots the recall values at
fixed levels of recall1 where
Recall =
number of relevant items retrieved
total number of relevant items in collection
Th[OCRerr] y axis plots the average precision values at those given recall values, where precision is calculated by
Precision =
number of relevant items retrieved
total number of items retrieved
0.9
0.8
0.7
0.6
o 0.5
[OCRerr] 0.4
Illustration of Recall-Precision Curve
½
0.3
0.2
0.1
0
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Recall
Figure 4. A Sample Recall/Precision Curve.
There is a standard table and graph in Appendix A for each run for each system, with the runs identified by their
unique tags. A map for matching the tags to the systems is also provided. Note that the tables for the [OCRerr]P-
SThR panel are in Appendix B as the results are not directly comparable to the TREC results. The tables show
some total statistics for each run, plus both the recall-level and document-level recall/precision averages.
A second type of information about each system is shown in Appendix C. These standardized forms
describe system features and system timing, and allow some primitive comparison of the amount of effort
needed to produce the results.
11