SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) CLARIT TREC Design, Experiments, and Results chapter D. Evans R. Lefferts G. Grefenstette S. Handerson W. Hersh A. Archbold National Institute of Standards and Technology Donna K. Harman ZFO9-435-245 7.720000 WS3870123-0031 6.470000 FR89214-0026 6.310000 WS3870519-0094 6.130000 WS39009 12-0046 5.830000 WSJ870305-0055 5.360000 ZFO7-783-164 5.310000 ZFO7-189-244 5.100000 ZF07-443-642 4.980000 AP881122-0107 4.060000 WS3911018-0122 4.060000 ZFO7-971-724 4.050000 ZFO7-251-245 3.930000 WS3870421-0065 3.780000 ZFO8-084-048 3.740000 ZFO7-621-948 3.670000 WSJ911030-0170 3.610000 ZFO9-584-807 3.570000 ZFO7-294-735 3.420000 ZFO9-526-239 3.390000 ZFO7-789-516 3.330000 ZFO7-218-520 3.300000 ZFO7-800-964 3.300000 ZFO7-495-528 3.280000 WS3900629-0110 3.210000 ZFO9-559-173 3.200000 ZFO7-118-812 3.170000 WS3871030-0149 3.160000 ZFO7-878-828 3.120000 WSJ870309-01 10 3.070000 AP880419-0280 3.050000 Figure 18: Sample of Data-First[OCRerr]Pass Partitioning Results for Topic 57 TREC document was `scored' against the super thesaurus in a single pass (Step 6 in Figure 11): effectively, each document was scored against the routing/partitioning thesaurus for each topic in parallel. In particular, every NP in each document was matched against the NPs (terms) in the routing thesaurus; partial matches were allowed. The definitions in Figure 15 and the formula in Figure 16 (given schematically in Figure 17) were used to yield a composite score for the document based on the number of exact and partial hits as a function of document length and term `value'. The routing/partitioning thesaurus was used to score the full database, yielding a ranking of all documents relative to all topics simultaneously. As shown in Step 7 in Figure 11, the top 2000 documents for each topic were retained as the partition for the topic for the next stage of processing. Figure 18 gives sample results of the rankings of documents based on feature scoring for Topic 57. Figure 19 shows the set of `true' relevants chosen by manual review of the top 10-50 ranked documents. 4.7 Final `Querying' Figure 20 gives the final steps in the process. There were two essential phases in querying at this point: building the final query vector and querying a partition of the database to retrieve the final set of relevant documents. Note that the query was weighted based on statistics 268