SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
CLARIT TREC Design, Experiments, and Results
chapter
D. Evans
R. Lefferts
G. Grefenstette
S. Handerson
W. Hersh
A. Archbold
National Institute of Standards and Technology
Donna K. Harman
ZFO9-435-245 7.720000
WS3870123-0031 6.470000
FR89214-0026 6.310000
WS3870519-0094 6.130000
WS39009 12-0046 5.830000
WSJ870305-0055 5.360000
ZFO7-783-164 5.310000
ZFO7-189-244 5.100000
ZF07-443-642 4.980000
AP881122-0107 4.060000
WS3911018-0122 4.060000
ZFO7-971-724 4.050000
ZFO7-251-245 3.930000
WS3870421-0065 3.780000
ZFO8-084-048 3.740000
ZFO7-621-948 3.670000
WSJ911030-0170 3.610000
ZFO9-584-807 3.570000
ZFO7-294-735 3.420000
ZFO9-526-239 3.390000
ZFO7-789-516 3.330000
ZFO7-218-520 3.300000
ZFO7-800-964 3.300000
ZFO7-495-528 3.280000
WS3900629-0110 3.210000
ZFO9-559-173 3.200000
ZFO7-118-812 3.170000
WS3871030-0149 3.160000
ZFO7-878-828 3.120000
WSJ870309-01 10 3.070000
AP880419-0280 3.050000
Figure 18: Sample of Data-First[OCRerr]Pass Partitioning Results for Topic 57
TREC document was `scored' against the super thesaurus in a single pass (Step 6 in Figure 11):
effectively, each document was scored against the routing/partitioning thesaurus for each topic
in parallel. In particular, every NP in each document was matched against the NPs (terms)
in the routing thesaurus; partial matches were allowed. The definitions in Figure 15 and the
formula in Figure 16 (given schematically in Figure 17) were used to yield a composite score for
the document based on the number of exact and partial hits as a function of document length
and term `value'.
The routing/partitioning thesaurus was used to score the full database, yielding a ranking
of all documents relative to all topics simultaneously. As shown in Step 7 in Figure 11, the top
2000 documents for each topic were retained as the partition for the topic for the next stage of
processing.
Figure 18 gives sample results of the rankings of documents based on feature scoring for
Topic 57. Figure 19 shows the set of `true' relevants chosen by manual review of the top 10-50
ranked documents.
4.7 Final `Querying'
Figure 20 gives the final steps in the process. There were two essential phases in querying at
this point: building the final query vector and querying a partition of the database to retrieve
the final set of relevant documents. Note that the query was weighted based on statistics
268