SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
B. CONSTRUCTION OF INDICES, KNOWLEDGE BASES, AND OIlIER DATA STRUCTURES-- STATISTICS ON DATA SmUCTURES (C
OTES:
] Occurrence statistics for the most frequently occurring (in learning set rel docs) 1000 terms for each routing query.
] For the adhoc runs, the `query regression' method was used. The query regression coefficients were computed from the query.nnn and doc.lsp-file (wh
created by polynomial regression). Afterwards reweighting of the q3-query-file. 4 query.nnn -> query.lsp.
] Because we used the UMASS INQUERY system and its indexing, all of the answers to the questions in this section for our systems are identical to tt
the UMASS system.
] Document vector files and term dictionary produced by SMART: Fach individual collection was indexed separately, so sizes/times are average per col
with the range of values specified. The collection statistics are based on the summation of individual collection values so are perhaps less accura
collection size of the term dictionary cannot be effectively estimated with this approach. Term positions are not stored within the document vecto
Average Range Collection
Document Vector Files (MB) 120 31-124 1100
Term Dictionary (MB) 16 15-17 Unknown
Time to create both above files (Hours) 10 6-14 120
5] Standard process as implemented by SMART, following parameters as in Part I, Section A.