SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
[OCRerr]* CONSTRUCTION OF INDICES, KNOWLEDGE BASES AND OIlIER DATA STRUCTURES -- STATISTICS ON DATA STRUCTURES (C
OIES:
L]
5]
Only a few KB for the training sets used for the official scores - we used TOPIC for the actual test.
Feature extraction takes of the order of 10 seconds per document - total time for the training data (disk 2 only) was of the order of 4 hours.
Ran queries as adhoc queries against the test data (WSJ) then typed in query sequence to he used as filter.
Scan data file and save ofl[OCRerr]et for each occurrence of <DOC> string.
The neural network was used to represent the topics. Fast output node is associated with a topic. Each input node is associated with a Roget ca
Document frequency for each topic for a list of 1400 candidate features.