NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Appendix B: System Features Appendix National Institute of Standards and Technology D. K. Harman [OCRerr]* CONSTRUCTION OF INDICES, KNOWLEDGE BASES AND OIlIER DATA STRUCTURES -- STATISTICS ON DATA STRUCTURES (C OIES: L] 5] Only a few KB for the training sets used for the official scores - we used TOPIC for the actual test. Feature extraction takes of the order of 10 seconds per document - total time for the training data (disk 2 only) was of the order of 4 hours. Ran queries as adhoc queries against the test data (WSJ) then typed in query sequence to he used as filter. Scan data file and save ofl[OCRerr]et for each occurrence of <DOC> string. The neural network was used to represent the topics. Fast output node is associated with a topic. Each input node is associated with a Roget ca Document frequency for each topic for a list of 1400 candidate features.