SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
IA. CONSThUCTION OF INDICES, KNOWLEDGE BASES, AND OIlIER DATA STRUCTUUES -- MEIHODS USED
Binary classification trees built automatically from the original documents and the topic statements.
Searching is done on the fly, as raw text is processed. The intermediate data structure is discarded as the search is completed. What is saved, howe[OCRerr]
record of all words appearing within three word positions of each query word.
] Inverse Document Frequency with a base of only those documents containing at least one of the query words.
] All word co-occurrences within 3 word positions of a query word are listed as word pairs.
] A list of offsets to the beginning of records (articles) is generated at the beginning of the session for each data file.
166 stop words, 122 abbreviations, 47 hyphenated words, 24 entries for abbreviations and alternate notation for months, 35 entries for legitimate wc
to be prefixed, and 6 entries for legitimate prefixes.
T] The semantic lexicon we used is based on word senses found in Roget's Thesaurus.