SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
IC. CONSTRU[OCRerr]ON OF INDICES, KNOWLEDGE BASES, AND OTHER DATA STRUCTURES -- DATA BUILT FROM OTHER SOUB
OThS:
] There are around 100() semantic categories used. The original 1911 Roget major categories are used by removing the suffix on our semantic cod[OCRerr]
example, the semantic category 12lnv[OCRerr] is shortened by ignoring nv.3.
] Since the 1911 edition of Roget's Thesaurus became public domain recently, we spent approximately 16 hours creating the software to process ti
Thesaurus. Approximately 6 hours of processing time was required to automatically extract 20,000 lexicon entries.