SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
IC. CONSIItUCTION OF INDICES, KNOWLEDGE BASES, AND OilIER DATA STRUCTITES
-. DATA BUILT FROM O¶HER SOul
[OCRerr]TEM NAME J CLARIT ] C[OCRerr]RIT ] HNC ] HNC [QUEENS r SYRACUSE [:
Word frequen[OCRerr]
lexicon for statistics for Stemming
ly Built Auxiliary Files English common English exception list Word pair list Yes
n Independent or Domain Domain Domain Domain Stopword
n Specific independent independent independent independent file Domain independent I
Database of words
with frequency 1) lexicon
f File lexicon statistics Exception list Word pair list lexicon 2) knowledge bases [OCRerr]
;torage(MB) 2MB 2MB 28KB 61KB 0.004 15[OCRerr] (3] C
.r of Concepts >100,000 (1] 139,481 Words 1,300 3700 630 82,669 (4] 1
f Representation Database records Database records Iiist Iiist [5] _
[OCRerr]mputer Build Time
) N/A 2Omin. N/A N/A 0 2OHours [6] 4
ter Time to Modi[OCRerr] (Hours) None None
I Time to Build N/A None - 96 Hrs. %-120 Hrs. 48 Hours [7] _
I Time to Modify None None
b Qexicon & proper
Manual LAbor (*) [2] None b b noun KB)
ly-built Auxiliary Files None None None None [OCRerr]
r File
torage (MB) _________________
r of Concepts
Representation
- - -
) Mostly manually built using special interface
) Mostly machine built with manual correction
Initial core manually built to `bootstrap" for completely machine-built completion
) Other