SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman System Summary and Timing Universitaet Dortmund Automatic routing (RPI feedback) General Coininents The timings should be the time to replicate runs from scr'[OCRerr]tch, not including trial runs, etc. The tilnes should also be re[OCRerr][OCRerr]onably accurate. This sometilnes will be difficult, such as gettilig total time ft)r d&[OCRerr]ument jiidexilig of huge text sections, or m[OCRerr]uiually buildilig a kiiowledge base. Please do your best. I. Construction of indices, knowledge bases, and other dattt structures (please describe all data structures that your system needs t;()r searching) A. Which of the ft)llowin(T were used 10 build your data structures? 1. st()pword list a. how many words in list? 57([OCRerr] 2. is a controlled v([OCRerr]abul[OCRerr]u[OCRerr]y used? no 3. stelnilling yes a. st£[OCRerr]idard stemming [OCRerr]`tlg()n thins which ones? S[OCRerr][OCRerr]ART b. m()1[OCRerr]h()l()gic£'1l [OCRerr][OCRerr]alysis 4. 1dm weighting In docs + queries, tt. * idt; cosine normalization (ntc) (in docs idf is l)ased on collection frequency within doc set Dl only) 5. phrase discovery no 6. syntactic parsing no 7. word sense dis[OCRerr][OCRerr]nbiguation n([OCRerr] 8. heuristic associations no 9. spelling checking (with manual correction) no 10. spelling correction no 11. proper noun identification algorithm n(i 12. tokenizer (rec()L'nizes d[OCRerr]tes, phone numbers, CoifliflOli patterils) no 13. are the m£'uiu£.illy-indexed terins used? Ilo 14. other techniques used to build [OCRerr]ta structures (bnef description) no B. S[OCRerr]itistics on data structwes built from Tl[OCRerr]EC text (please fill out each applicable section) 1. inverted index a. total [OCRerr]`uli()unt of stor£'ige (ineg[OCRerr]ibytes) 275 b. totil computer tilne to build (approxilnate number of hours) 1.9 hours (not including tllue to index Dl to o')tain collection frequency info) c. is the pr(x:ess completely [OCRerr]`Lut()ln'Ltic? yes d. (`ne term positions wi[OCRerr]in (1(iculnents stored? Ilo e. single terms only? yes 5. other dali structures built from Tl[OCRerr]EC text (wh[OCRerr]'it?) Map from dodd to text location (also gives title for each doc) [OCRerr]`i. total £`ui'ount of st()r£'ige (megabytes) 24 M')ytes. b. t()[OCRerr]l computer tilfle to build (approxilnate number of hours) Tinie t([OCRerr] create included in inverted tile creation ahove. c. is the pr[OCRerr]'cess completely (`tutomatic? yes other data structures built from TREC text (what?) 461