SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman System Summary and Timing Queens College, CUNY General Conunents The fimings should be the tilne to replic[OCRerr]ite ruiis from scratch, not including trial runs, etc. The tilnes should also he reasonably accur[OCRerr]'[OCRerr]te. This sometilnes will be difficult, such as getting total time for document indexing of huge text sections, or `n[OCRerr][OCRerr]ually building 1 kilowledge base. Please do your best. I. Constructioll of indices, knowledge b[OCRerr]ises, £Lnd other dat[OCRerr]'[OCRerr] structures (please describe your system needs for se[OCRerr]'ucliin[OCRerr]) all da[OCRerr] structures that A. Which of the following were used to build your data structures? 1. st()pw()rd list yes a. how many words in list'? 595 2. is a c()ntR)lled v([OCRerr]abul[OCRerr]lry used'! no 3. stelninilltT a. st£uid[OCRerr]ird steimnilig algon thins yes which ones'? I[OCRerr]()rter's Algorithm b. In()1[OCRerr]h()l()gical (`u1'Llysis 11([OCRerr] 4. telin weighting yes 5. phr[OCRerr]'ise discoveiy n[OCRerr]j 6. syntactic p[OCRerr]u;sinL' no 7. word sense dis'unbitjuati()n 11([OCRerr] 8. heuristic associations n(j 9. spelling checki'i[OCRerr] (with manual collection) no 10. spellitiLl colTection Ilo 11. proper noun idCntifiC[OCRerr]tti()Il L'il[OCRerr]()ri thin no 12. tokellizer (reco[OCRerr]flizes d[OCRerr]'LtCs, phone nujnbers, COiThflOfl patterus) no 13. £`u'e the in[OCRerr]tiiually-ii'dexed tenns used'! 110 14. other techniques used 10 build d[OCRerr]it[OCRerr]i structures (brief descuption) A tal)le of 396 manually created 2-word phrases. When these are identifled in adjacent positions in documents or ([OCRerr]ueries, they are used as additional index terms. B. St£[OCRerr]tistics on d:ita structures built fiom TREC text (please fill out each applicable section) 1. uiverted index a. total [OCRerr][OCRerr]n()unt of storage (megabytes) 378 b. total computer tune to build (approxilnate number of hours) 95+11+2=11)8 fi[OCRerr]r 5(1(1MB. clock tilne c. is the process completely automatic? Yes, if sutlicient disk. Not in this experiment. if not, [OCRerr]tppr()xiinL'Itely how many hours of manual labor? (1.5 d. [OCRerr]`ue term positions within d('cuments stored'? No, Ilut sentence yes. Call modify to capture word positions. C. single terms only'? Yes, except t[OCRerr])r I.A.14. 4. special routing structures (wh[OCRerr]t?) See I.B.5 Network node, edge tiles. Routing using network node and edge files is straightforward. £1. total £un()unt of st()r:lge (ine[OCRerr]abytes) Node tile: 4x7.5 Edge tile: 4x4 Netw([OCRerr]rk segmented int([OCRerr] 4, Ilecause ([OCRerr]f insufficient ram. b. t()L[OCRerr]I computer tune to build (appr()xilnL[OCRerr]te number of hours) 472