SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman Tinie to creite included in inverted file creation al)()Ve. C. is the pr[OCRerr][OCRerr]ess completely automatic'? yes other data structures built from TREC text (what'?) Map froni internal concept to token string [OCRerr]`i. total [OCRerr]`un()uIlt of stor[OCRerr]'ige (ineg[OCRerr]'ibytes) 18 MI)ytes b. total computer tilue to build (approx[OCRerr]ate number of hours) TIme t([OCRerr] create Included in inverted file creation ahove. C. is the pRXess completely £`[OCRerr]utomatic? yes C. Data built from sources other th[OCRerr]'ui the input text None, other than .`;t()pw()rd file. II. Query construction (please fill out a section for e£[OCRerr]h query construction method used) A. Automatic([OCRerr]ly built queries (ad hoc) 1. topic fields used T()pk, Nationality, Narrative, Concepts, Factors, Description 2. total computer tilne to build query (cpu seconds) 1.5 seconds 3. which of the f()llowiIl(T were used'? a. term weighting with weights b[OCRerr]'[OCRerr]sed on tenus in topics (idf) III. Searching A. Total computer tilne to seardi (cpu seconds) 383 seconds (includes retrieval + ranking). 1. retrieval tilne (to[OCRerr]1 cpu seconds between when a query enters the system until a list of document numbers are obtained) 2. ranking tune (tot[OCRerr]d cpu seconds to sort d([OCRerr]ument list) B. Which methods best describe your machine se[OCRerr]'irching methods'? 1. vector sp£'lce m(idel 2. probabilistic model C. What factors are included ill your ranking? 1. tenn frequency 2. inverse d(icument frequency 8. infonnation theoretic weights 9. docuinent length IV. What machine did you conduct the TREC experilnent on'? Sun SPARC 2 How much RAM did it have? 64 MB What was the clock rate of the CPU? 40 Mhz V. Some Systems are researdi prototypes and otliers are commercial. To help compare these systems: 1. How much "software engineerinLY" went into the development of your system? Aliout 3 person-years f[OCRerr])r the SMART system itself 2 person-weeks ft)r the Fulir weighing code 2. Given appropriate resources, could your system be made to run f£'ister? By how much (estimate)'? Of course! A 6 machine distril)uted version of SMART should he faster hy a factor of 3 for hoth indexing and retrieval. 456