SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman Map from int[OCRerr]rnal concept to token string a. total aiflOulit of st()r[OCRerr]1[OCRerr]e (`ne[OCRerr][OCRerr]'[OCRerr]bytes) 13 Ml)ytes b. tot[OCRerr]'Il co'npu ter time to build (approxli[OCRerr]ate number of hours) TIme to create included in inverted tile creation of Dl. C. is the pr([OCRerr]ess completely automatic? yes C. Data built from sources other th[OCRerr][OCRerr]i [OCRerr]e input text NE[OCRerr]Ile, other than st()pw()rd tile. II. Query Construction (please fill out a section for each query construction method used) D. Aut()Inatic[OCRerr]illy built queries (routing) 1. topic fields used all 2. toL[OCRerr] computer tilne to build query (cpu seconds) 1300 seconds, not including time to Index Dl (3.0 hours) 3. which of the fi)llowiIlg were used in building the query? a. terms selected from (1) topic b. tefln weighting (1) with weights based on terms in topics (2) with weights b[OCRerr]ised on terins in all training documents (3) with weights [OCRerr]sed on terms from documents with relevance judgments III. Searching A. TotŁ[OCRerr] computer tilne to se[OCRerr]irch (cpu seconds) 312 seconds (includes retrieval + ranking). 1. retrievLil tilne (t()[OCRerr]l cpu seconds between when a query enters the system until a list of document numbers aic ()bL[OCRerr]inCd) 2. ranking time (tot[OCRerr]'il cpu seconds to sort d([OCRerr]uInenI list) B. Which methods best describe your in[OCRerr][OCRerr]chii'e se[OCRerr]irching methods? 1. vector space m(xlel 2. probabilistic model C. What fac(()rs [OCRerr]Lre included ill [OCRerr]()w. raiiking? 1. terin frequency 2. inverse d([OCRerr]uInent frequency 8. infonnation theoretic weights 9. document length IV. What machine did you conduct the TREC experilnent oil'? Sun SPARC 2 How much RAM did it have? 64 MB What wŁL[OCRerr] the clock rate of the CPU? 40 MHz V. Some systems are research prolotypes [OCRerr]ind others Łue c()mmerci[OCRerr]d. To help compare these systems: 1. how much "software engineering" went into tl)e development of your system? AI)out 3 person-years f[OCRerr])r the SMART system itself 2. ("iv en appropriate resources, could your system be made to run f[OCRerr]'tster? By how much (Cstim[OCRerr]ite)? Of course! Due to algorithm flaw, CPU time f[OCRerr])r constructing routing [OCRerr]iuery is al)out a factor of 462