SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman 6. which of the f()ll()wifl(' were u.[OCRerr]ed? b. Booleall C()flIleCtor[OCRerr] (AN[), OR, NOT) C. proxilnity oper[OCRerr]I(()r.[OCRerr] C. other (bnef de[OCRerr]Cripti()fl) system lexicon, statistical analysis of samples matched by initial (iueries III. Searching A. Total computer tilne to NeLUCh (Cpu NeC()fldN) 1. retiieval tilne (totil CPU NecondN between when a query enters the system Until a list of document numbers tie obLimed) AI)out 2([OCRerr] h([OCRerr]uI's = 72l)([OCRerr]) CPU seconds. As the documents are not pre-indexed, this includes all operations oil all documents 2. ranking time (tolil C[OCRerr]U seconds 10 sort d(X'ulnent list) Al)()ut 3()() CPU seconds B. Which methods best descnbe Y()U[OCRerr] machine searching methods? 5. Boolean matching 7. free text sCan'lin(T C. What faCt()rs are inCluded in your ri'iing? 5. position in doCument 15. other (specify) Numl)er ([OCRerr]f hiL%' on topic description IV. What maChine did you ConduCt the TREC experilnent on'? SUN SPARCstation-2 [low muCh RAM did it have? 48 Meg What wŁ[OCRerr][OCRerr] the clock rate of the (?PU? standard V. Some systems are reseŁu[OCRerr]ch prototypes and others [OCRerr]u.e C()mmerciŁ'il. To llelp COIn[OCRerr]&UC these systems: Our system used a pattern matcher and lexicon that have l)een commercially developed, but the basic Boolean document processing engine was developed for TREC in a few days 1. How much "software en[OCRerr]ineering" went into the development of your system? 2 days for the l)asic engine 2. Given appr()pflate resources, could your system be made to run t[OCRerr]L[OCRerr]ter? By how much (estimate)? Processing time per document could easily be improved by a factor of 2. Processing time f(Jr ad hoc retrieval could be impr(Jved by a factor of about 1(K[OCRerr]OOO by using an inverted indexing strategy, at a cost of additional storage and indexing time for the corpus. 3. What features is your system missing that it would benefit by if it had them? Automatic query generati[OCRerr][OCRerr]n, aids f[OCRerr][OCRerr]r compiling queries from higher-level descriptions 506