NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman C. Feedback [OCRerr] hoc) 1. iflitiLil query built by me[OCRerr]()d 1 or Ineth(ld 2? Initial (1UCI-ie.[OCRerr] were 1)uilt by human fl[OCRerr]()m sul)set of topic keywords. 2. type of per.[OCRerr]on doing leedh.ick b. .[OCRerr]y.[OCRerr]te'n expert computer .%ystenl analyst 3. (Lver[OCRerr]Ige time to do complete feedbtek We did this manually. a. cpu tilne (total cpu .`;ec()nd.[OCRerr] for all iterations) A human refining tlie (lueries fl)r an hour might use 1([OCRerr] minutes of FDF time. b. cl(lck time floin initifi construction of query to completion of final query (minutes) Feedl)ackl(1uery refinement was done manually. Some topics were fairly easy, with reasonable results being achieved in less than an hour. Others, took several hours. 4. average number of iterations a. average nuinber of documents exatnined per iteration Typically 2()-3(). 5. minimum number of iterItions [OCRerr]1aybe 1([OCRerr]. 6. m[OCRerr]ixiinum number of iterations [OCRerr]Iaybe 1()(). 7. what determines the end of in iteration? Each iteration is (1) the human updates the (lueries, (2) the machine executes, (3) the Ii uman reviews the retrieved documents. We stopped working oil a topic when it seemed that the results were converging to practical limit for our approach, i.e., adding additional synonym keywords, or changing the ([OCRerr]uery structure, wasn't g()iOg to produce more reasonable results. 8. feedback me[OCRerr]()ds used d. m[OCRerr][OCRerr]ual methods (1) using individual judgment with 110 Set [OCRerr]ilgon[OCRerr]m After working through the first dozen (jr so topics, we started to fall into a semi-routine. We are still thinking about the nature of this "routine" and what types of tools could help automate it. E. Manually c()nstructed queries (r()utinL') Same answers as fl)r ad h(Jc. If fact, given our query language approach, final ad hoc queries and r([OCRerr]uting queries are the same. III. Searching A. Total computer t[OCRerr]e to search (cpu seconds) 1. retrieval tilne (toLil cpu seconds between when a query enters Ilie system until a list of document numbers are obLimed) Time to process a single set of topic queries against 1.2GB is 2-3 minutes. Time to load the tipster corpus (read from CD-RoM, decompress, and load onto FDF's disk) was less than 8 hours. 2. ranking time (total cpu seconds to sort d(lcument list) 1-2 seconds B. Which methods best describe your machine searching me[OCRerr]()ds? 7. flee text scanning T(J perf([OCRerr]rm the actual searches, we used the fast data finder (FDF) text search hardware. The FDF implements a wide variety of pattern matching functions including w()rdLstringlphrase matching, fuzzy matches, Boolean logic, proximity operators, term counting, term completeness, and numeric ranging. C. VVhat factors are included in your ranking? 5. positi()n in document 7. proxilnity of terms 508