SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
C. Feedback [OCRerr] hoc)
1. iflitiLil query built by me[OCRerr]()d 1 or Ineth(ld 2?
Initial (1UCI-ie.[OCRerr] were 1)uilt by human fl[OCRerr]()m sul)set of topic keywords.
2. type of per.[OCRerr]on doing leedh.ick
b. .[OCRerr]y.[OCRerr]te'n expert computer .%ystenl analyst
3. (Lver[OCRerr]Ige time to do complete feedbtek We did this manually.
a. cpu tilne (total cpu .`;ec()nd.[OCRerr] for all iterations)
A human refining tlie (lueries fl)r an hour might use 1([OCRerr] minutes of FDF
time.
b. cl(lck time floin initifi construction of query to completion of final query (minutes)
Feedl)ackl(1uery refinement was done manually. Some topics were fairly
easy, with reasonable results being achieved in less than an hour. Others,
took several hours.
4. average number of iterations
a. average nuinber of documents exatnined per iteration Typically 2()-3().
5. minimum number of iterItions [OCRerr]1aybe 1([OCRerr].
6. m[OCRerr]ixiinum number of iterations [OCRerr]Iaybe 1()().
7. what determines the end of in iteration?
Each iteration is (1) the human updates the (lueries, (2) the machine executes, (3) the
Ii uman reviews the retrieved documents.
We stopped working oil a topic when it seemed that the results were converging to
practical limit for our approach, i.e., adding additional synonym keywords, or
changing the ([OCRerr]uery structure, wasn't g()iOg to produce more reasonable results.
8. feedback me[OCRerr]()ds used
d. m[OCRerr][OCRerr]ual methods
(1) using individual judgment with 110 Set [OCRerr]ilgon[OCRerr]m
After working through the first dozen (jr so topics, we started to fall
into a semi-routine. We are still thinking about the nature of this
"routine" and what types of tools could help automate it.
E. Manually c()nstructed queries (r()utinL')
Same answers as fl)r ad h(Jc. If fact, given our query language approach, final ad hoc queries
and r([OCRerr]uting queries are the same.
III. Searching
A. Total computer t[OCRerr]e to search (cpu seconds)
1. retrieval tilne (toLil cpu seconds between when a query enters Ilie system until a list of
document numbers are obLimed)
Time to process a single set of topic queries against 1.2GB is 2-3 minutes.
Time to load the tipster corpus (read from CD-RoM, decompress, and load onto
FDF's disk) was less than 8 hours.
2. ranking time (total cpu seconds to sort d(lcument list) 1-2 seconds
B. Which methods best describe your machine searching me[OCRerr]()ds?
7. flee text scanning
T(J perf([OCRerr]rm the actual searches, we used the fast data finder (FDF) text search
hardware. The FDF implements a wide variety of pattern matching functions
including w()rdLstringlphrase matching, fuzzy matches, Boolean logic, proximity
operators, term counting, term completeness, and numeric ranging.
C. VVhat factors are included in your ranking?
5. positi()n in document
7. proxilnity of terms
508