SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
6. which of the f()ll()wifl(' were u.[OCRerr]ed?
b. Booleall C()flIleCtor[OCRerr] (AN[), OR, NOT)
C. proxilnity oper[OCRerr]I(()r.[OCRerr]
C. other (bnef de[OCRerr]Cripti()fl)
system lexicon, statistical analysis of samples matched by initial (iueries
III. Searching
A. Total computer tilne to NeLUCh (Cpu NeC()fldN)
1. retiieval tilne (totil CPU NecondN between when a query enters the system Until a list of
document numbers tie obLimed)
AI)out 2([OCRerr] h([OCRerr]uI's = 72l)([OCRerr]) CPU seconds. As the documents are not pre-indexed, this
includes all operations oil all documents
2. ranking time (tolil C[OCRerr]U seconds 10 sort d(X'ulnent list) Al)()ut 3()() CPU seconds
B. Which methods best descnbe Y()U[OCRerr] machine searching methods?
5. Boolean matching
7. free text sCan'lin(T
C. What faCt()rs are inCluded in your ri'iing?
5. position in doCument
15. other (specify) Numl)er ([OCRerr]f hiL%' on topic description
IV. What maChine did you ConduCt the TREC experilnent on'? SUN SPARCstation-2
[low muCh RAM did it have? 48 Meg
What wŁ[OCRerr][OCRerr] the clock rate of the (?PU? standard
V. Some systems are reseŁu[OCRerr]ch prototypes and others [OCRerr]u.e C()mmerciŁ'il.
To llelp COIn[OCRerr]&UC these systems:
Our system used a pattern matcher and lexicon that have l)een commercially developed, but
the basic Boolean document processing engine was developed for TREC in a few days
1. How much "software en[OCRerr]ineering" went into the development of your system?
2 days for the l)asic engine
2. Given appr()pflate resources, could your system be made to run t[OCRerr]L[OCRerr]ter? By how much
(estimate)?
Processing time per document could easily be improved by a factor of 2. Processing
time f(Jr ad hoc retrieval could be impr(Jved by a factor of about 1(K[OCRerr]OOO by using
an inverted indexing strategy, at a cost of additional storage and indexing time for
the corpus.
3. What features is your system missing that it would benefit by if it had them?
Automatic query generati[OCRerr][OCRerr]n, aids f[OCRerr][OCRerr]r compiling queries from higher-level
descriptions
506