SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
V. SYSTEM COMPARISON
Very little ;;C-specific in [OCRerr] leEISIM [OCRerr] Our sy:em;s a [OCRerr] A ;:na1ble [OCRerr] U;m[OCRerr]: [OCRerr] SEC [OCRerr] 2;0m[OCRerr]3[OCRerr] [OCRerr] About 1 we[OCRerr]tWo d
the search system, although than a research prototype, amount. We as hours. for TRW1 test run'
V," went TREC has spurred the week it took two person have been CITY About 1 month to
development of some months to build. interested in tools for TRW2 te'
it? additional features. It is a algorithmic - Underlying FDF [OCRerr]
generalized bibliographic aspects-- been developed ov
retrieval system which has speed, and at TRW and PARi
undergone continual memory and
modification to meet the disk
requirements of a number of requirements
research projects since 1983,
involving many thousands of
person-hours.
Disk I/O is the most serious By con- Yes, it scales Same A lot Sure. We expect sul
bottleneck in searching and in verting linearly with the 30%?? as performance increa[OCRerr]
outputting documents. it to a size of the CM5, CITY single FD-3 unit (3-1
`opriate Keeping entire database in true re- but even on the variety of hardware
could core would speed real time by trieval current size, the firmware improveme
n be > order of magnitude, CPU system, retrieval time could releases of FDF soft
Wn by much less. Perhaps 3 GB I would be speeded up by a include features to [OCRerr]
[OCRerr] How of core is too much to expect guess factor of 2 or 3. harness multiple H)
yet. More practically, faster that we Memory parallel running the
disks and faster bus. Indexing could optimization is the queries. Performan(
on the other hand is CPU improve main limiting improve linearally w
bound most of the time. This the factor for now. number of FDF's u[OCRerr]
could be distributed over N system Incorporating ideas
processors. [OCRerr]me would speed TREC experiments
behave approx. like A + B/N by at automatic query geti
+ CN for a given database, least 20- will reduce query si[OCRerr]
where typically B > A, and B fold improve system perf
» C. Such tools are unde
development.
There is Normalization System has a Same The
an (document length, relatively as major
itemize cosine basic user CITY feature
d list in normalization, interface. that is
ures are my etc.), more No designed
at would paper. elaborate use of mechanism but not
[OCRerr]r the proximity for feedback yet imple-
information and or other mented is
clustering of terms, query the use of
query expansion, modification. feedback.
stemming. Use of
[[OCRerr] relevance feedback.