NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Appendix B: System Features Appendix National Institute of Standards and Technology D. K. Harman V. SYSTEM COMPARISON Very little ;;C-specific in [OCRerr] leEISIM [OCRerr] Our sy:em;s a [OCRerr] A ;:na1ble [OCRerr] U;m[OCRerr]: [OCRerr] SEC [OCRerr] 2;0m[OCRerr]3[OCRerr] [OCRerr] About 1 we[OCRerr]tWo d the search system, although than a research prototype, amount. We as hours. for TRW1 test run' V," went TREC has spurred the week it took two person have been CITY About 1 month to development of some months to build. interested in tools for TRW2 te' it? additional features. It is a algorithmic - Underlying FDF [OCRerr] generalized bibliographic aspects-- been developed ov retrieval system which has speed, and at TRW and PARi undergone continual memory and modification to meet the disk requirements of a number of requirements research projects since 1983, involving many thousands of person-hours. Disk I/O is the most serious By con- Yes, it scales Same A lot Sure. We expect sul bottleneck in searching and in verting linearly with the 30%?? as performance increa[OCRerr] outputting documents. it to a size of the CM5, CITY single FD-3 unit (3-1 `opriate Keeping entire database in true re- but even on the variety of hardware could core would speed real time by trieval current size, the firmware improveme n be > order of magnitude, CPU system, retrieval time could releases of FDF soft Wn by much less. Perhaps 3 GB I would be speeded up by a include features to [OCRerr] [OCRerr] How of core is too much to expect guess factor of 2 or 3. harness multiple H) yet. More practically, faster that we Memory parallel running the disks and faster bus. Indexing could optimization is the queries. Performan( on the other hand is CPU improve main limiting improve linearally w bound most of the time. This the factor for now. number of FDF's u[OCRerr] could be distributed over N system Incorporating ideas processors. [OCRerr]me would speed TREC experiments behave approx. like A + B/N by at automatic query geti + CN for a given database, least 20- will reduce query si[OCRerr] where typically B > A, and B fold improve system perf ť C. Such tools are unde development. There is Normalization System has a Same The an (document length, relatively as major itemize cosine basic user CITY feature d list in normalization, interface. that is ures are my etc.), more No designed at would paper. elaborate use of mechanism but not [OCRerr]r the proximity for feedback yet imple- information and or other mented is clustering of terms, query the use of query expansion, modification. feedback. stemming. Use of [[OCRerr] relevance feedback.