SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Appendix B: System Features Appendix National Institute of Standards and Technology D. K. Harman V. SYSThM COMPARISON NAME ] GE [OCRerr] CLARIT J 151 ] HNC ] NYU ] SYRACUSE_1__ The system used for The 151 system was Severai years Quite a lot of code Not much. It Q' [OCRerr]IREC-2 processing was built as a research rewriting was done was the first rej developed as a prototype to look at to adjust NIST prototype to university-research human interface system to handle the testing with reE ch prototype. It is issues, and designed large index (8 times the current rei engineered for to work on much larger than without functionalities ini ing" went Zero robustness and smaller databases. compound terms). fil[OCRerr] flexibility, rather than I'd guess that about lent? speed. Most of the 1-2 person years components of the were spent on system are less than two various aspects of years old. The the system. research-prototype code (essentially all C) is not _________ ____________________ production-quality. For routing, it could be We anticipate at least Yes, 20%AO% Base IR system is Yes, with Y[OCRerr] a bit faster. For an order of magnitude searching much better than it careful design cc retrieval, it is speed improvement in entire was during TREC-1. of the data trE' )propriate compatible with any the system within the database. However, second structures and P[OCRerr]' S, could inverted indexing next six months. This Many orders phase of index elimination of us Lem be strategy. will be possible due to of magnitude building is still slow the features M Run (1) re[OCRerr]ngineering of the faster with and fragile. added for su By How system and (2) the use document experimental n[OCRerr] of optimization utilities clustering Qiad purposes, the cc sold for the DEC an order of 15 speed can be si, AU[OCRerr]HA platform. [OCRerr]e speed-up on a improved current 0SF compiler foreign significantly, does not optimize code corpus.) at least by an appropriately for the order of AlPHA (64 bit) magnitude. architecture, with _________ _____________________ disappointing results). [1] _____________ __________________ _____________ This approach is very The CLARIT TREC-2 Better tokenization, Word sense - There is still a lot [OCRerr] simple and has no fancy system did not take including proper disambiguation of room for di atures are features. Better advantage of several noun identification, (already in improvement of b[OCRerr] Lhat would tokenization, special processing options that phrases, and early NIiP programs. p] [OCRerr]ur purpose query handling, may have given perhaps some better development). A feedback [OCRerr] proximity, and negation, improved results, treatment of "nots." Document mechanism would for example, could help including tokenization, Precision enhancing cluster (speed be helpful. a lot, as would better subAexicon discovery methods would also up retrievals). - Faster indexing ranking. over training sets, and help some. EQ[OCRerr]ass discovery for I thesaurus terms. [OCRerr]___________________ ____________________________________ [2] __