SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Appendix B: System Features Appendix National Institute of Standards and Technology D. K. Harman V. SYSIEM COMPARISON NAME [OCRerr] ADS [OCRerr] UIc I DAUIOUSIE ]_MEAl) [OCRerr] UIF [OCRerr] I Approximately 4 person TREC-2 upgrades inVolved Strictly research Approximately 280 hours 4 pen ich weeks to "clean-up" last approximately one person- prototype. Retrofit of of programming have been monti e years TREC-1 month. adhoc interactive system used to develop the neural devel( ring" went experimental code - both to process filters. network. The system is systeti the CART algorithm and Upsize to work on the implemented in C and uses Kient? the tool we used to large item sets generated a scanner for text convert CART trees into from the large data files. processing. The scanner is TOPIC trees were "off- borrowed from the QA the- shelf." system built for NASA[OCRerr] _____ Undoubtedly - if we had With parallel processing, an Yes, 100-200% faster. Yes. Our system can easily With started out intending to order of magnitude increase in Being a prototype run in parallel. The disk 5 use TOPIC as the actual speed would be expected. optimization of processing time can be could [OCRerr]propriate test environment, we Without parallel processing, searching for multiple approximately reduced by decon [OCRerr], could would have designed a improvements on the order of terms was not the following ratio [EQN data I Lem be system that made use of 100% would be expected from implemented and lots of "center [#R [time required beforE Run TOPIC's data optimizations on current messages, including one usmg one CPU] over [#R runs, By How preparation utilities - sofrware. Restructuring data per record, are still [total number of CPUs]]]]. sever[OCRerr]' giving us an order of representation probably results displayed on the screen per te magnitude speed in tree in an order of magnitude as the system churns building. increase in a serial processing away. __________ _______________________ mode. We still have not Shortest path algorithm needs Functions to screen for The following sofrware Be experimented with to be implemented. For primatives other than improvements would mc external resources such TREC-2, only direct pair simple strings, such as benefit the retrieval rel as part[OCRerr]I-speech taggers matches were involved. dates and names. performance: be [OCRerr]atures are and lexicons that might Identif[OCRerr]ing indirect paths is Automatic analysis of fe[OCRerr] hat would be used to both expand under development. Tests with contents of the data files 1. Adding inverted index Be our the feature set and the several topics after official to assist the user in and term frequency fe[OCRerr] complexity of the CART results were submitted showing recognizing patterns that information for term exi trees; nor have we that use of indirect paths `may' be usefiiil when weighting. roi experimented with using results in improvements to searching that particular 2. Using larger and more (sy low-level topics as capture of relevant documents file. accurate Semantic ph features - all of these at 93%. Weighting, however, lexicon. A could be expected to give needs flirther enhancements 3. Using training text to th[OCRerr] improved results. because improvements are at improve the neural pe: 45% for the top 1000 network performance. document cutoff.