SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Appendix B: System Features
Appendix
National Institute of Standards and Technology
D. K. Harman
V. SYSIEM COMPARISON
NAME [OCRerr] ADS [OCRerr] UIc I DAUIOUSIE ]_MEAl) [OCRerr] UIF [OCRerr] I
Approximately 4 person TREC-2 upgrades inVolved Strictly research Approximately 280 hours 4 pen
ich weeks to "clean-up" last approximately one person- prototype. Retrofit of of programming have been monti
e years TREC-1 month. adhoc interactive system used to develop the neural devel(
ring" went experimental code - both to process filters. network. The system is systeti
the CART algorithm and Upsize to work on the implemented in C and uses
Kient? the tool we used to large item sets generated a scanner for text
convert CART trees into from the large data files. processing. The scanner is
TOPIC trees were "off- borrowed from the QA
the- shelf." system built for NASA[OCRerr] _____
Undoubtedly - if we had With parallel processing, an Yes, 100-200% faster. Yes. Our system can easily With
started out intending to order of magnitude increase in Being a prototype run in parallel. The disk 5
use TOPIC as the actual speed would be expected. optimization of processing time can be could
[OCRerr]propriate test environment, we Without parallel processing, searching for multiple approximately reduced by decon
[OCRerr], could would have designed a improvements on the order of terms was not the following ratio [EQN data I
Lem be system that made use of 100% would be expected from implemented and lots of "center [#R [time required beforE
Run TOPIC's data optimizations on current messages, including one usmg one CPU] over [#R runs,
By How preparation utilities - sofrware. Restructuring data per record, are still [total number of CPUs]]]]. sever[OCRerr]'
giving us an order of representation probably results displayed on the screen per te
magnitude speed in tree in an order of magnitude as the system churns
building. increase in a serial processing away.
__________ _______________________ mode.
We still have not Shortest path algorithm needs Functions to screen for The following sofrware Be
experimented with to be implemented. For primatives other than improvements would mc
external resources such TREC-2, only direct pair simple strings, such as benefit the retrieval rel
as part[OCRerr]I-speech taggers matches were involved. dates and names. performance: be
[OCRerr]atures are and lexicons that might Identif[OCRerr]ing indirect paths is Automatic analysis of fe[OCRerr]
hat would be used to both expand under development. Tests with contents of the data files 1. Adding inverted index Be
our the feature set and the several topics after official to assist the user in and term frequency fe[OCRerr]
complexity of the CART results were submitted showing recognizing patterns that information for term exi
trees; nor have we that use of indirect paths `may' be usefiiil when weighting. roi
experimented with using results in improvements to searching that particular 2. Using larger and more (sy
low-level topics as capture of relevant documents file. accurate Semantic ph
features - all of these at 93%. Weighting, however, lexicon. A
could be expected to give needs flirther enhancements 3. Using training text to th[OCRerr]
improved results. because improvements are at improve the neural pe:
45% for the top 1000 network performance.
document cutoff.