SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman g. tokeiiizer (recogIlizes d£[OCRerr]tes, phone numbers, coimnon pattenis) Dates are recognized l)y the QA System l)ut were not used for the TREC experiments. i. expansion of queries usint[OCRerr] previously-constructed data structure (from part I) (1) which structure? Semantic lexicon descril)ed in I.C.1. III. Searchiug A. Total computer ti'ne to se[OCRerr]'trch (cpu seconds) 3-1([OCRerr] minutes per ([OCRerr]uery to retrieve and rank. 1. retrieval ti'ne (total CPU seconds between when a query enters tlie system Until a list of docuineut numbers [OCRerr]tre obLimed) 2. ranking time (t()t[OCRerr]tl CPU seconds to sort d([OCRerr]ument list) B. Which methods best describe [OCRerr]()U[OCRerr] `n[OCRerr][OCRerr]chine se[OCRerr]Lrching inetliods? 1. vector space m(XIel C. What f[OCRerr]tciors [OCRerr]tre included ill [OCRerr]()U[OCRerr] ru[OCRerr]ing? 1. tenn frequency 2. inverse d([OCRerr]ument frequency 9. docwnent length IV. What machine did yoU conduct the TREC experilnent on? We used nine IBM P512 Model 95 computers. These were [OCRerr] MHz 486 computers with 8 megahytes [OCRerr] RAM. Tw[OCRerr][OCRerr] of them had 16 megal)ytes of RAM. A 33 MHz 486 PC was used to distrihute text to the nine IBM PCS fi)r indexing and ([OCRerr]uery processing. How much RAM did it have? What [OCRerr] the clock rate of the CPU? V. Some Systems are research prototypes and others are coi"inerci[OCRerr]'tl. To help compare these systems: 1. How much "soRware engineeriIl[OCRerr]" went into the development of yoUr system? Our QA System (huilt for NASA and restricted to an IBM compatihle PC platform running under DOS and using [OCRerr]() other license agreement commercial software such as a DOS extender) is a prototype and has heen under development for one and a half years. Approximately 2,E)()([OCRerr] hours ([OCRerr]f programming have heen used to develop the current s([OCRerr]ftware. The system is implemented in C and uses B-tree structures for the inverted file structure. We felt our system was not fast enough to appear reasonahle f[OCRerr])r TREC, so we designed a separate system without a pleasant user interface which used a hashing scheme to estal)lish codes for strings to cut down on st([OCRerr]rage space; we also eliminated the use of B-trees in this separate system. We custom huilt a system for TREC during July and August; approximately 400 hours of programming and dehugging went into this effort. The custom system generated the results which we sent in. H([OCRerr]wever, we are now trying to pr[OCRerr]'duce some semantic results using the original QA System. 2. (jivel' appr()pnate resources. could your system be made to run f£[OCRerr]ter? By how much (estimate)? Assuming we stay with DOS then we could easily run 8 to 16 times faster using the following: Hardware Improvements: 1. New 66 MHz PCs now on the market. 2. Multiple hard drives. 3. 16 [OCRerr] 32 megahytes of RAM instead of 8 megahytes to he used for a larger disk cache and for ()U[OCRerr] hashing algorithms. 482