NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman 1. topic fields used <desc>, <narr>, <con>, <det'> 2. total computer tune to build query (cpu seconds) takes less than [OCRerr] Seconds to huild the classitication tree including feature extraction- -this dE)e5 depend on the size of tile training set though 3. which of the followin[OCRerr] were used in building [OCRerr]e query? a. terms selected from (1) topic yes (2) all tr(Lining documents no (3) only documents with relevance judgments yes--including 5()[OCRerr]C additional judgments generated l)y us k. other (brief description) f[OCRerr]ature counts--in this case these are just word counts III. Searching A. Total computer t[OCRerr]e to search (cpu seconds) 1. retrieval tilne (total cpu seconds between when a query enters [OCRerr]e system until a list of document numbers are obtalned) approximately 20 hours (sic) of elapsed time (jn the WSJ test set--no accurate measures of CPU time availal)le to us 2. rankuig time (total cpu seconds to sort d([OCRerr]ument list) approximately 5 minutes of elapsed time--no accurate measures of CPU time availahie to us B. Which methods best describe your machine searchilig metliods? 10. other (describe) l)inary classification algorithm liased on counts of feature occurrence in the TES document C. What factors are included in your ranking? 15. other (specify) statistical estimate of the misclassification rate (prol)al)ility) of the classifier IV. What machine did you conduct tlie TREC experiment on? Sun SPARCstation IPC How much RAM did it have? 24M1) What wLts the clock rate of tlie CP[OCRerr]J? 2([OCRerr]MHz V. Some Systems are resear[OCRerr] prototypes and others are commercial. To help compare these systems: 1. How much "software engineerilig" went mx) the development of your system? approximately 4 person-weeks for the TREC infrastructure--the CART algorithm implementation used was `1otT the shelf" 2. Given appropriate resources, could your system be made to run t[OCRerr][OCRerr]Lster? By how much (estimate)? Al)s()lutely! The feature extraction algorithms were not optimized for speed, and no datal)ase or indexes were huilt to do the testing. With faster algorithms and a set of inverted indexes, we estimate a d([OCRerr]ument could he classified in less than 1 second. 3. What features is your system missing that it would benefit by if it had them? We would like t([OCRerr] experiment with "([OCRerr]ff the shelf" to()ls to assist in feature 485