SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
1. topic fields used <desc>, <narr>, <con>, <det'>
2. total computer tune to build query (cpu seconds)
takes less than [OCRerr] Seconds to huild the classitication tree including feature extraction-
-this dE)e5 depend on the size of tile training set though
3. which of the followin[OCRerr] were used in building [OCRerr]e query?
a. terms selected from
(1) topic yes
(2) all tr(Lining documents no
(3) only documents with relevance judgments
yes--including 5()[OCRerr]C additional judgments generated l)y us
k. other (brief description)
f[OCRerr]ature counts--in this case these are just word counts
III. Searching
A. Total computer t[OCRerr]e to search (cpu seconds)
1. retrieval tilne (total cpu seconds between when a query enters [OCRerr]e system until a list of
document numbers are obtalned)
approximately 20 hours (sic) of elapsed time (jn the WSJ test set--no accurate
measures of CPU time availal)le to us
2. rankuig time (total cpu seconds to sort d([OCRerr]ument list)
approximately 5 minutes of elapsed time--no accurate measures of CPU time
availahie to us
B. Which methods best describe your machine searchilig metliods?
10. other (describe)
l)inary classification algorithm liased on counts of feature occurrence in the TES
document
C. What factors are included in your ranking?
15. other (specify)
statistical estimate of the misclassification rate (prol)al)ility) of the classifier
IV. What machine did you conduct tlie TREC experiment on?
Sun SPARCstation IPC
How much RAM did it have?
24M1)
What wLts the clock rate of tlie CP[OCRerr]J?
2([OCRerr]MHz
V. Some Systems are resear[OCRerr] prototypes and others are commercial.
To help compare these systems:
1. How much "software engineerilig" went mx) the development of your system?
approximately 4 person-weeks for the TREC infrastructure--the CART algorithm
implementation used was `1otT the shelf"
2. Given appropriate resources, could your system be made to run t[OCRerr][OCRerr]Lster? By how much
(estimate)?
Al)s()lutely! The feature extraction algorithms were not optimized for speed, and no
datal)ase or indexes were huilt to do the testing. With faster algorithms and a set
of inverted indexes, we estimate a d([OCRerr]ument could he classified in less than 1 second.
3. What features is your system missing that it would benefit by if it had them?
We would like t([OCRerr] experiment with "([OCRerr]ff the shelf" to()ls to assist in feature
485