SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project chapter J. Yang R. Korfhage E. Rasmussen National Institute of Standards and Technology Donna K. Harman 6e Statistical Data (1) Computing time: Time is given for two types of processing. (a) Time for document processing: Document processing time includes keyword extraction, sorting and merge, and building inverted files. Two systems were used, a SUN -670 and SUN SPARQIPC(s), depending on availability. Table 1 provides the time used on several stages in document processing. (b) Time for document retrieval per generation The time required to retrieve documents for each iteration (generation) depends on several factors: the number of terms in a query, the generation, and the computing system used. Table 2 gives the average retrieval time for three generations, based on topics 1-50 and the first dataset (disk one). (2) Storage space for data structures: Several types of data structure were created to facilitate retrieval. They are inverted files, indexed files and address files which consist of document numbers and their locations in the databases. The storage space used for these files (disk one only) is shown in Table 3. (3) Machine capability: The capabilities of our systems are: SUN-670: 32 Megabytes RAM and 40 MHz CPU clock rate; SUN SPARC/IPC: 24 Megabytes RAM and 25 MHz CPU clock rate. (4) Manpower: There are in total four persons participating in the project, two Ph.D. students and two faculty. One Ph.D. student worked full time on this project, and the other part time, one faculty member half time and the other part time. Table 1. Time for document processing (Unit: min.) DOE AP ZIFF WSJ FR 19st 1'st 2nd 1[OCRerr]st 2[OCRerr]nd 1[OCRerr]st 2nd 1[OCRerr]st 2nd ___________________ ataset ataset ataset ataset ataset ataset ataset ataset ataset Keywordextraction 129 148 169 130 43 182 125 19 16 Sortingandmerge 293 281 NA 248 72 404 398 18 11 Create inverted files 29 NA 27 28 11 NA 34 2 3 * NA: Not Available 39