SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project
chapter
J. Yang
R. Korfhage
E. Rasmussen
National Institute of Standards and Technology
Donna K. Harman
6e Statistical Data
(1) Computing time:
Time is given for two types of processing.
(a) Time for document processing:
Document processing time includes keyword extraction, sorting and merge, and
building inverted files. Two systems were used, a SUN -670 and SUN SPARQIPC(s),
depending on availability. Table 1 provides the time used on several stages in document
processing.
(b) Time for document retrieval per generation
The time required to retrieve documents for each iteration (generation) depends on
several factors: the number of terms in a query, the generation, and the computing system
used. Table 2 gives the average retrieval time for three generations, based on topics 1-50 and
the first dataset (disk one).
(2) Storage space for data structures:
Several types of data structure were created to facilitate retrieval. They are inverted
files, indexed files and address files which consist of document numbers and their locations in
the databases. The storage space used for these files (disk one only) is shown in Table 3.
(3) Machine capability:
The capabilities of our systems are:
SUN-670: 32 Megabytes RAM and 40 MHz CPU clock rate;
SUN SPARC/IPC: 24 Megabytes RAM and 25 MHz CPU clock rate.
(4) Manpower:
There are in total four persons participating in the project, two Ph.D. students and two
faculty. One Ph.D. student worked full time on this project, and the other part time, one
faculty member half time and the other part time.
Table 1. Time for document processing (Unit: min.)
DOE AP ZIFF WSJ FR
19st 1'st 2nd 1[OCRerr]st 2[OCRerr]nd 1[OCRerr]st 2nd 1[OCRerr]st 2nd
___________________ ataset ataset ataset ataset ataset ataset ataset ataset ataset
Keywordextraction 129 148 169 130 43 182 125 19 16
Sortingandmerge 293 281 NA 248 72 404 398 18 11
Create inverted files 29 NA 27 28 11 NA 34 2 3
* NA: Not Available
39