SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project chapter J. Yang R. Korfhage E. Rasmussen National Institute of Standards and Technology Donna K. Harman Table 2. Average retrieval time for topics 1-50 on the first dataset [OCRerr]rnt: minutes) Generation 0 Generation 1 Generation 2 3:56 2:19 2:05 Table 3. Total amount of storage for inverted and indexed files -- disk one only [OCRerr]rnt: Megabyte[OCRerr] DOE AP ZIFF WSJ Invertedfiles 162.3 199.8 143.7 223.4 Indexed files 3.0 2.2 2.4 2.1 Addressfiles 4.3 1.7 1.7 2.6 7. Results This section describes several results of using the genetic algorithm in the TREC document collection. Examples provided are from training queries (topic 1 to 50) on the DOE database. (1) Query convergence In large document collections like the TREC databases, the genetic algorithm caused the query variants to converge within 3 to 6 generations in most cases. For an example, Table 4 shows the term weights of query individuals in the first generation, 0, and last generation, 5, on topic 3. For most of the query terms the weights on the query individuals converged to a single value in the final generation. Although a few variations existed, they were caused by the mutation operation. Table 5 shows a similar situation for topic 12, where the final generation is 4. An interesting phenomenon is how the query term weights changed. The genetic operators select the query individuals which have higher performance values than the average performance of all the individuals and exchange parts of their term weights by using mutation and crossover. Although the two operations are random, the results are interesting. The 40