SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project chapter J. Yang R. Korfhage E. Rasmussen National Institute of Standards and Technology Donna K. Harman experiments show that from generation to generation the average weight of each term in the query individuals gradually moves to the value in the final converged generation, although small variations may exist. As an example, Table 6 shows the changes of the average term weights on topic 3, from generation 0 to generation 5. The values on generation 5 are almost identical with those in Table 4. (2) Effects of query term weight modification The goal of query convergence using the genetic algorithm is to find the query individual with highest performance, that is, retrieving more relevant documents than its predecessors. Evidence from the experiments has shown the GA works as expected. In most cases, new relevant documents were brought in to the user in each generation, until convergence at the final generation. Table 7 through Table 11 show the numbers of new relevant documents retrieved in each generation for the five databases. Observing the results, one interesfing phenomenon arises; that is, for a topic the algorithm may retrieve different numbers of relevant documents on distinct databases. It seems reasonable since the five databases may concentrate on different areas. Moreover, the retrieval patterns for WSJ and AP databases, which should be interesting in the same topic, looks similar with each other. Table 4 Query individuals on Topic 3 (11 terms) Generation = 0 o 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 1 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 0.63 2 0.31 0.35 0.92 0.52 0.40 0.61 0.79 0.93 0.87 0.87 0.67 3 0.76 0.58 0.39 0.36 0.20 0.83 0.42 0.46 0.98 0.13 0.21 4 0.96 0.74 0.41 0.78 0.76 0.96 0.03 0.32 0.76 0.24 0.59 5 0.04 0.96 0.32 0.06 0.44 0.92 0.57 0.12 0.57 0.25 0.50 6 0.24 0.48 0.41 0.87 0.43 0.36 0.38 0.04 0.16 0.52 0.70 7 0.10 0.40 0.77 0.24 0.34 0.23 0.30 0.30 0.89 0.04 0.65 8 0.40 0.68 0.73 0.94 0.23 0.84 0.97 0.78 0.43 0.67 0.81 9 0.16 0.28 0.14 0.86 0.75 0.21 0.14 0.29 0.80 0.22 0.56 Generation = S 0 0.28 0.37 0.98 0.54 0.77 0.76 0.03 0.45 0.69 0.24 0.78 1 0.28 0.22 0.98 0.54 0.77 0.56 0.03 0.32 0.76 0.24 0.58 2 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 3 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 4 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 5 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 6 0.28 0.42 0.98 0.54 0.77 0.66 0.13 0.45 0.69 0.24 0.78 7 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.32 0.76 0.24 0.58 8 0.28 0.37 0.98 0.54 0.77 0.67 0.13 0.32 0.76 0.24 0.55 9 0.28 0.37 0.98 0.54 0.77 0.66 0.03 0.32 0.76 0.24 0.58 41