SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project
chapter
J. Yang
R. Korfhage
E. Rasmussen
National Institute of Standards and Technology
Donna K. Harman
experiments show that from generation to generation the average weight of each term in the
query individuals gradually moves to the value in the final converged generation, although
small variations may exist. As an example, Table 6 shows the changes of the average term
weights on topic 3, from generation 0 to generation 5. The values on generation 5 are almost
identical with those in Table 4.
(2) Effects of query term weight modification
The goal of query convergence using the genetic algorithm is to find the query
individual with highest performance, that is, retrieving more relevant documents than its
predecessors. Evidence from the experiments has shown the GA works as expected. In most
cases, new relevant documents were brought in to the user in each generation, until
convergence at the final generation. Table 7 through Table 11 show the numbers of new
relevant documents retrieved in each generation for the five databases.
Observing the results, one interesfing phenomenon arises; that is, for a topic the
algorithm may retrieve different numbers of relevant documents on distinct databases. It
seems reasonable since the five databases may concentrate on different areas. Moreover, the
retrieval patterns for WSJ and AP databases, which should be interesting in the same topic,
looks similar with each other.
Table 4 Query individuals on Topic 3 (11 terms)
Generation = 0
o 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39
1 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 0.63
2 0.31 0.35 0.92 0.52 0.40 0.61 0.79 0.93 0.87 0.87 0.67
3 0.76 0.58 0.39 0.36 0.20 0.83 0.42 0.46 0.98 0.13 0.21
4 0.96 0.74 0.41 0.78 0.76 0.96 0.03 0.32 0.76 0.24 0.59
5 0.04 0.96 0.32 0.06 0.44 0.92 0.57 0.12 0.57 0.25 0.50
6 0.24 0.48 0.41 0.87 0.43 0.36 0.38 0.04 0.16 0.52 0.70
7 0.10 0.40 0.77 0.24 0.34 0.23 0.30 0.30 0.89 0.04 0.65
8 0.40 0.68 0.73 0.94 0.23 0.84 0.97 0.78 0.43 0.67 0.81
9 0.16 0.28 0.14 0.86 0.75 0.21 0.14 0.29 0.80 0.22 0.56
Generation = S
0 0.28 0.37 0.98 0.54 0.77 0.76 0.03 0.45 0.69 0.24 0.78
1 0.28 0.22 0.98 0.54 0.77 0.56 0.03 0.32 0.76 0.24 0.58
2 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78
3 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78
4 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78
5 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78
6 0.28 0.42 0.98 0.54 0.77 0.66 0.13 0.45 0.69 0.24 0.78
7 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.32 0.76 0.24 0.58
8 0.28 0.37 0.98 0.54 0.77 0.67 0.13 0.32 0.76 0.24 0.55
9 0.28 0.37 0.98 0.54 0.77 0.66 0.03 0.32 0.76 0.24 0.58
41