SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project
chapter
J. Yang
R. Korfhage
E. Rasmussen
National Institute of Standards and Technology
Donna K. Harman
The parallel search could be continued if the individuals retrieving different relevant
documents have the same power, that is, the differences in the number of relevant documents
brought by each of them are small. The results in Table 15 and Table 16 come from the same
topic in Table 13 and Table 14, but from the second generation (generation 1) after feedback
and genetic modification to the first generation. It can be seen that due to the genetic
operations, parts of several query individuals have begun converging. At this point the queries
group into three clusters ofsimilarqueries, (0,3,7,8,9), (1,5,6) and (2,4)-- Table 15. Within
each group several term weights are identical. However, parallel search still continues.
Differences in the other weights result in query individuals that retrieve completely different
sets of documents. For example, contrast query 7 with the others in its cluster. Other
examples from this study show that this effect may be observed with differences in only one or
two term weights.
Tables 17 and 18 show that the parallel search from this topic went on to the final
generation, 3, where for each term almost all of the weights had converged to a single value.
But query 7 has two different values at terms 4 and 21 which cause it to retrieve totally
different relevant documents than the other query individuals. However, this is not always the
case. In most situations, the genetic algorithm will force the query variants to converge to a
single query; that is, all of the query individuals in the final generation retrieve the same
relevant documents.
The example on topic 24 also shows, as in previous cases, that new relevant
documents are brought to the user in each generation, as indicated by the `*` following the
document numbers.
Table 13. Term weights of query individuals of topic 24, the first generation (generation 0)
ti t2 t3 t4 t5 t6 t7 t8 t9 tio tlI t12 t13 t14 tls t16 t17 t18 t19t20 t21
0 18 .31 .53 .95 .17 .70.23.49
1 .63 .31 .35 .92.52 .40.61 .79
2.13 .21 .96.74.41 .78.76.96
3 .57.25 .50.24.48 .41 .87.43
4.30.89 .04.65.40 .68.73.94
5.14.29.80.22.56 .72.20.99
6.60.24.45.79.08 .48 .15 .25
7.78.71 .45.70.10 .96.55.74
8 .49.61 .42.13 .26 .04.98.11
9.84.84.90.59.54 .17.65.69
.12.08 .39.28 .37.98 .54.77.65 .77.78 .82.15
.93 .87.87.67 .76.58 .39.36.20.83 .42.46.98
.03.32.76.24.59.04.96 .32.06.44.92.57.12
.36.38 .04.16.52.70.10.40.77 .24.34.23 .30
.23.84.97 .78 .43 .67.81 .16.28 .14.86.75 .21
.25 .43 .76.86 .89.98 .40.43 .13 .46.24.99.65
.94.61 .99.48 .80.74.38 .48.53.10.59.35 .14
.58.64.78.19.30.28.68 .29.57.42.31 .44.57
.38 .65 .35 .55 .36 .57 .48 .16.62.17 .55 .29.87
.26.11 .81 .19.42.35 .84.14.26.18 .48.38 .50
48