SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project chapter J. Yang R. Korfhage E. Rasmussen National Institute of Standards and Technology Donna K. Harman (3) The genetic process: After the evaluation process genetic operators are used to modify the structures of the query variants, with the aim of improving the retrieval effectiveness of the variants. (a) Selection: In this step those query individuals whose performances are greater than the average performance of all the query variants are selected and retained for further processing. Other query individuals are discarded. (b) Reproduction: In this process the survivors from the previous step are reproduced. In our algorithm, the reproduction algorithm developed by Baker (1987) is used. The number of offspring for each individual depends on the ratio of its performance to the total performance of all the survivors and the population size. Therefore, the individual survivor with the highest performance will have the most offspring. Also, in this step, the query variants are rearranged randomly in a sequence for later processing. We call the set of query variants after reproduction the semi-new generation. (c) Mutation: In the mutation step, some individuals are selected at random from the semi-new generation, and some of their genes (i.e., the term weights) are selected and assigned new values, also in the range [0,1]. The number of individuals to be selected depends on the mutation ratio chosen in the experimentation. The selection of genes and the new value assignments are also done randomly. (d) Crossover: The crossover process mates pairs of query individuals in the semi- new generation and exchanges parts of their term weights. Here, the two-point crossover method is used. For each pair, two positions in the term vectors are selected randomly, and all term weights between the two positions are exchanged for the two individuals. Thus, two new individuals are born. However, it may be the case that not all individuals are chosen to mate with others. The crossover ratio controls the number of mates. After the crossover operation, a totally new generation is created and the process goes back to step 2 and continues until a preset condition is satisfied, either a fixed number of generations has been processed or no more relevant documents are retrieved. The genetic retrieval algorithm is distinguished from other research using relevance feedback: (1) Several query individual variants exist and explore the document space simultaneously. (2) To modify the term weights, it is not required to calculate and analyze the document term weights, at least in the current study. Query term weight modification is done by the genetic mechanisms through the interchange of term weights between query individuals and the introduction of new weights by the mutation operation. (3) Only term weights are modified and no new terms are introduced and no terms are deleted. The importance of a term is shown by the value of its weight. 35