SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project chapter J. Yang R. Korfhage E. Rasmussen National Institute of Standards and Technology Donna K. Harman The assignment of the term weights to the ad hoc queries is different. Only one query vector (called query individual one) is generated for each ad hoc query. The term weights in query individual one were assigned either by using the weights of the same terms in the final generation of the training topics if they existed, or by the researchers by referring to the weights of the relative terms in the final generation of the training topics or to their importance related to the topic. Moreover, in the ad hoc queries, a term weight of the query individual one could be assigned different values depending on which database the query is used to against documents. Based on the query individual one, another nine query individuals were generated with the weights of each term in the nine individuals being normally distributed around the corresponding term weights on the query individual one. One interesting question in query processing is how to handle the situation where a concept with a NOT requirement is presented in a user's request (i.e., the topics in TREC). It is hard to deal with in some retrieval systems. However, in the TREC topics, there are several cases which include NOT in the concepts and the narrative items. Our solution to this problem is to assign negative weight to a keyword if it is described by the NOT concept or narrative item. Since our system is based on a distance measure, a negative assignment can cause the documents which include this keyword to have longer distance from the query than those without the keyword. 5. System Configuration Some features of our system on the TREC project are described in this section. (1) Window size for document retrieval In much feedback retrieval research, there is a fixed window size which decides the number of documents to be retrieved. The value usually was set between 5 and 20 (e.g., Salton, 1971). Aalbersberg (1992) used window size one in the feedback retrieval process and showed that the precision values for four standard databases, on average, are higher than those achieved with size greater than one. However, Aalbersberg also suggested that variable relevance feedback (Ide, 1971) is worth implementing in term weighting IR systems. In our original experiments on small databases a threshold value was generated for each query in the beginning of an experiment, as the criterion for determining the retrieval window. If the distance measure between a document and a query individual was less than the threshold value, the document was viewed as relevant to the query and was retrieved. Thus we had a variable window size. However, in dealing with the large databases we added another factor to decide the window size. As in the previous method, a threshold value is generated first as each original query is read into the system. Documents whose distance is less than the threshold will be regarded as relevant to the query and will be printed for evaluation. However, if more than forty documents satisfy the condition, only the top forty are retrieved for evaluation. If there is a tie, the selection is random among these tied. This number seems reasonable given the time constraints and focus of the study. 37