SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project chapter J. Yang R. Korfhage E. Rasmussen National Institute of Standards and Technology Donna K. Harman 9. Failure Analysis Compaiiing our results with those provided by NIST, the precision values at eleven recall points indicate that our system performs better than the median level on about half of the topics for the ad hoc queries (Figure 1). In several queries (54, 60, 79, 81, 84, 91, and 100) we sent only a few documents (fewer than 10), and for queries 59, 61, 68, 80 and 99 we sent no documents, because the threshold values limited the number of documents retrieved. Thus the precision values for those queries are zero or very low. The same situation happened for some routing queries. The precision values for the routing queries in our system are lower than the median level in most cases (Figure 2). Beside the threshold inhibition mentioned above, we observed that due to the query convergence in the final generation of the training topic, most query variants retrieved the same documents. Some query individuals in the intermediate generations which retrieved different relevant documents than the last generation may not have survived. We think this caused the situation where fewer relevant documents were retrieved on the routing queries. Problems arose with specific queries due to our pre-processing of the documents. Several circumstances were not considered in designing our system. For example, some special keywords, such as AT&T and M which was used in some documents to represent million, were not processed, but were significant in some topics. Another factor is that in the AP, WSJ and ZIFF databases more than one text with different topics comprised a single document. Since we did not separate them, the keyword match could cause a document to be retrieved because keywords from different text parts matched the query, though the document itself is not relevant. 54