SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project
chapter
J. Yang
R. Korfhage
E. Rasmussen
National Institute of Standards and Technology
Donna K. Harman
9. Failure Analysis
Compaiiing our results with those provided by NIST, the precision values at eleven
recall points indicate that our system performs better than the median level on about half of
the topics for the ad hoc queries (Figure 1). In several queries (54, 60, 79, 81, 84, 91, and
100) we sent only a few documents (fewer than 10), and for queries 59, 61, 68, 80 and 99 we
sent no documents, because the threshold values limited the number of documents retrieved.
Thus the precision values for those queries are zero or very low. The same situation
happened for some routing queries.
The precision values for the routing queries in our system are lower than the median
level in most cases (Figure 2). Beside the threshold inhibition mentioned above, we observed
that due to the query convergence in the final generation of the training topic, most query
variants retrieved the same documents. Some query individuals in the intermediate
generations which retrieved different relevant documents than the last generation may not
have survived. We think this caused the situation where fewer relevant documents were
retrieved on the routing queries.
Problems arose with specific queries due to our pre-processing of the documents.
Several circumstances were not considered in designing our system. For example, some
special keywords, such as AT&T and M which was used in some documents to represent
million, were not processed, but were significant in some topics. Another factor is that in the
AP, WSJ and ZIFF databases more than one text with different topics comprised a single
document. Since we did not separate them, the keyword match could cause a document to be
retrieved because keywords from different text parts matched the query, though the document
itself is not relevant.
54