SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) The ConQuest System chapter P. Nelson National Institute of Standards and Technology D. K. Harman TREC-2 Topic____ Topic Descriptions SGML Codes [OCRerr] [OCRerr] and Processing Stop Words other TREC-2 Expand All ConQuest Function Words Word Meanings Query Logs Figure 3 Program to Automatically Generate Query Log Files The modules in the program are as follows: Parse Topic - Reads through the topic looking for the SGML codes (such as <description>). The location within the topic for all words in the query are preserved in the final query log files. Tokenize - Divides up strings into tokens. Morphology - Locates all words in the dictionary and reduces them to root words if possible. Idiom Processing - Collects idioms together as single terms, such as "United States." Remove Stop Words - Removes conjunctions, determiners, auxiliary verbs, prepositions, etc. Remove Function Words - Removes words such as "document," "relevant," and "retrieve" which are used often in TREC-2 narratives but do not help retrieval. Expand Word Meanings - All word meanings are expanded using the ConQuest semantic network and all expansions are added to the query. Note that all of these steps occur automatically with no manual input. The program also generates other statistics, such as the count of each term in the query, a count for each term for each section of the query (sections being the topic, description, narrative, concepts, and factors), and the total number of words in the query. Manual Query Generation Steps There were two manual steps used to generate quenes: 1. Remove words, word meanings, and/or expansions 2. Set term weights (if necessary) Fortunately, ConQuest has graphical user interfaces (Guls) for removing words, word meanings, and expansions from the queries automatically generated. A user merely brings up the query and uses the mouse to select items to be deleted. In TREC-2, terms were not weighted in the traditional sense, but rather were categorized into three sets: 1. Terms that embody the entire query, which would make good search terms if used by themselves 2. Terms which embody a necessary portion of the query. but not the entire concept 3. All other related terms 268 These categories provide simple guidelines for setting term weights, which make it much easier to generate queries. Evaluations using the TREC-2 test topics determined the functions for the actual term weights. To emphasize once more, no document feedback was used for these manual steps. All query adjustments were performed without executing any query. Only after all queries were generated were the final results generated. The TREC-2 Results ConQuest scored very well in TREC-2. In particular, our recall percentages were quite high. Our average precision scores were not as good, but still competitive. ConQuest submitted two sets of results for TREC-2, CnQstland CnQst2. Both sets used the same coarse-grain algorithm which retrieved the best 5000 documents from the database. The difference between the two results was how these 5000 documents were sorted to derive the top 1000 documents which were used for the official results. The first set (CnQstl) used fine-grain as the only sorting algorithm. This algorithm primarily depends on local proximity information, although word statistics and query structure are also incorporated. The second set of results (CnQst2) was a weighted average of the fine-grain and coarse-grain statistics for each document. As it turned out, this combination of local (fine- grain) and global (coarse-grain) statistics provided significantly better statistics. The relatively modest addition of global information improved the results more than expected. Previous experience had always indicated that fine-grain information, especially the proximity test, was the strongest contributor to document relevancy. Some additional insights can be extracted from topic analyses presented at the TREC-2 conference. Specifically, the topics where ConQuest excelled over other systems were also those which tended to have fewer relevant documents in the database. This indicates that local proximity statistics (used by ConQuest) are more important for these queries, since most other systems in TREC-2 are heavily weighted towards global document statistics. In other words, ConQuest appears to perform better for queries where one needs to find the "needle in the haystack." Post TREC Analysis After TREC-2, we had the chance to clean up our initial tests, gather new statistics, and perform some additional analysis. The first step in this process was to prove the accuracy of the coarse-grain algorithm. Remember that initial tests attempted to improve the coarse-grain algorithm. But did the coarse-grain algorithm really need improvement? One indication that coarse-grain was accurate was provided by the CnQst2 run, which performed better than expected.