NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Knowledge-Based Searching with TOPIC chapter J. Lehman C. Reid National Institute of Standards and Technology D. K. Harman 3.4.1 ROUTING TOPICS Overall, Topic's performance on the routing topics was rather good. We count that 21 of the 50 results were at or above median, and three were actually the bet score. Most of the other results were on the low side of the median. The relevant comparison to the median is summarized in Figure 2. The exceptions were topics 66, 67, 69, 74, 90 and 91, for which the Topic search used could be said to have failed. Several of these were straightforwardly explained. For example, in the case of topic 67 the wrong results were submitted. Our independent scoring of the correct results set would give the Topic search below median score. For topic 69 there was in fact only one relevant document, but, at least in our reading of the definition, this seems to be a false positive. In the case of topics 90 and 91 the Topic search definitions were, in our opinion, over-constrained. Further, in the case of topic 91 an index creation decision prevented a quite reasonable Topic definifion from performing as well as it could.2 The other two topics are of more interest. No clear pattern emerged between the type of search although, in the routing augmentation category, the Topic performance was well above the median on 20 of 33 searches. 3.4.1.1 ROUTING TOPIC 66 A relevant document for this topic is one that identifies a type of natural language processing technology that is being developed or marketed in the United States. The original definition of the Topic is basically a conjunction (AND) of a natural language concept and a products/technology concept. Performance was very poor, viz: Relevant = 86 Rel_ret = 1 R-Precision = 0.0000 Inspection of the Topic revealed that one of the conjuncts (the products/technology concept) had a weight of 0.05- thus effectively limiting the range of scores that Topic could produce to be in an extremely narrow range. 2This topic is about the acquisition of advanced weapons by the U.S. Army. One of the weapons systems mentioned in the information need statement is the M-1 tank. This was included in the Topic definition as the word "M-l"; but since the "-" symbol was interpreted as with like space at database build time, there was no possibility of retrieving documents based on "M- I" as a word. We changed the 0.05 to 0.5 and produced the following: Relevant = 86 Rel_ret=44 R-Precision = 0.2442 which is a median result. We concluded that for Topics to be effective we need to ensure a sufficient range of scores to give us the discrimination needed for the TREC scoring algorithm. 3.4.1.2 ROUTING TOPIC 74 A relevant document for this topic is one that cites an instance in which the U.S. Government propounds two conflicting or opposing policies. The routing task is complicated because this conflict may not necessarily be mentioned in the same document. In our opinion, this is a case where no amount of sophistication in Topic construction would enable Topic to do very Well. The information need is simply outside the scope of a retrieval system that uses non-NLP techniques. The best one could hope for is to model a document that talks about the meta-idea of conflict (i.e., find documents that talk about the US having conflicting policies, rather than documents that reference the specific conflicting policy). This is, in fact, what was done in the original submission. The results were: Relevant = 323 Rel[OCRerr]ret = 18 R-Precision = 0.0464 which is, of course, rather poor. The original statement of need actually mentions three examples of conflicting policies so, as an experiment, we ran the following query: * <Many><Stem> /wordtext = "tobacco" * <Many><Stem> /wordtext = "pesticide" * <Many><Phrase> * <Many><Stem> /wordtext = "infant" * <Many><Stem> Iwordtext = "formula" that is, just an ACCRUE of "tobacco pesticide" and "infant formula" (which the modification that the <Stem> and <Many> operators produce. This gave the following results: Relevant = 323 Rel_ret=107 R-Precision = 0.2660 which puts the score slightly above median. We expect that most TREC-2 participant sites probably did just this, and those that did much better than median found some other specific examples of a conflicting policy and modeled these in their routing queries. 214