NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) On Expanding Query Vectors with Lexically Related Words chapter E. Voorhees National Institute of Standards and Technology D. K. Harman 3 Experiments The training data for the routing queries was used both to refine the synsets that were included in the topic text and to select the type of relations used to expand the query vectors. In some cases a synset that appears to be a logical choice for a query is nonetheless detri- mental. For example, adding the synset for dea[OCRerr]h to topic 59 (weather fatalities) causes the query to re- trieve far too many articles reporting on deaths that have no relation to the weather. I produced five differ- ent versions of synset-annotated topic texts, although the differences between versions are not very large. The version used in the official routing run added an average of 2.9 synsets to a topic statement, with a minimum of o synsets added and a maximum of 6 synsets added. Of course, the utility of a synset depends in part on how that synset is expanded and the relative weights given to the different link types (the a's in the similar- ity function above). Table 1 lists the various combina- tions that were evaluated using the training data. Four different expansion strategies were tried: expansion by synonyms only, expansion by synonyms plus all descen- dents in the is-a hierarchy, expansion by synonyms plus parents and all descendents in the is-a hierarchy, and expansion by synonyms plus any synset directly related to the given synset (i.e., a chain of length 1 for all link types). Different a values were also investigated. As- suming original query terms are more important than added terms, the a for the original terms subvector was set to one and the a for other subvectors varied between zero and one as shown in Table 1. The most effective run was the one that expanded a query synset by any synset directly related to it and had a = .5 for all added subvectors. Therefore, this strategy was used to produce the official routing queries from the final version of the annotated text. The scheme added an average of 24.7 words to a query vector (minimum 0, maximum 70), and an average of 20.2 (0, 66) words that are not part of the original text. The average number of relevant documents retrieved at rank 100 for this run is 40.7 and at rank 1000 is 133.3; the mean "average precision" is .2984. In gen- eral, the individual query results are at or slightly above the median, with a few queries significantly below the median. Of more interest to this study is how the ex- panded queries compare to unexpanded queries. A plot of average recall versus average precision for these two runs is given in Figure 3. As can be seen, the effective- ness of the two query sets is very similar. Since there was no way to evaluate the relative ef- fectiveness of different expansion schemes for the ad hoc queries, the same same expansion scheme as was used for the official routing run chains of length one for any relation type and all a's = .5 was used for 227 the ad hoc run. Furthermore, there could be no re- fining of which synsets to add, so only one version of synset-annotated text was produced. An average of 2.7 (minimum 0, maximum 6) synonyms was added to an ad hoc topic text. The expansion process added an av- erage of 17.2 (0, 66) terms and 12.8 (0, 55) terms that are not part of the original text. Siemens actually submitted two ad hoc runs. The first was the expanded queries with a's set to 0, a run that is equivalent to no expansion and is used as a base case. The second Siemens ad hoc run used the .5 a val- ues. A plot of the effectiveness of the two ad hoc runs is given in Figure 4. The differences in effectiveness be- tween unexpanded and expanded queries is even smaller for the ad hoc queries than it is for the routing queries. The average number of relevant documents retrieved at rank 100 is 46.9 for both the unexpanded and expanded queries. The average number of relevant documents re- trieved at rank 1000 is 161.4 for the unexpanded queries and 161.3 for the expanded queries. The mean "average precision" is .3408 and .3397 respectively. A possible explanation for the little difference made by expanding the queries is that the expansion param- eters used were too conservative. To test this hypoth- esis, additional runs were made using the same set of synsets but allowing longer chains of links and/or using greater relative link weights (the a's). Table 2 lists the additional combinations tested using both the ad hoc queries versus the documents on disks one and two, and the routing queries versus the documents on disk 3. As was the case for the routing training runs, the strategy used for the official TREC-2 runs (all links of length one, a's = .5) was the most effective expansion strategy. The more aggressive expansion strategies did make larger differences in retrieval effectiveness com- pared to the unexpanded queries, but across the set of queries the aggregate difference was negative. Hence it is unlikely that the conservative expansion strategy is the reason for the lack of improvement. 4 Conclusion The experimental evidence clearly shows this query ex- pansion technique provides little benefit in the TREC environment. The most likely reason for why this should be so is the completeness of the TREC topic de- scriptions. Query expansion is a recall-enhancing tech- nique and TREC topic descriptions are already large compared to queries found in traditional IR collections. Although most of the expanded queries did have some new terms added to them, the most important terms frequently appeared in both the original term set and the set of expanded terms. This had an effect on the relative weight of those terms in the overall similarity