SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) On Expanding Query Vectors with Lexically Related Words chapter E. Voorhees National Institute of Standards and Technology D. K. Harman On Expanding Query Vectors with Lexically Related Words Ellen M. Voorhees Siemens Corporate Research, Inc. 755 College Road East Princeton, NJ 08540 ellenŠlearning.scr.siemens.com Abstract Experiments performed on small collections suggest that expanding query vectors with words that are lexically related to the original query words can im- prove retrieval effectiveness. Prior experiments using WordNet to automatically expand vectors in the large TREC- i collection were inconclusive regarding effec- tiveness gains from lexically related words since any such effects were dominated by the choice of words to expand. This paper specifically investigates the effect of expansion by selecting query concepts to be expanded by hand. Concepts are represented by WordNet syn- onym sets and are expanded by following the typed links included in WordNet. Experimental results sug- gest that this query expansion technique makes little difference in retrieval effectiveness within the TREC en- vironment, presumably because the TREC topic state- ments provide such a rich description of the information being sought. 1 Introduction The IR group at Siemens Corporate Research is in- vestigating how concep[OCRerr] spaces data structures that define semantic relationships among ideas - can be used to improve retrieval effectiveness in systems de- signed to satisfy large-scale information needs. As part of this research, we expanded document and query vec- tors automatically using selected synonyms of origi- nal text words for TREC-i [5]. The retrieval results indicated that this expansion technique improveA the performance of some queries, but degraded the perfor- mance of other queries. We concluded that improving the consistency of the method would require both a bet- ter method for determining the important concepts of a text and a better method for determining the correct sense of an ambiguous word. We took TREC-2 as an opportunity to investigate the effectiveness of vector expansion when good con- cepts are chosen to be expanded. As in TREC-1, query vectors were expanded using WordNet synonym sets. 223 However, the synonym sets associated with each query were selected manually (by the author). These results therefore represent an upper-bound on the effectiveness to be expected from a completely automatic expansion process. The results of the TREC-2 evaluation indicate that the query expansion procedure used does not signifi- cantly affect retrieval performance even when impor- tant concepts are identified by hand. Some expanded queries are more effective than their unexpanded coun- terparts, but for other queries the unexpanded version is more effective. In either case, the effectiveness differ- ence between the two versions is seldom large. Further testing suggests that more extreme expansion proce- dures can cause larger differences in retrieval perfor- mance, but the net effect over a set of queries is de- graded performance compared to no expansion at all. The remainder of the paper discusses the experiments in detail. The next section describes the retrieval envi- ronment, including a description of WordNet. Section 3 provides evaluation results for both the official TREC-2 runs and some additional supporting runs. The final section explores the issue of why the expansion fails to improve retrieval performance. 2 The Retrieval Environment The expansion procedure used in this work relies heavily on the information recorded in WordNet, a manually-constructed lexical system developed by George Miller and his colleagues at the Cognitive Sci- ence Laboratory at Princeton University [4]. Word- Net's basic object is a set of strict synonyms, called a synset. Synsets are organized by the lexical rela- tions defined on them, which differ depending on part of speech. For nouns, the only part of WordNet used in this study, the lexical relations include antonymy, hy- pernymy/hyponymy (is-a relation) and three different meronym/holonym (part-oJ) relations. The is-a rela- tion is the dominant relationship, and organizes the