SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
On Expanding Query Vectors with Lexically Related Words
chapter
E. Voorhees
National Institute of Standards and Technology
D. K. Harman
On Expanding Query Vectors with Lexically Related Words
Ellen M. Voorhees
Siemens Corporate Research, Inc.
755 College Road East
Princeton, NJ 08540
ellenŠlearning.scr.siemens.com
Abstract
Experiments performed on small collections suggest
that expanding query vectors with words that are
lexically related to the original query words can im-
prove retrieval effectiveness. Prior experiments using
WordNet to automatically expand vectors in the large
TREC- i collection were inconclusive regarding effec-
tiveness gains from lexically related words since any
such effects were dominated by the choice of words to
expand. This paper specifically investigates the effect of
expansion by selecting query concepts to be expanded
by hand. Concepts are represented by WordNet syn-
onym sets and are expanded by following the typed
links included in WordNet. Experimental results sug-
gest that this query expansion technique makes little
difference in retrieval effectiveness within the TREC en-
vironment, presumably because the TREC topic state-
ments provide such a rich description of the information
being sought.
1 Introduction
The IR group at Siemens Corporate Research is in-
vestigating how concep[OCRerr] spaces data structures that
define semantic relationships among ideas - can be
used to improve retrieval effectiveness in systems de-
signed to satisfy large-scale information needs. As part
of this research, we expanded document and query vec-
tors automatically using selected synonyms of origi-
nal text words for TREC-i [5]. The retrieval results
indicated that this expansion technique improveA the
performance of some queries, but degraded the perfor-
mance of other queries. We concluded that improving
the consistency of the method would require both a bet-
ter method for determining the important concepts of
a text and a better method for determining the correct
sense of an ambiguous word.
We took TREC-2 as an opportunity to investigate
the effectiveness of vector expansion when good con-
cepts are chosen to be expanded. As in TREC-1, query
vectors were expanded using WordNet synonym sets.
223
However, the synonym sets associated with each query
were selected manually (by the author). These results
therefore represent an upper-bound on the effectiveness
to be expected from a completely automatic expansion
process.
The results of the TREC-2 evaluation indicate that
the query expansion procedure used does not signifi-
cantly affect retrieval performance even when impor-
tant concepts are identified by hand. Some expanded
queries are more effective than their unexpanded coun-
terparts, but for other queries the unexpanded version
is more effective. In either case, the effectiveness differ-
ence between the two versions is seldom large. Further
testing suggests that more extreme expansion proce-
dures can cause larger differences in retrieval perfor-
mance, but the net effect over a set of queries is de-
graded performance compared to no expansion at all.
The remainder of the paper discusses the experiments
in detail. The next section describes the retrieval envi-
ronment, including a description of WordNet. Section 3
provides evaluation results for both the official TREC-2
runs and some additional supporting runs. The final
section explores the issue of why the expansion fails to
improve retrieval performance.
2 The Retrieval Environment
The expansion procedure used in this work relies
heavily on the information recorded in WordNet,
a manually-constructed lexical system developed by
George Miller and his colleagues at the Cognitive Sci-
ence Laboratory at Princeton University [4]. Word-
Net's basic object is a set of strict synonyms, called
a synset. Synsets are organized by the lexical rela-
tions defined on them, which differ depending on part
of speech. For nouns, the only part of WordNet used in
this study, the lexical relations include antonymy, hy-
pernymy/hyponymy (is-a relation) and three different
meronym/holonym (part-oJ) relations. The is-a rela-
tion is the dominant relationship, and organizes the