SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
On Expanding Query Vectors with Lexically Related Words
chapter
E. Voorhees
National Institute of Standards and Technology
D. K. Harman
Expansion by synonyms plus parents and all descendents
orig terms [OCRerr] synonyms [OCRerr] descendents a parents a
1 .5 .5 .5
1 1 .5 .5
1 1 1 1
1 2 1 1
Expansion by synonyms plus any directly related synset
orig terms a synonyms a other a
1 0 0
1 .5 .5
1 1 1
Table 2: Additional combinations of expansion strategies and relation weights tested.
computed for a document, especially when some im-
portant terms had no corresponding synset. Topic 112
provides an example of this effect. The topic is con-
cerned about the world-wide investment in biotechnol-
ogy. I added synonym sets for investment and capital
to the topic. WordNet does not contain biotechnology,
although it does contains biomedicaLscience. Thus, I
also added the biomedicaLscience synset and a synset
containing gene. The resulting expanded query per-
formed significantly worse than the unexpanded version
(33 relevant retrieved in the first 100 versus 52 relevant
retrieved). The problem is that the expanded query
places too much emphasis on money and not enough
on biotechnology. Thus these results indicate that sim-
ply recognizing which are the important concepts in a
query statement is not sufficient to ensure improved re-
trieval performance. An expansion procedure must also
preserve the relative weights of those concepts.
Another possible explanation is that WordNet is not
suited for this task - it was not designed to be used in
this manner and it may not contain the necessary links.
Even if this is true, however, it is unlikely that any other
broad-coverage knowledge base would be better suited.
The success of relevance feedback and other routing
techniques suggests that the most useful relations are
specific and idiosyncratic.
A second goal of this work was to characterize the
effectiveness of different types of lexical relations when
used to expand a query. Assuming the set of words to
be expanded is well chosen, any closely related word -
regardless of the type of relation - may be a good
additional word. Wang et al. reached a similar con-
clusion [6]. Nevertheless, an added word should be
weighted less than the original word that caused it
to be included. Runs in which added words were
equally or more heavily weighted than original words
were consistently less effective than the more conserva-
tively weighted runs. Similarly, runs that added words
230
that were loosely related to original words (i.e., when
long paths of links were followed) were consistently less
effective than runs that used only near relatives.
Acknowledgements
Geoff Towell carefully read a draft of this paper and
suggested changes that improved its presentation and
clarity.
References
[1] Chris Buckley. Implementation of the SMART in-
formation retrieval system. Technical Report 85-
686, Computer Science Department, Cornell Uni-
versity, Ithaca, New York, May 1985.
[2] Chris Buckley, Gerard Salton, and James Allan.
Automatic retrieval with locality information us-
ing SMART. In D. K. Harman, editor, Proceed-
ings of the First Text REtrieval Conference (TREC-
1), pages 59-72. NIST Special Publication 500-207,
March 1993.
[3] Edward A. Fox. Extending the Boolean and Vec-
tor Space Models of Information Retrieval with P-
norm Queries and Multiple Concept Types. PhD
thesis, Cornell University, 1983. University Micr[OCRerr]
films, Ann Arbor, MI.
[4] George Miller. Special Issue, WordNet: An on-line
lexical database. International Journal of Lexicog-
raphy, 3(4), 1990.
[5] Ellen M. Voorhees and Yuan-Wang Hou. Vector ex-
pansion in a large collection. In D. K. Harman, edi-
tor, Proceedings of the First Text REtrieval Confer-
ence (TREC-1), pages 343-351. NIST Special Pub-
lication 500-207, March 1993.