SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) On Expanding Query Vectors with Lexically Related Words chapter E. Voorhees National Institute of Standards and Technology D. K. Harman Expansion by synonyms plus parents and all descendents orig terms [OCRerr] synonyms [OCRerr] descendents a parents a 1 .5 .5 .5 1 1 .5 .5 1 1 1 1 1 2 1 1 Expansion by synonyms plus any directly related synset orig terms a synonyms a other a 1 0 0 1 .5 .5 1 1 1 Table 2: Additional combinations of expansion strategies and relation weights tested. computed for a document, especially when some im- portant terms had no corresponding synset. Topic 112 provides an example of this effect. The topic is con- cerned about the world-wide investment in biotechnol- ogy. I added synonym sets for investment and capital to the topic. WordNet does not contain biotechnology, although it does contains biomedicaLscience. Thus, I also added the biomedicaLscience synset and a synset containing gene. The resulting expanded query per- formed significantly worse than the unexpanded version (33 relevant retrieved in the first 100 versus 52 relevant retrieved). The problem is that the expanded query places too much emphasis on money and not enough on biotechnology. Thus these results indicate that sim- ply recognizing which are the important concepts in a query statement is not sufficient to ensure improved re- trieval performance. An expansion procedure must also preserve the relative weights of those concepts. Another possible explanation is that WordNet is not suited for this task - it was not designed to be used in this manner and it may not contain the necessary links. Even if this is true, however, it is unlikely that any other broad-coverage knowledge base would be better suited. The success of relevance feedback and other routing techniques suggests that the most useful relations are specific and idiosyncratic. A second goal of this work was to characterize the effectiveness of different types of lexical relations when used to expand a query. Assuming the set of words to be expanded is well chosen, any closely related word - regardless of the type of relation - may be a good additional word. Wang et al. reached a similar con- clusion [6]. Nevertheless, an added word should be weighted less than the original word that caused it to be included. Runs in which added words were equally or more heavily weighted than original words were consistently less effective than the more conserva- tively weighted runs. Similarly, runs that added words 230 that were loosely related to original words (i.e., when long paths of links were followed) were consistently less effective than runs that used only near relatives. Acknowledgements Geoff Towell carefully read a draft of this paper and suggested changes that improved its presentation and clarity. References [1] Chris Buckley. Implementation of the SMART in- formation retrieval system. Technical Report 85- 686, Computer Science Department, Cornell Uni- versity, Ithaca, New York, May 1985. [2] Chris Buckley, Gerard Salton, and James Allan. Automatic retrieval with locality information us- ing SMART. In D. K. Harman, editor, Proceed- ings of the First Text REtrieval Conference (TREC- 1), pages 59-72. NIST Special Publication 500-207, March 1993. [3] Edward A. Fox. Extending the Boolean and Vec- tor Space Models of Information Retrieval with P- norm Queries and Multiple Concept Types. PhD thesis, Cornell University, 1983. University Micr[OCRerr] films, Ann Arbor, MI. [4] George Miller. Special Issue, WordNet: An on-line lexical database. International Journal of Lexicog- raphy, 3(4), 1990. [5] Ellen M. Voorhees and Yuan-Wang Hou. Vector ex- pansion in a large collection. In D. K. Harman, edi- tor, Proceedings of the First Text REtrieval Confer- ence (TREC-1), pages 343-351. NIST Special Pub- lication 500-207, March 1993.