SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Workshop on: Automatically Generating Adhoc and Routing Queries report of discussion group Susan T. Dumais National Institute of Standards and Technology Donna K. Harman smaller syntactic units; and automatic pronoun disambiguation. Relevance feedback is closely related to term expansion. It is not fully automatic in the sense that human judgements about the relevance of some small number of documents are required. However, the routing queries were specifically designed to take advantage of relevance judgements from a training corpus. More importantly, many of the same issues that arise in term expansion also occur in the context of relevance feedback. The most common implementation of relevance feedback was to modify the query by adding some words from relevant documents. For the ThEC expenments as few as 5 words and as many as 250 words were added, with most systems adding from 10-30 words. Some systems also modified term weights, used information about words in non-relevant documents, and gave less weight to added words (compared with words in the original query). There were few comparisons of term expansion (or feedback) compared to no expansion in the same system. Feedback improvements were somewhat smaller than expected based on experiments with smaller test collections. It is too early to tell for sure, but part of this may simply be that the original queries were very good. The single common theme in the discussion of query expansion was be car[OCRerr]ul! Results were quite variable - appropriate term expansion can improve recall, but inappropriate expansion can just as easily harm pefformance. One major problem is that expansion is not easily limited to the intended meaning of a word. Some groups first disambiguated the word sense by hand before automatic expansion; others used automatic heuristics for disambiguation with some success. Other methods discussed to help limit undesirable associations included: expanding only "hot spots"; matching on smaller subtexts; giving less weight to added words relative to original query words; limiting the total number of words added; limiting the syntactic or semantic relations of added words; and limiting the influence that any single word can have in overall similarity. Miscellaneous observations: Few systems did anything more than extract single words and phrases. A few systems removed negated words (often by hand), and a few systems automatically generated Boolean queries. Some groups used what might be called a "two-pass method", first using a standard global match to obtain a smaller group of documents which then receive more detailed processing. Some of the more detailed processing involved breaking the query down into smaller sub-units for matching. Swnma~y: There were few really novel methods used for automatically generating either adhoc or routing queries. There are now some general and fairly comprehensive lexical resources that might be useful. The problems with over-expanding queries were quite noticeable in the TREC application. Systems that automatically generated queries often performed quite well compared to other systems. However, there were few direct comparisons of manual vs. automatic query generation, or of individual components (term expansion vs no expansion) within a system, and this is what is needed to understand the usefulness of such methods. Hopefully this will happen in TREC-2. 368