SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Workshop Report: Use of training materials in constructing routing queries
report of discussion group
William S. Cooper
Stephen E. Robertson
National Institute of Standards and Technology
D. K. Harman
Workshop Report
Use of training materials in constructing routing queries
William S. Cooper
Stephen E. Robertson
The participants outlined various methods of exploiting the training data for routing retrieval that had been used in
the conference. In all cases the data had been used in a topic- specific manner; i.e. each query was constructed or
expanded using relevance judgements for that particular topic only.
lii some systems, terms taken directiy from the topic were weighted or reweighted using the training data. In others,
terms taken from the trainit[OCRerr]g documents relevant to the topic were used in addition to topic terms, or were used instead
with the original topic terms playing no part. In a few cases, terms from hoth relevant and non-relevant documents were
added, the latter with negative weights. Relevant documents with a high preliinary retrieval ranking coefficient w&e
preferred as a source of expansion terms in one system. Probabilistic, feedback and ad-hoc methods had all been tried as
ways of modifiiing the query in response to the training data.
How far might a query profitably be expanded on the basis of the training data? Though this question was not
answered definitively, some participants indicated a greater willingness to consider drastic expansion than had been
thought advisable before IREC 2.
The sample of relevance judgements for ThEC 2 was thought to be adequate in size and not unrealistically large. It
was sufliciendy representative in its inclusiveness of feedback generated from a wide variety of systems. However, this
variety indicates a possible lack of realism, in that a real system would probably have access only to relevant documents
retrieved by itseff. Thus the use of only those relevant documents found in a search on the training data by the system in
question might be regarded as more realistic.
305