SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Combining Evidence from Multiple Searches
chapter
E. Fox
M. Koushik
J. Shaw
R. Modlin
D. Rao
National Institute of Standards and Technology
Donna K. Harman
7.3 Possible enhancements
ec
B ause of the disk space problem we were not able to do many of the tests we planned to do, even
by the end of Phase 2. Work will continue in 1993 now that new disks have been received. Among
the planned tasks are:
* Phrase Identification and Matching
In the current system, phrases are handled by using an AND query term. For example,
Information AND Retrieval was used for Information-Retrieval. Due to the absence
of a proximity operator, this leads to retrieval of non-relevant documents where these words
occur widely apart. The retrieval results can be improved by providing a mecharnsm for
dealing with phrases explicitly, and/or the use of proximity operators.
* Better Base Methods
In addition to considering the use of phrases, further study of base runs, considering query
construction, indexing, weighting schemes, and lexical information, will be undertaken. Of
particular interest is the use of p-norm queries, which if tailor-made, might well out-perform
the vector queries in all collections. Contrasts between stemming and morphological analysis
can also be made.
* Merging Methods
While the Ad Hoc and the R-P Merge methods are not based on elaborate theory, they do
provide insight into the effects of combining results. Further refinement of the approaches,
and additional testing to obtain upper-bound performance values, will be undertaken.
* The CEO Model
The Combination of Expert Opinion (CEO) model [3, 4J of Thompson can be used
to treat the different retrieval runs as experts, and combining their weighting probability
distributions to improve performance. This could be used in a variety of ways to combine
results from a variety of runs and indexing schemes.
References
[1] C. Buckley. Implementation of the SMART information retrieval system.
85-686, Cornell University, Department of Computer Science, May 1985.
Technical Report
[2] FirstMark Technologies Limited. KnowledgeSEEKER User's Guide. FirstMark Technologies,
14 Concourse Gate Site 680, Ottawa, Ontario, Canada, 1990.
[3] P. Thompson. A Combination of Expert Opinion approach to probabilistic information retrieval,
Part 1: The conceptual model. Information Processing [OCRerr] Management, 26(3):371-382, 1990.
[4] P. Thompson. A Combination of Expert Opinion approach to probabilistic information retrieval,
Part 2: Mathematical treatment of CEO Model 3. Information Processing & Management,
26(3):383-394, 1990.
328