SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
An Information Retrieval Test-bed on the CM-5
chapter
B. Masand
C. Stanfill
National Institute of Standards and Technology
D. K. Harman
V CONCLUSIONS
We have successfully implemented an in-memory corn-
pressed inverted file text retrieval system on the CM5 Con-
nection Machice system.
Queries that used both words and phrases composed of
adjacent words did better that those that used words alone.
While our experiments completed so f& suggest that convert-
ing everything to lower case for adhoc queries seems some-
what better, it is not clear whether the minor differences
couldn't be removed by filirdler optimization of other p&ame-
ters. Experiments that take document length and term position
into account suggest that normalizing for document length and
increasing the weight of terms appearing earlier in the docu-
ments lead to significant improvements for both routing and
adhoc queries. We have also seen that proxintity scores based
on nearby terms in a sentence improve retrieval performance.
VL FUTURE WORK
We would like to compare the effects of cosine normaliza-
tion with what we have triedso far and also explore its inter-
action with techniques that use term position and proximity
measures.
[13] Lo Verso, S., Isman, M., Nanopoulos, A., Nesheim, W., Milne, E. &
Wheeler, R. (1993). SES: A Parallel Rie System for the CM-S. Proceed-
ings, 1993 Summer USENIX Technical Conference. pp.291-306. Cin-
cinnati, OH.
[14] Stanifi', C. (1992). Parallel Information Retrieval Mgorithms. In Infor-
mation Retrieval: Data Structures and Algorithms. Frakes, B. & Baeze-
Yates, R. eds. Englewood Cliffs, New Jersey: Prentice Hall.
[15] Salton, G. & Buckley, C. (1992). SMART Trade-offs. Proceedings,
TREC-1. Gaitnersburgh, MD.
[16] Frakes, B.(1992). Stemming Algorithms. In Information Retrieval:
Data Structures and Algorithms. Frakes, B. & Baeze-Yates, R. eds.
Engiewood Cliffs, New Jersey: Prentice Hall.
VII. REFERENCES
[1] Cleverdon, C. W., Mrns, J. & Keen, E. M. (1966). Factors Determining
the Peiformance of Indexing Systems, Vol 1: Design, Vol.2: Test
Results. Aslib Crnnticld Research Project, Cranfield, England, 1%6.
[2] Sparck Jones, K. and Webster, C.A. (1979). Research in Relevance
Weighting, British Library Research and Development Report 5553,
Computer Laboratory, University of Cambridge.
[3] Fox, E. (1983). Characteristics of Two New Experimental Collections in
Computer and Information Science Containing Textual and Biblio-
graphic Concepts. Technical Report TR 83-561, Cornell Univerisy:
Computing Science Depument.
[4] Salton, G. (1989). Automatic Text Processing. New York: Addison-Wes-
ley.
[5] Salton, G & Buckley, C. (1988). Term-Weighting Approaches in Aut[OCRerr]
matic Text Retrieval. Information Processing and Management, 24 (5),
pp.513-523.
[6] Harman, D (1993). The DARPA TIPSTER Project. SIGIR Forum, 26
(2), pp.26-28.
[7] Harman, D (1993). Overview of the First TREC Conference. Proceed-
ings, SJGIR-93, pp. 3[OCRerr]47. Pittsburgh.
[8] Hollaar, L. A. (1992). Special-Purpose Hardware for Information
Retrieval. In Information Retrieval: Data Structures and Algorithms.
Frskes, B. & Baese-Yates, R. eds. Englewood Cliffs, New Jersey: Pren
tice Hall.
[9] Faloutsos, C. (1992). Signature Files. In Information Retrieval: Data
Structures and Algorithms. Frakes, B. & Baeze-Yates, R. eds. Engle-
wood Cliffs, New Jersey: Prentice Hall.
[10] Harman, D., Fox, E. Baeza-Yates, A. & Lee, W. (1992). Inverted Files.
In Information Retrieval: Data Structures and Algorithms. Frakes, B. &
Baeze-Yates, R. eds. Englewood Cliffs, New Jersey: Prentice Hall.
[11] Linoff, 0. & Stannil, C (1993). Compression of Indexes with Full Posi-
tional Information in Very Large Text Databases. Proceedings, SIGIR-
93, pp.88-95. Pittsburgh.
[12] Thinking Machines Corporation (1991). Connection Machine CM-S
Technical Summary, Cambridge, MA: Thinking Machines Corporation.
122