SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) An Information Retrieval Test-bed on the CM-5 chapter B. Masand C. Stanfill National Institute of Standards and Technology D. K. Harman V CONCLUSIONS We have successfully implemented an in-memory corn- pressed inverted file text retrieval system on the CM5 Con- nection Machice system. Queries that used both words and phrases composed of adjacent words did better that those that used words alone. While our experiments completed so f& suggest that convert- ing everything to lower case for adhoc queries seems some- what better, it is not clear whether the minor differences couldn't be removed by filirdler optimization of other p&ame- ters. Experiments that take document length and term position into account suggest that normalizing for document length and increasing the weight of terms appearing earlier in the docu- ments lead to significant improvements for both routing and adhoc queries. We have also seen that proxintity scores based on nearby terms in a sentence improve retrieval performance. VL FUTURE WORK We would like to compare the effects of cosine normaliza- tion with what we have triedso far and also explore its inter- action with techniques that use term position and proximity measures. [13] Lo Verso, S., Isman, M., Nanopoulos, A., Nesheim, W., Milne, E. & Wheeler, R. (1993). SES: A Parallel Rie System for the CM-S. Proceed- ings, 1993 Summer USENIX Technical Conference. pp.291-306. Cin- cinnati, OH. [14] Stanifi', C. (1992). Parallel Information Retrieval Mgorithms. In Infor- mation Retrieval: Data Structures and Algorithms. Frakes, B. & Baeze- Yates, R. eds. Englewood Cliffs, New Jersey: Prentice Hall. [15] Salton, G. & Buckley, C. (1992). SMART Trade-offs. Proceedings, TREC-1. Gaitnersburgh, MD. [16] Frakes, B.(1992). Stemming Algorithms. In Information Retrieval: Data Structures and Algorithms. Frakes, B. & Baeze-Yates, R. eds. Engiewood Cliffs, New Jersey: Prentice Hall. VII. REFERENCES [1] Cleverdon, C. W., Mrns, J. & Keen, E. M. (1966). Factors Determining the Peiformance of Indexing Systems, Vol 1: Design, Vol.2: Test Results. Aslib Crnnticld Research Project, Cranfield, England, 1%6. [2] Sparck Jones, K. and Webster, C.A. (1979). Research in Relevance Weighting, British Library Research and Development Report 5553, Computer Laboratory, University of Cambridge. [3] Fox, E. (1983). Characteristics of Two New Experimental Collections in Computer and Information Science Containing Textual and Biblio- graphic Concepts. Technical Report TR 83-561, Cornell Univerisy: Computing Science Depument. [4] Salton, G. (1989). Automatic Text Processing. New York: Addison-Wes- ley. [5] Salton, G & Buckley, C. (1988). Term-Weighting Approaches in Aut[OCRerr] matic Text Retrieval. Information Processing and Management, 24 (5), pp.513-523. [6] Harman, D (1993). The DARPA TIPSTER Project. SIGIR Forum, 26 (2), pp.26-28. [7] Harman, D (1993). Overview of the First TREC Conference. Proceed- ings, SJGIR-93, pp. 3[OCRerr]47. Pittsburgh. [8] Hollaar, L. A. (1992). Special-Purpose Hardware for Information Retrieval. In Information Retrieval: Data Structures and Algorithms. Frskes, B. & Baese-Yates, R. eds. Englewood Cliffs, New Jersey: Pren tice Hall. [9] Faloutsos, C. (1992). Signature Files. In Information Retrieval: Data Structures and Algorithms. Frakes, B. & Baeze-Yates, R. eds. Engle- wood Cliffs, New Jersey: Prentice Hall. [10] Harman, D., Fox, E. Baeza-Yates, A. & Lee, W. (1992). Inverted Files. In Information Retrieval: Data Structures and Algorithms. Frakes, B. & Baeze-Yates, R. eds. Englewood Cliffs, New Jersey: Prentice Hall. [11] Linoff, 0. & Stannil, C (1993). Compression of Indexes with Full Posi- tional Information in Very Large Text Databases. Proceedings, SIGIR- 93, pp.88-95. Pittsburgh. [12] Thinking Machines Corporation (1991). Connection Machine CM-S Technical Summary, Cambridge, MA: Thinking Machines Corporation. 122