SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Compression, Fast Indexing, and Structured Queries on a Gigabyte of Text chapter A. Kent A. Moffat R. Sacks-Davis R. Wilkinson J. Zobel National Institute of Standards and Technology Donna K. Harman when compared to techniques that rank the entire collection. In the light of the parallel experiments carried out using the full collection, it must be questionable as to whether such techniques will survive. We have seen a minor improvement in ranking as a result of taking the structure of queries into account. A more surprising result however is that pairs do not provide the performance gains that we have seen with much smaller collections. As a result, the vocabulary for a collection would appear to be more manageable than if pairs need to be explicitly described. Acknowledgements We are grateful to Lachlan Andrew, Daniel Lam, and Neil Sharman for assisting with the implementation. This work was supported by the Australian Research Council. References [1] S. Al-Hawamdeh and P. Willett. Comparison of index term weighting schemes for the ranking of paragraphs in full-text documents. International Journal of Information and Library Research, pages 116-130, 1990. [2] T.C. Bell, A. Moffat, C.G. Nevill, I.H. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science. To appear. [3] J. Fagan. Automatic phrase indexing for document retrieval: An examination of syntac- tic and non-syntactic methods. In Proc. 1O'th ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 91-108, 1987. [4] D. Harman and G. Candela. Retrieving records from a gigabyte of text. on a minicom- puter using statistical ranking. Journal of the American Society for Information Science, 41(8):581-589, 1990. [5] A.J. Kent, R. Sacks-Davis, and K. Ramamohanarao. A signature file scheme based on multiple organisations for indexing very large text databases. Journal of the American Society for Information Science, 41(7):508-534, 1990. [6] J.B. Lovins. Development of a stemming algorithm. Mechanical Translation and Compu- tation, 11(1-2):22-31, 1968. [7] A. Moffat. Economical inversion of large text files. Computing Systems, 5(2):125-139, 1992. [8] A. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. In Proc. 2'nd IEEE Data Compression Conference, pages 72-81, Snowbird, Utah, March 1992. IEEE Computer Science Press. [9] A. Moffat and J. Zobel. Parameterised compression for sparse bitmaps. In Proc. 15'th A CM-SIGIR International Conference on Research and Development in Information Re- trieval, pages 274-285, Copenhagen, Denmark, June 1992. ACM Press. [10] R. Sacks-Davis, A. Kent, R. Kotagiri, J. Thom, and J. Zobel. A nested relational database for text applications. Technical Report 92-52, Collaborative Information Technology Re- search Institute, Melbourne, Australia, 1992. 242