SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Compression, Fast Indexing, and Structured Queries on a Gigabyte of Text
chapter
A. Kent
A. Moffat
R. Sacks-Davis
R. Wilkinson
J. Zobel
National Institute of Standards and Technology
Donna K. Harman
when compared to techniques that rank the entire collection. In the light of the parallel
experiments carried out using the full collection, it must be questionable as to whether such
techniques will survive.
We have seen a minor improvement in ranking as a result of taking the structure of queries
into account. A more surprising result however is that pairs do not provide the performance
gains that we have seen with much smaller collections. As a result, the vocabulary for a
collection would appear to be more manageable than if pairs need to be explicitly described.
Acknowledgements
We are grateful to Lachlan Andrew, Daniel Lam, and Neil Sharman for assisting with the
implementation. This work was supported by the Australian Research Council.
References
[1] S. Al-Hawamdeh and P. Willett. Comparison of index term weighting schemes for the
ranking of paragraphs in full-text documents. International Journal of Information and
Library Research, pages 116-130, 1990.
[2] T.C. Bell, A. Moffat, C.G. Nevill, I.H. Witten, and J. Zobel. Data compression in full-text
retrieval systems. Journal of the American Society for Information Science. To appear.
[3] J. Fagan. Automatic phrase indexing for document retrieval: An examination of syntac-
tic and non-syntactic methods. In Proc. 1O'th ACM-SIGIR International Conference on
Research and Development in Information Retrieval, pages 91-108, 1987.
[4] D. Harman and G. Candela. Retrieving records from a gigabyte of text. on a minicom-
puter using statistical ranking. Journal of the American Society for Information Science,
41(8):581-589, 1990.
[5] A.J. Kent, R. Sacks-Davis, and K. Ramamohanarao. A signature file scheme based on
multiple organisations for indexing very large text databases. Journal of the American
Society for Information Science, 41(7):508-534, 1990.
[6] J.B. Lovins. Development of a stemming algorithm. Mechanical Translation and Compu-
tation, 11(1-2):22-31, 1968.
[7] A. Moffat. Economical inversion of large text files. Computing Systems, 5(2):125-139,
1992.
[8] A. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. In Proc.
2'nd IEEE Data Compression Conference, pages 72-81, Snowbird, Utah, March 1992.
IEEE Computer Science Press.
[9] A. Moffat and J. Zobel. Parameterised compression for sparse bitmaps. In Proc. 15'th
A CM-SIGIR International Conference on Research and Development in Information Re-
trieval, pages 274-285, Copenhagen, Denmark, June 1992. ACM Press.
[10] R. Sacks-Davis, A. Kent, R. Kotagiri, J. Thom, and J. Zobel. A nested relational database
for text applications. Technical Report 92-52, Collaborative Information Technology Re-
search Institute, Melbourne, Australia, 1992.
242