SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Retrieval Experiments with a Large Collection using PIRCS
chapter
K. Kwok
L. Papadopoulos
K. Kwan
National Institute of Standards and Technology
Donna K. Harman
results may be relevant.
5. Conclusion
Our system called PIRCS, acronym for Probabilistic Indexing and Retrieval -Component- System, has
been shown to be able to support storage and retrieval for a large collection of 0.5 GB. If appropriate
hardware is available, and with some software modification, it should support 2GB size collections. Our
strategy to IR and TREC consists of: 1) use of document components to provide a more restricted context
for retrieval and feedback, and to provide an initial ICTh term weighting; 2) two-word phrases, especially
consisting of stopwords and high frequency stems as a precision and recall enhancing tool; 3) combination
of retrieval methods, including soft-boolean, to capture cooperative effects between retrieval methods, and
4) network implementation with learning to implement feedback and feedback with query expansion,
resulting in a two-layer direct-connect artificial neural network with adaptive architecture. Our approach
leads to results that are better than expected.
6. Acknowledgment
We would like to thank our department Chairman and the Dean of Mathematics and Natural Science for
their support throughout the project. This work is partially supported by a grant from DARPA and a PSC-
CUNY grant #6-63288.
References
1. Kwok, K.L (1989). A neural network for probabilistic information retrieval. Proc. ACM SIGIR 12th
Ann. Intl. Conf. on R&D in IR. N.J. Belkin & C.J. van Rijsbergen, eds. ACM: NY, pp.21-30.
2. Kwok, K.L (1990). Experiments with a component theory of probabilistic information retrieval based
on single terms as document components. ACM TOIS 8:363-386.
3. Porter, M.F (1980). An algorithm fro suffix stripping. Program 14:130-137.
4. Smith, M (1990). Aspects of the p-norm model of information retrieval: Syntactic query generation,
efficiency, and theoretical properties. TR 90-1128, Ph.D. Thesis, Cornell University.
5. Turtle, H.R & Croft, W.B (1991). Evaluation of an inference network-based retrieval model. ACM
TOIS 9:187-222.
6. Fox, E.A; Nunn, G.L & Lee, W.C (1988). Coefficients for combining concept classes in a collection.
Proc. ACM SIGIR 11th Ann. Intl. Conf. on R&D in IR. Y. Chiaramella, ed. PUG: Grenoble,
pp.291-307.
7. Salton, G; Fox, E.A & Wu, H (1983). Extended boolean information retrieval. Comm. ACM 26:1022-
1036.
8. Kwok, K.L (1991). Query modification and expansion in a network with adaptive architecture. Proc.
ACM SIGIR 14th Ann. Intl. Conf. on R&D in IR. A. Bookstein, Y. Chiaramella, G. Salton &
V.V. Raghavan eds. ACM: NY, pp.192-201.
9. Kwok, K.L (199x). A network approach to probabilistic information retrieval. submitted for
publication.
10. Robertson, S.E & Sparck Jones, K (1976). Relevance weighting of search terms. J. ASIS. 27:129-146.
11. van Rijsbergen, C.J (1979). Information Retrieval, 2nd Ed. Buflerworths: London.
12. Maron M.E & Kuhns, L.J (1960). On relevance, probabilistic indexing and information retrieval. J.
ACM 7:216-244.
13. Salton, G (1989). Automatic Text Processing. Addison-Wesley: NY.
164