SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Retrieval Experiments with a Large Collection using PIRCS chapter K. Kwok L. Papadopoulos K. Kwan National Institute of Standards and Technology Donna K. Harman results may be relevant. 5. Conclusion Our system called PIRCS, acronym for Probabilistic Indexing and Retrieval -Component- System, has been shown to be able to support storage and retrieval for a large collection of 0.5 GB. If appropriate hardware is available, and with some software modification, it should support 2GB size collections. Our strategy to IR and TREC consists of: 1) use of document components to provide a more restricted context for retrieval and feedback, and to provide an initial ICTh term weighting; 2) two-word phrases, especially consisting of stopwords and high frequency stems as a precision and recall enhancing tool; 3) combination of retrieval methods, including soft-boolean, to capture cooperative effects between retrieval methods, and 4) network implementation with learning to implement feedback and feedback with query expansion, resulting in a two-layer direct-connect artificial neural network with adaptive architecture. Our approach leads to results that are better than expected. 6. Acknowledgment We would like to thank our department Chairman and the Dean of Mathematics and Natural Science for their support throughout the project. This work is partially supported by a grant from DARPA and a PSC- CUNY grant #6-63288. References 1. Kwok, K.L (1989). A neural network for probabilistic information retrieval. Proc. ACM SIGIR 12th Ann. Intl. Conf. on R&D in IR. N.J. Belkin & C.J. van Rijsbergen, eds. ACM: NY, pp.21-30. 2. Kwok, K.L (1990). Experiments with a component theory of probabilistic information retrieval based on single terms as document components. ACM TOIS 8:363-386. 3. Porter, M.F (1980). An algorithm fro suffix stripping. Program 14:130-137. 4. Smith, M (1990). Aspects of the p-norm model of information retrieval: Syntactic query generation, efficiency, and theoretical properties. TR 90-1128, Ph.D. Thesis, Cornell University. 5. Turtle, H.R & Croft, W.B (1991). Evaluation of an inference network-based retrieval model. ACM TOIS 9:187-222. 6. Fox, E.A; Nunn, G.L & Lee, W.C (1988). Coefficients for combining concept classes in a collection. Proc. ACM SIGIR 11th Ann. Intl. Conf. on R&D in IR. Y. Chiaramella, ed. PUG: Grenoble, pp.291-307. 7. Salton, G; Fox, E.A & Wu, H (1983). Extended boolean information retrieval. Comm. ACM 26:1022- 1036. 8. Kwok, K.L (1991). Query modification and expansion in a network with adaptive architecture. Proc. ACM SIGIR 14th Ann. Intl. Conf. on R&D in IR. A. Bookstein, Y. Chiaramella, G. Salton & V.V. Raghavan eds. ACM: NY, pp.192-201. 9. Kwok, K.L (199x). A network approach to probabilistic information retrieval. submitted for publication. 10. Robertson, S.E & Sparck Jones, K (1976). Relevance weighting of search terms. J. ASIS. 27:129-146. 11. van Rijsbergen, C.J (1979). Information Retrieval, 2nd Ed. Buflerworths: London. 12. Maron M.E & Kuhns, L.J (1960). On relevance, probabilistic indexing and information retrieval. J. ACM 7:216-244. 13. Salton, G (1989). Automatic Text Processing. Addison-Wesley: NY. 164