SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-2 Document Retrieval Experiments using PIRCS chapter K. Kwok L. Grunfeld National Institute of Standards and Technology D. K. Harman terms. The `No Train.' column shows results without using any known relevants for training and serve as the basis for comparison. It can be seen using the measure of average precison over all recall points that training without query expansion improves over no training by 21%, and training with expansion at the 40 level improves over the basis by about 39%. This measure, as well as the R-precialon, aeems to level off from expansion level 40 onwards. However the number of relevants retrieved improves from 6551 (no training) to 7712 (expansion level 100) in a monotone fashion. lijither query expansion level appears to improve the high-recall region of the precision-recall curve without materially affecting the low-recall region as observed in [OCRerr]w[OCRerr]x] using the WSJ collection only. Precision values at the different cutoffs of documents retrieved seem to level off at the expansion level of 80. At 20 retrieved documents cutoff, we now achieve a precision over 0.65, meaning that more than 13 of the 20 documents retrieved are relevant on the average. The timing of pararneters give us over 10% additional improvements above those obtaed in the revised routing results of Table 2. It appears that a query expansion level of 40 achieves a comprornise between good effectiveness and good efficiency for our system. We did not do massive query expansion at high levels of 200 or more. However, the results are comparable to the best of those reported in the ThE[OCRerr] conference. 9. Conclusion We have upgraded our PIRCS system to use dynamic network creation for leaming and retrieval, and to handle files in a master-subeollection desi[OCRerr] The former approach allows us to eliminate fall inverted file creation resulting in 2 x collection sire space requirement, reduced `dead' tirne for a collection to be searchable, and provide fast leaming. The latter approach renders our system to be sufficiently flexible to handle a large number of files in a robust fashion, yet produce a retrieval ranked list as if all documents were inone file. Although our subrnitted results for ThEŁ2 were not up to expectation becanse of insufficient resources at the time of the experiments, the reasons for the behavior of our system were isolated. New experiments show that PIRCS can provide highly competitive retrieval effectiveness in both ad hoc and routing environments. Acknowledgment Ms. Lu Chi-Ni provided the program for our twoword phrase detection. We would like to acknowledge the continal support of the Department Chranan and the Dean 242 of Mathematics and Natural Science at Queens College throughout the projecL This work is partially supported by a grant from ARPA and a PSC-CUNY grant #663288. References [13uS[OCRerr]3] Buckley, C. Salton, 0 & Allan, J (1993). Automatic retrieval with locality information using SMART. hi: TheFirstText REtrieval Conference (I[OCRerr]C-1). Harman, DX. (ECL). NIST Special Publication 500-207. pp.59-72. [Crof93] Croft, W.B (1993). The University of Massachusetts Tipster projec[OCRerr] In: The First Text REtrieval Conference [OCRerr][OCRerr][OCRerr]EC-1). Hannan, DX. (Ed.). NIST Special Publication 500-207. pp.101-105. [OCRerr]wFK93] Kwok, KI, Papadopolous, L & Kwan, Y.Y. Retrieval experiments with a large collection using PIRCS. In: The First Text REtrieval Conference Cfl[OCRerr]EC-1). Harian, D.K. (Ed.). NIST Special Publication 500-207. pp. 153-172. [OCRerr]wok9O] Kwok, KI (1990). Experiments with a component theory of probabilistic information retrieval based on singie terms as document components. ACM TOIS 8:363-386. [OCRerr]wo[OCRerr]x] Kwok, K.L (199x). A network approach to probabilistic information retrieval. Accepted for publication in ACM TOIS. [SaFW83] Salton, 0; Fox, E.A & Wu, H (1983). Extended boolean information retrieval. Comm. ACM 26:1022-1036. [OCRerr]OSp76] Robertson, S.E & Sparck Jones, K (1976). Relevance weighting of search terms. J. ASIS. 27:129-146. [Spar79] Sparck Jones, K (1979). Experiments in relevance weighting of search terms. Info. Froc. Mgmnt. 15:133-144.