SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Document Retrieval Experiments using PIRCS
chapter
K. Kwok
L. Grunfeld
National Institute of Standards and Technology
D. K. Harman
terms. The `No Train.' column shows results without using
any known relevants for training and serve as the basis for
comparison. It can be seen using the measure of average
precison over all recall points that training without query
expansion improves over no training by 21%, and training
with expansion at the 40 level improves over the basis by
about 39%. This measure, as well as the R-precialon,
aeems to level off from expansion level 40 onwards.
However the number of relevants retrieved improves from
6551 (no training) to 7712 (expansion level 100) in a
monotone fashion. lijither query expansion level appears to
improve the high-recall region of the precision-recall curve
without materially affecting the low-recall region as
observed in [OCRerr]w[OCRerr]x] using the WSJ collection only.
Precision values at the different cutoffs of documents
retrieved seem to level off at the expansion level of 80. At
20 retrieved documents cutoff, we now achieve a precision
over 0.65, meaning that more than 13 of the 20 documents
retrieved are relevant on the average. The timing of
pararneters give us over 10% additional improvements
above those obtaed in the revised routing results of Table
2. It appears that a query expansion level of 40 achieves a
comprornise between good effectiveness and good
efficiency for our system. We did not do massive query
expansion at high levels of 200 or more. However, the
results are comparable to the best of those reported in the
ThE[OCRerr] conference.
9. Conclusion
We have upgraded our PIRCS system to use dynamic
network creation for leaming and retrieval, and to handle
files in a master-subeollection desi[OCRerr] The former approach
allows us to eliminate fall inverted file creation resulting in
2 x collection sire space requirement, reduced `dead' tirne
for a collection to be searchable, and provide fast leaming.
The latter approach renders our system to be sufficiently
flexible to handle a large number of files in a robust
fashion, yet produce a retrieval ranked list as if all
documents were inone file. Although our subrnitted results
for ThEŁ2 were not up to expectation becanse of
insufficient resources at the time of the experiments, the
reasons for the behavior of our system were isolated. New
experiments show that PIRCS can provide highly
competitive retrieval effectiveness in both ad hoc and
routing environments.
Acknowledgment
Ms. Lu Chi-Ni provided the program for our twoword
phrase detection. We would like to acknowledge the
continal support of the Department Chranan and the Dean
242
of Mathematics and Natural Science at Queens College
throughout the projecL This work is partially supported by
a grant from ARPA and a PSC-CUNY grant #663288.
References
[13uS[OCRerr]3] Buckley, C. Salton, 0 & Allan, J (1993).
Automatic retrieval with locality information using SMART.
hi: TheFirstText REtrieval Conference (I[OCRerr]C-1). Harman,
DX. (ECL). NIST Special Publication 500-207. pp.59-72.
[Crof93] Croft, W.B (1993). The University of
Massachusetts Tipster projec[OCRerr] In: The First Text REtrieval
Conference [OCRerr][OCRerr][OCRerr]EC-1). Hannan, DX. (Ed.). NIST Special
Publication 500-207. pp.101-105.
[OCRerr]wFK93] Kwok, KI, Papadopolous, L & Kwan, Y.Y.
Retrieval experiments with a large collection using PIRCS.
In: The First Text REtrieval Conference Cfl[OCRerr]EC-1). Harian,
D.K. (Ed.). NIST Special Publication 500-207. pp. 153-172.
[OCRerr]wok9O] Kwok, KI (1990). Experiments with a
component theory of probabilistic information retrieval
based on singie terms as document components. ACM TOIS
8:363-386.
[OCRerr]wo[OCRerr]x] Kwok, K.L (199x). A network approach to
probabilistic information retrieval. Accepted for publication
in ACM TOIS.
[SaFW83] Salton, 0; Fox, E.A & Wu, H (1983). Extended
boolean information retrieval. Comm. ACM 26:1022-1036.
[OCRerr]OSp76] Robertson, S.E & Sparck Jones, K (1976).
Relevance weighting of search terms. J. ASIS. 27:129-146.
[Spar79] Sparck Jones, K (1979). Experiments in relevance
weighting of search terms. Info. Froc. Mgmnt. 15:133-144.