SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Document Retrieval Experiments using PIRCS
chapter
K. Kwok
L. Grunfeld
National Institute of Standards and Technology
D. K. Harman
a retrieval (ranking) of Diski is still expensive and perhaps
not necessary; rather we just select `nonbreak' relevants
from Diski for training. `Noubreak' means documents that
do not get split into multiple subdocuments based on our
criteria given in Section 3. The idea is that the quality of
documents for training is important, and short relevants are
the choice. They may not be those ranked early dwing a
retrieval. With these simplifications, a network is produced
with Diski, the query-term edges are trained, and then
stored for later routing retrieval using Disk3.
Ad hoc pirc[OCRerr] denotes retrieval based on combining the
baseline pircs3 with a soft-boolean retrieval. The pircs4
ranking formula becomes r*W1+s*S[OCRerr] (see definitions in
section 2). Our Boolean expressions for queries are
produced automatically as discussed in Section 4.3, and
edge weights are used to initiate the leaf nodes of the
Boolean expression tree.
7. Discussion of Submitted Results
From our routing retrieval table in the master appendix of
this volume, it can be seen that pircs2 improves over pisesi
by about 7% based on average non-interpolated precision
(.266 vs .249) and about 3.8% based on relevants retrieved
(6135 vs 5913), showing that our simplified method of
using only the Diski `nonbreak' training documents still
works. We did not do a retrieval and rank. Compared with
the other sites, our result is below median both using the
average non-interpolated precision for individual queries (18
better, 2 equal and 30 below median), and using the
relevants retrieved at 100 documents (18 better, 8 equal and
24 below median). if we assume the existence of an
overall `nia[OCRerr][OCRerr]i-system' that produces the best non-
interpolated precision values among all sites for all 50
queries, then its average precision over all queries is 0.5054
and 8348 relevants retrieved. Our piccs2 achieves only
.2661.505 = 52.7% of the average precision but 6135/8348
=73.5% of the relevants retrieve[OCRerr]
From the ad hoc retrieval table in the appendix of this
volume it can be seen that pircs4, which is pircs3 combined
with automatic soft-Boolean retrieval, improves over pircs3
only by about 1%. Processing time however increases
substantially. Our automatic Boolean expressions are
cuudely formed; manual Boolean queries may do better.
Compared with other sites, our result is above median both
using the average non-interpolated precision for individual
queries (34 better, 2 equal and 14 below median), and using
the relevants retrieved at 100 documents (36 better, 4equal
and 10 below median). The `maxi-system' has an average
precision over all queries of 0A354 and 9027 relevants
retrieved. pirc54 achieves about 0.29810A35 = 68.5% of
this best precision value and 7464[OCRerr]027 = 82.7% of the
239
relevants retrieved. They are much better than for routing.
It would be most useflil and interesting if one can choose
the best reported result for each query before the answers
are known. For these experiments our high frequency term
cut-off is 16000, which is still too low. The next Section
discusses our later results.
8. Further Experimental Results
After the ThE[OCRerr] Confer[OCRerr]nce, we decided to repeat both
experiments. We realite that our disappointing results are
due to several factors: 1) bad high frequency term cut[OCRerr]off
leading to in[OCRerr]ufficient representation; 2) no query expansion;
3) msufficient training samples; and 4) parameters need
tuning. Fxcept for 4) these are rem[OCRerr][OCRerr] as follows: high-
frequency cut-off is set at 50000, leaning for routing is
done from both diski and dis'c2 and only docic:uents that
`break' into six or less subdocuents are used, and query
expansionisalsodone. TherunsarenamedinTable2as:
piscs5: routing, with learning but no query expansion;
piics6: routing, query expansion level of 20;
pircs7: routing, `upperbound', no expansion;
piccs8: ad hoc without Boolean queries.
As in ThECi, our query expansion level of 20 actually adds
less than 20 terms because some of the top-ranked terms
may already appear in the query. It can be seen that results
are substantially better than those in Section 7. Iu
particular, pircs6 routing with query expansion have average
precision of 0.355 and the number of relevants retrieved are
7476 out of 10489. These are 12% and 5% respectively
better than pircs5 (0.318, 7098): routing with leaniing but
no query expansion, and achieving 70.3% and 89.6% of the
maxi-system values. The same average precision value and
relevants retrieved for ad hoc retrieval pircs8 are 0.344 and
8279 out of 10785, representii[OCRerr]g 79% and 91.7% of the ad
hoc maxi-system respectively. At 20 docents retrieved,
the precision values for routing and ad hoc are respecively
0.583 and 0.564. This means that averaging over 50
queries, out of the first 20 retrieved over 11 are relevant.
Considering the sire of these textbaees, these are quite good
results. These numbers are user-oriented, and users
naturally hope to see 100% precision. As discussed in
ThECi, from a system poiut of view the precision at n
documents retrieved shoold not be compared to the
theoretical value of 1.0, but to an operatioxal [OCRerr]
value x[OCRerr] ifthe total numberofrelevants xfor a query is
less than n. For example, at n=100 docuents retrieved 20
routing and 16 ad hoc queries have total relevants x less
than 100. The operational maximum precision averaged
over 50 queries for routing is only 0.8, and that for ad hoc
is 0.871. At 100 documents, routi pircs6 value of 0A39
and ad hoc pircs8 value of 0.468 therefore achieves 54.9%