NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-2 Document Retrieval Experiments using PIRCS chapter K. Kwok L. Grunfeld National Institute of Standards and Technology D. K. Harman a retrieval (ranking) of Diski is still expensive and perhaps not necessary; rather we just select `nonbreak' relevants from Diski for training. `Noubreak' means documents that do not get split into multiple subdocuments based on our criteria given in Section 3. The idea is that the quality of documents for training is important, and short relevants are the choice. They may not be those ranked early dwing a retrieval. With these simplifications, a network is produced with Diski, the query-term edges are trained, and then stored for later routing retrieval using Disk3. Ad hoc pirc[OCRerr] denotes retrieval based on combining the baseline pircs3 with a soft-boolean retrieval. The pircs4 ranking formula becomes r*W1+s*S[OCRerr] (see definitions in section 2). Our Boolean expressions for queries are produced automatically as discussed in Section 4.3, and edge weights are used to initiate the leaf nodes of the Boolean expression tree. 7. Discussion of Submitted Results From our routing retrieval table in the master appendix of this volume, it can be seen that pircs2 improves over pisesi by about 7% based on average non-interpolated precision (.266 vs .249) and about 3.8% based on relevants retrieved (6135 vs 5913), showing that our simplified method of using only the Diski `nonbreak' training documents still works. We did not do a retrieval and rank. Compared with the other sites, our result is below median both using the average non-interpolated precision for individual queries (18 better, 2 equal and 30 below median), and using the relevants retrieved at 100 documents (18 better, 8 equal and 24 below median). if we assume the existence of an overall `nia[OCRerr][OCRerr]i-system' that produces the best non- interpolated precision values among all sites for all 50 queries, then its average precision over all queries is 0.5054 and 8348 relevants retrieved. Our piccs2 achieves only .2661.505 = 52.7% of the average precision but 6135/8348 =73.5% of the relevants retrieve[OCRerr] From the ad hoc retrieval table in the appendix of this volume it can be seen that pircs4, which is pircs3 combined with automatic soft-Boolean retrieval, improves over pircs3 only by about 1%. Processing time however increases substantially. Our automatic Boolean expressions are cuudely formed; manual Boolean queries may do better. Compared with other sites, our result is above median both using the average non-interpolated precision for individual queries (34 better, 2 equal and 14 below median), and using the relevants retrieved at 100 documents (36 better, 4equal and 10 below median). The `maxi-system' has an average precision over all queries of 0A354 and 9027 relevants retrieved. pirc54 achieves about 0.29810A35 = 68.5% of this best precision value and 7464[OCRerr]027 = 82.7% of the 239 relevants retrieved. They are much better than for routing. It would be most useflil and interesting if one can choose the best reported result for each query before the answers are known. For these experiments our high frequency term cut-off is 16000, which is still too low. The next Section discusses our later results. 8. Further Experimental Results After the ThE[OCRerr] Confer[OCRerr]nce, we decided to repeat both experiments. We realite that our disappointing results are due to several factors: 1) bad high frequency term cut[OCRerr]off leading to in[OCRerr]ufficient representation; 2) no query expansion; 3) msufficient training samples; and 4) parameters need tuning. Fxcept for 4) these are rem[OCRerr][OCRerr] as follows: high- frequency cut-off is set at 50000, leaning for routing is done from both diski and dis'c2 and only docic:uents that `break' into six or less subdocuents are used, and query expansionisalsodone. TherunsarenamedinTable2as: piscs5: routing, with learning but no query expansion; piics6: routing, query expansion level of 20; pircs7: routing, `upperbound', no expansion; piccs8: ad hoc without Boolean queries. As in ThECi, our query expansion level of 20 actually adds less than 20 terms because some of the top-ranked terms may already appear in the query. It can be seen that results are substantially better than those in Section 7. Iu particular, pircs6 routing with query expansion have average precision of 0.355 and the number of relevants retrieved are 7476 out of 10489. These are 12% and 5% respectively better than pircs5 (0.318, 7098): routing with leaniing but no query expansion, and achieving 70.3% and 89.6% of the maxi-system values. The same average precision value and relevants retrieved for ad hoc retrieval pircs8 are 0.344 and 8279 out of 10785, representii[OCRerr]g 79% and 91.7% of the ad hoc maxi-system respectively. At 20 docents retrieved, the precision values for routing and ad hoc are respecively 0.583 and 0.564. This means that averaging over 50 queries, out of the first 20 retrieved over 11 are relevant. Considering the sire of these textbaees, these are quite good results. These numbers are user-oriented, and users naturally hope to see 100% precision. As discussed in ThECi, from a system poiut of view the precision at n documents retrieved shoold not be compared to the theoretical value of 1.0, but to an operatioxal [OCRerr] value x[OCRerr] ifthe total numberofrelevants xfor a query is less than n. For example, at n=100 docuents retrieved 20 routing and 16 ad hoc queries have total relevants x less than 100. The operational maximum precision averaged over 50 queries for routing is only 0.8, and that for ad hoc is 0.871. At 100 documents, routi pircs6 value of 0A39 and ad hoc pircs8 value of 0.468 therefore achieves 54.9%