SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Retrieval Experiments with a Large Collection using PIRCS
chapter
K. Kwok
L. Papadopoulos
K. Kwan
National Institute of Standards and Technology
Donna K. Harman
(b) Some of the topics have very specific requirements for documents to be relevant. For example: Topic
#1 needs antitrust cases as a result of complaint, not routine review; #2 needs acquisitions between a U.S.
company and another non-U.S. company; #53 needs leveraged buyout cases valued at or above $200
million; while #60 requires a policy change from merit-pay vs. seniority or vice versa. Data like `above
$200 million' or other numerics are removed either because they are on our stopword list or because of
high frequency. Other topics involve very general concepts that require the system to understand their
specific inferences. Examples are: course of action to decrease the U.S. deficit (#7); the body of water
being polluted (#12); specific commercial applications of superconductors (#21); hypocritical and
conflicting policies of the U.S. government (#74). The possible `course of action', `commercial
applications', `conflicting policies', etc. are essentially open-ended. Yet others need synonym lists or other
aids to interpret proper terms in order not to miss documents. Examples are: Japanese, U.S. or foreign
companies (#2,3), European Community or countries (#5, #69), third world or developing countries (#4,6),
or economic indicators (#8). When would a proper noun, if identifiable, represents a company? And if
it is, is it a foreign company? Also, a few of the topics like #3, 15, 53, 56, 66 have short descriptions
with many general words, so that after stemming and stop-word processing, these queries end up with few
terms. PIRCS does not have tools for these problems. Its precision values at 50% recall and at 11-pt Avg
for ad hoc and routing retrievals are tabulated below:
Ad Hoc
PIRCS1
PIRCS2
Routing
PIRCS 1 PIRCS2
------------------------------------------------
50% Recall I .276 .278 I .340 .342
11-pt Avg I .311 .322 I .343 .369
The ad hoc precision value of about 0.28 at 50% recall says that, averaged over 25 queries, if one wants
to retrieve half of all relevant documents, one would have to read about eleven documents to get three
relevant, and about nine documents to get three relevants for routing. Routing queries receive some help
from the few relevants provided for training purposes, just as in experiments first reported in [8] where
we simulate users posing queries equiped with some known relevants. The 11-pt Avg precison values
sample eleven recall points and simulate a uniform distribution of users with different recall needs, and
may reflect actual usage better. Effectiveness improves to between three in ten and three in nine retrieved
documents being relevant for ad hoc, and slighdy better for routing. These results naturally leave much
room for improvement; but considering that PIRCSl is fully automatic and relies only on statistical
methods, results seem reasonable for this large WSJ collection. See also analysis in (c) and (f).
(c) Another evaluation of the system is to look at the precision-recall values at different cut-off points
of 5, 15, 30, 100 and 200 retrieved documents. This may give users a better `feel' than the hypothetical
11-pt Avg. A question is what should these values be compared to? The theoretical limit is of course
1.0, for perfect recall and precision. However, this would punish the system unfaidy. For example, at
15 retrieved documents, many queries have x relevants with x> 15. Hence the best recall at this cut-off
should be iSix. This we call the best operational recall in contrast to the theoretical best of 1.0.
Similarly, if x <15[OCRerr] these queries would have the best operational precision at this cut-off of x/15, instead
of 1.0. We have listed below for PIRCS 1 the precision-recall values at various cut-offs and also the best
operational values for comparison:
5 15 30 100 200
Cut-off:
Ad Hoc
Best Oper. Recall: .146 .298 .456 .828 .916
Recall: .066 .130 .204 .419 .586
45% 44% 45% 51% 64%
161