SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Knowledge-Based Searching with TOPIC
chapter
J. Lehman
C. Reid
National Institute of Standards and Technology
D. K. Harman
3.4.1.3 TOPIC 67
Our analysis located weak topic formulation examples,
such as query 67, illustrated in Figure 4. In this query, a
set of optional, auxiliary evidence was "ANDed" with a
small set of required evidence. The weight, or strength
assigned to the auxiliary evidence was .05, which means
that if all auxiliary terms were located, the highest
possible score for a document would be .05, severely
limiting the range of scores, and thus the occurrence of
random false hits in the top 1000.
To make a cosmetic improvement, only the value of the
auxiliary evidence node was changed, to a value of .5, as
shown in Figure 5. This change alone brought the Topic
relevant document count to the median.
3.4.2 ADHOCTOPICS
Overall, Verity's performance on the ad hoc topics was
adequate. Performance was poorer than on the routing
topics, but this is to be expected since there was less
time available to build the Topics and no ground truth
against which to test the Topic trees. The relevant
comparison to the median is summarized in Figure 3.
We count that 13 of the 50 results are at or above
median. In contrast thought, there were only two
outright failures here, topics 124 and 139. We did not
look at topic 139, but topic 124 involves searching for
documents that discuss innovative approaches to cancer
therapy that do not involve any of the traditional
treatments. This is a very hard topic because nearly all
mentions of the innovative treatments are in the context
of discussion of traditional therapies The approach
adopted by Verity of simply looking for documents that
talk about innovative treatment produces a large number
of false hits (giving poor precision), and since there is an
artificial cut-off at 1000 documents in the TREC
experiments, this model also produces poor recall. We
do not see an obvious solution to this.
We picked three ad hoc topics to analyze in detail.
3.4.2.1 AD HOC TOPIC 109
A relevant document for this topic simply needs to
mention one of a list of six companies given in the
information need statement. A simple Topic that is the
disjunction (OR) of the company names should be all
that is needed here. However, the official result is:
Relevant =742
Rel_ret=192
R-Precision = 0.2588
which is well below median. furthermore, given the
simplicity of the topic, this is surprisingly low recall.
Examination of the official Topic showed that company
acronyms we used for three of the companies (i.e., 3M,
OTC, ISI) were given equal weight to the fully spelled
out company names. A cursory review of the original
hit list showed that ISI was a poor choice since it has
multiple interpretations. Less important, but for the
same reason, OTC is a poor choice in the Wall Street
Journal corpus since it can mean "over the counter", and
in the DOE corpus 3M is part of a designator for a
particular particle accelerator and is also used as an
abbreviation for "three meters".
We modified the Topic by eliminating the ISI acronym
and by giving OTC and 3M reduced weights. This
produced the following:
Relevant = 742
Rel_ret=480
R-Precision = 0.5512
which would have been the best score.
An interesting note here is that original and modified
Topics had perfect precision and recall for the first 100
documents. Our conclusion is that this indeed was an
easy topic - the false hits produced by ISI were what
impacted Topics score.
3.4.2.2 AD HOC TOPIC 121
A relevant document for this document had to mention
the death of a prominent U.S. citizen due to an identified
form of cancer.
This is an interesting topic consisting of two major
components - the idea of a prominent citizen, and the
idea of a specific cancer.
In the official Topic, prominence was modeled using a
number of words that indicate prominence (e.g.,
"prominent", "celebrity") together with words that
indicate prominent roles (e.g., "Nobel Prize", "actor",
"actress"). Cancer death was modeled by various
combinations of death words (e.g., "death", "died") and
cancer words (e.g., "cancer", "tumor", "leukemia"). The
official score was:
Relevant = 55
Rel_ret =27
R-Precision =0.1455
which, while not good in absolute terms, was well
above the median.
We observed two problems with this definition. First, it
uses generic cancer terms rather than the specific cancer
types required by the information need statement. So,
we made all the cancer terms specific by using a list of
common cancers (e.g., lung cancer, breast cancer,
stomach cancer, etc.). We made no attempt to make
216