SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Knowledge-Based Searching with TOPIC
chapter
J. Lehman
C. Reid
National Institute of Standards and Technology
D. K. Harman
3.4.1 ROUTING TOPICS
Overall, Topic's performance on the routing topics was
rather good. We count that 21 of the 50 results were at
or above median, and three were actually the bet score.
Most of the other results were on the low side of the
median. The relevant comparison to the median is
summarized in Figure 2. The exceptions were topics 66,
67, 69, 74, 90 and 91, for which the Topic search used
could be said to have failed. Several of these were
straightforwardly explained. For example, in the case of
topic 67 the wrong results were submitted. Our
independent scoring of the correct results set would give
the Topic search below median score. For topic 69 there
was in fact only one relevant document, but, at least in
our reading of the definition, this seems to be a false
positive. In the case of topics 90 and 91 the Topic
search definitions were, in our opinion, over-constrained.
Further, in the case of topic 91 an index creation
decision prevented a quite reasonable Topic definifion
from performing as well as it could.2 The other two
topics are of more interest.
No clear pattern emerged between the type of search
although, in the routing augmentation category, the
Topic performance was well above the median on 20 of
33 searches.
3.4.1.1 ROUTING TOPIC 66
A relevant document for this topic is one that identifies a
type of natural language processing technology that is
being developed or marketed in the United States. The
original definition of the Topic is basically a
conjunction (AND) of a natural language concept and a
products/technology concept.
Performance was very poor, viz:
Relevant = 86
Rel_ret = 1
R-Precision = 0.0000
Inspection of the Topic revealed that one of the
conjuncts (the products/technology concept) had a weight
of 0.05- thus effectively limiting the range of scores
that Topic could produce to be in an extremely narrow
range.
2This topic is about the acquisition of advanced weapons
by the U.S. Army. One of the weapons systems mentioned
in the information need statement is the M-1 tank. This was
included in the Topic definition as the word "M-l"; but
since the "-" symbol was interpreted as with like space at
database build time, there was no possibility of retrieving
documents based on "M- I" as a word.
We changed the 0.05 to 0.5 and produced the following:
Relevant = 86
Rel_ret=44
R-Precision = 0.2442
which is a median result.
We concluded that for Topics to be effective we need to
ensure a sufficient range of scores to give us the
discrimination needed for the TREC scoring algorithm.
3.4.1.2 ROUTING TOPIC 74
A relevant document for this topic is one that cites an
instance in which the U.S. Government propounds two
conflicting or opposing policies. The routing task is
complicated because this conflict may not necessarily be
mentioned in the same document.
In our opinion, this is a case where no amount of
sophistication in Topic construction would enable Topic
to do very Well. The information need is simply outside
the scope of a retrieval system that uses non-NLP
techniques. The best one could hope for is to model a
document that talks about the meta-idea of conflict (i.e.,
find documents that talk about the US having conflicting
policies, rather than documents that reference the specific
conflicting policy). This is, in fact, what was done in
the original submission. The results were:
Relevant = 323
Rel[OCRerr]ret = 18
R-Precision = 0.0464
which is, of course, rather poor.
The original statement of need actually mentions three
examples of conflicting policies so, as an experiment,
we ran the following query:
* <Many><Stem>
/wordtext = "tobacco"
* <Many><Stem>
/wordtext = "pesticide"
* <Many><Phrase>
* <Many><Stem>
/wordtext = "infant"
* <Many><Stem>
Iwordtext = "formula"
that is, just an ACCRUE of "tobacco pesticide" and
"infant formula" (which the modification that the
<Stem> and <Many> operators produce.
This gave the following results:
Relevant = 323
Rel_ret=107
R-Precision = 0.2660
which puts the score slightly above median. We expect
that most TREC-2 participant sites probably did just
this, and those that did much better than median found
some other specific examples of a conflicting policy and
modeled these in their routing queries.
214