SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Text Retrieval with the TRW Fast Data Finder
chapter
M. Mettler
National Institute of Standards and Technology
Donna K. Harman
3.2
The poor results from our initial trials with statistical query generation led us to fall back
on a purely manual (with feedback) approach. We extracted key concepts from the topic
description, added additional terms from outside knowledge or by observing them in
database documents. In building our multiple queries to provide a coarse grain ranking, we
favored documents where the subqueries matched in lead sentences or paragraphs. We
mostly ignored the May/June NIST sample judgements.
Manual Query Generation
Refinement of the queries was done manually by executing them, reviewing the results, and
modifying the queries. The easier topics required only a few iterations, while on some of
the more difficult topics we iterated several dozen times. We stopped working on a topic
when it seemed that the results were converging to practical limit for our approach, i.e.
when adding additional synonym keywords or altering the query structure wasn't producing
more reasonable results.
4.0
Table I shows our results for the TREC routing queries. Since our system doesn't really
rank the reujeved documents, we think the Table I presentation is more representative of
our performance than the 1 lpt averages. The [OCRerr]rrst and last columns are the topic number
and description. The second column, "# Rel", is the number of relevant documents as
judged by NIST in the Volume II Corpus. The next three columns give an indication of
how the field did on the topic. "TRW Rel" is the number of relevant documents we
submitted out of the "TRW Submit" we sent in for each topic. Our scores are summarized
as follows:
Results and Analysis
High Above Med Median Below Med Low
8 15 4 19 2
Unlike most of the TREC participants, we did not submit the full 200 allowed documents
for each topic. This turned out to be a major blunder because the TREC scoring procedure
did not reward this self restraint. Many of our queries were too restrictive, achieving high
precision at the expense of recall and a good score for the conference.
This problem comes about because of the binary nature of the FDF's evaluation of a query
against a document. To operate properly against a routing data stream, it is necessary to
execute several queries for each topic, with each successive query aiming for higher recall.
When our queries were "tuned" properly, the results were quite good. Considering only
those queries where we made the full submission, the distiibution is well above the median.
High Above Med Median Below Med Low
6 8 3 5 0
Our analysis shows that we did well on topics where the ability to find phrases, acronyms,
numbers, and alphanumerics were important. We had the high score on topics 28 and 29,
both which involved finding references to AT&T. Since we retain and scan the full data
stream, we didn't have to worry about an indexing parser splitting "AT&T" into "AT" and
"T" and then throwing them both away. Our PSL subquery to find AT&T was
312