SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Text Retrieval with the TRW Fast Data Finder
chapter
M. Mettler
National Institute of Standards and Technology
Donna K. Harman
define ATT `[AT\&[Iamp\;)TIAT and TI\
American Telephone [and I\&) Telegraph)' end
The "[lamp;]" notation means to allow an optional "amp;" as was present in the Ziff
database. We had the high score on Topic 10 "AIDS Treatments". This may be due to our
ability to easily find phrases like "acquired immune deficiency syndrome" or "AIDS
related complex" in close proximity to drug names like "ThA", "5-fluorouracil", or "AZT".
We had the high score on topic 13 to find documents about Mitsubishi Heavy Industries.
Our query that found 111 of the 112 documents the NIST judged relevant, was simply to
find the two word phrase "Mitsubishi Heavy". Apparently the other TREC participants had
trouble either finding phrases or determining the need to find phrases during query
generation. The following sections discuss in detail topic 47 where we achieved the high
score, and topic 36 where we achieved a low score.
4.1 Example of Good Performance - Topic 47
Topic 47 was to find documents discussing new contracts for computer systems in excess
of $1 million. We found 80 good documents out of 200 submitted, the high score for this
topic. We believe we did well on this topic because we were able to look for various
numeric representations of $1 million in close proximity to keywords for new contracts and
computer systems.
To be relevant, a document must identify the selection of a
source for the development or delivery of information systems
products or services valued at more than $1 million dollars.
The PSL query for this topic used three subqueries: one each for the "selection of a source",
information systems products or services", and "more than $1 million dollars". The PSL
definitions were:
define award [3 words -> `[siqnjaward)*I and "contract") end
define computer
"[computer I communic network phone telecommi mainframe
StarlanlpBxlcyber IBM 30901X\-MPIY\-MPISCS\-4olinformation)" end
define million [1 word -> "[millionIbillion)[OCRerr]dollar" or
I [j [0-9)) [I [0-9)) [0-9] [ j\. ] [[0-9)] [I [0-9)) [I [0-9] ]\
[millionibillion)" or
I) [I [0-9)] [1(0-9]) [0-9) [I\,] [0-9] [0-9) [0-9) [ I\,)\
[0-9)[0-9)[0-9]"} end
The "award" definition requires the root words "sign" or "award" to be within 3 words of
contract" in the text. This word count includes stop words, acronyms, or any other
alphanumerics that were in the original text. This definition will find phrases like:
a contract was awarded
AT&T signed a new contract
Bellcore was awarded three new contracts.
The "computer" definition looks for any of the root terms shown. Note that looking for
alphanumerics such "X-MP" or "IBM 3090", which may include multiple character white
314