SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Text Retrieval with the TRW Fast Data Finder chapter M. Mettler National Institute of Standards and Technology Donna K. Harman space; is no problem. The "million" subquery uses proximity and an alphanumeric sequence pattern and will find items like the following: a million dollar contract a $2.3 million system a $ 12 billion program a $2000000 machine a $ 2,000,000 machine Note that the phrase "a 2,000,000 dollar award" would not be found by this definition. This was an oversight. The winning query was then simply [50 words -> award and computer and million) This finds documents which contain a 50 word sliding window in which all three subqueries match. Note how the "award" subquery that uses a 3 word sliding window can be nested inside a query using a 50 word sliding window. 4.2 Example of Bad Performance - Topic 36 Topic 36 was to find documents discussing how rewritable optical disks work. To be relevant, a document must describe how rewritable optical disk technology works at length and in significant and comprehensive technical detail. This topic was particularly challenging because the topic narrative describes attributes the documents must have rather than specific concepts or keywords. We started by defining a subquery to find documents mentioning rewritable optical disks. define optical disk [10 word ->[OCRerr]"rewrit" and "optical [disk I drivel technolog]") end To find documents that describe the technology "at length", we wrote a subquery to find places where there were at least 5OOO characters between the <FEXT> definition and the <TEXTh marker. define LONG TEXT [5000 char -> no TEXTEND) end To find documents that contained "significant and comprehensive technical detail" we manually extracted a list of keywords (Table II), and required that the documents to have at least 10 or more of these terms present. The tightest query (intended for high precision) was [1 document -> optical[OCRerr]disk and LONG and 30+ <technical terms> I The loosest (intended for high recall) was [1 document -> optical[OCRerr]disk and 10+ <technical terms> ) 315