NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Text Retrieval with the TRW Fast Data Finder chapter M. Mettler National Institute of Standards and Technology Donna K. Harman 3.2 The poor results from our initial trials with statistical query generation led us to fall back on a purely manual (with feedback) approach. We extracted key concepts from the topic description, added additional terms from outside knowledge or by observing them in database documents. In building our multiple queries to provide a coarse grain ranking, we favored documents where the subqueries matched in lead sentences or paragraphs. We mostly ignored the May/June NIST sample judgements. Manual Query Generation Refinement of the queries was done manually by executing them, reviewing the results, and modifying the queries. The easier topics required only a few iterations, while on some of the more difficult topics we iterated several dozen times. We stopped working on a topic when it seemed that the results were converging to practical limit for our approach, i.e. when adding additional synonym keywords or altering the query structure wasn't producing more reasonable results. 4.0 Table I shows our results for the TREC routing queries. Since our system doesn't really rank the reujeved documents, we think the Table I presentation is more representative of our performance than the 1 lpt averages. The [OCRerr]rrst and last columns are the topic number and description. The second column, "# Rel", is the number of relevant documents as judged by NIST in the Volume II Corpus. The next three columns give an indication of how the field did on the topic. "TRW Rel" is the number of relevant documents we submitted out of the "TRW Submit" we sent in for each topic. Our scores are summarized as follows: Results and Analysis High Above Med Median Below Med Low 8 15 4 19 2 Unlike most of the TREC participants, we did not submit the full 200 allowed documents for each topic. This turned out to be a major blunder because the TREC scoring procedure did not reward this self restraint. Many of our queries were too restrictive, achieving high precision at the expense of recall and a good score for the conference. This problem comes about because of the binary nature of the FDF's evaluation of a query against a document. To operate properly against a routing data stream, it is necessary to execute several queries for each topic, with each successive query aiming for higher recall. When our queries were "tuned" properly, the results were quite good. Considering only those queries where we made the full submission, the distiibution is well above the median. High Above Med Median Below Med Low 6 8 3 5 0 Our analysis shows that we did well on topics where the ability to find phrases, acronyms, numbers, and alphanumerics were important. We had the high score on topics 28 and 29, both which involved finding references to AT&T. Since we retain and scan the full data stream, we didn't have to worry about an indexing parser splitting "AT&T" into "AT" and "T" and then throwing them both away. Our PSL subquery to find AT&T was 312