SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Overview of the First Text REtrieval Conference (TREC-1)
chapter
D. Harman
National Institute of Standards and Technology
Donna K. Harman
3[OCRerr] The Topics
In designing the TkBC (and [OCRerr]PSThR) tasks, there was a conscious decision made to provide `tuser need"
statements rather than more traditional queries. Two major issues were involved in this decision. First there
was a desire to allow a wide range of query construction methods by keeping the topic (the need statement) dis-
tinct from the query (the actual text submitted to the system). The second issue was the ability to increase the
amount of information available about each topic, in particular to include with each topic a Clear statement of
what criteria make a document relevanL
The topics were designed to mimic a real user's need, and were written by people who are actual users of a
retrieval system. Although the subject domain of the topics was diverse, some consideration was given to the
documents to be search[OCRerr] The topics were constructed by doing trial retrievals against a sample of the docu-
ment set, and then those topics that had roughly 25 to 100 hits in that sample were use[OCRerr] This created a range
of broader and narrower topics.
The following is one of the topics used in [OCRerr]TREC.
<top>
<head> Tipster Topic Description
<num> Number: 066
<dom> Domain: Science and Technology
<tide> Topic: Natural Language Processing
<desc> Description:
Document will identify a type of natural language processing technology which
is being developed or marketed in the U.S.
<narr> Narrative:
A relevant document will identify a company or institution developing or
marketing a natural language processing technology, identify the technology,
and identify one or more features of the company's product.
<con> Concept(s):
1. natural language processing
2. translation, language, dictionary, font
3. software applications
<fac> Factor(s):
<nat> Nationality: U.S.
<fac>
<del> Definition(s):
<top>
Each topic was formatted in the same standard method to allow easier automatic construction of queries.
Besides a beginning and an end marker, each topic had a number, a short title, and a one-sentence description.
There was a narrative section which was aimed at providing a complete description of document relevance for
the assessors. Each topic also had a concepts section with a list of assorted concepts related to the topic. This
section was designed to provide a mini-knowledge base about a topic such as a real searcher might possess.
Additionally each topic could have a definitions section and/or a factors section. The definition section had one
or two of the definitions critical to a human understanding of the topic. The factors section was included to
allow easier automatic query building by listing specific items from the narrative that constrain the documents
that are relevant. Two particular factors were used in the TREC-1 topics: a time factor (current, before a given
date, etc.) and a nationality factor (either involving only certain countries or excluding certain countries).
While the TRBC topics did not present a problem in scaling, the challenge of either automatically construct-
ing a query, or manually constructing a query with little foreknowledge of its searching capability, was a major
challenge for TREC participants. In addition to filtering the relatively large amount of information provided in
the topics into queries, the sometimes narrow definition of relevance as stated in the narrative was difficult for
8