SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Site Report for the Text REtrieval Conference
chapter
P. Nelson
National Institute of Standards and Technology
Donna K. Harman
* Query By Example: It is possible to submit an entire document as a query, which is
called "query by example." Essentially, the document is used as an example, and
documents similar to it are retrieved. This function works even when the example
document is very large.
* Search Within: A query can be directed to search only within a small set of documents.
The set of documents is often selected from a previous query. This function is also
called "recursive" search or "recurrent" search. This function is especially useful when a
statistical query searches over the results of an earlier boolean query, or visa-versa.
* Numeric and Date Ranges: ConQuest can fmd documents which contain numbers or
dates within a specified range. Numbers are subject to the standard proximity tests in the
boolean and natural language queries, just like other words.
* Fielded Searches: A search can be restricted to any particular field in a documenL For
example, a users often wish to search only over the "authors" field, and not over the full
body of the text.
* Document Categories: Documents in ConQuest can be categorized as appropriate. Users
can target searches to occur only over a single category, or over multiple categories.
Evaluation Results
ConQuest had the highest overall score in ThEC for Category A systems (the full 2.5
Gigabyte database) using the 11 point averages. In comparing ConQuest to other systems,
we found the following two graphs to be useful.
100%
80%
60%
40%
20%
II
ii
II Best
no,
`1l0
-20%
-40%
-60%
-80%
II
`I I- -II -I I I I III II III III I- II dl dl dl . . - - - - - - -
Average
-100% Worst
Figure 5 ConQuest vs All Systems for the 39 Non-Zero Queries
Category A, Manual Mode
This first graph (figure 5) shows the results of ConQuest for the 39 non-zero queries (the
remaining 11 queries had no relevant documents in the database). Ascore of 100%
represents the best of all Category A systems, a score of - 100% represents the worst of all
Category A systems. A score of 0% represents average performance.
294