SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
The ConQuest System
chapter
P. Nelson
National Institute of Standards and Technology
D. K. Harman
The ConQuest System
Paul E. Nelson
VP of Research & Development
CONQUESTTM
SOFTWARE
9705 Patuxent Woods Drive, Columbia, Maryland 21046
(410)-290-6290
Introduction
ConQuest software has a commercially available text search
and retrieval system* called "ConQuest" (for Concept
Quest). ConQuest is primarily an advanced statistical based
search system, with processing enhancements drawn from
the field of Natural Language Processing (NLP).
ConQuest participated in Category A of TREC, and so
produced results for 50 test queries over the entire 2.3
Gigabyte database. In this category, we constructed queries
and submitted results for two different ranking functions.
These two functions tested the difference between local and
global document relevancy, and are fully described later.
In TREC-2, ConQuest had a very strong showing. Our
recall scores in particular improved by about 18 percentage
points over the adjusted TREC- I scores. Our precision
scores were also very competitive.
The purpose of this paper is to discuss how we prepared for
TREC-2: how queries were performed, what initial
judgments were made and why, and interpretation of the
results. Then, I will cover the tests which were performed
after TREC-2, and how these tests clearly identify the areas
where ConQuest could most effectively be improved.
System Architecture
For a complete discussion of the system architecture of
ConQuest, see the TREC- 1 conference proceedings, or call
the author. The following overview is meant as a brief
refresher.
ConQuest uses pre-built indexes to perform text database
searches at fast speeds. In such a system, all text to be
searched must first be indexed. These indexes are then used
for all searching; the original document data is not required.
ConQuest uses a dictionary augmented with a semantic
network for both indexing and queries. The dictionary is a
list of words where each word contains multiple meanings.
Each meaning contains syntactic information (part-of-
speech, feature values), and a dictionary definition.
*
For additional information on ConQuest, please contact
the author.
265
The semantic network contains nodes which correspond to
meanings of words. These nodes are linked to other related
nodes. Relationships between nodes are extracted from
machine readable dictionaries. Some example relationship
types include synonym, antonym, child-of, parent-of,
related-to, part-of, substance-of, contrasting, and similar-to.
The ConQuest dictionary was generated automatically from
several Machine Readable Dictionary (MRDs) sources,
commercially available. This gives ConQuest the most
robust and thorough coverage of English available. It is the
completeness of coverage that drives performance gains in
recall and precision.
Since ConQuest is a commercially available product, many
additional components, not required for TREC-2, are also
available, such as true client/server, graphical user
interfaces, routing and dissemination, and sophisticated
application program interfaces.
Query
Generally speaking, ConQuest attempts to refine and
enhance the user's query. The result is then matched against
the indexes to look for documents which contain similar
concepts.
Queries are not "understood" in the traditional sense of
natural language processing. ConQuest makes no attempt
to deeply understand the objects in the query, their
interaction. or the user's intent. Rather, ConQuest attempts
to understand the meaning of each individual word and the
importance of the word. It then uses the set of meanings
and their related terms (retrieved from the semantic
networks) as a statistical set which is matched against
document information stored in the indexes.