SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Site Report for the Text REtrieval Conference
chapter
P. Nelson
National Institute of Standards and Technology
Donna K. Harman
even if this means delaying some advanced or exotic modules. Our philosophy is that
simple programs, well executed, will always out-perform complex tools poorly done.
System Architecture
ConQuest uses pre-built indexes to perform text database searches at fast speeds. In such a
system, all text to be searched must first be indexed. The indexes built in this process can
then be used by the text search engine to produce results. Both indexing and search use a
dictionary with a semantic network to perform various NLP tasks.
Queries
Resufts
Figure 1 ConQuest System Architecture
There are other modules in the ConQuest system not shown in Figure 1. These include the
library manager, which is responsible for system parameters, database configuration,
resource allocation, and physical partitioning of the indexes. Also the dictionary editor,
which can be used to edit words, meanings, links, and definitions.
The Dictionary
ConQuest uses a dictionary augmented with a semantic network to perform indexing and
queries. The dictionary is a list of words where each word contains multiple meanings.
Each meaning contains syntactic information (part-of-speech, feature values), and a
dictionary defmition.
The semantic network contains nodes which correspond to meanings of words. These
nodes are linked to other related nodes. Relationships between nodes are extracted from
machine re[OCRerr]dable dictionaries. Some example relationship types include synonym,
antonym, child-of, parent-of, related-to, part-of, substance-of, contrasting, and similar-to.
ConQuest uses the dictionary for morphological analysis (see below) and idiom
processing. The semantic networks are used to expand the query to include related terms.
Since connections are made between meanings of words, both in the dictionary and the
semantic networks, processing is much more accurate compared to a simple thesaurus.
289