SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
The ConQuest System
chapter
P. Nelson
National Institute of Standards and Technology
D. K. Harman
TREC-2 Topic____ Topic
Descriptions SGML Codes
[OCRerr]
[OCRerr] and
Processing Stop Words
other TREC-2 Expand All ConQuest
Function Words Word Meanings Query Logs
Figure 3 Program to Automatically Generate Query
Log Files
The modules in the program are as follows:
Parse Topic - Reads through the topic looking for the
SGML codes (such as <description>). The location
within the topic for all words in the query are preserved
in the final query log files.
Tokenize - Divides up strings into tokens.
Morphology - Locates all words in the dictionary and
reduces them to root words if possible.
Idiom Processing - Collects idioms together as single
terms, such as "United States."
Remove Stop Words - Removes conjunctions,
determiners, auxiliary verbs, prepositions, etc.
Remove Function Words - Removes words such as
"document," "relevant," and "retrieve" which are used
often in TREC-2 narratives but do not help retrieval.
Expand Word Meanings - All word meanings are
expanded using the ConQuest semantic network and all
expansions are added to the query.
Note that all of these steps occur automatically with no
manual input. The program also generates other statistics,
such as the count of each term in the query, a count for each
term for each section of the query (sections being the topic,
description, narrative, concepts, and factors), and the total
number of words in the query.
Manual Query Generation Steps
There were two manual steps used to generate quenes:
1. Remove words, word meanings, and/or expansions
2. Set term weights (if necessary)
Fortunately, ConQuest has graphical user interfaces (Guls)
for removing words, word meanings, and expansions from
the queries automatically generated. A user merely brings
up the query and uses the mouse to select items to be
deleted.
In TREC-2, terms were not weighted in the traditional
sense, but rather were categorized into three sets:
1. Terms that embody the entire query, which would
make good search terms if used by themselves
2. Terms which embody a necessary portion of the query.
but not the entire concept
3. All other related terms
268
These categories provide simple guidelines for setting term
weights, which make it much easier to generate queries.
Evaluations using the TREC-2 test topics determined the
functions for the actual term weights.
To emphasize once more, no document feedback was used
for these manual steps. All query adjustments were
performed without executing any query. Only after all
queries were generated were the final results generated.
The TREC-2 Results
ConQuest scored very well in TREC-2. In particular, our
recall percentages were quite high. Our average precision
scores were not as good, but still competitive.
ConQuest submitted two sets of results for TREC-2,
CnQstland CnQst2. Both sets used the same coarse-grain
algorithm which retrieved the best 5000 documents from
the database. The difference between the two results was
how these 5000 documents were sorted to derive the top
1000 documents which were used for the official results.
The first set (CnQstl) used fine-grain as the only sorting
algorithm. This algorithm primarily depends on local
proximity information, although word statistics and query
structure are also incorporated.
The second set of results (CnQst2) was a weighted average
of the fine-grain and coarse-grain statistics for each
document. As it turned out, this combination of local (fine-
grain) and global (coarse-grain) statistics provided
significantly better statistics.
The relatively modest addition of global information
improved the results more than expected. Previous
experience had always indicated that fine-grain information,
especially the proximity test, was the strongest contributor
to document relevancy.
Some additional insights can be extracted from topic
analyses presented at the TREC-2 conference. Specifically,
the topics where ConQuest excelled over other systems
were also those which tended to have fewer relevant
documents in the database. This indicates that local
proximity statistics (used by ConQuest) are more important
for these queries, since most other systems in TREC-2 are
heavily weighted towards global document statistics. In
other words, ConQuest appears to perform better for queries
where one needs to find the "needle in the haystack."
Post TREC Analysis
After TREC-2, we had the chance to clean up our initial
tests, gather new statistics, and perform some additional
analysis.
The first step in this process was to prove the accuracy of
the coarse-grain algorithm. Remember that initial tests
attempted to improve the coarse-grain algorithm. But did
the coarse-grain algorithm really need improvement? One
indication that coarse-grain was accurate was provided by
the CnQst2 run, which performed better than expected.