SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System
chapter
W. Croft
J. Callan
J. Broglio
National Institute of Standards and Technology
D. K. Harman
experiments, we have developed a new stemming algorithm that has a number of advantages
for operational systems. A number of recognizers written in flex are then used to identify
objects such as company names and mark their presence in the document using "meta"
index terms. A company name such as IBM in the text, for example, will result in a meta
term #COMPANY being recorded at that position in the text. The use of these meta terms
extends the range of queries that can be specified. This completes the usual processing for
document text.
The document indexing process also involves building the compressed inverted ifies
that are necessary for efficient performance with very large databases. Since positional
information is stored, overhead rates are typically about 40% of the original database size.
The query processing process involves a series of steps to identify the important concepts
and structure describing a user's information need. INQUERY is unique in that it can
represent and use complex structured descriptions in a probabilistic framework. Many of
the steps in query processing are the same as those done in document indexing. In addition,
a part-of-speech tagge? is to used to identify candidate search phrases. Domain-dependent
features are recognized and meta-terms inserted into the query representation. The relative
importance of query concepts is also estimated, and relationships between concepts are
suggested based on simple grammar rules. An evaluation of some of the query processing
techniques is presented in [1].
INQUERY also has the capability of expanding the query using relationships between
concepts found by either using manually specified domain knowledge in the form of a simple
thesaurus or by corpus analysis. The WORDFINDER system is a version of INQUERY
that retrieves concepts that are related to the query. WORDFINDER is constructed by
identifying noun groups in the text and representing them by the words that are closely
associated with them (i.e. occur in the same text windows). Concept "documents" are then
stored in INQUERY. This technique of query expansion was not tested in TREC-2.
The query evaluation process uses the inverted ifies and the query represented as an
inference net to produce a document ranking. The evaluation involves probabilistic inference
based on the operators defined in the INQUERY language. These operators define new
concepts and how to calculate the belief in those concepts using linguistic and statistical
evidence. We are constantly experimenting with and refining these operators (for example,
the operator defining a phrase-based concept) in order to improve retrieval performance.
The relevance feedback process uses information from user evaluations of retrieved doc-
uments to modify the original query in detection or routing environments. The INQUERY
system, because it can represent structured queries, supports a wide range of learning tech-
niques for query modification [5]. In general, new words and phrases are identified in the
sample of relevant documents. These are added to the original query and all the terms
in the query are then reweighted. With the amount of relevance information available in
TIPSTER, relatively simple automatic techniques appear to produce good levels of effec-
tiveness. We are also investigating the effect of using more limited information and more
complex learning techniques, such as neural networks.
77