SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System
chapter
W. Croft
J. Callan
J. Broglio
National Institute of Standards and Technology
D. K. Harman
3 Query Processing
In order to clarify the query processing done for the TREC and TIPSTER experiments
with INQUERY, the following sections give more detailed descriptions.
There are two main kinds of query styles: a natural language query and a keyword or
key concept query. For example, the <desc> and <narr> fields of a TIPSTER query
represent natural language queries of varying levels of abstraction. The <con>, <title>
and <f ac> fields represent key concepts in the query. The main difference between the
two types of processing is that the key concept query has more controlled information.
The phrasing and emphasis are already given and do not have to be conjectured from the
language structure. It is valuable to discover how to treat both styles of query, because a
good user interface will make it easy for a user to input both styles. For example, a user
may enter a prose query and then highlight the important words and phrases in the query
in some convenient manner. These highlighted words would then be treated as key concepts
in the query processing.
3.1 Prose query processing
Natural language query fields are tagged for syntactic category by a part-of-speech (POS)
tagger. Currently we use the tagger developed by Ken Church. We have developed our
own P05 tagger, and we expect to begin using it in the fall of 1993. There are some pre-
tagging and post-tagging "housekeeping" operations, such as removing parentheses. (The
current version of INQUERY does not permit parentheses except as part of an operator,
and we do not yet make any inferences from the presence of parentheses during the text
processing.) Additionally, we change operator phrases to single words in order to simplify
later processing. An example of this simplification is replacing the phrase in order to with
the infinitive particle to or replacing with respect to with the word regarding. The goal of
this replacement is to remove phrases which resemble noun phrases syntactically but which
are really syntactic operators (e.g., phrasal prepositions) with no substantive content. At
this stage, stop phrases are also removed.
3.1.1 Noun and adjective phrase capture: orthographic and syntactic clues.
When the text is tagged and the potentially irrelevant material has been removed, syntactically-
based noun group capture is performed. Certain kinds of noun phrase patterns are enfolded
in a #PHRASE operator:
1. A noun phrase which contains more than one modifying adjective and noun is enclosed
in a #PHRASE operator;
2. A head noun with no premodifiers and followed by a prepositional phrase is enclosed
in a #PHRASE operator with the head noun of the prepositional phrase;
78