SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
DR-LINK: A System Update for TREC-2
chapter
E. Liddy
S. Myaeng
National Institute of Standards and Technology
D. K. Harman
2. H. Topic Statement Processin[OCRerr] for Conceptual Graph Generation
The processing of topic statements for CG generation does not make use of the output of the Natural Language
Query Constructor, but instead the current system first applies the same RCD and CG generator modules to produce
topic statement Cr5) CGs. Several TS-specific processing requirements have been identified, some of which have
been implemented as post-processing routines and others are under development.
- Elimination of concept and relation nodes corresponding to contenfless meta-phrases (e.g. `[OCRerr]Relevant
document must identify ..."). If both of the concept nodes in a concept-relation-concept triple belong
to a meta-phrase, the CRC is ignored. When only one of them is a meta-phrase concept, the triple is
not removed blindly uniess the other concept occursin another triple.
- Handling of negated parts of topic statements. The weights are adjusted in such a way that an occurrence of the
negated concept in a document will contribute to the negative evidence that the document will be relevant. In
effect, the two weights for the concept are switched.
Automatic assignment of weights to concept and relation nodes. There are several factors we consider: the
conventional way of determimng the importance of terms using inverse document frequency (DF) and total
frequency; the location of terms occurring in topic statements; the part of speech information for each term; and
indications in the topic statement sublanguage (e.g. the document MUST contain...). Although we have
implemented a program that tags individual words with the degree of importance based on the sublanguage
patterns, we assigned concept weights based on IDF values of terms in the collection for the evaluation, due to
time constraints.
Merging common concept appearing in different sections of topic statements. Although it is not safe
in general to assume that two concepts sharing the same concept name actually refer to the same concept
instantiation and merge them blindly, we have observed that this is not the case in the topic statements. In fact,
we believe that it is desirable to merge CG fragments using common concept nodes. This is an important process
that eliminates undesirable effects on scoring. Without this, a document contaimng a concept occurring repeatedly
in <desc>, <narr>, and <con> fields would be ranked unnecessarily high (or low if it is negated) because each
ocerrence of the concept would make an independent contribution to the overall score.
Since an integrated automatic topic processing module was not available, the mechanical aspects of the process were
hand-simulated with some parts done automatically and other done manually.
2.1. Relation Concent Detector [OCRerr]CD)
The output of the Complex Nominal Phraser and the Proper Noun Interpreter modules described above provide
concept-relation-concept triples directly to the Relation-Concept Detector [OCRerr]CD) module. In addition, the following
RCD handlers are operative.
One of the more distinct aspects of the DR-LINK system is its capability of extracting and using relations in the
fmal representation of documents and topic statements in their CG representations. This module provides bullding
blocks for the CG representation by generating concept-relation-concept triples based on the domain-independent
knowledge bases we have been constructing with machine-readable resources and corpus statistics. In this module,
there are several handlers that are activated selectively depending on the input sentence.
2. L 1. Case Frame (CF) Handler
The main function of the CF Handler is to generate concept-relation-concept triples where one of the concepts comes
typically from a verb. It identifies a verb in a sentence and counects it to other constituents surrounding the verb.
Since the relations (about 50 we use currently) included inour representation are originated froin the theories of
linguistic case roles (Somers, 1987, and Cook, 1989) and are all semantic in nature, this module consults the
93