SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
DR-LINK: A System Update for TREC-2
chapter
E. Liddy
S. Myaeng
National Institute of Standards and Technology
D. K. Harman
assigned manually, using an ontology of forty-three relations. Some example complex nominals plus relation are:
[press] <- (SOURCE) <- [commentaries]
[OCRerr]owth] -> (MEASURE) -> [rate]
[electronic] <- (MEANS) <- [theft]
[campaign] <- (USEDYOR) <- [fmances]
The development of the complex nominal CRC knowledge base was an intellectual effort for the twenty four month
testing, but our cu[OCRerr]rent task is the flill automation of the semantic relation assigninent. Although difficult, our
experience with the intellectual process has encouraged us to pursue appropriate NU[OCRerr]-based macinne-learning
techniques which will enable the system to automatically recognize and code semantic relations in complex
nominals.'
in CG matching, the existence of both case-frame relations and complex nominal relations make it possible for the
system to detect conceptual similarity even if expressed in different grammatical structures, such as a verb +
arguments in a Topic Statement and a complex nominal in a document, e.g.:
"reduce the debt" = [reduc*] [OCRerr]> (OBJECI) -> [debt]
"debt reduction" = [debt] <- (OBJECI) <[OCRerr] [reduc*]
To achieve the flillest exploitation of relational information despite grammatical realization, a further step was
necessary in order to match on CRCs produced by verb-based analysis and CRCs produced by complex nominal
analysis. This required the determination of the degree of relation-similarity across the two relation sets. There are
approximately sixty relations used in case frames, while there are approximately forty relations used in complex
nominals. A relation-similarity table was constructed that assigns a degree of similarity between twenty[OCRerr]ight pairs
across the two grammatically-distinguished sets, and a degree of simllarity between pairs within the same set. The
relation-similarity table is used in the fmal CG matching to allow concepts that are linked by a relation in a
document that is different from the relation that links the same two concepts in the Topic Statement, to still be
awarded some degree of similarity. The quality and appropriateness of the similarity table will be determined by the
results of the twenty-four month testing which will also provide empirical evidence of the Complex Nominal
Phraser's impact on pefformance. Sample runs have indicated that the inclusion of complex nominals has a strongly
positive impact on our results in both of its incorporations in the system.
2. F. Natural Language Ouery Constructor
We have implemented a Natural Language Query Constructor (QC) for DR-LINK which takes as input a Topic
Statement which has been pre-processed by straight-forward techniques, such as part-of-speech tagging as well as
SGML-tagging of the meta-language which reflects the typical request-presentation language used in Topic
Statements (e.g. "A relevant document will ..." or "To be relevant...") The QC produces a query which reflects the
appropriate logical combinations of the text structure, proper noun, and complex nominal requirements of a Topic
Statement. The basis of the QC is a sublanguage grammar which is a generalization over the regularities exhibited in
the Topic, Description, and Narrative fields of the one hundred fifty `flPSTER Topic Statements. It should be noted
that the sublanguage grammar, with minor modifications, is capable of handling non-TJPSTER queries, soits
generalized utility is promising. Earlier work [OCRerr]iddy et al, 1991) demonstrated that the sublanguage approach is an
effective and efficient approach to natural language processing tasks within a particular text-type, here Topic
Statements.
For the twenty-four month runs, the QC sublanguage grammar detects the required logical combination of text
structure components, proper nouns, and complex nomis. These are the specific entities which we consider to be
particularly revealing indicators of relevant documents. in most cases, matching on these classes produces high-
precision ranked results, although there are some instances in which single common nouns may also be needed. Mter
analyzing the twenty-four month results, we will determine whether to expand the range of linguistic types which
can be used to instantiate the variables in the QC's logical assertions.
90