NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) DR-LINK: A System Update for TREC-2 chapter E. Liddy S. Myaeng National Institute of Standards and Technology D. K. Harman assigned manually, using an ontology of forty-three relations. Some example complex nominals plus relation are: [press] <- (SOURCE) <- [commentaries] [OCRerr]owth] -> (MEASURE) -> [rate] [electronic] <- (MEANS) <- [theft] [campaign] <- (USEDYOR) <- [fmances] The development of the complex nominal CRC knowledge base was an intellectual effort for the twenty four month testing, but our cu[OCRerr]rent task is the flill automation of the semantic relation assigninent. Although difficult, our experience with the intellectual process has encouraged us to pursue appropriate NU[OCRerr]-based macinne-learning techniques which will enable the system to automatically recognize and code semantic relations in complex nominals.' in CG matching, the existence of both case-frame relations and complex nominal relations make it possible for the system to detect conceptual similarity even if expressed in different grammatical structures, such as a verb + arguments in a Topic Statement and a complex nominal in a document, e.g.: "reduce the debt" = [reduc*] [OCRerr]> (OBJECI) -> [debt] "debt reduction" = [debt] <- (OBJECI) <[OCRerr] [reduc*] To achieve the flillest exploitation of relational information despite grammatical realization, a further step was necessary in order to match on CRCs produced by verb-based analysis and CRCs produced by complex nominal analysis. This required the determination of the degree of relation-similarity across the two relation sets. There are approximately sixty relations used in case frames, while there are approximately forty relations used in complex nominals. A relation-similarity table was constructed that assigns a degree of similarity between twenty[OCRerr]ight pairs across the two grammatically-distinguished sets, and a degree of simllarity between pairs within the same set. The relation-similarity table is used in the fmal CG matching to allow concepts that are linked by a relation in a document that is different from the relation that links the same two concepts in the Topic Statement, to still be awarded some degree of similarity. The quality and appropriateness of the similarity table will be determined by the results of the twenty-four month testing which will also provide empirical evidence of the Complex Nominal Phraser's impact on pefformance. Sample runs have indicated that the inclusion of complex nominals has a strongly positive impact on our results in both of its incorporations in the system. 2. F. Natural Language Ouery Constructor We have implemented a Natural Language Query Constructor (QC) for DR-LINK which takes as input a Topic Statement which has been pre-processed by straight-forward techniques, such as part-of-speech tagging as well as SGML-tagging of the meta-language which reflects the typical request-presentation language used in Topic Statements (e.g. "A relevant document will ..." or "To be relevant...") The QC produces a query which reflects the appropriate logical combinations of the text structure, proper noun, and complex nominal requirements of a Topic Statement. The basis of the QC is a sublanguage grammar which is a generalization over the regularities exhibited in the Topic, Description, and Narrative fields of the one hundred fifty `flPSTER Topic Statements. It should be noted that the sublanguage grammar, with minor modifications, is capable of handling non-TJPSTER queries, soits generalized utility is promising. Earlier work [OCRerr]iddy et al, 1991) demonstrated that the sublanguage approach is an effective and efficient approach to natural language processing tasks within a particular text-type, here Topic Statements. For the twenty-four month runs, the QC sublanguage grammar detects the required logical combination of text structure components, proper nouns, and complex nomis. These are the specific entities which we consider to be particularly revealing indicators of relevant documents. in most cases, matching on these classes produces high- precision ranked results, although there are some instances in which single common nouns may also be needed. Mter analyzing the twenty-four month results, we will determine whether to expand the range of linguistic types which can be used to instantiate the variables in the QC's logical assertions. 90