SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) TIPSTER Panel -- DR LINK's Linguistic-Conceptual Approach to Document Detection chapter E. Liddy S. Myaeng National Institute of Standards and Technology Donna K. Harman [agree] - (A) -> [country: *1 Venezuela] (A) -> (creditor_bank: *2] (AT) -> [restructure] - (A) -> [country: *1 Venezuela] (A) -> [creditor_bank: *2] (P) -> [debt] - (ME) -> [money] (CH) -> [foreign]. where *1 and *2 indicate that the nodes with the same number represent the same concept. While this process of extracting relations and constructing CGs is applied both to documents and topic statements, the latter require additional processing to capture unique features of information needs often found in the topic statement. Accordingly, CGs generated from topic statements have such additional features as importance weights on concept and relation nodes and ways of indicating whether an instantiation of a concept must exist in relevant documents. This specialized processing, in comparison with document processing, is accomplished by treating topic statements as a sub-language and building a model for them. For example, some information on weights is revealed by phrases like [OCRerr] optionaIN and "... must exist ..." whereas the need for an instantiation of a concept id indicated by phrases like "Identification of the company must be included". While CG theory provides a framework in which IR entities can be represented adequately, much of the representation task involves intellectual analysis of topic statements and documents so that we capture and store concepts and relations that are ontologically adequate for IR. For example, it is essential to choose, organize and classify a restricted set of relations in such a way that they facilitate matching and inferencing with two CGs representing a document and a topic statement. The efficacy of the relations we have chosen will be determined with full experiments and failure analyses. a[OCRerr] Conceptual Graoh Matcher The main function of the CG matching component is to determine the degree to which two OGs share a common structure and score each document with respect to the topic statement. This is accomplished by empbying techniques necessary to model plausible inferences with CGs (Myaeng & Khoo, 1992). In order to allow for approximate matching between concept nodes or relation nodes, we have developed a matrix that represents similarities among relations being used in OG representation, as well as some concepts. Our goal is to enhance both precision and recall. By exploiting the structure of the CGs and ihe nature of the relations, we attempt to meet the specific information needs in topic statements. By allowing for partial matching (e.g. between `debt' and `bank debt') and inexact matching (e.g. between `debt' and `loan' and between `CO-AGENT' and `AGENT') at the node level, we can increase recall. For CG matching, we first developed and implemented a base algorithm that is flexible enough to allow for various types of partial matching between two CGs and ran experiments to test its practicality (Myaeng and Lopez-lopez, 1992). While the general subgraph isomorphism problem is known to be computationally intractable, matching CGs containing conceptual information (i.e. labels on nodes) appears to be practical. With improved understanding of the 124