SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
TIPSTER Panel -- DR LINK's Linguistic-Conceptual Approach to Document Detection
chapter
E. Liddy
S. Myaeng
National Institute of Standards and Technology
Donna K. Harman
support is selected as the correct tag for that sentence.
To convey how the Text Structurer output is used by DR-LINK, Figure 3 presents a Topic
Statement which is highlighted to show its implicit request for future-oriented prediction
information. This need for a specific type of information is recognized by the Topic Statement
Processor which maps this need for future-oriented information to EXPECTATION or MAIN,
FUTURE components in documents. The Text Structure Matcher then searches for documents
with these components, as exemplified in the document in Figure 4, which shows a relevant WSJ
article, as structured by the DR-LINK component, in which the required information occurs in
sentences which have been correctly tagged as EXPECTATION.
ZLdL Relation-Concept Detector
The main function of the RCD module is to extract relations that connect concepts that otherwise
would be treated as isolated and independent. For example, a relation REASON can be extracted
from phrases like `because of, `as a result of, and `due to', which form lexical RRF, in order to
connect the two constituents occurring before and after the phrases. It should be noted that the
RRF we are developing are domain-independent since they are based on fairly universal
linguistic clues rather than a domain model. There are several types of RRF derived from a
variety of linguistic constructs including verb-oriented thematic roles, complex nominals,
proper noun appositions, nominalized verbs, adverbs, prepositional phrases, and some other ad
hoc patterns revealing relations that appear in the literature (Cf. Somers, 1987) and are
expected to be suitable for IR purposes based on our preliminary analysis of the topic
statements. We intend to conduct an extensive study of usefulness of individual relations as part
of our failure analysis.
With different types of RRF stored in a knowledge-base, we go through multiple stages of partial
linguistic analyses, as opposed to a holistic syntactic processing followed by a semantic
interpreter, to extract relations and generate CGs eventually. With the tagged, bracketed, and
structured text as the input, various sub-modules in the ROD selectively detect implicit
relations as well as concepts being connected, by focusing on occurrences of patterns of interest
found in the knowledge base and by bypassing portions of text irrelevant to the relation
extraction tasks. The output of the ROD component is a set of concept-relation-concept triples
where concepts are derived often from content-bearing words and relations from non-content
words or indirectly from the linguistic structure by consulting the knowledge base.
For example, the Proper Noun (PN) apposition category of RRF will help categorize the many
occurrences of PNs in text and determine semantic relations between a PN and the apposition
which either precedes or follows it. For example, in the following sentence fragment, apposition
RRF will recognize and categorize the LOCATION relation and the PRODUCT relation of the PN,
General Development:
`[OCRerr]... General Development, B Miami-based developer Of planned communications,..."
General Development -> (LOCATION) -> Miami
General Development -> (PRODUCT/SERVICE) -> developer of planned communications
As another example of processing in the RCD module, the Case Frame Handler will, given a
sentence fragment which has been processed by the tagger and bracketer:
120