SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Incorporating Semantics Within a Connectionist Model and a Vector Processing Model
chapter
R. Boyd
J. Driscoll
National Institute of Standards and Technology
D. K. Harman
Incorporating Semanfics Within a Connectionist Model
and a Vector Processing Model
Richard Boyd, James Driscoll, mien Syu
Department of Computer Science
University of Central Florida
Orlando, Florida 32816
(407)823-2341
FAX: (407)823-5419
e-mail: driscoll@cs.ucf.edu
Abstract
Semantic information obtained from the public domain
1911 version of Roget's Thesaurus is combined with key-
words to measure similarity between natural language topics
and documents. Two approaches are explored. In one
approach, a combination of keyword relevance and semantic
relevance is achieved by using the vector processing model
for calculating similarity, but extending the use of a keyword
weight by using individual weights for each of its meanings.
This approach is based on the database concept of semantic
modeling and the linguistic concept of thematic roles. It is
applicable to both routing and archival retrievaL The second
approach is especially suited for routing. It is based on an Al
connectionist model. In this approach, a probabilistic
inference network is modified using semantic information to
achieve a competitive activation mechanism that can be used
for calculating similarity.
Keywords: vector processing model, semantic data model,
semantic lexicon, inference network, connectionist model.
1 . Introduction
The experiments reported here use a relatively efficient
method to detect the semantic representation of text. Our
original method is based on semantic modeling and is
described in [4,17,19).
Semantic modeling was an object of considerable database
research in the late 1970's and early 1980's. Abriefoverview
can be found in [3]. Essentially, the semantic modeling
approach identified concepts useful in talking informally
about the real world. These concepts included the two notions
of entities (objects in the real world) and relationships among
entities (actions in the real world). Both entities and rela-
tionships have properties.
The properties of entities are often called attributes. There
are basic or surface level attributes for entities in the real
world. Examples of surface level entity attributes are General
Dimensions, Color, and Position. These properties are
prevalent in natural language. For example, consider the
phrase "large, black book on the table" which indicates the
General Dimensions, Color, and Position of the book.
In linguistic research, the basic properties of relationships
are discussed and called thematic roles. Thematic roles are
also referred to in the literature as participant roles, semantic
roles and case roles. Examples of thematic roles are Ben[OCRerr]
ficiary and Time. Thematic roles are prevalent in natural
language; they reveal how sentence phrases and clauses are
semantically related to the verbs in a sentence. For example,
consider the phrase "purchase for Mary on Wednesday"
which indicates who benefited from a purchase(13eneficiary)
and when a purchase occurred (Fime).
A main goal of our research has been to detect thematic
information along with attribute information contained in
natural language queries and documents. In order to use this
additional information, the concept of text relevance needs
to be modified.
In [17,19] the major modifications included the addition
of a lexicon with thematic and attribute information, and a
modified computation of a vector processing similarity
coefficient. That research concerned a Question/Answer
environment where queries were the length of a sentence and
documents were either a sentence or at most a paragraph. At
that time, our lexicon was based on 36 semantic categories,
and in that environment, our semantic approach produced a
significant improvement in retrieval performance.
However, for TREC-1 [4], document and topic length
presented a problem and caused our semantic approach based
on 36 semantic categories to be of little value. However, as
reported in [4], by breaking the TREC documents into
paragraphs, a significant improvement was demonstrated.
This work has been supported in part by NASA KSC Cooperative Agreement NCC 10[OCRerr]3 Project 2, Florida High Technol-
ogy and Industry Council Grants 494011-28-721 and 4940-1 1-2[OCRerr]728.
291