SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Incorporating Semantics Within a Connectionist Model and a Vector Processing Model
chapter
R. Boyd
J. Driscoll
National Institute of Standards and Technology
D. K. Harman
In Section 2, we describe our original semantic lexicon
and an extension which uses a larger number of semantic
categories. Section 3 presents an application of an Al
connectionist model to the task of routing. Section 4 presents
an approach different than reported in TREC-1 [4], using our
extended semantic lexicon within the vector processing
model. Section 5 summarizes our rasearch effort.
2. The Semantic Lexicon
Our semantic approach uses a thesaurus as a source of
semantic categories (thematic and attribute information). For
example, Roget's Thesaurus contains a hierarchy of word
classes to relate word senses [14]. In TREC-1 [4] and in
earlier research [17,19], we selected several classes from this
hierarchy to be used for semantic categories. We defined
thirty-six semantic categories as shown in Figure 1.
In order to explain the assignment of semantic categories
to a given term using Roget's Thesaurus, consider the brief
index quotation for the term "vapor":
vapor
n. fog 404.2
fume 401
illusion 519.1
spirit 4.3
steam 328.10
thing imagi[OCRerr]ed 535.3
v. be bombastic 601.6
bluster 911.3
boast 910.6
exhale 310.23
talk nonsense 547.5
The eleven different meanings of the term "vapor" are given
in terms of a numerical category. We developed a mapping
of the numerical categories in Roget's Thesaurus to the
thematic role and attribute categories given in Figure 1. In
this example, "fog" and "fume" correspond to the attribute
State; "steam" maps to the attribute Temperature; and "ex-
hale" is a trigger for the attribute Motion with Reference to
Direction. The remaining seven meanings associated with
"vapor" do not trigger any thematic roles or attributes. Since
there are eleven meanings associated with "vapor," we
indicated in the lexicon a probability of 1/11 each time a
category is triggered. Hence, a probability of 2/11 is assigned
to State, 1/11 to Temperature, and 1/11 to Motion with
Reference to Direction. This technique of calculating prob-
abilities is being used as a simple alternative to a corpus
analysis.
It should be pointed out that we are still experimenting
with other ways of calculating probabilities. For example, as
in [8], a probabilistic part-of-speech tagger could be used to
further restrict the different meanings of a term, and existing
lexical sources could be used to obtain an ordering based on
frequency of use for the different meanings of a term.
As reported in [4], the use of 36 semantic categories caused
problems when dealing with TREC documents. When the
size of a document is large, a greater number of the 36
semantic categories are triggered in the document. Also,
when using the semantic approach described in [19] the
probability present for each category in a document is often
very close to one. Consequently, almost every one of the
Thematic Role Categories Attribute Categories
TACM Accomnaniment ACOL Color
TAMT Amount AEID External and Internal Dimensions
ThNF Beneficiarv AFRM Form
TCSE Cause AOND Gender
TCND Condition AODM General Dimensions
TCMP Comnenson ALDM Linear Dimensions
TCNV Conve ance AMFR Motion Conjoined with Foree
ThOR De[OCRerr]e AOMT Motion in General
ThST Destination AMDR Motion with Reference to Direction
ThUR Duration AORD Order
TOOL Ooal APIIP Phvsical Pronerties
TINS Instrument APOS Position
TSPL I:c[OCRerr]tion/Si,ace ASTE State
TMAN Manner A[OCRerr]mrature
TMNS Means AUSE Use
ThUR Purpc[OCRerr]e AVAR Variation
ThNO Ran[OCRerr]
i[OCRerr]FS Result
TSRC Source
TTIM Time
Figure 1. Thirty-Six Semantic Categories.
292