SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Incorporating Semantics Within a Connectionist Model and a Vector Processing Model
chapter
R. Boyd
J. Driscoll
National Institute of Standards and Technology
D. K. Harman
The semantic categories in our example are those shown
in Figure 1. For example, consider the word "depart" which
occurs one time in the query as shown in Figure 9. The
semantic lexicon entry for the word 11depart't using the
categories of Figure 1 is as follows:
depart: NONE NONE NONE NONE NONE AMDR
AMDR TA[OCRerr]
where NONE represents a word sense not included in the 36
semantic categories of Figure 1. If a uniform distribution is
assumed, then AMDR is triggered 1/4 of the time and TA[OCRerr]
is triggered 1/8 of the time. This is shown in Figure 9 as the
probabilities for each semantic category.
A similar category probability determination is done for
each document. Figure 10 is an alphabetized list of all the
unique words in Document #4 of Figure 6. The semantic
categories each word triggers along with probabilities are also
shown.
The text relevance determination procedure is shown in
Figure 11. The procedure uses three input lists:
a. List of words and the kif of each word, as shown in Figure
8.
b. List of words in the query and the semantic categories they
trigger along with the probability of triggering those
categories, as shown in Figure 9.
c. List of words in a document and the semantic categories
they trigger along with the probability of triggering those
categories, as shown in Figure 10.
The procedure operates as follows:
Step 1.
This step determines the common meanings between the
query and the document. Figure 12 corresponds to the output
of Step 1 for Document #4. In Step 1, a new list is created as
follows:
For each word in the query, follow either subsection (a) or
[OCRerr]), whichever applies:
a. For each category the word triggers, find each word in the
document that triggers the category and output three things:
1) The word in the query and its frequency of occurrence.
2) The word in the document and its frequency of
occurrence.
3) The category.
b. If the word does not trigger a category, then look for the
word in the document and if found, output two things and
a [OCRerr]
1) The word in the query and its frequency of occurrence.
2) The word in the document and its frequency of
occurrence.
297
word frequency category probability
hourly 1 `rflM 1.0
leave 1 AMDR 1/7
TA[OCRerr] 1/7
noon 1 AU)M 1/3
[OCRerr]flM 2/3
the 1
station 1 APOS 3/16
AORD 1/8
TA[OCRerr][[OCRerr] 1/16
TCNP 1/8
ThGR 1/16
J[OCRerr]PL 3/16
trains 1 AORD 7/24
AMDR 1/12
AMFR 1/12
TACM 1/24
TCNV 1/12
until 1 THM 1.0
Figure 10. Words in Document #4.
[OCRerr]tep 1 - Refer to Figure 12.
Determine common meaning
between query and the document.
[OCRerr]tep 2- Refer to Figure 13.
Adjust for words in the
query that are not in any
of the documents.
[OCRerr]tep 3 - Refer to Figure 14.
Calculate the weight of a
semantic component in the query
and calculate the weight of a
semantic component in the document.
[OCRerr]tep 4- Refer to Figure 15.
Multiply the weight in the query
by the weight in the document.
[OCRerr]tep 5 - Refer to Figure 15.
Sum all the individual products
of Step 4 into a single value which
is the semantic similarity coefficient.
I
1
Figure 11. Relevance Determination Procedure to Explain
Semantic Similarity.