SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
TIPSTER Panel -- HNC's MatchPlus System
chapter
S. Gallant
R. Hecht-Nielson
W. Caid
K. Qing
J. Carleton
D. Sudbeck
National Institute of Standards and Technology
Donna K. Harman
where [OCRerr] is some suitable positive number (eg 3).
(See also [6].) Note that search with [OCRerr] takes
the same amount of time as search with [OCRerr]
2.1 Context Vector Representations
Context vector representations (or feature space rep-
resentations) have a long history in cognitive science.
Work by Waltz & Pollack [7] had an especially strong
influence on the work reported here. They described a
neural network model for word sense disambiguation
and developed context vector representations (which
they termed micro-feature representations). See Gal-
lant [2] for more background on context vector repre-
sentations and word sense disambiguation.
We use context vector representations for docu-
ment retrieval, with most of the representation being
learned from an unlabeled corpus. A main constraint
for all of this work is to keep computation and storage
reasonable, even for very large corpora.
For defining context vectors, we begin by specifying
a set of n features that are useful for differentiating
among terms and contexts. These may be chosen in
an ad hoc manner using `common sense", or they
may consist of the high frequency terms in a given
corpus after removal of stopwords (a, the, and, ..).
Figure 1 gives some typical examples that might be
suitable for general news stories. Experiments sug-
gest that the precise selection of features is not criti-
cal to system performance.
human machine
art science play
walk lie-down motion speak
research fun sad exciting
friend family baby country
cold hard soft sbarp
light big small red
white blue yellow animal
insect plant tree flower
fruit fragrant stink past
future high low wood
paper metal building house
work early late day
afternoon morning sunny cloudy
hot told humid
smart dumb truck
write type cook eat
Figure 1: Some typical features.
some features that apply to star
to the top of the list.
politics
entertainment
yell
boring
hot
heavy
black
mammal
bush
present
plastic
factory
night
rain
bright
bike
spicy
For convenience
have been moved
For any word stem k we now define its context vec-
tor, Vk, to be an n-dimensional vector, and interpret
each component of Vk as follows:
* {strongly} positive if word k is {strongly}
associated with feature j
* 0 if word k is not associated with feature j
108
* {strongly} negative if word k {strongly}
contradicts feature j.
As an example, vastronomer might be
<+2 +1 +1 -1 -1
o +2 0 0 0
o 0 +1 +1 +1
+2 +1 -1 +1 -1
using the features in figure 1. Note that the inter-
pretation of components of context vectors is exactly
the same as the interpretation of weights in neural
networks.
In addition to the "word features" there are other
"learned features" that primarily serve to increase the
dimensionality of context vectors.
Features are deliberately chosen so that they over-
lap. This makes context vectors less dependent upon
any individual feature. Context vector representa-
tions give built-in sensitivity to similarity of meaning
between terms. For example it is likely that the con-
text vector for `car' would be very close to the context
vector for `auto', somewhat close to the context vector
for `driving', and less close to the context vector for
`hippopotamus' for any "reasonable" set of features
and for any "reasonable" person enterin9 the context
vectors. For more on this point see the plausibility
argument in Waltz & Pollack [7]. Note that overlap-
ping features provide a distributed representation [3],
and therefore help insulate against a small number of
questionable entries for context vectors.
2.2 Bootstrap Learning
Bootstrapping is a machine learning technique that
begins with hand-entered context vectors for a small
set of core stems, and then uses an unlabeled train-
ing corpus to create context vectors for all remaining
stems.
The basic idea is to define the context vector for a
new stem by making it similar to the context vectors
of its neighbor stems. Note that bootstrapping takes
into account local word positioning when assigning the
context vector representation for stems. Moreover it
is nearly invariant with respect to document divisions
within the training corpus. This contrasts with those
methods where stem representations are determined
solely by those documents in which the stem lies.
We have also developed a fully-automated method
for bootstrapping that requires no initial hand entry.
This capability is very useful for specialized domains
such as the tests on traditional IR corpora presented
in the next section. We are currently researching