SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) TIPSTER Panel -- HNC's MatchPlus System chapter S. Gallant R. Hecht-Nielson W. Caid K. Qing J. Carleton D. Sudbeck National Institute of Standards and Technology Donna K. Harman where [OCRerr] is some suitable positive number (eg 3). (See also [6].) Note that search with [OCRerr] takes the same amount of time as search with [OCRerr] 2.1 Context Vector Representations Context vector representations (or feature space rep- resentations) have a long history in cognitive science. Work by Waltz & Pollack [7] had an especially strong influence on the work reported here. They described a neural network model for word sense disambiguation and developed context vector representations (which they termed micro-feature representations). See Gal- lant [2] for more background on context vector repre- sentations and word sense disambiguation. We use context vector representations for docu- ment retrieval, with most of the representation being learned from an unlabeled corpus. A main constraint for all of this work is to keep computation and storage reasonable, even for very large corpora. For defining context vectors, we begin by specifying a set of n features that are useful for differentiating among terms and contexts. These may be chosen in an ad hoc manner using `common sense", or they may consist of the high frequency terms in a given corpus after removal of stopwords (a, the, and, ..). Figure 1 gives some typical examples that might be suitable for general news stories. Experiments sug- gest that the precise selection of features is not criti- cal to system performance. human machine art science play walk lie-down motion speak research fun sad exciting friend family baby country cold hard soft sbarp light big small red white blue yellow animal insect plant tree flower fruit fragrant stink past future high low wood paper metal building house work early late day afternoon morning sunny cloudy hot told humid smart dumb truck write type cook eat Figure 1: Some typical features. some features that apply to star to the top of the list. politics entertainment yell boring hot heavy black mammal bush present plastic factory night rain bright bike spicy For convenience have been moved For any word stem k we now define its context vec- tor, Vk, to be an n-dimensional vector, and interpret each component of Vk as follows: * {strongly} positive if word k is {strongly} associated with feature j * 0 if word k is not associated with feature j 108 * {strongly} negative if word k {strongly} contradicts feature j. As an example, vastronomer might be <+2 +1 +1 -1 -1 o +2 0 0 0 o 0 +1 +1 +1 +2 +1 -1 +1 -1 using the features in figure 1. Note that the inter- pretation of components of context vectors is exactly the same as the interpretation of weights in neural networks. In addition to the "word features" there are other "learned features" that primarily serve to increase the dimensionality of context vectors. Features are deliberately chosen so that they over- lap. This makes context vectors less dependent upon any individual feature. Context vector representa- tions give built-in sensitivity to similarity of meaning between terms. For example it is likely that the con- text vector for `car' would be very close to the context vector for `auto', somewhat close to the context vector for `driving', and less close to the context vector for `hippopotamus' for any "reasonable" set of features and for any "reasonable" person enterin9 the context vectors. For more on this point see the plausibility argument in Waltz & Pollack [7]. Note that overlap- ping features provide a distributed representation [3], and therefore help insulate against a small number of questionable entries for context vectors. 2.2 Bootstrap Learning Bootstrapping is a machine learning technique that begins with hand-entered context vectors for a small set of core stems, and then uses an unlabeled train- ing corpus to create context vectors for all remaining stems. The basic idea is to define the context vector for a new stem by making it similar to the context vectors of its neighbor stems. Note that bootstrapping takes into account local word positioning when assigning the context vector representation for stems. Moreover it is nearly invariant with respect to document divisions within the training corpus. This contrasts with those methods where stem representations are determined solely by those documents in which the stem lies. We have also developed a fully-automated method for bootstrapping that requires no initial hand entry. This capability is very useful for specialized domains such as the tests on traditional IR corpora presented in the next section. We are currently researching