IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-2
in one place initially, and then seeking other likely places through which
to extend the search if necessary. The same procedure is followed in using
a KWIC (Keyword-In-Context) index, which although mechanically produced is
usually manually searched. An example of a search request and some of the
strategies used to perform a search through various types of indexes is given
in Figure 1. The main characteristic of these manual systems is that the
indexes are designed to be entered by the searcher in one place at a time
only. Thus the subject headings and classification numbers used must repre-
sent quite complex ideas in a single entry to cope with modern knowledge.
Another type of manually searched index that has gained widespread
acceptance is the type that allows entry into several parts of the file simul-
taneously, and is designed to identify documents that are found in all of the
places entered. These systems are known as co-ordinate systems, or better
post-co-ordinate, since the documents retrieved are those which match the
search terms of the request only if the terms are present in the documents
in the required combinations. The processing of search requests in such
systems requires not only a decision as to which vocabulary terms shall be
used in the search, but also a statement of logical combinations of the
terms, in terms of logical products (AND), logical sums (OR), and logical
differences (NOT). An example of such a search formulation is given in
Figure 2; although this example illustrates a mechanized system to be des-
cribed, a similar search formulation could be used in a manually searched
system.
In these manual systems described, each entry into the index produces
a set of documents that match the search formulation, usually called the
retrieval set; the remainder of the collection is considered to be not
retrieved. User satisfaction is related both to the finding of relevant