MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
6.4.1 Probabilistic Indexing - Maron, Kuhns, and Ray
The work in the area of "probabilistic indexing" involves, as in the case of Stiles1
statistical association factors, an assumption that there should be machine means avail-
able for the automatic elaboration of search requests in order that relevant documents not
indexed by the precise terms of these requests may be retrieved. Given that measures of
"closenesses" and "distances11 between similar documents can be obtained, probabilistic
weighting factors between index terms assigned to documents may be made explicit.
More generally, however, the notion of probabilistic indexing is based upon the assign-
ment of weights that provide a numerical evaluation of the probable relevance of index
terms to a particular document, and of the relative importance of the various terms
used in a search request. Maron and Kuhns (1963 [397]) thus consider the following
variables important in the formulation and following out of search strategies:
1. Input- both the terms of the request and the weights assigned to them.
2. A probabilistic matrix giving dissimilarity measures between documents,
significance measures for index terms, and closeness measures between
index terms.
3. A priori probability distribution data.
4. Output- a class of retrieved documents ranked in order of their "computed
relevance numbers" and an indication of the number of documents involved
in the class.
5. Search parameter controls, such as the number of documents desired.
6. Search prescription renegotiation involving amplification of the request by
adding terms "close" to the ones in the original request and the selection
of additional documents following distance criteria for the collectio 1/
Experiments have been reported for 40 requests run against 110 articles taken from
Science News Letter. Without search renegotiation, the "answer" document was
retrieved in only 27 of the 40 tests. Three alternative methods of request elaboration
were then tried. First, additional terms most strongly implied, statistically, by the
terms in the request were used. Secondly, those terms were added which most strongly
imply, again in a statistical sense, each of the given request terms. Thirdly, co-
efficients of association between index terms were used. Results are reported as follows.
"(1) Using the method of request elaboration via forward conditional
probabilities between index tags, we retrieved the correct answer
document in 32 cases out of the 40.
(2) Elaborating the requests via the inverse conditional probability heuristic,
we retrieved the correct document in 33 of_the 40 cases.
(3) Using the coefficient of associationto obtain the elaborated request we
obtained success in 33 cases of the 40.
1/
MaronandKuhns, 1960[397], pp. 230-231.
133