MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Other Potentially Related Research chapter Mary Elizabeth Stevens National Bureau of Standards 6.4.1 Probabilistic Indexing - Maron, Kuhns, and Ray The work in the area of "probabilistic indexing" involves, as in the case of Stiles1 statistical association factors, an assumption that there should be machine means avail- able for the automatic elaboration of search requests in order that relevant documents not indexed by the precise terms of these requests may be retrieved. Given that measures of "closenesses" and "distances11 between similar documents can be obtained, probabilistic weighting factors between index terms assigned to documents may be made explicit. More generally, however, the notion of probabilistic indexing is based upon the assign- ment of weights that provide a numerical evaluation of the probable relevance of index terms to a particular document, and of the relative importance of the various terms used in a search request. Maron and Kuhns (1963 [397]) thus consider the following variables important in the formulation and following out of search strategies: 1. Input- both the terms of the request and the weights assigned to them. 2. A probabilistic matrix giving dissimilarity measures between documents, significance measures for index terms, and closeness measures between index terms. 3. A priori probability distribution data. 4. Output- a class of retrieved documents ranked in order of their "computed relevance numbers" and an indication of the number of documents involved in the class. 5. Search parameter controls, such as the number of documents desired. 6. Search prescription renegotiation involving amplification of the request by adding terms "close" to the ones in the original request and the selection of additional documents following distance criteria for the collectio 1/ Experiments have been reported for 40 requests run against 110 articles taken from Science News Letter. Without search renegotiation, the "answer" document was retrieved in only 27 of the 40 tests. Three alternative methods of request elaboration were then tried. First, additional terms most strongly implied, statistically, by the terms in the request were used. Secondly, those terms were added which most strongly imply, again in a statistical sense, each of the given request terms. Thirdly, co- efficients of association between index terms were used. Results are reported as follows. "(1) Using the method of request elaboration via forward conditional probabilities between index tags, we retrieved the correct answer document in 32 cases out of the 40. (2) Elaborating the requests via the inverse conditional probability heuristic, we retrieved the correct document in 33 of_the 40 cases. (3) Using the coefficient of associationto obtain the elaborated request we obtained success in 33 cases of the 40. 1/ MaronandKuhns, 1960[397], pp. 230-231. 133