MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Other Potentially Related Research chapter Mary Elizabeth Stevens National Bureau of Standards 1. For each term in the initial formulation of a search request, the appropriate term-profile is obtained, which gives weighted values for those other terms that had significantly co-occurred with it. 2. The profiles of each term in a multi-term request are compared and those additional terms common to all or a specified number of the profiles are selected and added to the initial set. [OCRerr]lI 3. The "first generation" terms resulting from step 2 are next treated as though they also were request terms, and steps 1 and Z are repeated for them. 4. A selection is made from some reasonable proportion of the profiles associated with the first generation terms to produce the 1'second generation" terms 5. The expanded list of search terms is then compared with the index terms assigned to each document in the collection, and whenever a match is found the weight of the request term is assigned to the matching document term. These weights are then summed to provide a numeric measure of probable document relevance to the original request. 6. Documents responding to the expanded request are printed out in the order of document relevance scores. Some experiments have been made using a computer program which accepts up to 300 weighted terms in an expanded request vocabulary. Representative results have been reported, in part, as follows: ..... We asked a qualified engineer to examine these documents and specify which were related to `Thin Films' and which were not.. . This engineer was not familiar with our project. .. yet... we found a remarkably high correlation between his evaluation and the document relevance numbers... We then checked to see how the documents containing information on `Thin Film' had been indexed. We found that the first five documents on our list had been indexed by both `Thin' and `Film'. Three more documents had been indexed by `[OCRerr]i1rr[OCRerr]' alone, and other related terms. Two documents had not been indexed by either `Thin' or `Film', but only by a group of related terms, yet they contained information on `Thin Films' and had a high document relevance number. By using association factors and a series of statisti- cal steps, easily programmed for a computer, we were thus able to locate 1/ 2/ These are called "first generation terms" and tend to reflect only statistical asso- ciations without including synonyms and near-synonyms which, over the course of time, have occurred in the indexing vocabulary. Stiles, 1961 [OCRerr] 571], p.274: "Among these we find words closely related in meaning to the request terms." An example given in Ref. L572], pp. 200-201, is the derivation of `weathering, ` `fungicidal', `deterioration', and `preservatives' as second generation terms when the initial request included the terms `plastics', `fungus', `coating', and `tests'. 121