IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Summary
summary
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
a "super-thesaurus" is generated for the whole collection by merging the
individual term groupings obtained from the subcollections. Retrieval
experiments show that such a fully-automatic super-thesaurus produces
better retrieval results than manually constructed thesauruses.
Statistical word-word association procedures are examined and
evaluated in section IX by M. E. Lesk. Associative procedures produce
groups of terms (or documents) based on the co-occurrence of the terms
within the documents of a collection, or within the sentences of a
document. The effect is then similar to that of a thesaurus, except that
the construction method is automatic. The data included in section IX
show that the associative method furnishes results which are essentially
independent from those obtained by a normal thesaurus procedure. The
associative term groups are unrelated to the thesaurus groups, and
there appears to be no basis for the conjecture that second order term
associations are equivalent to synonym groupings.
Like the synonym groupings of a thesaurus, word-word associations
do occasionally improve the recall of a retrieval system; they also
improve the precision by promoting certain relevant documents to higher
rank positions.
A detailed analysis of the search requests used with the ADI
documentation collection is contained in section X by E. M. Keen. Various
characteristics of the search requests are examined, including criteria
for identifying unclear request statements, requests expressing multiple
needs, requests with identifiable important words, requests with restrictive
negative statements, and so on. Using these characterizations, certain
xvi