ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Design Criteria for Automatic Information Systems
chapter
M. E. Lesk
G. Salton
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
V-30
F) Manual Indexing
The Cranfield collections were available for purposes of experimenta-
tion both in the form of abstracts and in the form of manually assigned
index terms. The indexing performed by subject experts is extremely
detailed,consisting for some documents of over fifty index terms. As
such, the indexing performance may be expected to be superior to the
subject indexing normally used for large document collections. Neverthe-
less the output of Fig. 14(a) shows that the retrieval results obtained
by matching the index terms (11index nullt) is only slightly superior to
the standard word stem matching procedure, using the words extracted from
the document abstracts.
When the manual indexing procedure is compared with the word stem
association process, it is seen in Fig. 14(b) that the word stem match
with the associated terms is superior to the index term method. The same
is true when manual indexing is compared with the regular thesaurus process.
The output produced with the Cranfield collection then leads to the
following rule:
Rule 8 : Keyword matching systems based on manually assigned
index terms are found (at least for one well-known
document collection) to be not substantially superior
to raw word matching techniques, and to be actually
inferior to statistical word association and to thesaurus
methods.
This rule is in complete contradiction to what one hears repeated
over and over again by documentation and library science specialists.
Moreover, as the collection sizes increase, the manual indexing procedure