ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Synopsis
synopsis
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
the IBM 7094 producing several classifications of a collection of 405
document index images (incorporated into the SMART system). Some
experimental results illustrating the use of the classifications for
improving search efficiency are presented. Document classification
of the type described is novel, in that it is proposed as a direct
adjunct to the query-document matching operation. Normally,
automatic classification algorithms have been considered as replace-
ments for manual classification or indexing.
The statistical basis for the evaluation of document
retrieval systems is discussed in [OCRerr]hapter 5. Several of the topics
considered ar& based on previous work which is cited by bibliographic
reference. The organlzation and presentation as well as some of the.
conclusions drawn' are original. In addition some novel performance
statistics are derived which are particularly applicable to query-
document matching operations possessing a high degree .of'discrimina-
tion such as the' correlation measure of the assumed model. Each Qf
the statistic's derived is capable of describing overall system
performance with a single `parameter in contrast with several of the
evaluation measures in curre'nt'use.
xvii