ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Design Criteria for Automatic Information Systems
chapter
M. E. Lesk
G. Salton
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v2~
are identified. Specifically, a correlation coefficient is computed to
indicate the degree of similarity between each document and each search
request, and documents are then ranked in decreasing order of the
correlation coefficient. C3,[OCRerr],5] A typical search request processed by the
system is shown in Fig. 1. Three analyzed forms of this request, produced
respectively by a word stem identification process (nuil thesaurus), a
synonym dictionary look-up (regular thesaurus), and a phrase identification
method (statistical phrases), are shown in Fig. 2. Finally a typical
output product listing documents in decreasing correlation order with
the request is shown in Fig. 3.
The system may be controlled by the user in that a search request
can be processed first in a standard mode. The user can then analyze
the output obtained and depending on the information returned to the
system as a result of previous search operations, the request can be
reprocessed under altered conditions. The new output can again be examined,
and the search can be interated until the right kind and amount of infor-
mation are obtained.[6,7]
The SMART systems organization m&kes it possible to evaluate the
effectiveness of the various processing methods by comparing the output
obtained from a variety of different runs. This is achieved by processing
the same search requests against the same document collections several
times, while making selected changes in the analysis procedures between
runs. By comparing the performance of the search requests under different
processing conditions, it is then possible to determine the relative
effectiveness of the various analysis methods.
The actual evaluation calculations are based on the standard recall