ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
The SMART System -- Retrieval Results and Future Plans
chapter
G. Salton
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
`4
should syntactical relations be[OCRerr][OCRerr]en subject identifiers be preserved;
does the user have an important role to fulfill in controlling the search
procedure.
These and many other questions are answered by the following rules
derived from the evaluation results, and described in greater detail in the
remainder of this report. [l,[OCRerr],6] In each case the evaluation is made in
terms of two measures, known as recall and precision, which reflect,
respectively, the ability of the system to retrieve [OCRerr][OCRerr]ted material, and
its ability to reject nonwanted items:
1) The use of document titles alone for purposes of information
analysis results in poor retrieval performance compared with the
use of abstracts or full text.
2) The use of information identifiers which are weighted in accordance
with their presumed importance leads to large-scale improvements
in retrieval effectiveness, compared with the use of unweighted
terms.
3)
Dictionaries providing synonym recognition are of considerable
help in improving retrieval performance, particularly when they
reflect the properties of the vocabulary under consideration.
[OCRerr]) Absolute accuracy in the analysis of every single item is not so
important as the accumulation of a maxiam number of correctly
analyzed items. If a choice exists between a method which can
produce one guaranteed correct content indication (syntactic
analysis), and another which produces five indicators of [OCRerr]hich
four are probably correct (statistical phrase process), the second
is generally to be preferred.
5) Simple phrase generation methods lead to a definite improvement
in recall at the expense of some initial loss in precision in
the low recall region.