ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Design Criteria for Automatic Information Systems
chapter
M. E. Lesk
G. Salton
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v-li
3. Ehraluation Results and Design Criteria
In attempting to generate useful criteria for the design of information
systems, a number of obvious questions suggest themselves: first, can
automatic text processing methods be used effectively to replace a manual
content analysis; if so, what part or parts of a document should be incor-
porated in the automatic procedure; is it necessary to provide vocabulary
normalization methods to eliminate ambiguities caused by homographs and
synonymous word groups; should such a normalization be handled by means
of a specially constructed dictionary, or is it possible to replace thesauruses
completely by statistical word association methods; what dictionaries can
mdst effectively be used for vocabulary normalization; is it important to
provide hierarchical arrangements of subject categories as is done in many
library classification systems; what shoula be the role of the user in
formulating and controlling the search procedure. These and many other
questions are considered in the evaluation process described in the
remainder of this section.
A) Indexing Depth and Document Length
In a manual system, where each information item is identified by a
few carefully chosen keywords, the presence or absence of a given keyword
becomes of crucial importance, since failure to provide a certain needed
keyword may mean the difference between a retrievable item and one which
is not. In an automatic text processing system, it is possible to generate
for each item many different information identifiers, as seen in F'ig. 2
for the request of Fig. 1; the importance of each individual identifier is
then nftlch reduced since a small number of poorly chosen terms are often
offset by the much larger number of correct ones.