IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Analysis of the Documentation Requests
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
x-lo
Definitions of good and bad performance are arbitrary, but it is thought
that good performance requires the rank position of a relevant document to
be at least 15, and anything positioned lower than this is a poor result.
Any requests which fall into groups a) and c) were thought to be particularly
useful for analysis; in practice, however, all 38 requests fall into group b).
Requests [OCRerr] and Bl4 perform well on nearly all options, but occasionally one
of the relevant documents falls below rank position 10. There occurs a sur-
prisingly large amount of change in the ranks of the relevant when options
are tested; Figure 2 gives an example for one request and two relevant docu-
ments. In this request, all the options that are found on average to be the
poorest, such as titles only, the use of cosine logical, and the "Hastie"
Thesaurus give the best results.
Since the division into groups by performance achieved does not
assist in the analysis, another method of analysis is suggested: this in
to look for strong correlation between measurable request characteristics and
the use of particular performance options. A s[OCRerr]ary of possible request
characteristics is given in Figure 3, some of which have been described pre-
viously; these can now be used to look for direct correlation between charac-
teristics and performance, as attempted in sections SB, SC, and SD.
B) Variation in Generality, Length and Concept Frequency
Request generality refers to the nun[OCRerr]er of documents in the collec-
tion that are relevant; using this principle, the request set may be divided
into specific and general requests With the 35 requests divided into sets of
17 and 18, request generality data is given in Figure 4 together with evaluation
results of normalized recall and precision, couparing the stem and thesaurus
dictionaries. As has been observed previously [2J, the ppecific requests give