IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Characteristics IRE-3 CRAN-1 ADI
Subject Area Computer AerQ- Documen-
Science dynamics tation
------------------------------------------------------------
Number of documents in collection 780 200 82
------------------------------------------------------------
Average number of word rfull text 1380
occurrences (all words) abstract 88 16$ 59
per document title 9 14 10
DOCUMENT [OCRerr]indexing - 33
COLLECTIONS----------------------------__ ----------------------------- - -
Average number of word (full text - 710
occurrences (non ) abstract 49 91 35
cottm[OCRerr]on words deleted) title 5 11 7
per document [OCRerr]indexing 0
--
Average number of dis- (full text - 369
tinct words per docu- abstract 40 65 25
ment, using suffix `5' title 5 9 6
dictionary indexing - 33
Number of search requests 34 42 35
------------------------------------------------------------
Average number of word occurrences 20* 17 14
Call words) per request
------------------------------------------------------------
Average number of distinct words
8
SEARCH per request, suffix `5' dictionary 12* 8
REQUESTS-------------------------------------------------------------------------
Request Preparation
a) Prepared by subject experts
in course of their work [OCRerr] \½42)
b) Prepared by staff members (17)
c) Prepared by non-staff mem- -
bers with no knowledge of
[OCRerr](l7)
\¼35)
system and some familiarity
with subject
*
17 requests prepared by staff members have average length of 24 words,
and 14 suffix `5' concepts; 17 requests prepared by non-staff persons
have average length of 16 words and 11 suffix `5' concepts.
Document Collection and Request Characteristics
Fig. 1