IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Test Environment chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Characteristics IRE-3 CRAN-1 ADI Subject Area Computer AerQ- Documen- Science dynamics tation ------------------------------------------------------------ Number of documents in collection 780 200 82 ------------------------------------------------------------ Average number of word rfull text 1380 occurrences (all words) abstract 88 16$ 59 per document title 9 14 10 DOCUMENT [OCRerr]indexing - 33 COLLECTIONS----------------------------__ ----------------------------- - - Average number of word (full text - 710 occurrences (non ) abstract 49 91 35 cottm[OCRerr]on words deleted) title 5 11 7 per document [OCRerr]indexing 0 -- Average number of dis- (full text - 369 tinct words per docu- abstract 40 65 25 ment, using suffix `5' title 5 9 6 dictionary indexing - 33 Number of search requests 34 42 35 ------------------------------------------------------------ Average number of word occurrences 20* 17 14 Call words) per request ------------------------------------------------------------ Average number of distinct words 8 SEARCH per request, suffix `5' dictionary 12* 8 REQUESTS------------------------------------------------------------------------- Request Preparation a) Prepared by subject experts in course of their work [OCRerr] \½42) b) Prepared by staff members (17) c) Prepared by non-staff mem- - bers with no knowledge of [OCRerr](l7) \¼35) system and some familiarity with subject * 17 requests prepared by staff members have average length of 24 words, and 14 suffix `5' concepts; 17 requests prepared by non-staff persons have average length of 16 words and 11 suffix `5' concepts. Document Collection and Request Characteristics Fig. 1