ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Consideration for Time Shared Automatic Documentation Centers chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. X-l~ document sets created by the partitioning. Documents might be included in more than one cluster. This would however, be slightly wasteful of storage space. To derive tIming estimates, let us assume disjoint document clusters, - so that we have approximately n key vectors representing n clusters of 250,000/n documents each. We should expect that about two or three clusters are searched per request, but for safety's sake, let us assume that five clusters are searched per request. If we assume that five correlations can be performed per millisecond, the total time required for internal operations is n/5 + 250,000/n msec or about 250/n seconds. The time required for external operations is the data cell access time (-2[OCRerr] second) and read time for each cluster. Since each cluster contains 250,000/n documents and each document consists of 2000 bit8, processed at 7x105 bits per second, *each cluster will require about 750/n seconds to read in. If five clusters are to be read, and one expects an average of two in each data cell, the total read time would be 2(0.5+750.n). Reasonable values for n would thus be n = 500 or n = 1000 which would allow the complete search to be performed in 3 to 5 seconds. Considering the small amount of additional work (sorting the correlations and applying the cutoff and other restrictions), it is clear that 10 seconds should suffice for the complete process, and that five seconds would be a more likely bound. This assumes that no competition exists from other programs in memory for the data cell sections needed by SM[OCRerr]T. Since the data cell sections