ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Consideration for Time Shared Automatic Documentation Centers chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. X-13 requests simultaneously. Once the dictionary lookup is completed, the computer must quickly notify the user of any words used which were not included in the dictionary. Such words should also be saved for investigation by the system programmers. Users should not be permitted to enter words in the dictionaries themselves, since a large dictionary consists of a very complicated structure and changes made in one part are likely to affect other parts in ways unforeseen. by the casual user. The computer must now compare the request against the document collection. The collection has also been looked up previously, and the documents in coded form are presumably stored on (say) the data cell. Fach concept detected will require perhaps 15 bits for the concept number and 10 bits for the weight, or a total of 25 bits. A full document representation will consi[OCRerr]t of perhaps 50 of these, plus identification and other data, so that 2000 bits per document should be adequate for 8 storage purposes. For 250,000 documents this amounts to 5x10 bits 1 or approximately l[OCRerr] data cell sections. A full data cell consists of 9 9 about 3xl0 bits; a full disk about l.5x10 bits. Thus the storage problems are well within the range of practicality. Clearly, the whole collection cannot be read by the system for 8. each request if real-time answers are to be provided. For 5x10 bits, 6 a reading rate of 10 bps would require over eight minutes. The basic search plan would then have to involve a partitioning of the document collection, a comparison with representative V?[OCRerr][OCRerr]y?? vectors to decide which partitions to search, and then a thorough search of the selected sections. These `[OCRerr]key" vectors might be the centroid vectors of the