ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-60
experience on a batch-processing 7Q9l[OCRerr] I. Since SMART is an experimental
system, no great effort was spent on optimizing the object code for speed,
and considerable improvements could be made in many of the programs.
Starting: Mounting two tapes, signing on, reading the specifications,
etc. requires about two minutes. This represents mostly tape mounting time.
Lookup: To look up £ words in a dictionary of a stems takes
roughly pq.l0[OCRerr]7 minutes. Statistical phrase searching of words in a list
of q phrases takes about [OCRerr]q[OCRerr]l0[OCRerr]5 minutes. Syntactic timlng is exceedingly
irregular with the old syntactic programs, and is effectively so slow that
nothing useful can be accomplished in a reasonable amount of time. It is
hoped that some syntactic analysis runs can be performed with a new revised
analyzer to be distributed shortly.
Correlations: To correlate [OCRerr] requests against q documents and then
sort the correlations, and evaluate, on the order of [OCRerr]q[OCRerr]l0[OCRerr]3 minutes are
used up. Present experimental data exist only for the range of between 10
and 50, and a between 50 and [OCRerr]00; these estimates should not be trusted far
outside this range.
Concept-concept correlations: If [OCRerr] concepts are involved (i.e. p = CONMAX -
1 2 -[OCRerr]
CONMIN) the first correlation takes about [OCRerr]p 10 minutes. F'irther iterations
should be fast, assuming reasonable cutoffs.
Hierarchy: Most of the time is spent in tape shuffling, requiring
about five minutes for collections of about 50,000 words.
9. Acknowledgnents
The programs described here were written by Mark Cane, Tom Evslin,
Guy Hochgesang, Alan Lemmon, Michael Razar, George Shapiro, ana the author.