ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-5
to instruct the operator to mount necessary tapes, and to transfer
control to a small program called CLCH[OCRerr]i which calls in the SMART
programs from the program tape.
b) The instructions for the run are read from the monitor input tape
and decoded.
c) The [OCRerr]lish language input text is read and analyzed. This includes
a lookup of the text in a thesaurus, and an optional lookup in
a statistical phrase dictionary and a syntactic phrase dictionary.
Printouts of the original text and the words missing from the
dictionaries may also be performed.
d) Partial concept vectors are formed for these documents. These
partial vectors contain all concepts derived from individual
words and phrases within each document. The concepts appear in
their original uncombined form.
e) Partial concept vectors from documents lo6ked up in this run,
and concept vectors from documents looked up in preceding runs
are collated, and the weighting schemes specified in the instruc-
tions are applied to produce concept vectors in the proper form
for correlation. If desired, the detailed results of the weighting
may be printed.
f) The concept vectors may now be expanded by means of statistical
concept-concept associations, or through a hierarchy associated
with the thesaurus. Various options may control the type of
expansion, and may specify the documents to be expanded.
g) The request vectors are compared with the document vectors using
the correlation procedure chosen by the user. All documents in
the collection are ranked with respect to each request.
h) The highest ranking documents are printed out for each request.
This is the basic output of the simulated retrieval system.
i) The document rank list and the highest ranking documents are
printed and compared with the list of documents previously desig-
nated as relevant to the request. [OCRerr]asures are computed which