Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Test Environment chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 1-13 work is needed. An analysis of every instance of failure for every request in each experiment would be an impossibly large task; a judicious selection must therefore be made. Most of the sections in this report set out first to present the average results for a series of experiments, and then to make a fast-search performance analysis to uncover details and explanations for the search results obtained. Since real user populations and currently growing collections are not available it is correct to describe the experimental procedures used as "Simulated Search Methods as does R. V. Katter in (5]. Katter criticizes such experimental techniques on several grounds: in particular, he says that mechanical type matching is unnecessary and cumbersome. Since the work Feported by Katter does not tackle any problem other than human judgment reliability, his comments do not seem to apply to experimentation that deals with a total system, which are designed to evaluate performance from a user viewpoint. Search procedures used by SMART are not cumbersome, and simulated searches are believed to be necessary in order to provide useful relation- ships to reality. B) Variables Tested At the input stage, the use of natural language by SMART implies that there are not `input variables to be tested, since full text processing of documents has not been attempted in many different subject areas. Different lengths of documents are therefore used, such as titles only, or abstracts. Some tests using variables of this type are covered in Section V. Content analysis procedures in SMART are performed by using a series of dictionaries which differ in construction and effectiveness. The