ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv-32 under Tabstract trees', which in turn would come under "graph theory", a branch of algebra. It does not appear reasonable to expect that a hierarchical arrangement of concepts will serve equally well for all uses under all circ'mistances. Rather any hierarchy will serve its function, if it can be counted upon to suggest [OCRerr][OCRerr]ys of broadening or narrowing a given search request or a given interpretation of the subject matter under most of the circumstances likely to arise in practice. Dictionary ?erformance Tn order to obtain an idea of the relative effectiveness of the various dictionaries in a retrieval situation, s[OCRerr]e e[OCRerr]erimental results may be presented, based in each case on averages obtained with 17 search requests used in conjunction with a document collection of some 500 document abstracts in the computer literature. The retrieval performance is measured by two parameters, kno[OCRerr]m respectively as recall and precision. Recall is defined as the prQ[OCRerr]ortion of relevant material actually retrieved and a high recall score therefore im[OCRerr]lies that much of what is useful in a collection has actually been produced during the search operation. Precision, on the other hand, is the proportion of retrieved material which is actually relevant, and a high precision score implies that very little useless material had been obtained as a result of a given search. Clearly both of these parameters are Lmportant, and a perfect search would therefore exhibit both a high recall and a high precision. Recall and precision results can be presented in many different forms. One of the simplest [OCRerr] in which to exhibit the performance measures is in the form of recall-precision graphs. Such graphs are obtained by looking at many recall points for each search request, and computing in each case