ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Operating Instructions for the SMART Text Processing and Document Retrieval System chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11-10 3.2. Specifications Affecting Phrase Searching Two basic phrase searching methods are available. These are called the statistical and syntactic phrase searching procedures. Both use pre-designed dictionaries of phrases which are searched against sentences in the texts. Statistical phrase searching is the more common of these methods. The phrase dictionary consists of a set of pre-assigned word pairs or word n-tuples (where n can be 2,3,[OCRerr],5 or 6). Each word n-tuple has a "phrase concept number" associated with it, that is used as the concept n[OCRerr]miber representing the whole phrase (as distinguished from the individual concept numbers which give the components of the phrase) Every sentence is scanned for occurrences of phrase components, and when all of the n phrase components have been found to occur at least m times in the sentence, the phrase is considered to have occurred m times, and a weight is entered accordingly for the "phrase concept number". The method of writing the phrase dictionary on the library tape is given in part 5.1.2. It is noted that this is a rather imprecise type of search, since the words searched for may occur anywhere in the sentence in any order. For e[OCRerr]le, the sentence "Despite a second, larger order of textbooks, approximately twenty-five percent of the students are still without them'1 will be considered as containing the phrase "second order approximation't. A more precise type of search is available through the "syntactic phrase searching" procedure. This uses the "criterion tree" dictionary (see 5.[OCRerr].4) in which a complete syntactic/semantic/structural specification of each phrase is given. If a syntactic phrase is to be detected, 1) the components of the phrase must have the proper semantic