SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Probabilistic Retrieval in the TIPSTER Collections: An Application of Staged Logistic Regression chapter W. Cooper F. Grey A. Chen National Institute of Standards and Technology Donna K. Harman feedback' -- has been the subject of some fruitful probabilistic investigations, but the question of how to create the initial ranking seems equally significant. Objectives of the Experiment A principal goal of the experiment was to investigate the retrieval effectiveness of the SLR methodology in large collections. It was hoped especially that something could be learned of the soundness and retrieval power inherent in the statistical logic that under- lies the SLR method. In view of this emphasis on logical foundations, and also because of the limited time and resources at the researchers' disposal, only a few simple and com- monly employed frequency statistics were used as retrieval clues. No attempt was made to exploit more elaborate types of linguistic or locational evidence that would have required the incorporation of a parser, a conflator, a disambiguator, a thesaurus, a phrase identifier, etc. It is not that the latter kinds of evidence could not be exploited effectively under the SLR approach. Rather, in this particular experiment the idea was to see how far one could get on the basis of careful statistical logic alone. Because of this `mainly logic' approach, the results of the experiment must be interpreted with special care. It is not the absolute retrieval effectiveness of the system that is of most interest, but rather its retrieval effectiveness relative to the amount of evi- dence used. If an SLR-based system can achieve with less clues the same level of effec- tiveness that under other design approaches requires more clues, the objective of creating a powerful underlying retrieval logic will have been demonstrated. Because regression methods are hospitable to the use of almost any kind of predictive evidence that can be expressed in statistical form, there is little doubt that the performance of the system could have been improved through the use of additional clue-types. In considering the present experiment, however, the reader is asked to bear in mind the philosophy of building a generalizable logical platform capable of extracting a maximum of retrieval power from whatever clues do happen to be available. Another objective of the SLR methodology is to keep the computational aspects of the retrieval reasonably simple and efficient. Some probabilistic schemes call for elabo- rate programming and are not impressively efficient at run-time. As a reasonable desider- atum, a truly practical probabilistic method should be no more trouble to program, and not run substantially slower than, say, a vector-processing IR system usmg comparable types of evidence. Still another goal of SLR is to partially replace traditional IR research procedures with more convenient and powerful standard statistical methods. For many years the cus- tomary experimental research paradigm in IR has been to conduct retrieval trials and apply specially invented IR effectiveness measures such as precision and recall to com- pare the trial results. It is reasonable to hope that much of this trial-and-error experimen- tation could be replaced by more efficient statistical regression analyses that use standard statistical software packages and standard measures of goodness-of-fit, thus bringing IR research more into the realm of mainstream statistical analysis. The SLR design methodology requires the use of `training data' (`learning trials', etc.) in the form of human relevance judgements for a sample of query-document pairs representative of the collection in which the proposed IR system is to operate. It is some- times objected that methods requinng training data are useless in situations where a 74