SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Probabilistic Retrieval in the TIPSTER Collections: An Application of Staged Logistic Regression chapter W. Cooper F. Grey A. Chen National Institute of Standards and Technology Donna K. Harman system must be designed for a collection for which no training data is available, and as a practical matter none can be gathered. This objection has considerable force but it over- looks the possibility of extrapolating the results of a regression analysis in one collection for which training data exists into another for which it does not. As the experiment developed, it cast some light on the question of whether such an extrapolation can be effected without unacceptable loss of retrieval power. A final objective of the SLR methodology is to produce estimates of relevance probability that are reliable enough to present to the system users as part of the ranked output they receive. Some IR research would appear to be premised on the notion that the output ordering of the collection is all that matters -- that the only purpose of generat- ing retrieval status values (`similarity coefficients', `ranking scores', etc.) is to achieve as effective a ranking as possible. We agree that imposing an effective order of presentation on the documents is the most essential single role of the retrieval status values, but feel that in addition the numeric scores are themselves a potentially important part of the out- put. Their significance lies in their ability to provide the user at each point in the search with information about whether it is likely to be worth while to continue the search down the ranking. Clearly, such scores will be most helpful if presented in a form that most users find readily interpretable, and interpretable moreover in a sense that bears as directly as possible on the decision of whether they should stop searching. Probability- of-relevance estimates would appear to fit this prescription admirably. For reasons that will become apparent, this final objective was not attained in the present experiment. However, experiments in small collections indicate that SLR is capable of producing well-calibrated probability estimates, and doing so remains one of the general objectives of the methodology The SLR Methodology The theoretical foundations of the SLR approach are presented in a recent paper by Cooper, Dabney, & Gey (1992). A synthesis and extension of earlier approaches to prob- abilistic retrieval, the SLR method combines the commonplace theoretical stratagem of invoking statistical simplifying assumptions with the empirical technique of applying sta- tistical regression analysis to a learning sample. The use of statistical simplifying assumptions in IR has been explored by Maron & Kuhns (1960), Robertson & Sparck Jones (1976), Yu & Salton (1976), van Rijsbergen (1979) and others (surveyed by Maron (1984), Bookstein (1985)). Examples of the use of regression analysis are to be found for example in the work of Fox (1983), Fuhr (1989), and Fuhr & Buckley (1991). A distinguishing characteristic of SLR is that it breaks the analysis of the retrieval process down into two or more distinct steps or stages. For the present experiment a sim- ple two-stage procedure was adopted. In the first stage a learning sample was used to develop a regression equation that combines elementary retrieval clues into composite clues. In the second stage, the same empirical data is used to derive another regression equation that combines these composite clues into an estimate of the desired estimate of relevance probability for each query-document pair. Thus the evidence bearing on the retrieval decision is organized first into sets of simple properties of particular descriptors, and then into combinations of such sets as determined by the particular descriptors com- mon to the query and document under consideration. 75