SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Probabilistic Retrieval in the TIPSTER Collections: An Application of Staged Logistic Regression chapter W. Cooper F. Grey A. Chen National Institute of Standards and Technology Donna K. Harman which to regress. If one could trust the Linked Dependence Assumption completely, and if nonmatch as well as match events had been taken into explicit account in the computa- tion of Z, one might have tried letting Z stand as-is as the desired logodds estimate. But because such is not the situation, this would fail to correct for dependency distortion and would give longer documents an unfair advantage. Thus it seemed advisable to perform a corrective linear transformation on Z, and moreover to normalize Z first by dividing it by some simple function of document length. It was found by trial and error that dividing Z by L raised to a power of around 0.4 seemed to remove most of the visible bias toward either very short documents or very long documents in the five collections. Logging the entire expression was found to improve the fit to the sample data. It would have been appropriate to include a correction for query length analogous to the one developed for document length. However, for lack of time the necessary anal- ysis could not be carried out. Extrapolation to Other Collections The regression analysis was confined to the WSJ data because relevance judge- ments were not available for most of the other collections in sufficient quantities, or for enough of the training queries, to make regression feasible. This circumstance brought with it the problem of how to extrapolate the WSJ retrieval rules to the remaining four collections. Speaking generally, the extension of retrieval formulae to other collections is a sig- nificant problem throughout the IR field. One would like to know how to transfer design parameters from one collection, for which there is enough relevance data, to another col- lection for which there may be too little data or none. If the transfer could be accom- plished without too much loss of predictive power, the almost exclusive use by IR experi- menters of special `test collections' could be justified more easily. We welcomed the dearth of TIPSTER relevance data for some of the collections as an opportunity to explore this problem. To simplify the challenge and confront it in its starkest form, we elected to ignore entirely even such data as were available for collections other than WSJ. The method used for the extrapolation was based on the well known statistical con- cept of standardization of variables. The standardized value of a variable in a population is obtained by subtracting from its observed value the variable's mean value in the popu- lation, then dividing this difference by its standard deviation in the population. The new standardized values have a mean of zero and a standard deviation of one. The working assumption that was made was that a regression equation such as Eq. (1) can be carried over and applied in another collection provided all variables involved have first been standardized in both collections. Although no variable values were actually standardized, the coefficients in Eq. (1) were recalculated for each of the other four collections in such a way as to create the same effect. The values for the population means and standard deviations used in the recalculation of the coefficients were taken from random samples of triples taken from the five collections. The samples were comparable to those in the `random' subsample of WSJ query-document triples described earlier. The algebraic details of the transformation process will not be presented here, but the resulting modifications of the right side of the earlier WSJ form of Eq. (1) may be of 83