SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Probabilistic Retrieval in the TIPSTER Collections: An Application of Staged Logistic Regression
chapter
W. Cooper
F. Grey
A. Chen
National Institute of Standards and Technology
Donna K. Harman
feedback' -- has been the subject of some fruitful probabilistic investigations, but the
question of how to create the initial ranking seems equally significant.
Objectives of the Experiment
A principal goal of the experiment was to investigate the retrieval effectiveness of
the SLR methodology in large collections. It was hoped especially that something could
be learned of the soundness and retrieval power inherent in the statistical logic that under-
lies the SLR method. In view of this emphasis on logical foundations, and also because
of the limited time and resources at the researchers' disposal, only a few simple and com-
monly employed frequency statistics were used as retrieval clues. No attempt was made
to exploit more elaborate types of linguistic or locational evidence that would have
required the incorporation of a parser, a conflator, a disambiguator, a thesaurus, a phrase
identifier, etc. It is not that the latter kinds of evidence could not be exploited effectively
under the SLR approach. Rather, in this particular experiment the idea was to see how
far one could get on the basis of careful statistical logic alone.
Because of this `mainly logic' approach, the results of the experiment must be
interpreted with special care. It is not the absolute retrieval effectiveness of the system
that is of most interest, but rather its retrieval effectiveness relative to the amount of evi-
dence used. If an SLR-based system can achieve with less clues the same level of effec-
tiveness that under other design approaches requires more clues, the objective of creating
a powerful underlying retrieval logic will have been demonstrated. Because regression
methods are hospitable to the use of almost any kind of predictive evidence that can be
expressed in statistical form, there is little doubt that the performance of the system could
have been improved through the use of additional clue-types. In considering the present
experiment, however, the reader is asked to bear in mind the philosophy of building a
generalizable logical platform capable of extracting a maximum of retrieval power from
whatever clues do happen to be available.
Another objective of the SLR methodology is to keep the computational aspects of
the retrieval reasonably simple and efficient. Some probabilistic schemes call for elabo-
rate programming and are not impressively efficient at run-time. As a reasonable desider-
atum, a truly practical probabilistic method should be no more trouble to program, and
not run substantially slower than, say, a vector-processing IR system usmg comparable
types of evidence.
Still another goal of SLR is to partially replace traditional IR research procedures
with more convenient and powerful standard statistical methods. For many years the cus-
tomary experimental research paradigm in IR has been to conduct retrieval trials and
apply specially invented IR effectiveness measures such as precision and recall to com-
pare the trial results. It is reasonable to hope that much of this trial-and-error experimen-
tation could be replaced by more efficient statistical regression analyses that use standard
statistical software packages and standard measures of goodness-of-fit, thus bringing IR
research more into the realm of mainstream statistical analysis.
The SLR design methodology requires the use of `training data' (`learning trials',
etc.) in the form of human relevance judgements for a sample of query-document pairs
representative of the collection in which the proposed IR system is to operate. It is some-
times objected that methods requinng training data are useless in situations where a
74