NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models chapter N. Fuhr C. Buckley National Institute of Standards and Technology Donna K. Harman both probabilistic indexing methods clearly outperform the if idf run, with the exception of the low-recall end for the phrase run (see below). Comparing word indexing with phrase indexing shows mixed results (see also table 1): For the recall-precision averages, phrases perform better again with the exception of the low recall end. More information is given by the recall and precision values at different numbers of documents retrieved (see appendix of this volume). Precision for phrases is worse or about equal to that of words, but recall is always better. This means that phrases perform better for narrow queries (with a small number of relevant documents). precision 0.8- if idf [OCRerr] 0.7- iuhrai (words) I [OCRerr]uhrp1 (phrases) p 0.6- 0.5- 0.4- 0.3- 0.2- 0.1- 0.2 0.4 0.6 0[OCRerr] 1 recall Abbildung 2: Recall-precision curves for probabilistic indexing in comparison to if. idf run 0 In general, one would expect that using phrases in addition to single words would never decrease precision. We blame the current definition of the retrieval function (1) for the observed behaviour. This function assumes that all the terms considered are independent, which is obviously not true when we consider a phrase in addition to its two components as terms. Since the size of the error made here depends on the indexing weights of the components, this fact may explain the precision decrease for the highest ranked documents (i.e. at low recall levels). A better strategy would be to ignore the components in case the phrase occurs in the document. The query-wise comparison with the median (see table 1) shows a large scattering of the results for the single queries. A preliminary analysis has indicated that the relative performance for a specific query depends significantly on the fact whether or not the query statement contains negated terms. For example, in topic 86 we have... it is a bank, as opposed to another financial institution such as a savings and loan or credit union ... and topic 87 reads ... Civil actions, such as creditor claims filed for recovery of losses, are not relevant ... . In both cases, our system yields one of the worst results (for word indexing). Since our query indexing procedure extracts only the terms from the query and is not able to recognize negation, the system explicitly searches for documents containing these negated terms. Theoretically, it is obvious that negated terms should be given a negative utility weight. However, the examples cited here show that it is a difficult task to recognize negation automatically. 3.2 Routing queries For the routing queries, the retrieval-with-probabilistic-indexing (RPI) model described in [Fuhr 89] and [Fuhr 92) was applied. This model combines query-specific relevance feedback information with 94