SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Combining Evidence from Multiple Searches
chapter
E. Fox
M. Koushik
J. Shaw
R. Modlin
D. Rao
National Institute of Standards and Technology
Donna K. Harman
Table 1: Summary of Retrieval Runs
([ Name [OCRerr] Model [OCRerr] Sim. Function [OCRerr] Weighting Scheme 1'[OCRerr]
bool Boolean Boolean binary term weights
pnorml.0 p-norm p-norm binary term weights, p=1.0
pnorml.5 p-norm p-norm binary term weights, p=1.5
pnorm2.0 p-norm p-norm binary term weights, p=2.0
cosine.atn vector cosine aug[OCRerr]orm * idf
cosine.nnn vector cosine tf
inner.atn vector inner product aug[OCRerr]orm * idf
inner.nnn vector inner product tf
. Retrieval based on the p-norm model
The Boolean queries described above were also used for the p-norm runs. Retrieval runs were
made with three different p-values: 1.0, 1.5, and 2.0. No query term or clause weights were
used during the p-norm runs.
The different runs are summarized in Table 1. Note that in Phase 1 of our efforts, we used all
eight runs listed. In Phase 2, however, we focused on the pnorml.O case, with document weighting,
and the four vector runs.
3.2 Weighting Schemes
The weighting schemes mentioned above are detailed in Table 2.
Table 2: Weighting Scheme Options
. Term frequency normalization. This has the following choices:
(n)one newif = tf
(b)inary newif = 1
________________ ________[OCRerr]
if
(m)ax[OCRerr]norm newif = m[OCRerr]x[OCRerr]if
(a)ug[OCRerr]orm newif = 0.5 + 0.5 * if
______________ max[OCRerr]if
. Document weights. This has the following choices:
f{(n)oneT new[OCRerr]wt = newif
I(t)fldf new[OCRerr]wt = newif * [OCRerr]
(p)rob new[OCRerr]wt = newif * 1()9(num[OCRerr]0o1C1s;fC;6lQl[OCRerr]fre[OCRerr])
. Document vector normalization. This can be either of:
`1(n)onei[OCRerr]Th[OCRerr]wt=new[OCRerr]wt
[OCRerr]
This allows for a very flexible approach to changing the document vector weights as can be seen
from Table 3.
321