SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Retrieval Experiments with a Large Collection using PIRCS
chapter
K. Kwok
L. Papadopoulos
K. Kwan
National Institute of Standards and Technology
Donna K. Harman
B,C,D are index terms with weights b,c,d respectively, A,V are the boolean AND and OR each given a
fixed weight p=2, and all clause weights are set to I. If document di also has these three terms with
weights 0 <= b',c',d'<= 1, then the similarity between di and CL could be evaluated recursively as:
x `= Sim(di,X) = sqrt[((cc')2 +(dd')2)i([OCRerr]+d2)),
Sim(di,CL) = 1 - sqrt[ [OCRerr]2(l[OCRerr]b[OCRerr])2+l*(l[OCRerr]x')2 )/(1)2 +1))
(5)
All document and query term weights are taken from the edges of the net, so that the system is fully
automatic once the boolean expression has been defmed manually.
Our retrieval results for automatic query construction is then based on combining methods (a) and (b),
thus: W[OCRerr]ILito = (W[OCRerr] + V[OCRerr])/2. Those for manual query construction is based on w1man = r*Wiauto +
s*sim(di,CL). Both make use of combination of retrieval methods. The constants r, 5 are chosen as 0.65
and 0.35 respectively. The objective is that adding soft-boolean structure may enhance the retrieval results
of the automatic method for the same queries. Our soft-boolean evaluation algorithm currently only
accounts for terms that also appear in the network for this query; additional terms that may have been
inserted manually are ignored.
2.4. Network Implementation with Learning
2.4.1 Network for Routing, Ad Hoc and Feedback without/with Query Expansion
The use of a network can provide a unified view of many retrieval algorithms and is a flexible tool for
implementation. In PIRCS, retrieval methods (a) and (1') of the previous section are implemented as
feedforward and feedbackwards processing in a Query-Term-Document (Q-T-D) network as presented in
[8,9]. A binary tree representing a boolean expression can also be hung onto the net for method (c).
These are shown in Figs.l,2.
QTD DTQ
cia
w
ak ifi
A
-A
Q
Fig.i: 3-Layer PIR Network
tk
--0----
ThWkiWik
T
~L]
LI
LI
D
The edges of the net are initialized as follows: w[OCRerr]
= d[OCRerr]I[OCRerr] as in Eqn.2, and similariy for w[OCRerr] and w[OCRerr].
d1
½
Fig[OCRerr]2: Soft-Boolean Query
Network
tk
0
w
T
F]
LI
LI
D
(tk acting on q,) as in Eqn. 1 and wkl (d[OCRerr] acting on tk)
Activation on d[OCRerr]=l gated through wkl deposits on tk
157
OTO
Soft-
Boolean
d