SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Document Retrieval Experiments using PIRCS
chapter
K. Kwok
L. Grunfeld
National Institute of Standards and Technology
D. K. Harman
OTO
cia
Q
Fig.4: Query self-learning
0
T
OTQ
D
same self-learning procedure for a query by adding the
query to the collection as a `document' temporarily,
resulting in self-learn initial weights for the indexing
representation of the query. This weight is used in our ad
hoc DTQ retrieval. The bottom line is that this component
consideration enables the probabilistic model to self-
bootstrap, allows term frequencies in items to be employed
instead of `binary', and takes into account of query-focused
and document-focused retrieval in a cooperative fashlon.
Iii the case of routing retrieval, relevant documents are
avallable for training each topic [OCRerr]. These documents are
again considered as constituted of components, and are used
in the network to estirnate r[OCRerr]. The estirnate should be
better than seif[OCRerr]learning because the sample silte of relevant
components is much larger. Our learning algofitlun updates
the co(iditional probability [OCRerr] (and r[OCRerr] as follows (Fig.5):
Ar[OCRerr] = 11Q*(; - r[OCRerr][OCRerr]oM) (2)
Here, 11[OCRerr] is a leaning rate for trainng on the query side
and; is the average activation deposited on term t[OCRerr] by the
given relevant set. If a term [OCRerr] not in the original query is
highly activated by the relevant set, it can also be linked to
q£with edge weights given by:
w1a = a [OCRerr]:;
= * * xl (3)
This implements query expansion as was also done in
ThECi.
cia
w
0
Fig.5: DTO Learning with
238
WkI
¼
T
DTQ
Learning
&
Expansion
d.
D
6. Retrieval Methodology
To satisfy TRE[OCRerr] requirements. we submitted results as
named in the following:
piccsl: routing, no training;
pircs2: routing, with training from Diski relevants,
Disk2 not used and no query expansion;
pircs3: ad hoc, no soft-Boolean added;
pir[OCRerr]: ad hoc, with soft-Boolean adde[OCRerr]
Routing allows training the old Q2 topic set (topics 51-100)
before doing retrieval on new Dis[OCRerr] documents. Dis[OCRerr]
term usage statistics are not used. Ad hoc retrieval involves
using new Q3 topic set (topics 101-150) to do retrieval on
old documents in Diskl and Disk2. All our quenes are
automatically constructed. We did not perform feedback
experiments using Q3.
For baseline routing [OCRerr]ircsl) and ad hoc [OCRerr]ircs3) retrievals,
we use the item self-learn (SL) edge weights. Rounng
pircs2 denotes retrieval based on fther learning from
known relevant samples and represent improvements from
pircsl. There can be hundreds of known relevants for each
of the Q2 topic setfrom documents inDiski and Disk2 as
given from the results of TRECi. One way of employing
them is to do a retrieval (rankng) of Diski and Dis[OCRerr]
documents, and then make use of the first n (say n=100)
relevant documents, as for feedback learning. However, we
did not have enough resources to create a network and do
retrieval (ranking) on 2GB of documents at that time and
have to settle on a simplified strategy. First, we decided to
use Diskl only (1 GB) for training. Secondly, we believe
v'[OCRerr][OCRerr]a
¼