NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-2 Document Retrieval Experiments using PIRCS chapter K. Kwok L. Grunfeld National Institute of Standards and Technology D. K. Harman OTO cia Q Fig.4: Query self-learning 0 T OTQ D same self-learning procedure for a query by adding the query to the collection as a `document' temporarily, resulting in self-learn initial weights for the indexing representation of the query. This weight is used in our ad hoc DTQ retrieval. The bottom line is that this component consideration enables the probabilistic model to self- bootstrap, allows term frequencies in items to be employed instead of `binary', and takes into account of query-focused and document-focused retrieval in a cooperative fashlon. Iii the case of routing retrieval, relevant documents are avallable for training each topic [OCRerr]. These documents are again considered as constituted of components, and are used in the network to estirnate r[OCRerr]. The estirnate should be better than seif[OCRerr]learning because the sample silte of relevant components is much larger. Our learning algofitlun updates the co(iditional probability [OCRerr] (and r[OCRerr] as follows (Fig.5): Ar[OCRerr] = 11Q*(; - r[OCRerr][OCRerr]oM) (2) Here, 11[OCRerr] is a leaning rate for trainng on the query side and; is the average activation deposited on term t[OCRerr] by the given relevant set. If a term [OCRerr] not in the original query is highly activated by the relevant set, it can also be linked to q£with edge weights given by: w1a = a [OCRerr]:; = * * xl (3) This implements query expansion as was also done in ThECi. cia w 0 Fig.5: DTO Learning with 238 WkI ¼ T DTQ Learning & Expansion d. D 6. Retrieval Methodology To satisfy TRE[OCRerr] requirements. we submitted results as named in the following: piccsl: routing, no training; pircs2: routing, with training from Diski relevants, Disk2 not used and no query expansion; pircs3: ad hoc, no soft-Boolean added; pir[OCRerr]: ad hoc, with soft-Boolean adde[OCRerr] Routing allows training the old Q2 topic set (topics 51-100) before doing retrieval on new Dis[OCRerr] documents. Dis[OCRerr] term usage statistics are not used. Ad hoc retrieval involves using new Q3 topic set (topics 101-150) to do retrieval on old documents in Diskl and Disk2. All our quenes are automatically constructed. We did not perform feedback experiments using Q3. For baseline routing [OCRerr]ircsl) and ad hoc [OCRerr]ircs3) retrievals, we use the item self-learn (SL) edge weights. Rounng pircs2 denotes retrieval based on fther learning from known relevant samples and represent improvements from pircsl. There can be hundreds of known relevants for each of the Q2 topic setfrom documents inDiski and Disk2 as given from the results of TRECi. One way of employing them is to do a retrieval (rankng) of Diski and Dis[OCRerr] documents, and then make use of the first n (say n=100) relevant documents, as for feedback learning. However, we did not have enough resources to create a network and do retrieval (ranking) on 2GB of documents at that time and have to settle on a simplified strategy. First, we decided to use Diskl only (1 GB) for training. Secondly, we believe v'[OCRerr][OCRerr]a ¼