SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Retrieval Experiments with a Large Collection using PIRCS chapter K. Kwok L. Papadopoulos K. Kwan National Institute of Standards and Technology Donna K. Harman highly activated in the set of relevant documents should be highly related to the concepts and topics wanted for that query. These terms can therefore be added to the original query as illustrated in Fig.4, and could be expected to enhance both recall and precision, since these terms come from relevant samples. Experiments with small collections have shown that this is indeed the case [16,8,9]. This tool resembles that of an automatic thesaurus, associated terms being derived from user experience. From the network viewpoint, query expansion corresponds to growing new edges from a query to highly activated terms during relevance feedback, and weights are assigned according to the following learning algorithm: DTQ: wia = a*xk; Pai = p[OCRerr]Q*xk, w81 = In [PaV(l[OCRerr]Pai)] + ln [NWIFk] which we introduced in [8,9]. In the TREC WSJ collection, we realize that documents can contain totally different stories and they can also be exceptionally lengthy, while both feedback and feedback with query expansion requires restricted context to work. This is a major reason why we decide to break documents into su[OCRerr]documents, as discussed in Section 2. 2.4.2 Implementation of Retrieval in a Network To satisfy TREC requirements, we need to simulate different querying circumstances as given in the following table: Query Query Type Construcfion I-------------------- Method I adhoc I routing automatic I PIRCSI I PIRCSl manual I PIRCS2 I PIRCS2 feedback I PIRCS3 I n.a. fdbk + qry expansion I PIRCS4 I n.a. We have submitted results to all six types of experiments. The last Sections describe the rationale and how RSV's are calculated in our approach. The following discusses the processing of routing, ad hoc and feedback queries: (a) Routing Queries: a special requirement in TREC is to do routing retrieval. This simulates the classification of new test documents with respect to a set of static queries that have been trained with past documents. The (past) training documents are the first half of the TREC collection c(A). We process c(A) and capture the necessary statistics of term usage. The queries from the first set of topics are then processed against c(A), and a network created using ICTF weighting. Query-focused DTQ learning is now applied with the supplied relevant (but not irrelevant) documents, and the resultant edge weights on the Q-T side of the net saved. Note that each relevant document is split into multiple sub-documents for this learning, and because we do not have relevance judgment at the sub-document level, we choose not to expand the queries. These become our routing queries q(A). The test documents from collection B, c(B), are then processed against c(A) as if they were queries, so that only collection A statistics are used for the ICTF term weighting. The q(A) are now loaded with c(B) and the dictionary from c(A) to form a new network, from which routing retrieval results based on W[OCRerr]8UtO and W[OCRerr]man (Section 2.3) are obtained, for fully automatic and manual routing queries. (b) Ad Hoc Queries: collection B is now re-processed as additions to collection A, forming a total collection c(AB) and accumulating their total term usage statistics as well as a new dictionary. The ad hoc topics are processed to form ad hoc queries q(B) against the whole c(AB). We did not perform 159