NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) UCLA-Okapi at TREC-2: Query Expansion Experiments chapter E. Efthimiadis P. Biron National Institute of Standards and Technology D. K. Harman This process resulted in one-word query terms. When ap- propriate the procedure also output phrases by treating the punctuation available in these fields as the phrase delime- ter. Queries were then generated automatically from the Ti- tle and Concepts fields. Exactly the same queries were used in the Routing and Ad hoc searches. 3.4 Term selection for query expansion a) Routing searches: Query expansion in the routing searches was performed through query modification without relevance information. As indicated in the ta- ble, that describes the construction of the runs inthe methodology section, the number of documents used could range from the top 0-20 documents, in incre- ments of 5 documents. These top ranked documents were treated as relevant and were analyzed in order to provide terms for the expansion. Expansion terms were selected by pooling all the terms and then weight- ing these terms with one of the five ranking algorithms as specified by the run. Then the top 10, 20 or 30 terms were added to the original query terms and searched. b) Ad hoc searches: The term pool consisted of all the terms of the documents judged as relevant. For the Ad hoc searches with feedback of the official results, the top 10 terms as determined by wpq were chosen for expansion and were searched together with the initial query terms. c) Rules for term selection: The following rules were followed for the inclusion or exclusion of a term during selection for query expansion: a) numbers were excluded as terms, b) all terms whose frequency (n) is equal to the num- ber of relevant documents seen (R), i.e., if n <= R, were excluded. 3.5 Search procedure All searches, Routing and Ad hoc, were automatic and de- termined by the specifications made for each run. There were no manual searches. 3.5.1 Ad hoc searches and searchers There were no manual searches. For the Ad hoc searches with relevance feedback, i.e. uclafi (official results), rel- evance assessments were provided by two searchers. The odd numbered topics were assessed by one searcher and the even numbered topics by the other. 283 3.5.2 Relevance assessments During the Ad hoc searches, the guidelines for relevance judgements were: a) review the entire document, when judging relevance, even if it seems to be peripheral or not relevant. The reason being that many of the articles were found to be collections of brief news stories, with the relevant part of the text hidden in (the middle or the end of) the text. b) target for 10 relevant documents; stop as soon as 10 are found or at the 20th document. However, if 3 relevant have not been found continue till 3 are found (this is because OKAPI will not do an expansion if it has less than 3 documents). 3.5.3 Ad hoc additional runs Following the TREC conference, a set of runs was con- ducted on the Ad hoc queries in order to complete the eval- uation of the five ranking algorithms for query expansion that were studied. The relevance judgements made in the Ad hoc run uc1a[OCRerr] 1 (fdbk.bmlS.phb.qey:wpq-10-10.uclagsly) were ex- tracted and used in the subsequent runs. The process fol- lowed in these additional runs is described below: * Four new Ad hoc runs were done; one for each of the remaining algorithms which were used for the ranking of terms for query expansion, i.e., emim, porter, r[OCRerr]hilo, r[OCRerr]lohi. * The same initial query, which was generated automat- ically, was used for all searches. * The relevance judgements made in the initially re- trieved set of the official Ad hoc run were extracted and then simulated in the additional runs. * Query expansion terms were ranked using the algo- rithm that was designated by each run. The 10 top ranked terms from the pool were added to the query. 3.6 Problems & Limitations Lack of equipment has been a major problem in our par- ticipation. In order to participate in TREC, SUN Mi- crosystems provided an equipment grant (SUN Sparc-2) in March, however no disk was initially available, but a 1- Gigabyte disk was acquired in June. Consequently, only the Ad hoc runs were included in the official results.