NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Okapi at TREC chapter S. Robertson S. Walker M. Hancock-Beaulieu A. Gull M. Lau National Institute of Standards and Technology Donna K. Harman Clearly one would in general expect end-users to have more domain or subject knowledge, especially for the kinds of queries provided for TREC. Highly interactive systems in general, and Okapi in particular, may be assumed to exploit such subject knowledge; clearly relevance feedback in ad-hoc searching can only work well if it is relatively easy for the user to find some relevant items from the initial search. In this sense, we see the present experiment as to some degree unfavourable to Okapi. 5.3.2 Searching Searchers were expected to make whatever interpretations of the topic they deemed appropriate for the purpose of searching. In other words, they could use words or phrases taken from any part of the topic, or from their own general or specific knowledge. They could also have used other reference sources. However, they were encouraged to use the system to help them refine the search, in the way that an end-user might explore the possibilities within the system and try out different combinations of search terms. The combination of these ideas with the TREC rules was a little clumsy and artificial. The procedure was as follows: (a) The searcher was given the topic in full, as received by us. (b)The searcher examined the topic and chose some terms as candidates for searching (possibly including terms not in the topic as received). (c) The searcher made exploratory searches, examining the results, making tentative relevance judgements and perhaps using the semi-automatic query expansion facility (see section 3.3) to suggest new terms. (d)Having decided on an initial formulation, the searcher then finished the exploratory session and started the definitive session. (e)The definitive session involved two stages, an initial search and a first iteration feedback search. The initial search was strictly in accordance with the selected initial formulation; the searcher examined the top few documents, making relevance judgements. (f) The frrst iteration feedback was purely automatic from the relevance judgements, including re-weighting and automatic 26 expansion. No further iterations were conducted. The guidelines to the searchers included the following: Time: Searchers were asked to allow very roughly 30 minutes per topic. In fact, the average was nearer 50 minutes. Feedback: The guidance was to assess about the first 20 documents retrieved by the initial search, or to stop after finding about 8 relevant (if that was sooner). Relevance: If it seemed to be difficult to find any relevant items, searchers were encouraged to make generous relevance judgements, so as to ensure that there was some basis for feedback (see also section 6.2 below). 5.3.3 Remarks on the system The bias in favour of initial formulation terms in the relevance feedback formula was 2 out of 3 (i.e. 3 supposed relevant documents out of which 2 were supposed to contain the term). Searchers were able to use the Boolean facility described in section 3.2, for example to treat an expression such as (A and B) as if it were a single term, to be weighted like any other. However, the emphasis was on the usual (in the Okapi context) weighted searching of single terms, and this facility was used only occasionally, and only as part of larger best-match searches. In other words, this use did not compromise the characteristic of weighted searching as truly "best match", with all the flexibility that that implies. 5.3.4 Choice of terms The terms chosen by the searchers may be briefly characterized by the following statistics: Average num[OCRerr]r of terms 12.9 Terms appearing in the topic 10.5 (81%) Terms appearing in different fields: Description 3A Narrative 6.0 Concept 7.5 Others 2.9 (these add up to more than the total because a term may occur in more than one field). For comparison, the Concept field has around 19- 20 terms on average.