NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System chapter W. Croft J. Callan J. Broglio National Institute of Standards and Technology D. K. Harman 4 The TREC Experiments Four experiments were submitted to the TREC evaluation, two "ad-hoc" and two "routing". In these experiments, we emphasized automatic query processing and automatic feedback algorithms for routing. The following is a summary: . AdHoc: topics 101-150 against TIPSTER volumes 1 and 2. INQOOl Created automatically from TIP STER topics. Contains phrases. Details of query processing used are described below. INQOO2 INQOOl queries, modified manually. Modifications restricted to eliminating words and phrases, and adding paragraph-level operators around existing words and phrases. The method for doing this was done somewhat differently than last year's TREC conference, as discussed below. . Routing: topics 51-100 against TIPSTER volume 3. INQOO3 Created automatically from TIPSTER topics and relevance judgements from Volumes 1 and 2. Baseline queries (from a previous TIPSTER evalua- tion) were modified by reweighting and adding single-word terms. The term weighting and selection function used was df.idf, as described in [5]. Only the top 120 relevant documents found by INQUERY were used for feedback, and 30 terms were added to each query. INQOO4 Formed by combining (using the #SUM operator) INQOOl queries and IN- QRYP queries (used in TIPSTER 18 month evaluation). The INQRYP queries were produced automatically and then modified manually. Modifications re- stricted to eliminating words and phrases, and adding paragraph-level operators around existing words and phrases. Query Type INQOOl 1NQ002 5 Docs .62 .60 (-2.6%) Average Precision 30 Docs 100 Docs .57 .49 .59 (+3.5%) .51 (+4.1%) 11-Pt Avg .36 .36 (0%) Table 1: Results for Adhoc queries Table 1 gives the results for the adhoc queries. These show that there is little difference in effectiveness between the automatically processed queries and the semi-automatically processed queries. The query processing for the automatically processed queries has been significantly improved as described in the previous section, but there is another effect. Compared to the manual query run in the last TREC conference, paragraph-level concepts were formed in a much more mechanistic way and were constrained by the language of the description and the narrative. In the previous conference, the only constraint was the vo- cabulary used in the queries, and the user's "world knowledge" was used to group concepts. 80