SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Overview of the Second Text REtrieval Conference (TREC-2) chapter D. Harman National Institute of Standards and Technology D. K. Harman constructed automatically using the training top- ics, the training relevance judgments and the train- ing documents. The queries should then be sub- mitted to NIST before the test documents are released and should not be modified after that point. The unmodified queries should be run against the test documents and the results submit- ted to NIST. 2. MANUAL (manual initial query construction) adhoc queries -- The query is constructed in some manner from the topic, either manually or using machine assistance. Once the query has been con- structed, it will be submitted to the system (with no manual intervention), and the results from the system will be the results submitted to NIST. There should be no manual intervention after ini- tial query construction that would affect the results. (Manual intervention is covered by the cat- egory labelled FEEDBACK.) routing queries -- The queries should be con- structed in the same manner as the adhoc queries for MANUAL, but using the training topics, rele- vance judgments, and training documents. They should then be submitted to NIST before the test documents are released and should not be modi- fled after that point. The unmodified queries should be run against the test documents and the results submitted to NIST. 3. FEEDBACK (automatic or manual query con- struction with feedback) atihoc queries -- The initial query can be con- structed using either AUTOMATIC or MAAUAL methods. The query is submitted to the system, and a subset of the retrieved documents is used for manual feedback, i.e., a human maltes judgments about the relevance of the documents in this sub- set. These judgments may be communicated to the system, which may automatically modify the query, or the human may snnply choose to modify the query himself. At some point, feedback should end, and the query should be accepted as final. Systems that submit runs using this method must submit several different sets of results to allow tracking of the time/cost benefit of doing relevance feedback. 22 The Participants There were 31 participating systems in ThEC-2, using a wide range of retrieval techniques. The participants were able to choose from three levels of participation: Cate- gory A, full participation, Category B, full participation using a reduced dataset (1/4 of the full document set), and Category C for evaluation only (to allow commercial sys- tems to protect proprietary algoritiuns). The program committee selected only 20 category A and B groups to present talks because of limited conference time, and requested that the rest of the groups present posters. All groups were asked to submit papers for the proceedings. Each group was provided the data and asked to turn in either one or two sets of results for each topic. When two sets of results were sent, they could be made using differ- ent methods of creating queries (AUTOMATIC, MAN- UAL, or FEIIIDBACK), or by using different parameter settings for one query creation method. Groups could choose to do the routing task, the atihoc task, or both, and were requested to submit the top 1000 documents retrieved for each topic for evaluation. 3. The Test Collection 3.1 Introduction The creation of the test collection (called the `IIPSThR collection) was critical to the success of ThEC. Like most traditional retrieval collections, there are three dis- tinct parts to this collection -- the documents, the queries or topics, and the relevance judgments or "right answers." These test collection components are discussed briefly in the rest of this section. For a more complete description of the collection, see [Hannan 1994]. 3,2 The Documents The documents needed to mirror the different tyi[OCRerr] of documents used in the theoretical TREC application. Specifically they had to have a varied length, a varied writing style, a varied level of editing and a varied vocab- ulary. As a final requirement, the documents had to cover different timeframes to show the effects of document date on the routing task. The documents were distributed as CD-ROMs with about 1 gigabyte of data each, compressed to fit. The following shows the actual contents of each disk. Disk 1 routing queries -- FEEDBACK cannot be used for routing queries as routing systems have not sup- ported feedback. 3 WSJ -- Wall Street Journal (1987, 1988, 1989) * --- AP Newswire (1989)