SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) TIPSTER Panel -- The University of Massachusetts TIPSTER Project chapter W. B. Croft National Institute of Standards and Technology Donna K. Harman Query Type 5 docs T+D+C+F+phrase .64 T+D+C+F .62 1+N .60 T+C+phrase .66 1+nian .6.5 1+man+para .72 Average Precision (50 topics) 30 docs 200 docs .52 .35 .52 .35 .50 .34 .53 .36 .56 .36 .61 .39 (-3.1%) (-6.7%) (+3.1%) (+1.6%) (+12.5%) (0%) (-3.8%) (+1.9%) (+7.7%) (+17.3%) (0%) (-2.8%) (+2.8%) (+2.8%) (+10.3%) Table 1: TIPSTER Retrieval Results: Query types refer to topic fields used. T is topic, D is description, C is concepts, F is factors, N is narrative, phrase means phrase constructs used, 1 refers to the basehue (the first hue), man means manual modification, para means paragraph retrieval. These results support two main conclusions: the first being that the effectiveness of the retrieval techniques is surprisingly good considering the difficulty of the queries; the second is that paragraph-level retrieval [OCRerr]i.s silLiulated by iua.nual creation of `ulLordered window" queries significantly improves eff.ectivelLess. Much of the short-term development of the inference net retrieval system will concentrate on techniques to accomplish paragraph-level retrieval automatically. The major questiolL raised by the results concerns the effectiveness of phrases. In previous experiments with mediuni-sized full-text collections, phrase-based retrieval led to significant effectiveness improvements. This is not evident in the results shown here. A possible explan[OCRerr][OCRerr]ion for this is the size of the TIPSTER topics, where queries may have more than 50 terms, but it should also be remembered that these results are very preliminary. The routing expenments used 20 `[OCRerr]old" topics to search the "new" database (approxi- mately 1 GByte of text from the same sources as the "old" database, with the exception of DOE abstracts). Since the aim of these experiments was to study techniques for represent- ing and using long-term information needs, we assumed that users would be more involved in query formulation and thus the baseline used was the "1+man+para" queries. The other query types in this experiment used variations of relevance feedback to modify the baseline queries. These modifications consist of adding concepts to the query and reweighting the query concepts based on their fiequency of occurrence in the identified relevant documents. For this experiment, we had a small numl)er of relevance judgements based on documents retrieved by another system. The techniques used to select concepts to add to the query were based on local and gl[OCRerr])al application of the EMIM measure of association [5] The number of terms added to a query was limited to 5. The results show that, once again the effectiveness levels are quite good (note the 50% precision value at the 200 document level). The relevance feedback techniques were not effective, except at the high precision end of retrieval. The features selected were, on inspection, reasonable, but they (10 not [OCRerr]pear to be the features required by the narrative in order to make a document relevant. No definite conclusions can be made about the 103