SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
TIPSTER Panel -- The University of Massachusetts TIPSTER Project
chapter
W. B. Croft
National Institute of Standards and Technology
Donna K. Harman
Query Type
5 docs
T+D+C+F+phrase .64
T+D+C+F .62
1+N .60
T+C+phrase .66
1+nian .6.5
1+man+para .72
Average Precision (50 topics)
30 docs 200 docs
.52 .35
.52 .35
.50 .34
.53 .36
.56 .36
.61 .39
(-3.1%)
(-6.7%)
(+3.1%)
(+1.6%)
(+12.5%)
(0%)
(-3.8%)
(+1.9%)
(+7.7%)
(+17.3%)
(0%)
(-2.8%)
(+2.8%)
(+2.8%)
(+10.3%)
Table 1: TIPSTER Retrieval Results: Query types refer to topic fields used. T is topic, D
is description, C is concepts, F is factors, N is narrative, phrase means phrase constructs
used, 1 refers to the basehue (the first hue), man means manual modification, para means
paragraph retrieval.
These results support two main conclusions: the first being that the effectiveness of the
retrieval techniques is surprisingly good considering the difficulty of the queries; the second
is that paragraph-level retrieval [OCRerr]i.s silLiulated by iua.nual creation of `ulLordered window"
queries significantly improves eff.ectivelLess. Much of the short-term development of the
inference net retrieval system will concentrate on techniques to accomplish paragraph-level
retrieval automatically. The major questiolL raised by the results concerns the effectiveness
of phrases. In previous experiments with mediuni-sized full-text collections, phrase-based
retrieval led to significant effectiveness improvements. This is not evident in the results
shown here. A possible explan[OCRerr][OCRerr]ion for this is the size of the TIPSTER topics, where
queries may have more than 50 terms, but it should also be remembered that these results
are very preliminary.
The routing expenments used 20 `[OCRerr]old" topics to search the "new" database (approxi-
mately 1 GByte of text from the same sources as the "old" database, with the exception of
DOE abstracts). Since the aim of these experiments was to study techniques for represent-
ing and using long-term information needs, we assumed that users would be more involved
in query formulation and thus the baseline used was the "1+man+para" queries. The other
query types in this experiment used variations of relevance feedback to modify the baseline
queries. These modifications consist of adding concepts to the query and reweighting the
query concepts based on their fiequency of occurrence in the identified relevant documents.
For this experiment, we had a small numl)er of relevance judgements based on documents
retrieved by another system. The techniques used to select concepts to add to the query
were based on local and gl[OCRerr])al application of the EMIM measure of association [5] The
number of terms added to a query was limited to 5.
The results show that, once again the effectiveness levels are quite good (note the
50% precision value at the 200 document level). The relevance feedback techniques were
not effective, except at the high precision end of retrieval. The features selected were, on
inspection, reasonable, but they (10 not [OCRerr]pear to be the features required by the narrative
in order to make a document relevant. No definite conclusions can be made about the
103