SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Compression, Fast Indexing, and Structured Queries on a Gigabyte of Text
chapter
A. Kent
A. Moffat
R. Sacks-Davis
R. Wilkinson
J. Zobel
National Institute of Standards and Technology
Donna K. Harman
4_Description with Pairs
Alg[OCRerr]orithm Concepts Description
Docs. Returned =1221
Di8junctsI 4.01 T T :1063 I
1.94
Table 5: Further comparison of Boolean query algorithms
[OCRerr]From these tables we are able to see that the least costly algorithm is based on finding two
key descriptors. If pairs are added, there are more rare terms, which each contribute a single
disjunct.
3.2 Ranking
Ranking documents using the vector space model has usually treated both documents and
queries as a flat structures-lists of words. However, it is very simple to combine different
representations of a query in ranking. If each form of a query is represented as a unit vector
over the same vector space, we may combine the representations by vector addition. We are
thus able to combine structural information and statistical information in the same measure.
The first series of ranking experiments were used to determine the relative usefulness of some
of the query fields. The results are given in Table 6. The third Boolean query generation
algorithm was used to generate a set of approximately 1,000 documents for each query. These
were ranked using each field successively, and then combined, ignoring structure. In all these
experiments we give first the results in terms of recall and precision, and then in terms of
precision in terms of the number of viewed documents.
Let V' stand for the title vector, Vd stand for the description vector, V[OCRerr] stand for the
narrative vector, VC stand for the concepts vector, Vf stand for the definition vector, Va stand
for the vector for all of the text, and Vp stand for vector of all adjacent pairs in the query.
Recall 10% 20 o 30 o 40 o 50% 60% 70% 80% 90% 100 o Av.
Vd 0.197 0.114 0.032 0.014 0.015 0.012 0.000 0.000 0.000 0.000 0.038
Vfl 0.192 0.112 0.034 0.014 0.015 0.012 0.000 0.000 0.000 0.000 0.038
Va 0.191 0.114 0.034 0.018 0.015 0.012 0.000 0.000 0.000 0.000 0.038
Table 6: Ranking using individual fields
Table 7 shows what happens if each of the five fields are added with the same weight, and
then what happens if adjacent pairs are added as well. By way of comparison, we show the
effect of adding pairs to the full text vector, and by adding pairs to the narrative vector.
Vi-
V2-
V3=Va
V4-
Vs-
V6-
Vt+Vd+Vfl+Vf +VC
vt+Vd+Vn+Vi +vc+VP
Va+Vp
vm+VP
238