SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Compression, Fast Indexing, and Structured Queries on a Gigabyte of Text chapter A. Kent A. Moffat R. Sacks-Davis R. Wilkinson J. Zobel National Institute of Standards and Technology Donna K. Harman 4_Description with Pairs Alg[OCRerr]orithm Concepts Description Docs. Returned =1221 Di8junctsI 4.01 T T :1063 I 1.94 Table 5: Further comparison of Boolean query algorithms [OCRerr]From these tables we are able to see that the least costly algorithm is based on finding two key descriptors. If pairs are added, there are more rare terms, which each contribute a single disjunct. 3.2 Ranking Ranking documents using the vector space model has usually treated both documents and queries as a flat structures-lists of words. However, it is very simple to combine different representations of a query in ranking. If each form of a query is represented as a unit vector over the same vector space, we may combine the representations by vector addition. We are thus able to combine structural information and statistical information in the same measure. The first series of ranking experiments were used to determine the relative usefulness of some of the query fields. The results are given in Table 6. The third Boolean query generation algorithm was used to generate a set of approximately 1,000 documents for each query. These were ranked using each field successively, and then combined, ignoring structure. In all these experiments we give first the results in terms of recall and precision, and then in terms of precision in terms of the number of viewed documents. Let V' stand for the title vector, Vd stand for the description vector, V[OCRerr] stand for the narrative vector, VC stand for the concepts vector, Vf stand for the definition vector, Va stand for the vector for all of the text, and Vp stand for vector of all adjacent pairs in the query. Recall 10% 20 o 30 o 40 o 50% 60% 70% 80% 90% 100 o Av. Vd 0.197 0.114 0.032 0.014 0.015 0.012 0.000 0.000 0.000 0.000 0.038 Vfl 0.192 0.112 0.034 0.014 0.015 0.012 0.000 0.000 0.000 0.000 0.038 Va 0.191 0.114 0.034 0.018 0.015 0.012 0.000 0.000 0.000 0.000 0.038 Table 6: Ranking using individual fields Table 7 shows what happens if each of the five fields are added with the same weight, and then what happens if adjacent pairs are added as well. By way of comparison, we show the effect of adding pairs to the full text vector, and by adding pairs to the narrative vector. Vi- V2- V3=Va V4- Vs- V6- Vt+Vd+Vfl+Vf +VC vt+Vd+Vn+Vi +vc+VP Va+Vp vm+VP 238