SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Latent Semantic Indexing (LSI) and TREC-2
chapter
S. Dumais
National Institute of Standards and Technology
D. K. Harman
U
[OCRerr]d
the centroid of all relevant documents for some of the
standard IR test collections [OCRerr]ed, Cr51, Cranfield,
CACM, Time). In these cases, we found an average
improvement of 107% when the query was replaced
by the centroid of all relevant documents. The
Improvement was 67% when the top three relevant
documents were used, and 33% when just the first
relevant document was used. The smaller advantages
observed In JREC-2 are partially due to statistical
artifacts, and partially to the fl[OCRerr]EC topics which are
much richer need statements than the usual IR queries.
(We also examined topic and reldocs profiles In
TREC-1. Somewhat surprisingly, the query using just
the topic terms was about 25% more accurate than the
query using relevant documents from training. This is
attributable to the small number and inaccuracy of
relevance judgements in the Initial training set for
[OCRerr]IREC-1. This had substantial Impact on performance
for some topics because our reldocs queries were
based only on the relevant articles and ignored the
original topic description.)
The lsirl and lsir2 runs provide baselines against
which various combinations of query information and
relevant document information can be measured. We
have tried a simple combination of the lsirl and lsir2
profile vectors, in which both components have equal
weight. That is, we took the sum of the lsirl and lsir2
profile vectors for each of the topics and used this as a
profile vector. The results of this analysis are shown
in the third column of the table labeled rl+r2. This
combination does somewhat better than the centroid of
the relevant documents in the total number of relevant
documents returned and in average precision. (We
returned fewer than 1000 documents for 5 of the
topics and not all documents returned by the rl+r2
method had been judged for relevacce, so we suspect
that performance could be improved a bit more.) For
27 of the topics, rl+r2 was better than the maximum
of the other two methods. It was never more than
about 10% worse than the best method. Thus it
appears that this combination takes advantage of the
best of both methods.
The rl+r2 method which combines a query vector
with a vector representing the centroid of all relevant
documents is a kind of relevance feedback. This is an
unusual variant of relevance feedback since all the
words in relevant documents are used, words in non-
relevant documents are not down-weighted, and query
terms are not re-weighted. Interestingly, this method
appears to produce improvements that are comparable
to those obtained by Buckley, Allan and Salton (1993)
using more traditional relevance feedback methods.
109
A[OCRerr][OCRerr]r[OCRerr]e preci[OCRerr]i[OCRerr] f[OCRerr] tile rl+r2 method is 31% better
than for lsirl which used only the topic words (.3457
vs. .2622), and this is quite similar to the 38%
improvement reported by Buckley, Allan and Salton
(1993) for their richest routing query expansion
method.
The lsir2 method is generally better than the lsirl
method, but there is substantial variability across
topics. The topics on which there are the largest
differences are generally those in which the cosine
between the the lsirl and lsir2 topic vectors are
smallest. The cosines between corresponding topic
vectors range from .87 to .54. The lsir2 method is
substantially better on topics: 71 (incursions by
foreign military or guerrilla groups), 73 (movement of
people from one country to another), 87 (criminal
actions against officers of falled financial institution),
94 (crime peipetrated with the aid of a computer), 98
(production of fiber optics equipment). There are a
few topics for which lsirl is substantially better than
lsir2: 63 (machine translation system), 65 (information
retrieval system), 85 (actions against corrupt public
officials), 95 (computer application to crime solving).
It is not entirely clear what distinguishes between
these topics, especially topics 94 and 95, for example.
We have not yet had time to look in detail at the
fallures of the LSI system. We will examine both
misses and false alarms in more detail. A preliminary
examination of a few topics suggests that lack of
specificity is the main reason for false alarms (highly
ranked but irrelevant documents). This is not
surprising because LSI was designed as a recall-
enhancing method, and we have not added precision-
enhancing tools although it would be easy to do so.
We would also like to examine some query splitting
ideas. We have previously conducted experiments
which suggest that performance can be improved if
the filter is represented as several separate vectors. We
did not use this method for the IREC-2 results we
submitted, but would like to do so. (See also Kane-
Esrig et al., 1991 or Foltz and Dumais, 1992, for a
discussion of multi-point interest profiles in LSI.)
3.5 TREC-2: Adhoc experiments
We submitted two sets of adhoc queries - lsiasm and
lsial. We had intended to compare the new SMART
pre-processing (isiasm) and a single LsI space (`sial)
with our old JREC-1 pre-processing and 9 separate
subeollection spaces. Unfortunately, there were some
serious errors in our translation between internal
document numbers and the 4)OCNO> labels