SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
LSI meets TREC: A Status Report
chapter
S. Dumais
National Institute of Standards and Technology
Donna K. Harman
attempt to optimize the data storage or sorting, we decreased the time required to match a 250-
dimensional query vector against all document vectors and sort by a factor of 60 to 100.
3.2 Accuracy
This section on accuracy is divided into two main parts - one examining the results of the two
Adhoc and two Routing runs we submitted, and the other looking in detail at some failures of the
LSI system.
Compared to other systems, LSI performance was average on the adhoc topics and somewhat
below average on the routing topics (tho see the section on misses for a discussion of sizable
improvements). Because there were so many differences between systems (tokenization, query
construction, representation, matching, amount of human effort, etc.) it is difficult to isolate
performance differences to specific, theoretically interesting components. For this reason, we
focus on our own experimental results and failure analyses as a first step in understanding and
improving performance.
3.2.1 LSJ experiments
3.2.1.1 Adhoc - normalization experiment. We submitted results from two sets of adhoc
queries. The two sets of adhoc query results differed only in how the similarities (cosines) from
the 9 subcollections were combined to arrive at a single ranking. In one case, adhoc_topic
cosine, we simply used the raw cosines from the different collections. In the other case,
adhoc[OCRerr]topic normalized cosine, we normalized the cosines within each subcollection before
combining.
The differences in accuracy between these two methods were not very large. The raw cosine
method of combining was about 15% better overall that the normalized cosine method (2786 vs.
2469 relevant articles, and .1274 vs. .1100 11-pt precision). In general, some form of
normalization is needed to correct for measurement artifacts (e.g., lower mean cosine in higher
dimensions) when combining scores from many different subcollections. Since we used the
same number of dimensions for comparisons in each subcollection this correction was
unnecessary in the present experiments. Normalization appeared to have some undesirable
consequences for a few topics. Because normalization subtracts the mean cosine and divides by
the variance, the same raw cosine will have a higher normalized score if it comes from a
subcollection with a low mean cosine and low variance - but these are precisely the
subeollections that we probably want to avoid!
3.2.1.2 Routing - feedback experiment. For the routing queries, we created two filters or
queries for each of the 50 training topics. In one case, routing[OCRerr]topic cosine, the routing query
was based on just terms in the topic statements, as if it had been an adhoc query. In the other
case, routing_reldocs cosine, we used feedback about relevant documents from the training set
and located the filter at the vector sum of the relevant documents. Our intent is to use these two
runs as baselines against which alternative methods for combining the original query and
relevant documents can be compared.
145