SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) LSI meets TREC: A Status Report chapter S. Dumais National Institute of Standards and Technology Donna K. Harman attempt to optimize the data storage or sorting, we decreased the time required to match a 250- dimensional query vector against all document vectors and sort by a factor of 60 to 100. 3.2 Accuracy This section on accuracy is divided into two main parts - one examining the results of the two Adhoc and two Routing runs we submitted, and the other looking in detail at some failures of the LSI system. Compared to other systems, LSI performance was average on the adhoc topics and somewhat below average on the routing topics (tho see the section on misses for a discussion of sizable improvements). Because there were so many differences between systems (tokenization, query construction, representation, matching, amount of human effort, etc.) it is difficult to isolate performance differences to specific, theoretically interesting components. For this reason, we focus on our own experimental results and failure analyses as a first step in understanding and improving performance. 3.2.1 LSJ experiments 3.2.1.1 Adhoc - normalization experiment. We submitted results from two sets of adhoc queries. The two sets of adhoc query results differed only in how the similarities (cosines) from the 9 subcollections were combined to arrive at a single ranking. In one case, adhoc_topic cosine, we simply used the raw cosines from the different collections. In the other case, adhoc[OCRerr]topic normalized cosine, we normalized the cosines within each subcollection before combining. The differences in accuracy between these two methods were not very large. The raw cosine method of combining was about 15% better overall that the normalized cosine method (2786 vs. 2469 relevant articles, and .1274 vs. .1100 11-pt precision). In general, some form of normalization is needed to correct for measurement artifacts (e.g., lower mean cosine in higher dimensions) when combining scores from many different subcollections. Since we used the same number of dimensions for comparisons in each subcollection this correction was unnecessary in the present experiments. Normalization appeared to have some undesirable consequences for a few topics. Because normalization subtracts the mean cosine and divides by the variance, the same raw cosine will have a higher normalized score if it comes from a subcollection with a low mean cosine and low variance - but these are precisely the subeollections that we probably want to avoid! 3.2.1.2 Routing - feedback experiment. For the routing queries, we created two filters or queries for each of the 50 training topics. In one case, routing[OCRerr]topic cosine, the routing query was based on just terms in the topic statements, as if it had been an adhoc query. In the other case, routing_reldocs cosine, we used feedback about relevant documents from the training set and located the filter at the vector sum of the relevant documents. Our intent is to use these two runs as baselines against which alternative methods for combining the original query and relevant documents can be compared. 145