NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Latent Semantic Indexing (LSI) and TREC-2 chapter S. Dumais National Institute of Standards and Technology D. K. Harman documents, it is sometimes difficult to understand why a particular document was returned. One advantage of the LSI method is that documents can match queries even when they have no words in common; but this can also produce some spurious hits. Another reason for false alarms could be inappropriate word sense disambiguation. LsI queries are located at the weighted vector sum of the words, so words are "disambiguated" to some extent by the other query words. Similarly, the inltial SYD analysis used the context of other words in articles to determine the location for each word in the LSI space. However, smce each word has only one location, it sometimes appears as if it is "in the middle of nowhere". A related possibility concerns long articles. Lengthy articles which talk about many distiiict subtopics were averaged into a single document vector, and this can sometimes produce spurious matches. Breaking larger documents into smaller subsections and matching on these might help. 4.2.2 Misses. For this analysis we will examine a random subset of relevant articles that were not in the top 1000 returned by LSI. Many of the relevant articles were fairly highly ranked by LSI, but there were also some notable failures that would be seen only by the most persistent readers. So far, we have not systematically distinguished between misses that "almost made it" and those that were much finher down the list. Most of the misses we examined, represent articles that were primarily about a different topic than the query, but contained a small section that was relevant to the query. Because documents are located at the average of their terms in LsI space, they will generally be near the dominant theme, and this is a desirable feature of the LSI representation. Some kind of local matching should help in identifying less central themes in documents. Some misses were also attributable to poor text (and query) pre-processing and tokenizatio[OCRerr] 4.3 Open issues On the basis of preliminary failure analyses we would like to exploring some precision-enhancing methods. We would also like to explore three additional areas. 4.3.1 Separate vs. combined scaling We used 9 separate subscalings for the [OCRerr]IREC-l experiments. For TREC-2 we used a single scaling 113 (based on a very small sample). We have also recentiy finished a complete scaling and will compare this with the subeollection scalings and the sampled full scaling. 4.3.2 Centroid query vs. many separate points of interest A single vector was used to represent each query. "1 some cases the vector was the average of terms in the topic statement, and in other cases the vector was the average of previously identified relevant documents. A single query vector can be inappropriate if interests are multifacted and these facets are not near each other in the LsI space. We have developed techniques that allow us to match using a controllable compromise between averaged and separate vectors [OCRerr]ane-Fsrig et al., 1991). In the case of the routing queries, for example, we could match new documents against each of the previously identified relevant documents separately rather than against their average. 4.3.3 Interactive interfaces Ml LSI evaluations were conducted using a non- interactive system in essentially batch mode. It is well known that one can have the same underlying retrieval and matching engine, but achieve very different retrieval success using different interfaces. We would like to examine the performance of real users with interactive interfaces. A number of interface features could be used to help users make faster (and perhaps more accurate) relevance judgements, or to help them explicitiy reformulate queries. (See Dumais and Schmitt, 1991, for some preliminary results on query reformulation and relevance feedback.) Another interesting possibility involves retning something richer than a rank-ordered list of documents to users. For example, a clustering and graphical display of the top-k documents might be quite useful. We have done some preliminary experiments using clustered return sets, and would like to extend this work to the TREC collections. The general idea is to provide people with useful interactive tools that let them make good use of their knowledge and skills, rather than attempting to build all the smarts into the database representation or matching components of the system.