NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Appendix B: System Features Appendix National Institute of Standards and Technology D. K. Harman V. SYSThM COMPARISON (a) I'd guess that simple database and matching operations could be speeded up by a factor of 2-3 with a rewrite of the software to do what [OCRerr] than what we thought we might want when we started). (I)) Most of the initial analysis time is spent in computing the SVD decomposition of the term[OCRerr]ocument matrix. The sparse-iterative algorithm orders of magnitude faster than the dense algorithm we used 2 years ago. We might find additional impmvements of 2-3 times by using mor precision arithmetic. Parallel algorithms might help, but again, only by a factor of 2. This analysis is a one4ime cost for relatively stable (c) Que[OCRerr] processing is slow. Although the LSI vectors have many fewer dimensions than standard vector representations, the vectors are de[OCRerr] related to every document; it's just a matter of how much. Thus, we cannot take advantage of efficient inverted indices or other structu[OCRerr] trivial to match queries to documents in parallel. Improvements here are limited only by the nuriber of processors we have! We are als heuristic methods for finding near neighbors in high[OCRerr]imensional spaces. There are many features that have not been included in the system because it was the very first prototype. As in the paper, many improvements are und in reducing errors in text processing, reducing complexity of representation, improving quality of knowledge bases, and improving the time and spa[OCRerr] redesign of the data structures and implementations.