SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) The QA System chapter J. Driscoll J. Lautenschlager M. Zhao National Institute of Standards and Technology Donna K. Harman It should be noted that the semantic experiments reported here were crude. Our lexicon did not have enough TREC words, and we used a blend of keyword and semantic weights that was not the best. So we expect that better semantic results for paragraphs can be achieved (refer to Section 6). 6. Failure Analysis In general our participation in the TREC experiments was impeded by the following: PC DOS Platform. This platform has a serious memory addressing restriction which results in memory page swap- ping and this seriously affects the speed of processing, especially during creation of inverted files and index structures. We can solve this problem by moving to an OS/2 or UNIX platform. Extra Semantic Processing Time. Our semantic proba- bilistic and statistical calculations more than double the processing time for indexing and statistical ranking of retrieved documents. Again, we can solve this problem by moving to an OS/2 or UNIX platform. Time to Build Semantic lexicon. We were only able to incorporate 1000 frequently occurring words in the training text within our semantic lexicon. We did not have enough time to process the test text for the ad-hoc queries. This problem can be solved by having archival data distributed earlier. We suspect that by having more TREC words in our semantic lexicon, better results could have been achieved in Section 5.2 when paragraphs are used as the basis for retrievaL Unknown Blend for Semantic and Keyword Weights. There are three main aspects to our blend of semantic and keyword weights within the vector processing model: (i) The Proper Probabilities to Use for the Semantics Triggered by a Word. For example, we let the word "vapor" trigger State with 18% probability, Tem- perature with 9% probability, and Motion with Reference to Direction with 9% probability. We have several techniques for determining probabili- ties such as these. (ii) The Scaling of Keyword Weights and Semantic Weights. For example, in a Question/Answer environment where queries are the length of a sentence and documents are either a sentence or at most a paragraph, we have been successful by forcing semantic similarity to be approximately 1/3 of keyword similarity when the two are combined in processing small document collections (1ess than 1000 documents). There was no scaling for the experiments reported in Section 5.2; we suspect better results could have been achieved. (iii) Independent Semantic Weights and Keyword Weights. A valid criticism of our research has been that the semantic contribution from a word in a document should be kept independent of the word's own similarity contribution if the word is a keyword in common with the query. The overall problem of proper blend can be solved by spending more time using TREC test documents, test topics, and good test relevance judgments to run many retrieval experiments to establish the correct blend. 207 Number of Semantic Categories. Mother way to solve the problem of long documents causing semantic weights to be of little value is to have more semantic categories. A large number of "semantic" categories could be obtained (for example) by using [OCRerr]11 the categories and/or subcategories found in Roget's Thesaurus, instead of the 36 semantic categories we use. This would be a deviation from database semantic modeling but it probably should be examined. Block-split Tree Structured Files. The QA System used B+ tree structures for implementing inverted files and this actually slowed the system in our DOS environment. The QA System also had severe storage overhead due to storage of character strings in the B+ trees. We have solved these two problems by implementing a separate system using a hashing function to establish codes for strings. TREC Document length. Semantic experiments like those reported in Section 5.2 have shown that documents larger than a paragraph cause our semantic approach to be of little value. This problem can be corrected by considering paragraphs as a basis for document retrieval. Finally, we spent too much time on work that was never incorporated in our experiments. We originally designed an efficient method of inverting data files, but it could not be used for routing queries. Also, trying to do semantic part-of-speech tagging experiments using SQUDS slowed us down. References [1) C. Buckley, SMART Evaluation Program (for TREC), Cornell SMART Group, Cornell University. [2] C. Date, An Introduction to Database Systems, Vol.1, Addison Wesley, 1990. [3] Hello Software, P.O. Box 494, Goldenrod, FL 32733. [4] K. Sparck Jones and R. Bates, "Research on Automatic Indexing 1974-1976," Technical Report, Computer laboratory, University of Cambridge, 1977. [5] J. Ii)vins, "Development of a Stemming Algorithm," Mechanical Translation and ComputationalLinguistics, Vol.11, No.1-2, pp.11-31, March and June, 1968. [6] Roget's International Thesaurus, Harper & Row, New York, Fourth Edition, 1977. [7] G. Salton,Automatic T&tProcessing, Addison-Wesley, 1989. [8] D. Voss and J. Driscoll, "Text Retrieval Using a Com- prehensive Semantic lexicon," Proceedings of ISMM First International Conference on Informadon and Knowledge Management, Baltimore, Maryland, November 1992. [9] E. Wendlandt and J. Driscoll, "Incorporating a Semantic Analysis into a Document Retrieval Strategy," Pro- ceedings of the Fourteenth Annual International ACM/SIGIR Conference on Rese""rch and Development in Information Retrieva4 Chicago, Illinois, pp. 270-279, October 1991.