SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
WORDIJ: A Word Pair Approach to Information Retrieval
chapter
J. Danowski
National Institute of Standards and Technology
Donna K. Harman
query tuning based on natural language processing
using Special procedures for treating proper noun
names for organizations, products, locations, etc.
retaining and using word pairs occurring only once in
documents
stemming the documents and queries
doing indirect document frequency (IDE) or entropy
weighting on words and using these to weight query
pairs
computing additional weights based on shortest paths.
ACKNOWLEDGEMENTS
The author is grateful for the contributions of the
following University of Illinois at Chicago faculty, students,
and staff to this project: John Andrews, Robert Goldstein,
Alan Hinds, Nainesh Khimasia, Jin Hong Meng, Stephen
Roy, Gary Singer, Anand Sundaram, George Yanos, and
Clement Yu.
REFERENCES
Barnett, G.A. & Richards, W.D. (1991, February). A
comparison of NEGOPY'S clique detection algorithm with
correspondence analysis. Paper presented to the
International Social Networks Conference, Tampa, Florida.
Burt, R.S. (1990). Structure. New York: Center for Social
Sciences, Columbia University.
Croft, B., Turtle, H. & Lewis, D. (1991). Proceedings of the
SIGIR `91, 32-45.
Danowski, J. (1982). A network-based content analysis
methodology for computer-mediated communication: An
illustration with a computer bulletin board," Communication
Yearbook, 6,904-925.
Danowski, 3. (1988). Organizational infographics and
automated auditing: Using computers to unobtrusively
gather and analyze communication. In G. Goldhaber and G.
Barnett (eds.) Handbook of organizational communication
(pp.335-384). Norwood, NJ: Ablex.
Danowski, J. & Andrews, J. (1985, February). A method for
automated network analysis of word cooccurrences. Paper
presented to the International Social Networks Conference,
San Diego.
Danowski, 3. & Martin, T.H. (1979). Evaluating the health
of information science: Research community and user
contexts. Final report to the Division of Information Science
of the National Science Foundation, no. 15Tl8-2l 130.
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W.
& Harshman, R.A. (1990). Indexing by latent semantic
analysis. Journal of the Society for Information Science,
41:6, 391407.
Dumais, S.T. (1992). LSI meets TREC: A status reporL
Paper presented to TREC.
Fagan, J. (1989). The effectiveness of a nonsyntactic
approach to automatic phrase indexing for document
retrieval. Journal of the American Society for Information
Science, 40:2,115-132.
Gabow, H.N. & Tarjan, R.E. (1989). Faster sealing
algorithms for network problems. SIAM Joumal[OCRerr]on
Computing. 1 8(Oct),1013-36.
Salton, G. & McGill, M. (1983). Introduction to modem
information retrieval. New York: McGraw-Hill.
Salton, G. & Buckley, C. (1991). Automatic text structuring
and retrieval: Experiments in automatic encyclopedia
searching. Proceedings of the SIG-IR `91, 21-30.
van Rijsbergen, C. (1977). A theoretical basis for the use of
cooccurrence data in information retrieval. Journal of
Documentation, 33,106-119.
Yu, C., Buckley, C., Lam, H. & Salton, G. (1983). A
generalized term dependence model in information retrieval.
Information technology: Research and development,
2,129-154.
136