SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) WORDIJ: A Word Pair Approach to Information Retrieval chapter J. Danowski National Institute of Standards and Technology Donna K. Harman Table 4: Direct and indirect links to the word "aid" FREQUENCY aid airbus 1 fdp 1 back 1 jets 2 adams 1 board 1 crash 1 group 1 plans 1 airbus 1 boeing 2 family 1 german 1 member 1 planes 2 dispute 1 mandate 1 nations 2 partner 2 percent 1 program 2 provide 1 aircraft 1 european 1 products 1 projects 2 amendment 1 executive 1 industrie 9 initially 1 ministers 1 spokesman 2 structure 1 * subsidies 2 violating 1 consortium 5 * government 1 management 1 aid package 1 aid guarantee 1 aid consortium 1 level. Nevertheless, they are listed to show the larger context of identifying meaningful indirectness. to indirect association patternS beloW the manifest or direct level. Currently, elgenvector solutions to large matrices are more computationally limited than shortest path network solutions. There has heen more development of large scale, parallel algorithms for shortest pall's, due to the practical needs to aid routing of information in telecommunications networks. Some work, however, suggests that there is a mathematical equivalence between eigenvector and network approaches to reducing matrices of associations to a simpler underlying structure [Barnett & Richards 91]. Shortest Path Weighting. Given a set of query word pairs and a list of all documents that contain each word pair--hoth directly and indirectly-- we can take all pairs of nodes and identify the shortest path linking them in the network. These paths are measured for length according to Euclidean distance in graph terms. Such distance is a direct function of the minimum number of link steps it requires to connect two nodes on their geodesic. Directly linked nodes have a distance of one, nodes linked through one common intermediary node have a distance of two, elc. Documents are counted that were "passed through" or "activated" as each step in the shortest path is traversed. Shortest path algorithms can find these indirect paths with large data sets provided parallel algorithms and hardware are used. We are further developing such experiments. After IDF weighting, ranking, and selection of the best words, network analysis is conducted on the word pairs they form. The shortest paths linking every word in the set are found, and the word centrality in the network is indexed via the average of the minimum number of steps between that word and all other words in the sel Then, for each document, it is given a weight that is based on the centrality of the words from the query it contains. The retrieved documents found along the shortest paths between all query pairs are counted and weighted by their constituent word centrality. ank ordered for each query. Documents are then rank[OCRerr]rdered for each query. airbus 1 CONCLUSION * These are indirect links that create pairs contained in the query pairs: aid- (Airbus) -subsidies and aid-(Airbus)-government. The other indirect links are not meaningful because they do not relate to the query at the two-step Results showed that even with unexpected limitations due to the mid-project death of the lead research assistant, Nainesh Khimasia, we succeeded in processing the entire TREC collection and doing direct matching of query word pairs to document word pairs. For 15% of the topics, our results can be considered failures. Failure analysis suggests that improvements in future research may result from: 135