SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
WORDIJ: A Word Pair Approach to Information Retrieval
chapter
J. Danowski
National Institute of Standards and Technology
Donna K. Harman
PIPELINE matching of the query pairs against the pair
files for each text file executed in approximately 16
milliseconds per file per 100 sets of query pairs. This meant
that to run all 100 queries against the entire collection took
approximately five hours of PIPELNE processing on the
word pair index files, or three minutes per query.
Time constraints precluded completing a word and
word-pair by document count on the entire collection for
inverse document frequency or entropy word and word pair
weighting. Retrieved documents were ranked from 1 to 200
by counting the number of matching pairs each document
had to the query. Frequency of pair occurrence in documents
was not used to weight except in breaking ties at the 200
document-rank threshold.
Time limitations also prevented full implementation of
the indirect matching process. Only direcdy matching pairs
were used for the main analysis to produce the results.
Indirect matching was, however, later tested. This will be
described after presentation of the basic results.
RESULTS
WORDij results were greater than or equal to the
median levels of performance for seven topics. Our results
were within one standard deviation on 55 topics, and within
two standard deviations on 82 topics. Performance was
significantly lower than the median for 14 topics, as judged
by counting topics whose results were greater than two
standard deviations below the median. Table 1 lists the
topics in two categories, those that were better than or equal
to the median, and those that were significantly below the
median.
Failure Analysis
Query Style.
Several kinds of failure analysis were performed. To
investigate whether stylistic features of queries were
associated with performance, we computed the following
variables for each query using the shareware program, PC-
STYLE:
Number of Sentences
Number of Words
Words per sentence
Percentage of long words
Percentage of personal words
Percentage of action verbs
Average number of syllables per word
Table 1: Topic Results Ordered by Performance
TOPIC Difference (median - result)
Better than or Equal to Median
66 .08980 Natural Language Processing
29 .04540 OS/2 problems
94 .03180 Computer-aided Crime
95 .00800 Computer-aided Crime Detection
18 .00000 Global Stock Market Trends
44 .00000 What Makes CASE succeed or fail
88 .00000 Crude Oil Price Trends
100 .00000 Controlling High Tech Transfer
50 .00250 Virtual Reality Military Apps.
Significantly Below Median (Failures)
22 .19590 Legal Repercus. -Agrochemicals
58 .20740 Rail Strikes
37 .21290 Role of Minis and Mainframes
20 .21770 Superconductors
77 .23290 Poaching
17 .24350 Japanese Stock Market Trends
93 .24560 What Backing Does the NRA Have
13 .24780 Drug Approval
54 .26840 Satellite Launch Contracts
51 .29490 Airbus Subsidies
10 .33340 Space Program
70 .35440 Surrogate Motherhood
78 .38240 Greenpeace
21 .48710 Counternarcotics
Reading grade level
These variables were correlated with a criterion variable,
which was the difference between the median and our result.
We subtracted for each query our obtained result from the
median result on the 1 1-point averages of recall-precision
contained in the official results across systems for the test
queries 51-100. Table 2 displays these correlations. None of
them are statistically significant at the .01 level. A second
criterion variable was created to represent whether the query
was in the t1failed" category, greater than two standard
deviations below the median. A dummy variable was
created for each query using zero to represent success and
one to represent failure. Correlations of the style variables
were also computed with the failure criterion. No
correlations were significant at the .01 level. This suggests
that query length, complexity, and other stylistic variables
are unrelated to retrieval performance.
Query Words.
132