SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Okapi at TREC-2
chapter
S. Robertson
S. Walker
S. Jones
M. Hancock-Beaulieu
M. Gatford
National Institute of Standards and Technology
D. K. Harman
First component
Expanding the first component of 9 on the basis of term
independence assumptions, and also making the assumption
that eliteness is independent of document length (on the
basis of the Verbosity hypothesis), we can obtain a formula
for the weight of a term t which occurs tf times. This formula
is similar to equation 2 in the main text, except that A and
1£ are replaced by Ad/[OCRerr] and sd/[OCRerr]. The factors dIA in
components such as Atf cancel out, leaving only the factors
of the form e[OCRerr]AdlA
Analysis of the behaviour of this function with varying If
and d is a little complex. The simple function used for the
experiments (formula 4) exhibits some of the correct proper-
ties, but not all. In particular, the maximum value obtained
as d 0 should be strongly dependent on tf; formula 4 does
not have this property.
B Extracts from a searcher's
notes
Choice of search terms
Suitable words and phrases occurring in title, description,
narrative, concept and definition fields were underlined-
often this provided more than enough material to begin
with. Sometimes they were supplemented by extra words,
e.g. for a query on international terrorism I added "nego-
tiate", "hostage", "hijack", "sabotage", "violence", "propa-
ganda", as well as the names of known terrorist groups likely
to fit the US bias of the exercise.
I did not look at reference books or other on-line
databases, and tended to avoid very specific terms like
proper names from the query descriptions, as I found they
could lead the search astray. For instance, the 1986 Immi-
gration Law was also known as the Simpson-Mazzoli Act,
but the name Mazzoli also turned up in accounts of other
pieces of legislation, so it was better to use a combination of
"real" words about this topic.
In some queries, it was necessary to translate an ab-
stract concept, e.g. "actual or alleged private sector eco-
nomic consequences of international terrorism" into words
which might actually occur in documents, e.g. "damage",
"insurance claims", "bankruptcy", etc. For this purpose
the use of a general (rather than domain-specific) thesaurus
might be a useful adjunct to the system.
Like the other participants I was surprised at the contents
of the stop[OCRerr]word list, e.g. "talks", "recent", "people", "new",
but not "these"! However it was usually possible to find
synonyms for stop-words and their absence was not seriously
detrimental to any query.
Grouping of terms, use of operators
Given the complexity of the queries, it was obviously nec-
essary to build them up from smaller units. My original
intention was to identify individual facets and create sets of
single words representing each, then put them together to
form the whole query. [OCRerr]...] For example, for a query about
32
the prevention of nuclear proliferation I had a set of "nu-
clear" words (reprocessing, plutonium, etc.), a set of "con-
trol" words (control, monitor, safeguards, etc.) and sets of
words for countries (argentina, brazil, iraq, etc.) suspected
of violating international regulations on this point. This
proved a bad strategy-the large sets (whether ORed or
BMed7 together) had low weightings because of their collec-
tively high frequencies, and the final query was very diffuse.
A more successful approach was to build several
small, high-weighted sets using phrases with OP=ADJ or
OP=SAMES[entence] (e.g. economic trends, gross national
product, standard of living, growth rate, productivity gains),
and then to BM them together, perhaps with a few extra
singletons (e.g. decline, slump, recession). Because of the
TREC guidelines, I didn't look at any documents for the
small sets as I went along, although under normal circum-
stances I would have done so.
Our initial instructions were to use default best-matching
if at all possible, rather than explicit operators. As al-
ready suggested, ADJ and SAMES were an absolute neces-
sity given the length of documents to be searched, but AND
and OR were generally avoided-on the occasions when I
tried AND (out of desperation) it was not particularly use-
ful. For one query where I thought it might be necessary
(to restrict a search to documents about the US economy)
it luckily proved superfluous because of the biased nature of
the database, indeed it would have made the results worse as
the US context of these documents was implied rather than
stated.
Viewing results, relevance feedback
Normally I looked at about the top 5-10 records from the
first full query. If 40% or more seemed relevant, the query
was considered to be fairly satisfactory and I went on down
the list trying to accumulate a dozen or so records for the ex-
traction phase. As ... noted by other participants, there was
a conflict between judging a record relevant because it fitted
the query, and because it was likely to yield useful new terms
for the next phase. On the one hand were the "newsbyte"
type of documents containing one clearly relevant paragraph
amidst a great deal of potential noise, and on the other the
documents which were in the right area, contained all the
right words, but failed the more abstract exclusion condi-
tions of the query. I tried to judge on query relevance, but
erred on the side of permissiveness for documents containing
the right sort of terms.
The competition conditions discouraged a really thorough
exploration of possibilities when a query was not initially
successful. In one very bad case, having seen more than 20
irrelevant records and knowing that they would appear at
the head of my output list, I felt that the query would show
up badly in the [results] anyway and that it was not worth
exploring further, as I might had there been a real question
to answer.
7BM = "best match"; the default weighted set combination
operation was BM15 (see Section 2.6)