SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
UCLA-Okapi at TREC-2: Query Expansion Experiments
chapter
E. Efthimiadis
P. Biron
National Institute of Standards and Technology
D. K. Harman
This process resulted in one-word query terms. When ap-
propriate the procedure also output phrases by treating the
punctuation available in these fields as the phrase delime-
ter.
Queries were then generated automatically from the Ti-
tle and Concepts fields. Exactly the same queries were used
in the Routing and Ad hoc searches.
3.4 Term selection for query expansion
a) Routing searches: Query expansion in the routing
searches was performed through query modification
without relevance information. As indicated in the ta-
ble, that describes the construction of the runs inthe
methodology section, the number of documents used
could range from the top 0-20 documents, in incre-
ments of 5 documents. These top ranked documents
were treated as relevant and were analyzed in order
to provide terms for the expansion. Expansion terms
were selected by pooling all the terms and then weight-
ing these terms with one of the five ranking algorithms
as specified by the run. Then the top 10, 20 or 30 terms
were added to the original query terms and searched.
b) Ad hoc searches: The term pool consisted of all the
terms of the documents judged as relevant. For the Ad
hoc searches with feedback of the official results, the
top 10 terms as determined by wpq were chosen for
expansion and were searched together with the initial
query terms.
c) Rules for term selection: The following rules were
followed for the inclusion or exclusion of a term during
selection for query expansion:
a) numbers were excluded as terms,
b) all terms whose frequency (n) is equal to the num-
ber of relevant documents seen (R), i.e., if n <= R,
were excluded.
3.5 Search procedure
All searches, Routing and Ad hoc, were automatic and de-
termined by the specifications made for each run. There
were no manual searches.
3.5.1 Ad hoc searches and searchers
There were no manual searches. For the Ad hoc searches
with relevance feedback, i.e. uclafi (official results), rel-
evance assessments were provided by two searchers. The
odd numbered topics were assessed by one searcher and
the even numbered topics by the other.
283
3.5.2 Relevance assessments
During the Ad hoc searches, the guidelines for relevance
judgements were:
a) review the entire document, when judging relevance,
even if it seems to be peripheral or not relevant. The
reason being that many of the articles were found to
be collections of brief news stories, with the relevant
part of the text hidden in (the middle or the end of)
the text.
b) target for 10 relevant documents; stop as soon as 10 are
found or at the 20th document. However, if 3 relevant
have not been found continue till 3 are found (this is
because OKAPI will not do an expansion if it has less
than 3 documents).
3.5.3 Ad hoc additional runs
Following the TREC conference, a set of runs was con-
ducted on the Ad hoc queries in order to complete the eval-
uation of the five ranking algorithms for query expansion
that were studied.
The relevance judgements made in the Ad hoc run
uc1a[OCRerr] 1 (fdbk.bmlS.phb.qey:wpq-10-10.uclagsly) were ex-
tracted and used in the subsequent runs. The process fol-
lowed in these additional runs is described below:
* Four new Ad hoc runs were done; one for each of the
remaining algorithms which were used for the ranking
of terms for query expansion, i.e., emim, porter, r[OCRerr]hilo,
r[OCRerr]lohi.
* The same initial query, which was generated automat-
ically, was used for all searches.
* The relevance judgements made in the initially re-
trieved set of the official Ad hoc run were extracted
and then simulated in the additional runs.
* Query expansion terms were ranked using the algo-
rithm that was designated by each run. The 10 top
ranked terms from the pool were added to the query.
3.6 Problems & Limitations
Lack of equipment has been a major problem in our par-
ticipation. In order to participate in TREC, SUN Mi-
crosystems provided an equipment grant (SUN Sparc-2)
in March, however no disk was initially available, but a 1-
Gigabyte disk was acquired in June. Consequently, only
the Ad hoc runs were included in the official results.