SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Text Retrieval with the TRW Fast Data Finder
chapter
M. Mettler
National Institute of Standards and Technology
Donna K. Harman
TABLE II - Subterms used for Topic 36
"amorphous";
"[ISO CCITT]";
"Kerr effect"
SCSI fl;
"bias";
"binary";
"capacity";
"chemical";
"states";
cycles
"density";
spatial
"High Sierra";
"dye[ f\-Jpolymer";
"Curie temperature";
"gadolinium";
"lanthanide";
"birefringence";
"emerging technolog";
erasable
"fatigue";
"field";
"[frequency[OCRerr]Mhz]";
inductance
"[jukebox autochanger] ";
"laser";
4.3 Failure Analysis - Topic 36
"crystalline";
"operation";
"phase(l j\-
"phenomenon"
"polarit";
"polarized";
"principle";
``reflect''
"[sector %rack[OCRerr]cylinder]";
``[silver gold]'';
"Qersted
"surface reflectance";
"phase[ J\-]change";
"thin film";
"terbium";
"magnetization";
"substrate";
"speed";
"transfer";
"transluscent";
"Winchester";
"[mega[ Ibyte [A[az]]MB[A[az]]]II;
"[giga[ j.]byte [A[az]]CB[A[az]]][OCRerr][OCRerr];
"magneto [II \- Joptical";
"media"; "magnet"
change";
Unfortunately, even our high recall queiy retrieved only 11 documents in the Volume II
Corpus of which 10 were judged relevant. (The 11th was discussing WORM technology
and only mentioned "rewritable optical drives" in passing.)
Upon examination of the NIST judgements, we made several observations about the
relevant documents. First, we missed the keywords "erasable" as a synonym for
rewritable" and "video" as a synonym for "optical". Second, the assessor accepted articles
about "optical recorders" and "optical image processing" systems. To pick up these
corrections we would change the optical[OCRerr]disk subquery to read as follows:
define optical disk [10 words ->
"[rewritf erasable]" and
"[videojoptical]" and
"[disk I drivel technolog recorder image processing]") end
We then threw out the length restriction and reran the query requiring differing numbers of
the technical terms to be present. The results from these runs are shown in Table III. This
table shows two things. First, for this topic, the number of technical terms is an excellent
"knob" to adjust the precision and recall. Second, the assessor was making a loose
interpretation of "comprehensive technical detail". If we'd completely ignored this part of
316