SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Document Retrieval Experiments using PIRCS
chapter
K. Kwok
L. Grunfeld
National Institute of Standards and Technology
D. K. Harman
the net are used to initialize the tree leaf nodes, from which
activation spreads to the query root node. Processing at the
nodes implements soft-Boolcan evaluation [SaFW83] in the
OTO
q
a
ffiv'a
0
Fig.1: 3.Layer PIR Network
3. System Design
ifi
D
Our previous software for fl[OCRerr]EC1 has extraneous
processing that produces several intermediate files for other
DTQ direction resulting in a third RSV: S1. The RsVs are
latter combined for ranking retrieval outputs.
Affi
d1 qj[OCRerr]
½
DTQ
Fig.2: Soft-Boolean Ouery
Network
w
D
purposes. These consume a lot of disk space for 1arg[OCRerr] scale
collections. We spent a substantial amount of time to
revamp our system, resulting in a more streaflilined flow-
chart as follows:
DTQ
Soft-
Boolean
d.
Pre?rocss --> Create --> Initiate --> Network --> Retrieve --> Evaluate
Documents Direct files I Network Learning and Rank
PreProcss " ---I
Queries
II\
Relevance File -->--------
add[OCRerr] we also took the ([OCRerr]-iIiity to [OCRerr] our
system reslilting in several inn[OCRerr][OCRerr]ative [OCRerr] described
in the following sub[OCRerr][OCRerr]tions:
3.1 Full Inverted File Eliminated
II is useful to view a textbase as a document by term
matriL If one stores the matrix rowwise. we call it a direct
file. in oo'trast to an inverted file which stoes the matrix
cahinuwise. The inveeted file is usal to support fast
retrieval while the direct file is uscful for feedback learning
and query eapansion when gi[OCRerr]en oeram documents boiling
relevant to certain queries. [OCRerr] addition, the raw textfile is
234
useful for display lrurpses after a retrieval ranj[OCRerr]ed list is
prodi[OCRerr]cCL If one assumes that each of these these flees are
[OCRerr]p-miteiy - in sire ofNbytes. we need ammtm[OCRerr]um
of 3N -, which is quite siib:ianti[OCRerr][OCRerr] Removing the raw
textmay not win user support during display unless the
dircct file encodes all sic[OCRerr][OCRerr]ords. pimctuatio[OCRerr] and
pamgra[OCRerr] [OCRerr]ctea of the original- Most Systems
the direct file and re-[OCRerr]uce a [OCRerr]eet of it fr[OCRerr] raw text
when needei This results in a re-went of [OCRerr] bytes.
We choose. however, to - the directflle and produce our
network: with respect to the queies dynamically as needed
without first producing an inveteed file first. Our network
actually contains both direct and `inverted' dat[OCRerr] It resides
in memory to support leering and retriev[OCRerr] By this
~WIk
T
ifi
T