SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Retrieval Experiments with a Large Collection using PIRCS
chapter
K. Kwok
L. Papadopoulos
K. Kwan
National Institute of Standards and Technology
Donna K. Harman
if not, approximately how many hours of manual labor?
0.5
d. are term positions within documents stored?
NO, BUT SENTENCE YES.
YES, EXCEPT FOR I.A.14.
NO
f. single terms only?
2. clusters
a. total amount of storage (megabytes)
b. total computer time to build (approximate number of hours)
C. brief description of clustering method
d. is the process completely automatic?
if not, approximately how many hours of manual labor?
3. ngrrrns, suffix arrays, signature files NO
a. total amount of storage (megabytes)
b. total computer time to build (approximate
c. brief description of methods used
d. is the process completely automatic?
if not, approximately how many hours of
4. knowledge bases
a. total amount of storage (megabytes)
b. total number of concepts represented
c. type of representation (frames, semantic nets, rules, etc.)
d. total computer time to build (approximate number of hours)
e. total manual time to build (approximate number of hours)
f. use of manual labor
(1) mostiy manually built using special interface
(2) mostly machine built with manual correction
(3) initial core manually built to "bootstrap" for
completely machine-built completion
(4) other (describe)
g. auxiliary files needed for machine use
(1) machine-readable dictionary (which one?)
(2) other (identify)
5. special routing structures (what?) SEE I.B.6
NETWORK NODE, EDGE FILES.
ROUTING USING NETWORK NODE AND EDGE FILES IS SThAIGIflFORWARD.
number of hours)
manual labor?
NO
a. total amount of storage (megabytes)
NODE FILE: 4x7.5 EDGE FILE: 4x4
NETWORK SEGMENTED INTO 4, BECAUSE OF INSUFFICIENT RAM.
b. total computer time to build (approximate number of hours)
4O+5+l+4xO.2=46.8, STARTING FROM TEXT FILE.
c. is the process completely automatic? YES, IF SUFFICIENT
RAM AND DISK SPACE.
d. brief description of methods used
1. PROCESS (OLD) COLLE[OCRerr]ON A.
2. PROCESS QUERIES AGAINST COLLE[OCRerr]ON A.
3. PROCESS NEW COLLECTION B AS IF THEY WERE QUERIES TO MAKE USE OF COLLE[OCRerr]ON
A ST[OCRerr][OCRerr]CS.
4. COMBINE QUERIES, (OLD) DI[OCRerr]ONARY AND COLLECTION B INTO NETWORK FOR
RETRIEVAL.
6. other data structures built from TREC text (what?)
166