SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
4([OCRerr]+5+1+4x().2=46.8, starting from text tile.
C. is the pr[OCRerr]}cess colupletely £[OCRerr]utornatic? yes if sufticient rain and disk space.
(1. brief descriptioii of methods used
1. Process (old) collection A.
2. Process (lueries against collection A.
3. Process new collection B as if they were (lueries--to make use of collection
A statistics.
4. C()ml)ine (iuerles, (old) dictionary and collection B into network for
retrieval.
5. other data structures buill from TREC text (what?)
1. Suhd([OCRerr]cun1ent file
2. C([OCRerr]ed tile
3. D[OCRerr][OCRerr]id checking file
4. Termid checking tile
5. Docnum tile
6. Termnum (dictionary) file
7. Direct tile
8. Index to direct tile
9. N('(le tile
lo. Edge file
a. totil [OCRerr]un()uflt of st()r([OCRerr]ge (me[OCRerr]aby(es)
1.481 2.324
3.7 4.4
5.11 6.6
7.372 8.19
9. 4x14 lo. 4x9
System was developed for experimental research, with tlexil)ility to generate
other data. Some of the tiles are not necessary for retrieval.
b. total computer tillie to build (approxil nate number of hours)
1. 1.5
2,3,4,5,6. 95
7,8. 11
9,1(). 4x([OCRerr].25=1
C. is the pr('cess completely aut()m£[OCRerr]tic?
Yes Ir sutTicient RAM and disk space. For this experiment, no.
if not, [OCRerr]ipproxim[OCRerr]itely how many hours of m[OCRerr]uiual labor? 2
d brief description of methods used
ra[OCRerr]v text -.> sul)d()cunlent tile
sul)d()cuIllent --> c([OCRerr]ded tile, dodd file, termid tile, docnum
(dictionary) file.
Zipf-law prograni truncates dictionary via user assigned limits.
Coded, terninuni --> direct file with index
direct -> inverted file
direct, inverted --> node, edge tiles.
C. Data built from sources other th('w [OCRerr]e input text
1. inte[OCRerr][OCRerr]illy-built auxiliL[OCRerr]y files
a. domain independent ()[OCRerr] d()m£lin specific (if two
of questions for each file) phrase file
b. type of file (thesaurus, knowledge b[OCRerr]';e, lexicon,
C. total ainount of st()r£ige (meg[OCRerr]'iby(es) ([OCRerr].E)E)5
d. total number of concepts represented 396
f. tOtL[OCRerr] computer tilne to build (approxiluate number of hours)
([OCRerr] (this is a tile created via editor).
473
file, termnum
sep£Lrate files, please fill out one set
etc.) word pair