SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
System Summary and Timing
Queens College, CUNY
General Conunents
The fimings should be the tilne to replic[OCRerr]ite ruiis from scratch, not including trial runs, etc. The tilnes should also
he reasonably accur[OCRerr]'[OCRerr]te. This sometilnes will be difficult, such as getting total time for document indexing of huge
text sections, or `n[OCRerr][OCRerr]ually building 1 kilowledge base. Please do your best.
I. Constructioll of indices, knowledge b[OCRerr]ises, £Lnd other dat[OCRerr]'[OCRerr] structures (please describe
your system needs for se[OCRerr]'ucliin[OCRerr])
all da[OCRerr] structures that
A. Which of the following were used to build your data structures?
1. st()pw()rd list yes
a. how many words in list'? 595
2. is a c()ntR)lled v([OCRerr]abul[OCRerr]lry used'! no
3. stelninilltT
a. st£uid[OCRerr]ird steimnilig algon thins yes
which ones'? I[OCRerr]()rter's Algorithm
b. In()1[OCRerr]h()l()gical (`u1'Llysis 11([OCRerr]
4. telin weighting yes
5. phr[OCRerr]'ise discoveiy n[OCRerr]j
6. syntactic p[OCRerr]u;sinL' no
7. word sense dis'unbitjuati()n 11([OCRerr]
8. heuristic associations n(j
9. spelling checki'i[OCRerr] (with manual collection) no
10. spellitiLl colTection Ilo
11. proper noun idCntifiC[OCRerr]tti()Il L'il[OCRerr]()ri thin no
12. tokellizer (reco[OCRerr]flizes d[OCRerr]'LtCs, phone nujnbers, COiThflOfl patterus) no
13. £`u'e the in[OCRerr]tiiually-ii'dexed tenns used'! 110
14. other techniques used 10 build d[OCRerr]it[OCRerr]i structures (brief descuption)
A tal)le of 396 manually created 2-word phrases. When these are identifled in
adjacent positions in documents or ([OCRerr]ueries, they are used as additional index terms.
B. St£[OCRerr]tistics on d:ita structures built fiom TREC text (please fill out each applicable section)
1. uiverted index
a. total [OCRerr][OCRerr]n()unt of storage (megabytes) 378
b. total computer tune to build (approxilnate number of hours)
95+11+2=11)8 fi[OCRerr]r 5(1(1MB. clock tilne
c. is the process completely automatic? Yes, if sutlicient disk. Not in this experiment.
if not, [OCRerr]tppr()xiinL'Itely how many hours of manual labor? (1.5
d. [OCRerr]`ue term positions within d('cuments stored'?
No, Ilut sentence yes. Call modify to capture word positions.
C. single terms only'? Yes, except t[OCRerr])r I.A.14.
4. special routing structures (wh[OCRerr]t?) See I.B.5
Network node, edge tiles. Routing using network node and edge files is
straightforward.
£1. total £un()unt of st()r:lge (ine[OCRerr]abytes)
Node tile: 4x7.5 Edge tile: 4x4
Netw([OCRerr]rk segmented int([OCRerr] 4, Ilecause ([OCRerr]f insufficient ram.
b. t()L[OCRerr]I computer tune to build (appr()xilnL[OCRerr]te number of hours)
472