SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Automatic Retrieval With Locality Information Using SMART chapter C. Buckley G. Salton J. Allan National Institute of Standards and Technology Donna K. Harman If \ goo(I (1oc11nI('fll S (1 IC 10 [OCRerr])(` [OCRerr] [OCRerr] I'[OCRerr][OCRerr]I[OCRerr]t('('(1 oil [OCRerr] 200 (bc ii lI[OCRerr]Q iii S t Ii (` (l('('isi()I1 j)I'0CC'(I U V(' i1l[OCRerr]0I[OCRerr]C(l l)('for[OCRerr] (`(`Icil tcrii[OCRerr] A of tllQ [OCRerr]l [OCRerr] Ii [OCRerr]s l)QCIl (I[OCRerr]l(' is l (`SI i Il[OCRerr] ( `,(I)2UU) [OCRerr] < 5(I)x) (1UQI'V. [OCRerr]liis in[OCRerr]A\'QS thQ [OCRerr]S5lj[OCRerr]1)ti()I[OCRerr] thit tll(' \\Qi([OCRerr]1Il Of [OCRerr]i li-ill iii (1 docum('iiI \viII I)(' 1(55 t1i[OCRerr] thc [OCRerr]v[OCRerr]ight of that tcii[OCRerr]l Ifl tuc' (j[OCRerr]IQry. 1i.[OCRerr]s (i.l n[OCRerr]()St (`[OCRerr]1\v[OCRerr][OCRerr]vS tUii(' [OCRerr]villi I? I ( \Vei[OCRerr]1ll S ( (`xc('I)I fol' vc1y sliori docu[OCRerr]c'iits ). [OCRerr]`i iiic' I[OCRerr]iii[OCRerr]c' 1[OCRerr]('t [OCRerr]`it 200 i 71 1 I I 2([OCRerr]2[OCRerr] 1575 2;[OCRerr];[OCRerr]0/:300[OCRerr]) 5 i[OCRerr]I(i 2911 i()':[OCRerr]0 2:391/3097 10 2911 I(i59 2129/;3i:[OCRerr]2 15 97 17[OCRerr] 29(S3 j(j91 2 Il()/;117:[OCRerr] 25 i0()' 21S 3017 1721 2)0)/:V210 50 123 255 3031 17r)1 2[OCRerr] I 3/:322S 75 291 ;30(i'9 1 7() ) 2)1 /32(i'5 100 167 317 :30[OCRerr]2 17[OCRerr]2 2.[OCRerr][OCRerr].[OCRerr]/;[OCRerr]279 150 221 159 3()[OCRerr]5 177(S 259;[OCRerr]/:[OCRerr]2[OCRerr]2 200(F[OCRerr]ill) 371 721 :3111 I[OCRerr]13 26 I4/:[OCRerr]:3 13 X decreases, retricv('4 effectivcness `111(1 C[OCRerr]P V Ii II[OCRerr]( (l('('rQ('[OCRerr]:(' j)I'QI ty SlIi()OtIi1[OCRerr] [OCRerr]viIIi ret ricv[OCRerr] effectiveness re[OCRerr]('1I1li Ill' rcaSoflaI)I(' for (juite [OCRerr]`i. 1011k' iii il('. L[OCRerr]x'ictl'v \V Ii cli 1)Oi ii t iSSUit('ll)1(' for (`Yfly 1)[OCRerr]'tic11lar a,1)14ica.tion is detei'miiied by the rel('i.liV(' j)1.iOi'jt i[OCRerr]'s of efficicucy [OCRerr]`i.li(I effectivenesS. Phrase runs rihe basic adjacency 1)kra.se a})pro('[OCRerr]'h iIS('(1 1)\' S\1 \ I[OCRerr] i', (l( S('I'iI)('(l iii I li(' ()ffj('i('l.i I'll Ii 1)()I'tiofl of the paper. \Ve've lOOkC(1 al other methods of [OCRerr] HciIik[OCRerr] 1)111 (i[OCRerr]('S: 1)111. 0111' ot1i[OCRerr]'i. i Ill pletudil (`YtIo[OCRerr]s [OCRerr]VCl*e too slo[OCRerr]v to be of use [OCRerr]vitli 1'I[OCRerr] IC'.. :`vdj[OCRerr]cency I)Iil'('iSeS lia\ C l lie [OCRerr].(l v[OCRerr]'l.iii (`ige of 1)('iiig f[OCRerr]'[OCRerr]Si . siiiij4c' [OCRerr].fl(I producing reason('i.1[OCRerr]e resli ks. [OCRerr]`1iev Ii [OCRerr]ve the (liSa(1v('i nt('ik'e lii [OCRerr] 5011 IC soi'i of Ii Itering 01)era.tion ha's to be performed to cOiliC np [OCRerr]vitli a good [OCRerr])l1ra.se li[OCRerr]i I here (`[OCRerr]I'C j[OCRerr]ist too ilially 1)('[OCRerr]irs of terms to index all of them. For these runs, \vC n[OCRerr]ed the criteria. I.ha.t the I)lira.se 11(1(1 to occur more than 25 times ii[OCRerr] Dl, the learning docninent set. Vve had hoped that phrases [OCRerr]vo[OCRerr]d lid1) 51i1)5t('Yliti[OCRerr]ly in [OCRerr]`i1 1"JC' since (`iS the collection gro\vs, the need to be more 5pecific in the qnerv grows and 14ira.ses [OCRerr]ould l)e (`1 good w("y of increasing precision. `1i\Te got improve[OCRerr]ent, but it re[OCRerr]a.ined in the range of 5-S[OCRerr],", ai)onI. \"`h('[OCRerr]. it is on the very small conventional test co1iection[OCRerr] of ilic l)("'l Peihal)" other phrase a.pj)roaches can (10 l)etter. Phrases are indexed with 1? I( [OCRerr][OCRerr]ci[OCRerr]iI l)nt the cosine iiormaiiza.tion of I lie entil'C vector is done over the length of the single te[OCRerr] snl),'ector onl\ I lii', means that the single teI'[OCRerr]s end up [OCRerr]vith exactly the same weight as t hev would ii I he etitii'C (`011e( I ion was iii (1 exed with only' single terms. Thns, phrases only' increase similal It\ 1 his "( cius to l)e (1iiitC impoi'tant foi' sOme collections, alt hongh not crucial for our 1)h I a[OCRerr]e ",e1e( 11011 OIl I l[OCRerr] 1 ( 69