SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Automatic Retrieval With Locality Information Using SMART
chapter
C. Buckley
G. Salton
J. Allan
National Institute of Standards and Technology
Donna K. Harman
If \ goo(I (1oc11nI('fll S (1 IC 10 [OCRerr])(` [OCRerr] [OCRerr] I'[OCRerr][OCRerr]I[OCRerr]t('('(1 oil [OCRerr] 200 (bc ii lI[OCRerr]Q iii S t Ii (` (l('('isi()I1 j)I'0CC'(I U V('
i1l[OCRerr]0I[OCRerr]C(l l)('for[OCRerr] (`(`Icil tcrii[OCRerr] A of tllQ [OCRerr]l [OCRerr] Ii [OCRerr]s l)QCIl (I[OCRerr]l(' is l (`SI i Il[OCRerr]
( `,(I)2UU) [OCRerr] < 5(I)x)
(1UQI'V. [OCRerr]liis in[OCRerr]A\'QS thQ [OCRerr]S5lj[OCRerr]1)ti()I[OCRerr] thit tll(' \\Qi([OCRerr]1Il Of [OCRerr]i li-ill iii (1 docum('iiI \viII I)(' 1(55 t1i[OCRerr] thc
[OCRerr]v[OCRerr]ight of that tcii[OCRerr]l Ifl tuc' (j[OCRerr]IQry. 1i.[OCRerr]s (i.l n[OCRerr]()St (`[OCRerr]1\v[OCRerr][OCRerr]vS tUii(' [OCRerr]villi I? I ( \Vei[OCRerr]1ll S ( (`xc('I)I fol' vc1y sliori
docu[OCRerr]c'iits ).
[OCRerr]`i iiic' I[OCRerr]iii[OCRerr]c' 1[OCRerr]('t [OCRerr]`it 200
i 71 1 I I 2([OCRerr]2[OCRerr] 1575 2;[OCRerr];[OCRerr]0/:300[OCRerr])
5 i[OCRerr]I(i 2911 i()':[OCRerr]0 2:391/3097
10 2911 I(i59 2129/;3i:[OCRerr]2
15 97 17[OCRerr] 29(S3 j(j91 2 Il()/;117:[OCRerr]
25 i0()' 21S 3017 1721 2)0)/:V210
50 123 255 3031 17r)1 2[OCRerr] I 3/:322S
75 291 ;30(i'9 1 7() ) 2)1 /32(i'5
100 167 317 :30[OCRerr]2 17[OCRerr]2 2.[OCRerr][OCRerr].[OCRerr]/;[OCRerr]279
150 221 159 3()[OCRerr]5 177(S 259;[OCRerr]/:[OCRerr]2[OCRerr]2
200(F[OCRerr]ill) 371 721 :3111 I[OCRerr]13 26 I4/:[OCRerr]:3 13
X decreases, retricv('4 effectivcness `111(1 C[OCRerr]P V Ii II[OCRerr]( (l('('rQ('[OCRerr]:(' j)I'QI ty SlIi()OtIi1[OCRerr] [OCRerr]viIIi ret ricv[OCRerr]
effectiveness re[OCRerr]('1I1li Ill' rcaSoflaI)I(' for (juite [OCRerr]`i. 1011k' iii il('. L[OCRerr]x'ictl'v \V Ii cli 1)Oi ii t iSSUit('ll)1(' for (`Yfly
1)[OCRerr]'tic11lar a,1)14ica.tion is detei'miiied by the rel('i.liV(' j)1.iOi'jt i[OCRerr]'s of efficicucy [OCRerr]`i.li(I effectivenesS.
Phrase runs
rihe basic adjacency 1)kra.se a})pro('[OCRerr]'h iIS('(1 1)\' S\1 \ I[OCRerr] i', (l( S('I'iI)('(l iii I li(' ()ffj('i('l.i I'll Ii 1)()I'tiofl of the
paper. \Ve've lOOkC(1 al other methods of [OCRerr] HciIik[OCRerr] 1)111 (i[OCRerr]('S: 1)111. 0111' ot1i[OCRerr]'i. i Ill pletudil (`YtIo[OCRerr]s [OCRerr]VCl*e
too slo[OCRerr]v to be of use [OCRerr]vitli 1'I[OCRerr] IC'.. :`vdj[OCRerr]cency I)Iil'('iSeS lia\ C l lie [OCRerr].(l v[OCRerr]'l.iii (`ige of 1)('iiig f[OCRerr]'[OCRerr]Si . siiiij4c' [OCRerr].fl(I
producing reason('i.1[OCRerr]e resli ks. [OCRerr]`1iev Ii [OCRerr]ve the (liSa(1v('i nt('ik'e lii [OCRerr] 5011 IC soi'i of Ii Itering 01)era.tion ha's
to be performed to cOiliC np [OCRerr]vitli a good [OCRerr])l1ra.se li[OCRerr]i I here (`[OCRerr]I'C j[OCRerr]ist too ilially 1)('[OCRerr]irs of terms to
index all of them. For these runs, \vC n[OCRerr]ed the criteria. I.ha.t the I)lira.se 11(1(1 to occur more than 25
times ii[OCRerr] Dl, the learning docninent set.
Vve had hoped that phrases [OCRerr]vo[OCRerr]d lid1) 51i1)5t('Yliti[OCRerr]ly in [OCRerr]`i1 1"JC' since (`iS the collection gro\vs,
the need to be more 5pecific in the qnerv grows and 14ira.ses [OCRerr]ould l)e (`1 good w("y of increasing
precision. `1i\Te got improve[OCRerr]ent, but it re[OCRerr]a.ined in the range of 5-S[OCRerr],", ai)onI. \"`h('[OCRerr]. it is on the very
small conventional test co1iection[OCRerr] of ilic l)("'l Peihal)" other phrase a.pj)roaches can (10 l)etter.
Phrases are indexed with 1? I( [OCRerr][OCRerr]ci[OCRerr]iI l)nt the cosine iiormaiiza.tion of I lie entil'C vector is done
over the length of the single te[OCRerr] snl),'ector onl\ I lii', means that the single teI'[OCRerr]s end up [OCRerr]vith
exactly the same weight as t hev would ii I he etitii'C (`011e( I ion was iii (1 exed with only' single terms.
Thns, phrases only' increase similal It\ 1 his "( cius to l)e (1iiitC impoi'tant foi' sOme collections,
alt hongh not crucial for our 1)h I a[OCRerr]e ",e1e( 11011 OIl I l[OCRerr] 1 (
69