SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Proximity-Correlation for Document Ranking: The PARA Group's TREC Experiment chapter M. Zimmerman National Institute of Standards and Technology Donna K. Harman )O8:43'~ core docuinents delimited by [OCRerr]DOGNOs lines from ThEC data or fourth 10 questions on TREC list 920527-29. 0601, 0602, 0603 usage: swk -f TREcacore.Q.ll-40.awk .`ypically will want to do aoeething like; zcst ws1/l9[OCRerr]/= .1 I tr a-z A-Z ; awk -f TRECscore.O.ll-40.awk >Q.ll-40.ThEcacorea.out (isaybe prefixed by nohup)) then typically will want to do something like; sort -n +1 TREcscores.out I tail -1000 ;=0ll.beat etc.... this proqras; reads in format; i<DOCNOsI from stdin and outputs scores for each document to stdout (scoreil [scorel) ... (acoreb) GIN ml = 0; m2 = 0; ml = 0; 54 = 0; mS = 0; m6 = 0; m7 = 0; sO = 0; m9 = 0; mlO= 0; siS = 0; sic sis sib sla sIb s3c 545 s4b sOs sOb sic - sOd sOs s6b sOc 575 ;;7b 5*5 sOb sOc 595 s9b slOs = 0; slOb = 0; docno = <nuil>[OCRerr]; sDOC;;Os/ ( prOntf )[OCRerr]%-i0s %5d %Sd %Sd %5d %Sd %Od %Od %Od %Od %5d\n[OCRerr], docno, ml, ml, ml, mA, mO, m6, m7, ma, m9. slO); docno = $2; ml = 0; m2 = 0; ml = 0; mA = 0; = 0; mO = 0; [OCRerr]SOm;0 [OCRerr] TRECscore.Q=31 AO.gawk = 0; = 0; m9 = 0; mio= 0; ala = 0; sib = 0; sic = 0; ala = 0; sib = 0; ala = 0; sIb = 0; sic = 0; a4a = 0; eAb = 0; ass = 0; s5b = 0; s5c = 0; sSd = 0; a6a = 0; s6b = 0; s6c = 0; s7a = 0; s7b = 0; 565 = 0; s9b = 0; sBc = 0; a9a = 0; s9b = 0; slOs = 0; slOb = 0; * topic 011 --- advantages of OS/2 /OS\/2/ I sla == 0; /ADVANTAOISTRENGTH! { sib += 5; /WINDO;;SIX.WINDOWS)DOS/ I sic += 5; Si = sia - sib sic; if (Si > ml) ml = SI; sla = .9; slb `= .9; sic == .9; * topic 012 --- outsourcing computer work /CONTRACT. OUT I OUTSOORCZNG / sia += 5; ICOMPUTER I DATA I NETWORKI sib += 5; Si = s2* sib; if )s2 5 ml) ml = s2; sia == .9; s2b == .9; * topic 031 --- companies capable of producing document management systems IDOCUMENTI I ala += 5; IMANAGEMENT I PROCESSING I AUTOMATION (OCR) O[OCRerr]ICAL CMARACTER RECOGNI/ I sIb + = 5; ,COMPANYICORPICO\.IINC\.ILTD\.IINCORPORATED/ ) sIc += 5; 51 = ala = sIb sIc; if (51 > ml) ml = 51; ala == .9; sIb == .9;