SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Proximity-Correlation for Document Ranking: The PARA Group's TREC Experiment
chapter
M. Zimmerman
National Institute of Standards and Technology
Donna K. Harman
)O8:43'~
core docuinents delimited by [OCRerr]DOGNOs lines from ThEC data
or fourth 10 questions on TREC list
920527-29. 0601, 0602, 0603
usage: swk -f TREcacore.Q.ll-40.awk
.`ypically will want to do aoeething like;
zcst ws1/l9[OCRerr]/= .1 I tr a-z A-Z ; awk -f TRECscore.O.ll-40.awk >Q.ll-40.ThEcacorea.out
(isaybe prefixed by nohup))
then typically will want to do something like;
sort -n +1 TREcscores.out I tail -1000 ;=0ll.beat
etc....
this proqras; reads
in format;
i<DOCNOsI
from stdin and outputs scores for each document to stdout
(scoreil [scorel) ... (acoreb)
GIN ml = 0;
m2 = 0;
ml = 0;
54 = 0;
mS = 0;
m6 = 0;
m7 = 0;
sO = 0;
m9 = 0;
mlO= 0;
siS = 0;
sic
sis
sib
sla
sIb
s3c
545
s4b
sOs
sOb
sic -
sOd
sOs
s6b
sOc
575
;;7b
5*5
sOb
sOc
595
s9b
slOs = 0;
slOb = 0;
docno = <nuil>[OCRerr];
sDOC;;Os/ ( prOntf )[OCRerr]%-i0s %5d %Sd %Sd %5d %Sd %Od %Od %Od %Od %5d\n[OCRerr],
docno, ml, ml, ml, mA, mO, m6, m7, ma, m9. slO);
docno = $2;
ml = 0;
m2 = 0;
ml = 0;
mA = 0;
= 0;
mO = 0;
[OCRerr]SOm;0
[OCRerr]
TRECscore.Q=31 AO.gawk
= 0;
= 0;
m9 = 0;
mio= 0;
ala = 0;
sib = 0;
sic = 0;
ala = 0;
sib = 0;
ala = 0;
sIb = 0;
sic = 0;
a4a = 0;
eAb = 0;
ass = 0;
s5b = 0;
s5c = 0;
sSd = 0;
a6a = 0;
s6b = 0;
s6c = 0;
s7a = 0;
s7b = 0;
565 = 0;
s9b = 0;
sBc = 0;
a9a = 0;
s9b = 0;
slOs = 0;
slOb = 0;
* topic 011 --- advantages of OS/2
/OS\/2/ I sla == 0;
/ADVANTAOISTRENGTH! { sib += 5;
/WINDO;;SIX.WINDOWS)DOS/ I sic += 5;
Si = sia - sib sic;
if (Si > ml) ml = SI;
sla = .9;
slb `= .9;
sic == .9;
* topic 012 --- outsourcing computer work
/CONTRACT. OUT I OUTSOORCZNG / sia += 5;
ICOMPUTER I DATA I NETWORKI sib += 5;
Si = s2* sib;
if )s2 5 ml) ml = s2;
sia == .9;
s2b == .9;
* topic 031 --- companies capable of producing document management systems
IDOCUMENTI I ala += 5;
IMANAGEMENT I PROCESSING I AUTOMATION (OCR) O[OCRerr]ICAL CMARACTER RECOGNI/ I sIb + = 5;
,COMPANYICORPICO\.IINC\.ILTD\.IINCORPORATED/ ) sIc += 5;
51 = ala = sIb sIc;
if (51 > ml) ml = 51;
ala == .9;
sIb == .9;