ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IESI NUN CF SCCCFN cq,ci[OCRerr]ei PAGE ,e SPEC IF lEO *[OCRerr] AN[OCRerr] SUPPlE ENYENSICN CF [OCRerr]D'I PE[OCRerr]ICCS USIC FCN CCNPLEVELV SPECIFIIO FLIIC1IC[OCRerr]S AN ANAlYSIS CF YNE PACILEP is PPESUNYE NI:PIEN CF CCCUNNENCES a DOCUMENT "U..'. qe [OCRerr]N PENGINC YECMNIGUE SIVENAL IIIPPLUS ANE GIVEN ON THE UPS CF SEVERAl ITEPAIIVE PNCCISSES . THE TIVE PETHCC CF PATNII IRVERSICN IS CEPIVEC IN TEUPS OF THE UPS OF THE SPECTRAL [OCRerr]CRP . VANICUS TPECNEPS CONCERNING THE NIPIEN CF CCCUNNENCES 4 SPECTRAL SPECINAL NIORPI CF SEVERAL ITERATIVE PUCCESSES SPECTRAL NO[OCRerr]P OF A SC[OCRerr]ANE SYPPETUIC PCSITIVE CEPIUITE PATRIN SPECTRAL NOUP . VARICUS THECAEPS CCRCERRIN[OCRerr] TPE SPECTRAL NOR SPECTRAL NOUP ANE PNCVEU THE RESULTS CITAINEC ARE APPLIEC TO SPIITTING TONED INITIALLY CN TPE P.1ST TAPE . AN ITERATIVE SCHEPE OF SPLIITING RIcCUS CF DATA I[OCRerr]OSE CESIGNATION PAS THE HIGHEST N NO HUllING IN TNE REVERSE CIRECTICN ANY CUCUP NEQUIREC FOR SPLITTING IS ALPA[OCRerr]S UNDER THE HEAD CF THE APPUCPRIATE TAPE U NUPOER CF CCCURRENCES a lie Ii, lea I., `N AVIW PUCCESSES THE SPECTRAL NONP OF A PETRIC PCSITIVE DEFINITE PATUIN IS CEFINEC AS THE POSITIVE ONGANITE SUCH A CAICULATICR IS FLCWCHARTE( . (N IAUING THE IC CANCELIATICN NESUITING FACP a suuwNaci:c[OCRerr] IN TARING THE LEN NIPUIR IS NCTEO . CNE SCIUVION IS IC TAPE INTERNECIATE NIPIER CF CCCUNRERCES [OCRerr] SQIANE SOLARE S[OCRerr][OCRerr]ARE S[OCRerr]LARE SOLANE [OCRerr]OLANE ST SYNNEVUIC PCSITIVE DEFINITE PATRIP IS DEFINEC AS THE ROOT OF THE PANIPLN PAGRITUCE CF ITS CHARACTERISTIC N NOOT OF A CCPPLEI RUPOER e A ACOT CF A CCPPIEN NUPOER IS NCTE( CNE SOLUTIDN IS V ROOTS 10 OCUOLE LENGTH ACCURACY ANCYPEN IS VC PINC VP ESCRINED THE UNS(NTEO (AlA IS ST(NE( INITIALLY ON THE P.1ST IAPE . AN ITERATIVE SCHEPE CF SPLITTING EICCPS OF CAVA WH NLPUER CF CCCURRENCES I Ii, aec "4 ase [OCRerr]li `4 STAGE OYNAPIC PRCGRAPPING S[OCRerr]CW [OCRerr]CM T( FPCCII( (RuRALLY FROP ONE STAGE 10 THE NEIl [OCRerr]ITH THE NIPUER CF CCPPUTATICNS INCREASING NIPOER CF CCCURNENCES I STAGES F CCPPLTATICNS INCREASINC CNL[OCRerr] LINEARLY [OCRerr]IT[OCRerr] T[OCRerr]E NLPRER OF STAGES CONSIOEREO . THE GENENAL FRINCIPIES ANE ILLUSTRAVEC U NLPUER CF CCCUNRENCES I ai3 `CS GLENTIAL PACHINES HIPIER CF CCCURRENCES I PIERS ANE SUGGESTEC FCN FUTUPE CATA PUCCESSING COPPUTENS SSEO THE RELATICNSHIP EETIIEEN V[OCRerr]E INTERNAL LCCIC ANC THE CHOITIONS FCN THE NEl[OCRerr]CPP IC El NCNSIN(ULAR I.E. TO [OCRerr]A[OCRerr]E A SIGLIANITY CCNDITICN ARE CIPCNSTNITEC . THE EFFECTS CN THE NIPUER CF CCCURNERCES 4 FLNCTICNS IS TANEN INTO ACCCUNT . PINIPIZIN( UHE NUPEEN OF LV OESCNINES THE PACHINE IS CEVELCPEC . THE ECUIVALENCE OF NIPOEN CF CCCURRENCES a ATICH CF AN ADONESS FUNCTIC[OCRerr] IS CISCNIEEC . USUALLY ONLY A NIPUER CF CCCUNRENCES I STARTING STARTING FROP PEALV S PCOEL CF A SECUENTIAL PACHINE A CONNEC STATE STAVE LOGIC RELATICNS IN AUTCNOPCUS SECUENTIAL NEV[OCRerr]CRKS STATE SECLENTIAL *EHAVICN OF SUCH NETliORUS IS ENAPINEC THROU STATE OIAGRAP HHICH IS DETERPINISTIC EVEN IN REVERSEC TIRE A STATE OIAGRAP OF SEVERAL RINDS CF CCRSTNAINVS IPPCSEC CN THE 144 el S) Se Si STATES STATES IN INCOPPIETEIY SPECIFIED SECUENTIAL SNITCHING FUNCTI 14 STATES OF A SECLENTIAL PACHINE IS ANALYZEC SYSTEPATICALLY IV 14C SIATISTICAL STATISTICAL APPRCIIPATICH VC THE ACCUESS FUNCTION IS RNOMN N CONCORDANCE EXCERPT , 1