Contents Abstract Introduction 1 1.1 Definitions and background 2 1.2 Scope of this study 10 1.3 Derivative vs. assignment indexing 13 2. Indexes compiled by machine 14 2. 1 Concordances and complete text processing 15 2.2 Card catalogs, book catalogs, bibliographies and subject index listings prepared by machine 19 2.3 Tabledex and other special purpose indexes 25 2.4 Citation indexes 27 2.5 Machine conversion from one index set to another 38 3. Indexes generated by machine - automatic derivative indexing 40 3.1 KWIC indexes 40 3.1.1 Applications of KWIC indexing techniques 41 3.1.2 Advantages, disadvantages and operational problems of KWIC indexing 55 3.2 Modified derivative indexing 68 3.2.1 Title augmentation 68 3.2.2 Book indexing by computer 71 3.2.3 Modified derivative indexing - Baxendale's experiments 73 3.3 Derivative indexing from automatic abstracting techniques 75 3.3.1 Auto-condensation and auto-encoding techniques of H. P. Luhn 75 3.3.2 Frequencies of word n-tuples - Oswald and others 79 3.3.3 Relative frequency techniques - Edmundson and Wyllys3 and others 81 3.3.4 Significant word distances 83 3.3.5 Uses of special clues for selection 84 3.3.6 Recent examples of mixed systems experimentation 86 3.4 Quality of modified derivative indexing by machine 89 4. Automatic assignment indexing techniques 91 4.1 Swanson and later work at Thompson Ramo Wooldridge 91 4.2 Maron's automatic indexing experiments 93 4.3 Automatic indexing investigations of Borko and Bernick 94 4.4 Williams' discriminant analysis method 97 4.5 SADSACT 98 4.6 Assignment indexing from citation data 99 4.7 Similarities and distinctions among assignment indexing experiments 100 4.8 Other assignment indexing proposals 105 iv Page 5. Automatic classification and categorization 106 5.1 Factor analysis 108 5.2 The theory of clumps 110 5.3 Latent class analysis 113 5.4 Examples of other proposed classificatory techniques 113 6. Other potentially related research 114 6.1 Thesaurus construction, use and up-dating 114 6.2 Statistical association techniques 118 6.2.1 Devices to display associations: EDIAC 119 6.2.2 Statistical association factors - Stiles 119 6.2.3 The association map - Doyle and related work at SDC 122 6.2.4 Work of Giuliano and associates, the ACORN devices 124 6.2.5 Spiegel and others at Mitre Corporation 126 6.3 Clues to index-term selection from automatic syntactic analysis 6.4 Probabilistic indexing and natural language text searching 127 132 6.4.1 Probabilistic indexing - Maron, Kuhns and Ray 133 6.4.2 Natural language text searching - Swanson 134 6.4.3 Full text searching - legal literature 135 6.5 Other examples of related research in linguistic data processing 6.6 Machine assistance in translations of subject content indications to special search and retrieval language 6.7 Example of a proposed indexing-system utilizing related research techniques 136 140 142 7. Problems of evaluation 143 7.1 Core problems 145 7.2 Bases and criteria for evaluation of automatic indexing procedures 149 7.2.1 The Cranfield project 150 7.2.2 O'Connor investigations 151 7.2.3 Questions of comparative costs 153 7.2.4 Summary: potential advantages as bases for evaluation 156 7.3 Findings with respect to inter-indexer and intra-indexer consistency 7.4 Special factors and other suggested bases for evaluation 157 160 8. Operational considerations 164 8.1 Questions of input 164 8.2 Examples of processing considerations 168 8.3 Output considerations 171 9. Conclusion: Appraisal of the state of the art in automatic indexing 173 v Pa~e Acknowledgments 18Z Appendix A: List of references cited and selected bibliography 183 Appendix B: Progress and prospects in mechanized indexing Z23 Appendix C: Selective bibliography of additional references 237 vi