|
SCIENTIFIC REPORT NO. ISR-11
Information Storage and Retrieval
Table of Contents
|
TABLE OF CONTENTS
|
|
Page
|
| SUMMARY
|
xiii
|
SECTION I
|
SALTON, G.: "The SMART System -- Retrieval Results and Future Plans"
|
|
1. Introduction |
I-1 |
|
2. Experimental Results |
I-3 |
|
3. Discussion and Future Plans |
I-5 |
SECTION II
|
LESK, M. E.: "Operating Instructions for the SMART Text Processing and Document Retrieval System"
|
|
1. Introduction |
II-1 |
|
|
1.1. Processing Summary |
II-2 |
|
|
1.2. Operating Process |
II-4 |
|
2. Basic Operating Procedures |
II-4 |
|
|
2.1. Run Outline |
II-4 |
|
|
2.2. Tape Setup |
II-6 |
|
|
2.3. Input Deck Setup |
II-7 |
|
3. Specification for the SMART Retrieval System |
II-7 |
|
|
3.1. Specifications Affecting Lookup |
II-9 |
|
|
3.2. Specifications Affecting Phrase Searching |
II-10 |
|
|
3.3. Vector Expansions by Means of Concept-Concept Correlation |
II-12 |
|
|
3.4. Vector Expansion by Means of Concept Hierarchies |
II-15 |
|
|
3.5. Vector Formation |
II-16 |
|
|
3.6. Request-Document Correlation |
II-17 |
|
|
3.7. Document-Document Expansion |
II-19 |
|
|
3.8. Other Specifications |
II-19 |
|
4. Data Input |
II-20 |
|
|
4.1. Natural Language Documents |
II-20 |
|
|
4.2. Binary Documents |
II-28 |
|
|
4.3. DOCTAPS |
II-29 |
|
|
4.4. Relevance Judgment Data |
II-29 |
|
|
4.5. Other Instruction Cards |
II-30 |
|
5. Tape Preparation Programs
|
II-31 |
|
|
5.1. Writing A New Library Tape |
II-31 |
|
|
5.1.1. Thesaurus and Suffix List Formation |
II-33 |
|
|
5.1.2. Statistical Phrase Dictionary |
II-35 |
|
|
5.1.3. Syntactic Suffix List |
II-37 |
|
|
5.1.4. The Condensed Grammar File |
II-38 |
|
|
5.1.5. Criterion Tree File |
II-40 |
|
|
5.1.5.1. Criterion Tree Input Format |
II-41 |
|
|
5.1.6. Hierarchy |
II-50 |
|
|
5.2. The Document Tape |
II-51 |
|
|
5.3. The Program Tape |
II-52 |
|
6. Auxiliary Programs for Use with the SMART System
|
II-52 |
|
|
6.1. THES |
II-54 |
|
|
6.2. MORVAL |
II-54 |
|
|
6.3. SØCCER |
II-55 |
|
7. A Sample Input Deck
|
II-56 |
|
8. Miscellaneous
|
II-59 |
|
|
8.1. Size Limits |
II-59 |
|
|
8.2. Timing |
II-59 |
|
9. Acknowledgments
|
II-60 |
SECTION III
|
HOCHGESANG, G. T.: "SØCCER - A Concordance Program"
|
|
1. Introduction |
III-1 |
|
2. The Concordance |
III-2 |
|
|
A) Definitions |
III-2 |
|
|
B) The Input Text |
III-2 |
|
|
C) Processing the Text
|
III-2 |
|
|
D) The Output Format
|
III-6 |
|
3. Tape Usage |
III-6 |
|
|
A) Control Cards |
III-6 |
|
|
B) The INPUT, 0UTPUT, and SMRTAP Tapes |
III-6 |
|
|
C) Scratch Tapes |
III-7 |
|
4. Control Cards |
III-8 |
|
5. Examples of SØCCER Usage |
III-11 |
|
6. Subroutines used by SØCCER |
III-13 |
|
|
A) IN0T |
III-13 |
|
|
B) SPECTR |
III-15 |
|
|
C) CL0CK
|
III-17 |
|
7. Some Details about the SØCCER Program |
III-18 |
|
|
A) Source Deck Changes |
III-17 |
|
|
B) Timing |
III-18 |
|
Appendix |
III-19 |
SECTION IV
|
SALTON, G. and LESK, M. E.: "Information Analysis and Dicionary Construction"
|
|
1. Introduction |
IV-1 |
|
2. Language Analysis |
IV-2 |
|
3. Dictionary Construction |
IV-7 |
|
|
A) The Synonym Dictionary (Thesaurus) |
IV-7 |
|
|
B) The Null Thesaurus and Suffix List |
IV-15 |
|
|
C) The Phrase Dictionaries |
IV-21 |
|
|
D) The Concept Hierarchy |
IV-27 |
|
4. Dictionary Performance |
IV-27 |
|
|
A) The Null Thesaurus |
IV-33 |
|
|
B) The Regular Thesaurus |
IV-38 |
|
|
C) The Phrase Dictionary |
IV-42 |
|
5. Automatic Thesaurus Construction |
IV-44 |
|
|
A) Fully-Automatic Methods |
IV-48 |
|
|
B) Semi-Automatic Methods |
IV-50 |
|
|
C) Sample Thesaurus Generation |
IV-56 |
|
6. Semi-Automatice Hierarchy Formation
|
IV-59 |
SECTION V
|
LESK, M. E. and SALTON, G.: "Design Criteria for Automatic Information Systems"
|
|
1. Introduction |
V-1 |
|
2. The SMART Experiments |
V-3 |
|
3. Evaluation Results and Design Criteria |
V-11 |
|
|
A) Indexing Depth and Document Length |
V-11 |
|
|
B) Synonym Recognition |
V-16 |
|
|
C) Phrase Processing |
V-19 |
|
|
D) Statistical Association Methods |
V-22 |
|
|
E) Hierarchical Subject Expansion |
V-28 |
|
|
F) Manual Indexing |
V-30 |
|
|
G) Iterative Searching |
V-32 |
|
|
H) Summary |
V-33 |
SECTION VI
|
RIDDLE, W., HORWITZ, T., and DIETZ, R.: "Relevance Feedback in an Information Retrieval System"
|
|
1. Introduction |
VI-1 |
|
2. Principal Methods |
VI-3 |
|
|
A) Determination of the Number of Documents Retrieved |
VI-5 |
|
|
B) The Effect of the Correlation Function |
VI-5 |
|
|
C) Determination of the Relevance Weighting Factors |
VI-6 |
|
|
D) Determination of the Value of [alpha] |
VI-8 |
|
|
E) Termination of the Modification Process |
VI-10 |
|
3. Experimental Results |
VI-10 |
|
4. Conclusions |
VI-12 |
|
Appendix A |
VI-16 |
|
Appendix B (by E. M. Keen) |
VI-19 |
SECTION VII
|
LESSER, V. R.: "A Modified Two-Level Search Algorithm Using Request Clustering"
|
|
1. Introduction |
VII-1 |
|
2. A Modified Cllustering Algorithm and a Corresponding Two-Level Search Strategy |
VII-3 |
|
3. Advantages of the Query Clustering System |
VII-5 |
|
4. Design of an Experiment to Compare the Modified with the Normal Two-Level Search Scheme |
VII-7 |
|
|
A) Problem Areas |
VII-7 |
|
|
B) Tests to Compare the Effectiveness of Each Search Procedure |
VII-7 |
|
|
C) Implementation of the Normal and Modified Two-Level Search Schemes |
VII-9 |
|
|
D) Test Data Base |
VII-12 |
|
5. Design of an Experiment to Compare the Modified with the Normal Two-Level Search Scheme |
VII-13 |
|
|
A) Data Generated for Two-Level Search Algorithm |
VII-13 |
|
|
B) Data Generation for Modified Two-Level Search Algorithm |
VII-14 |
|
|
C) Experimental Evaluation |
VII-16 |
|
|
D) Evaluation Results |
VII-25 |
|
6. A New Criterion for Search Effectiveness |
VII-27 |
|
7. Conclusions |
VII-28 |
|
Appendix A |
VII-30 |
SECTION VIII
|
BLOMGREN, G., GOODMAN, A., and KELLY, L.: "An Experimental Investigation of Automatic Hierarchy Generation"
|
|
1. Introduction |
VIII-1 |
|
2. Automatic Construction of Hierarchies |
VIII-2 |
|
3. Outline of the Investigation |
VIII-12 |
|
Appendix A |
VIII-15 |
SECTION IX
|
BROFFIT, J. D., MORGAN, H. L., and SODEN, J. V.: "On Some Clustering TEchniques for Information Retrieval"
|
|
1. Introduction |
IX-1 |
|
2. Similarity Measures |
IX-4 |
|
3. Rocchio's Procedure |
IX-5 |
|
4. Bonner's Procedure |
IX-7 |
|
5. The Experiment |
IX-10 |
|
6. Evaluation |
IX-11 |
|
7. Results and Conclusions |
IX-12 |
SECTION X
|
LESK, M.E.: "Design Considerations for Time Shared Automatic Documentation Centers"
|
|
1. Introduction |
X-1 |
|
2. Principles |
X-2 |
|
3. Methods |
X-5 |
|
4. Practicalities |
X-9 |
|
5. Conclusions |
X-17 |