IRLIB logo

SCIENTIFIC REPORT NO. ISR-11

Information Storage and Retrieval

Table of Contents





TABLE OF CONTENTS


Page
SUMMARY xiii


SECTION I

SALTON, G.: "The SMART System -- Retrieval Results and Future Plans"

1. Introduction I-1
2. Experimental Results I-3
3. Discussion and Future Plans I-5


SECTION II

LESK, M. E.: "Operating Instructions for the SMART Text Processing and Document Retrieval System"

1. Introduction II-1
1.1. Processing Summary II-2
1.2. Operating Process II-4
2. Basic Operating Procedures II-4
2.1. Run Outline II-4
2.2. Tape Setup II-6
2.3. Input Deck Setup II-7
3. Specification for the SMART Retrieval System II-7
3.1. Specifications Affecting Lookup II-9
3.2. Specifications Affecting Phrase Searching II-10
3.3. Vector Expansions by Means of Concept-Concept Correlation II-12
3.4. Vector Expansion by Means of Concept Hierarchies II-15
3.5. Vector Formation II-16
3.6. Request-Document Correlation II-17
3.7. Document-Document Expansion II-19
3.8. Other Specifications II-19
4. Data Input II-20
4.1. Natural Language Documents II-20
4.2. Binary Documents II-28
4.3. DOCTAPS II-29
4.4. Relevance Judgment Data II-29
4.5. Other Instruction Cards II-30
5. Tape Preparation Programs

II-31
5.1. Writing A New Library Tape II-31
5.1.1. Thesaurus and Suffix List Formation II-33
5.1.2. Statistical Phrase Dictionary II-35
5.1.3. Syntactic Suffix List II-37
5.1.4. The Condensed Grammar File II-38
5.1.5. Criterion Tree File II-40
5.1.5.1. Criterion Tree Input Format II-41
5.1.6. Hierarchy II-50
5.2. The Document Tape II-51
5.3. The Program Tape II-52
6. Auxiliary Programs for Use with the SMART System

II-52
6.1. THES II-54
6.2. MORVAL II-54
6.3. SØCCER II-55
7. A Sample Input Deck

II-56
8. Miscellaneous

II-59
8.1. Size Limits II-59
8.2. Timing II-59
9. Acknowledgments

II-60


SECTION III

HOCHGESANG, G. T.: "SØCCER - A Concordance Program"

1. Introduction III-1
2. The Concordance III-2
A) Definitions III-2
B) The Input Text III-2
C) Processing the Text

III-2
D) The Output Format

III-6
3. Tape Usage III-6
A) Control Cards III-6
B) The INPUT, 0UTPUT, and SMRTAP Tapes III-6
C) Scratch Tapes III-7
4. Control Cards III-8
5. Examples of SØCCER Usage III-11
6. Subroutines used by SØCCER III-13
A) IN0T III-13
B) SPECTR III-15
C) CL0CK

III-17
7. Some Details about the SØCCER Program III-18
A) Source Deck Changes III-17
B) Timing III-18
Appendix III-19


SECTION IV

SALTON, G. and LESK, M. E.: "Information Analysis and Dicionary Construction"

1. Introduction IV-1
2. Language Analysis IV-2
3. Dictionary Construction IV-7
A) The Synonym Dictionary (Thesaurus) IV-7
B) The Null Thesaurus and Suffix List IV-15
C) The Phrase Dictionaries IV-21
D) The Concept Hierarchy IV-27
4. Dictionary Performance IV-27
A) The Null Thesaurus IV-33
B) The Regular Thesaurus IV-38
C) The Phrase Dictionary IV-42
5. Automatic Thesaurus Construction IV-44
A) Fully-Automatic Methods IV-48
B) Semi-Automatic Methods IV-50
C) Sample Thesaurus Generation IV-56
6. Semi-Automatice Hierarchy Formation

IV-59


SECTION V

LESK, M. E. and SALTON, G.: "Design Criteria for Automatic Information Systems"

1. Introduction V-1
2. The SMART Experiments V-3
3. Evaluation Results and Design Criteria V-11
A) Indexing Depth and Document Length V-11
B) Synonym Recognition V-16
C) Phrase Processing V-19
D) Statistical Association Methods V-22
E) Hierarchical Subject Expansion V-28
F) Manual Indexing V-30
G) Iterative Searching V-32
H) Summary V-33


SECTION VI

RIDDLE, W., HORWITZ, T., and DIETZ, R.: "Relevance Feedback in an Information Retrieval System"

1. Introduction VI-1
2. Principal Methods VI-3
A) Determination of the Number of Documents Retrieved VI-5
B) The Effect of the Correlation Function VI-5
C) Determination of the Relevance Weighting Factors VI-6
D) Determination of the Value of [alpha] VI-8
E) Termination of the Modification Process VI-10
3. Experimental Results VI-10
4. Conclusions VI-12
Appendix A VI-16
Appendix B (by E. M. Keen) VI-19


SECTION VII

LESSER, V. R.: "A Modified Two-Level Search Algorithm Using Request Clustering"

1. Introduction VII-1
2. A Modified Cllustering Algorithm and a Corresponding Two-Level Search Strategy VII-3
3. Advantages of the Query Clustering System VII-5
4. Design of an Experiment to Compare the Modified with the Normal Two-Level Search Scheme VII-7
A) Problem Areas VII-7
B) Tests to Compare the Effectiveness of Each Search Procedure VII-7
C) Implementation of the Normal and Modified Two-Level Search Schemes VII-9
D) Test Data Base VII-12
5. Design of an Experiment to Compare the Modified with the Normal Two-Level Search Scheme VII-13
A) Data Generated for Two-Level Search Algorithm VII-13
B) Data Generation for Modified Two-Level Search Algorithm VII-14
C) Experimental Evaluation VII-16
D) Evaluation Results VII-25
6. A New Criterion for Search Effectiveness VII-27
7. Conclusions VII-28
Appendix A VII-30


SECTION VIII

BLOMGREN, G., GOODMAN, A., and KELLY, L.: "An Experimental Investigation of Automatic Hierarchy Generation"

1. Introduction VIII-1
2. Automatic Construction of Hierarchies VIII-2
3. Outline of the Investigation VIII-12
Appendix A VIII-15


SECTION IX

BROFFIT, J. D., MORGAN, H. L., and SODEN, J. V.: "On Some Clustering TEchniques for Information Retrieval"

1. Introduction IX-1
2. Similarity Measures IX-4
3. Rocchio's Procedure IX-5
4. Bonner's Procedure IX-7
5. The Experiment IX-10
6. Evaluation IX-11
7. Results and Conclusions IX-12


SECTION X

LESK, M.E.: "Design Considerations for Time Shared Automatic Documentation Centers"

1. Introduction X-1
2. Principles X-2
3. Methods X-5
4. Practicalities X-9
5. Conclusions X-17

NIST home Retrieval Group home page
IAD home page
Date updated:
Date created: Monday, 18-Sept-00