Title: Highlighting Page 1 of Date Prepared: November 21, 1995 April 3, 1996 Priority: Routine Document Affected : Design (1.52) Paragraphs Affected: Section 7.1.4 Document and Query Indexes References: RFC Viewing RetrievalQueries & RoutingQueries Change Required: --------------- We propose 2 new operations to the Tipster Architecture. These operations will highlight matching query terms and passages in Documents. One Highlight() operation will be added to the DocumentCollectionIndex and the other to the QueryCollectionIndex. Documentation shall be included to stress that these new operations will add Annotations to the Documents. In addition, the documentation shall stress that these highlighting Annotations are not critical and need not be maintained over the life of the Document. Specific Recommendation: ----------------------- Section 7.1.4 [Document & Query Indexes] Class DocumentCollectionIndex ... Operations Highlight(sequence of DocumentCollectionIndex, RetrievalQuery, Document) Document is a Document which has been returned in the Collection created by a RetrieveDocuments() operation. RetrievalQuery is the query evaluated during the RetrieveDocuments() operation. sequence of DocumentCollectionIndex is the set of indices against which the RetrievalQuery was evaluated. The Highlight() operation will add Annotations to the Document to indicate terms and passages in the Document which match the input RetrievalQuery. annotation type RelevantPassage {rank : String} annotation type RelevantTerm The RelevantPassage Annotations can be ranked (best passage) using the rank Attribute. Title: Highlighting Page 2 of The highlighting Annotations indicate where the RawData component of a particular Document matches the RetrievalQuery. Because these Annotations are specific to an instance of a RetrievalQuery they are not intended to be a permanent part of the Document. Therefore, users may wish to delete these highlighting Annotations when they are finished reviewing their RetrieveDocuments results. Class QueryCollectionIndex ... Operations Highlight(sequence of QueryCollectionIndex, RoutingQuery, Document) Document is a Document which was compared to a set of QueryCollectionIndexes in a RetrieveQueries() operation. sequence of QueryCollectionIndex is the set of QCIs used to compare the Document during the RetrieveQueries() operation. RoutingQuery is the current RoutingQuery representing a DetectionNeed returned by the RetrieveQueries() operation. Will add Annotations to the Document to indicate terms and passages in the Document which match the input RoutingQuery. annotation type RelevantPassage {rank : String} annotation type RelevantTerm {rank : String} The highlighting Annotations indicate where a particular Document matches a RoutingQuery. Because these Annotations are specific to an instance of a RoutingQuery they are not intended to be a permanent part of the Document. Therefore, users may wish to delete these highlighting Annotations when they are finished reviewing their RetrieveQueries results. GetQuery(QueryCollectionIndex, DetectionNeed): RoutingQuery or nil GetQuery returns the RoutingQuery in QueryCollectionIndex whose DetectionNeed component is identical to the DetectionNeed given as an argument. If no such RoutingQuery exists in the QueryCollectionIndex, nil is returned. [see Viewing RetrievalQueries and RoutingQueries RFC] Title: Highlighting Page 3 of Reason for Proposed Change: -------------------------- Indicating which Document terms and passages matched the RetrievalQuery or RoutingQuery are typical highlighting functions performed by Detection systems. Section 7 of the Tipster architecture defines the functionality of the Detection system yet does not mention these highlighting tasks as operations for of any of the Detection objects. By adding a Highlight() operation to both the DocumentCollectionIndex and the QueryCollectionIndex objects, the Tipster Architecture provides the expected query matching information users expect from an IR system. An additional operation GetQuery() has been added to the QueryCollectionIndex object. A QueryCollectionIndex consists of RoutingQueries where each RoutingQuery is a representation of a DetectionNeed. The RetrieveQueries() operation returns only the Collection of DetectionNeeds matching a particular Document. To provide highlighting information for that pair, the Search Engine requires the RoutingQuery representation of the DetectionNeed which caused the Document to be matched. The GetQuery() operation allows the RoutingQuery matching the DetectionNeed to be retrieved from the QueryCollectionIndex and used later in the Highlight() operation . Applications Affected: Change Requested By: Organization: University of Massachusetts, CIIR Name: Kathleen S DiBella Phone Number: (413)545-9781 Date: October 19, 1995