Title: Highlighting Page 1 of Date Prepared: November 21, 1995 Date Needed: April 3, 1996 May 28, 1996 Paragraphs Affected: Section 7.1.4 Document and Query Indexes References: RFC Viewing RetrievalQueries & RoutingQueries Change Required: --------------- We propose 2 new operations to the Tipster Architecture. These operations will identify matching query terms and passages in Documents. One MatchQueryTerms() operation will be added to the DocumentCollectionIndex and the other to the QueryCollectionIndex. Documentation shall be included to stress that these new operations will add Annotations to the Documents. In addition, the documentation shall stress that these query term Annotations are not critical and need not be maintained over the life of the Document. Specific Recommendation: ----------------------- Section 7.1.4 [Document & Query Indexes] Class DocumentCollectionIndex ... Operations MatchQueryTerms(sequence of DocumentCollectionIndex, RetrievalQuery, Document) Document is a Document which has been returned in the Collection created by a RetrieveDocuments() operation. RetrievalQuery is the query evaluated during the RetrieveDocuments() operation. sequence of DocumentCollectionIndex is the set of indices against which the RetrievalQuery was evaluated. The MatchQueryTerms() operation will add Annotations to the Document to indicate terms and passages in the Document which match the input RetrievalQuery. annotation type RelevantPassage {rank : String} annotation type RelevantTerm {rank : String} Title: Highlighting Page 2 of The query term Annotations indicate where the RawData component of a particular Document matches the RetrievalQuery. Because these Annotations are specific to an instance of a RetrievalQuery they are not intended to be a permanent part of the Document. Therefore, users may wish to delete these query term Annotations when they are finished reviewing their RetrieveDocuments results. Class QueryCollectionIndex ... Operations MatchQueryTerms(sequence of QueryCollectionIndex, RoutingQuery, Document) Document is a Document which was compared to a set of QueryCollectionIndexes in a RetrieveQueries() operation. sequence of QueryCollectionIndex is the set of QCIs used to compare the Document during the RetrieveQueries() operation. RoutingQuery is the current RoutingQuery representing a DetectionNeed returned by the RetrieveQueries() operation. Will add Annotations to the Document to indicate terms and passages in the Document which match the input RoutingQuery. annotation type RelevantPassage {rank : String} annotation type RelevantTerm {rank : String} The query term Annotations indicate where a particular Document matches a RoutingQuery. Because these Annotations are specific to an instance of a RoutingQuery they are not intended to be a permanent part of the Document. Therefore, users may wish to delete these query term Annotations when they are finished reviewing their RetrieveQueries results. GetQuery(QueryCollectionIndex, DetectionNeed): RoutingQuery or nil GetQuery returns the RoutingQuery in QueryCollectionIndex whose DetectionNeed component is identical to the DetectionNeed given as an argument. If no such RoutingQuery exists in the QueryCollectionIndex, nil is returned. [see Viewing RetrievalQueries and RoutingQueries RFC] Title: Highlighting Page 3 of Reason for Proposed Change: -------------------------- Indicating which Document terms and passages matched the RetrievalQuery or RoutingQuery are typical highlighting functions performed by Detection systems. Section 7 of the Tipster architecture defines the functionality of the Detection system yet does not mention these highlighting tasks as operations for of any of the Detection objects. By adding a MatchQueryTerms() operation to both the DocumentCollectionIndex and the QueryCollectionIndex objects, the Tipster Architecture provides the expected query matching information users expect from an IR system. An additional operation GetQuery() has been added to the QueryCollectionIndex object. A QueryCollectionIndex consists of RoutingQueries where each RoutingQuery is a representation of a DetectionNeed. The RetrieveQueries() operation returns only the Collection of DetectionNeeds matching a particular Document. To provide highlighting information for that pair, the Search Engine requires the RoutingQuery representation of the DetectionNeed which caused the Document to be matched. The GetQuery() operation allows the RoutingQuery matching the DetectionNeed to be retrieved from the QueryCollectionIndex and used later in the MatchQueryTerms() operation. Change Requested By: Organization: University of Massachusetts, CIIR Name: Kathleen S DiBella Phone Number: (413)545-9781 Date: October 19, 1995