TIPSTER Architecture Change Request Title: Document Lists Page 1 of ? Date Prepared: 9 March 1998 CR No. 12 Priority: Routine Date Logged Document Affected: Design Document Version: 2.3 Paragraphs Affected: Sections 3.1, 4.1, 4.3 and Appendices B & C References: None Change Required: Add DocumentList Class to the Architecture Specific Recommendations: Modify Architecture Design document pages by add the specification to affected sections and appendices with material provided. Reason for the Proposed Change: From its inception, the Architecture has been plague as to whether or not there should be a DocumentList Class. Various attempts have been made to provide the equivalent of a DocumentList Class. These efforts have been less than successful so it is appropriate to resolve the issue. A document list is a commonly accepted output from a retrieval (document detection) process since it identifies the retrieved documents without carrying all of the document’s text in the output. A document list also is a typical input to a GUI. The Class DocumentReference will be left in the Architecture for backward compatibility. ----------------------------------------------------------------------------------------------------- Add Section 4.3 4.3 DocumentList In the same way that Documents are gathered into Collections references to a group of Documents may be gathered into DocumentLists, which may have attributes on the document list level as well as on the individual documents which are referenced by DocumentIds. The DocumentList provides a way to handle documents indirectly without the burden of carrying the entire document text with the reference. DocumentLists are either named and persistent or unnamed and non-persistent. The DocumentList will, as a minimum, contain DocumentIds; however, a DocumentId may also be associated with an AttributeSets. These sets will be in addition to the attributes associated with the Document referenced by the DocumentList’s DocumentId. Class DocumentList Type of PersistentObject, AttributedObject Properties Contents: sequence of DocumentId (R, W) and sequence of Attribute(R, W) there is an AttributeSet for each DocumentId. The set may be empty. Operations CreateDocumentList (name: string, attributes:sequence of Attribute): DocumentList creates a named, persistent document list CreateTemporaryDocumentList (attributes: sequence of Attribute): DocumentList creates an unnamed, non-persistent document list AddDocumentId (DocumentList, string:sequence of DocumentId OR Collection, attributes:sequence of Attribute): adds a DocumentId to a DocumentList. If the DocumentId does not exist, this operation adds the argument to the Contents. If the argument is a Collection all of the Ids for documents in the Collection are placed in the DocumentList. If there is a matching DocumentId in the Content, the operation error condition is set FALSE and the operation terminates. Zero or more attributes may be created when the AddDocumentId is performed. RemoveDocumentId (DocumentList, sequence of DocumentIds OR Collection: string) removes the DocumentId from the list if a sequence DocumentId or a Collection document Id argument matches the DocumentId in the DocumentList; if any DocumentIds fail to match, the operation error condition is set to FALSE, even though some DocumentIds may have been removed because a match was found, and the operation continues. The associated AttributeSet is also removed Length (DocumentList): integer returns the number of DocumentIds in a document list MergeDocumentList (DocumentList, DocumentList): DocumentList returns the union of the DocumentIds, including related attributes, if any, in the two DocumentLists. MergeDocumentId (DocumentList, DocumentList): DocumentList returns the union of the DocumentIds, excluding related attributes, if any, in the two DocumentLists. FirstDocumentId (DocumentList): DocumentId OR nil returns the `first' DocumentId within a DocumentList and initializes data structures internal to the DocumentList so that NextDocumentId can be used to iterate through the documents in the list. Returns nil if no DocumentIds are found in the list. NextDocumentId (DocumentList): DocumentId OR nil returns the `next' DocumentId within a DocumentList. Normally used to iterate through all ids in a list. Returns nil if no more ids are found in the list. FirstDocumentId and NextDocumentId must be well behaved in the presence of calls to AddDocumentId and RemoveDocumentId. This means that a loop using FirstDocumentId and NextDocumentId must visit all ids which were in the list when FirstDocumentId was called if and only if the id are not deleted before the loop reaches them. DocumentIds added after FirstDocumentId is called may or may not be encountered during the loop. SelectDocumentIds (DocumentList constraint: sequence of AnnotationTypes OR sequence of Attributes): string returns the (possibly empty) set of DocumentId from the DocumentList which refer to Documents which have annotations of type type and which satisfy constraint. constraint is a sequence of attributes, where the ith attribute has name ai and value vi. An annotation satisfies the constraint if (for each i), attribute ai of the annotation has value vi. If constraint is the empty sequence, no constraint is placed on the attributes: all annotations of the given type are selected. If type is nil, annotations of all types satisfying the attribute constraints are included. GetDocument (DocumentList, DocumentId: string): Document OR nil returns the Document referenced with the given DocumentId, or nil if no such Document exists FirstDocument (DocumentList): Document OR nil returns the `first' Document referenced within a DocumentList and initializes data structures internal to the DocumentList so that NextDocument can be used to iterate through the list. Returns nil if no DocumentIds are found in the list. NextDocument (DocumentList): Document OR nil returns the `next' Document referenced within a DocumentList. Normally used to iterate through the list. Returns nil if no more DocumentIds are found in the DocumentList. FirstDocument and NextDocument must be well behaved in the presence of calls to AddDocumentId and RemoveDocumentId. This means that a loop using FirstDocument and NextDocument must visit all DocumentIds which were in the list when FirstDocument was called if and only if the DocumentIds are not deleted before the loop reaches them. DocumentIds added after FirstDocument is called may or may not be encountered during the loop. For Appendix B: Class DocumentList Operations CreateDocumentList (name: string, attributes: sequence of Attribute): DocumentList CreateTemporaryDocumentList (attributes: sequence of Attribute): DocumentList AddDocumentId (DocumentList, sequence of DocumentId OR Collection): string, Length (DocumentList): integer MergeDocumentList (DocumentList, DocumentList): DocumentList MergeDocumentId (DocumentList, DocumentList): DocumentList SelectDocumentIds (DocumentList constraint: sequence of AnnotationTypes OR sequence of Attributes): string FirstDocumentId (DocumentList): DocumentId OR nil NextDocumentId (DocumentList): DocumentId OR nil GetDocument (DocumentList, DocumentId: string): Document OR nil FirstDocument (DocumentList): Document OR nil NextDocument (DocumentList): Document OR nil For Appendix C typedef void* tip_DocumentList; typedef void* tip_DocumentId; tip_DocumentList CreateDocumentList (tip_string, tip_AttributeSet); tip_DocumentList CreateTemporaryDocumentList ( tip_AttributeSet); /*Type of argument 2 AddDocumentId and be tip_stringSet or tip_Collection*/ tip_string AddDocumentId (tip_DocumentList, void*); tip_integer Length (tip_DocumentList); tip_DocumentList MergeDocumentList (tip_DocumentList, tip_DocumentList); tip_DocumentList MergeDocumentId (tip_DocumentList, tip_DocumentList); /*Type of argument 2 for SelectDocuments can be tip_string or NULL*/ tip_stringSet SelectDocumentIds (tip_DocumentList, tip_string, tip_AttributeSet); tip_DocumentId FirstDocumentId (tip_DocumentList); tip_DocumentId NextDocumentId (tip_DocumentList); tip_Document GetDocument (tip_DocumentList, tip_DocumentId); tip_Document FirstDocument (DocumentList); tip_Document NextDocument (DocumentList);