Changing IRF

Some applications may require more than just extension of IRF classes. Changes and extension within the framework may be necessary. This section touches on a number of such possible modifications.

New indexers and changing the default indexer

The type of index created by the software is currently hard-coded in the method gov.nist.nlpir.irf.document.DocCollection.setDefaultIndexingModalities(). In order to change the type of index created, the code must be edited, recompiled, etc. Comments exist to indicate where the code should be changed, both at the top of class file DocCollection.java and in the method DocCollection.setDefaultIndexingModalities().

New converters

IRF can now only deal with text in the nsgmls output format or that of a TROFFF bib file. Other formats can be added, plus different types of data: video, audio, etc. All of these new formats will need new converters compliant with the IrfConverter virtual class so that the rest of IRF can use them.

New data elements

For every new kind of data to be represented (a particular type of text, for example, or more obviously an audio or a video stream), a new data element class must be defined. This class may then contain the value and all other information needed for a data element. The DataElem interface presents every method that must be defined by a new type of date element

A new data element will require two classes to be completed: one class to be defined is the real object one: every method in the DataElem class must be implemented with its particular meaning for the new type of data element. Then, a proxy class must be implemented; it mainly consists of passing every method call to the real object after having made sure it was present (with the getRealObject() method).

When creating a new type of data element, its proxy class must be registered in PersistentObjectManager, in four places:

New indexing features

With IRF come three IndexingFeature classes. The first is very generic, it just links a Data Element to the source where it was found. That's why it has two extending classes, dealing with text. The first one allows one to keep more information about a word found in a text, specifying its rank in the string where it was found, the second class is even more specific because it manages both the rank and the paragraph in which the word was found.

For every new type of data manipulated, there may be a need to define a new type of indexing feature. The basic one just links a data element to the document where it has been found, and also keeps track of the field in which it was found. The indexing features are created by the data elements in the method getIndexingFeatures(). Thus, a new data element, for example a picture, may need, when it creates indexing features, to give them more than just the basic information. For text, it was the positional information, it could be the same thing for a picture, with the XY coordinates. Other types of data will need other information.

Thus, extending IRF may coincide with creating a new type of IndexingFeature. This should be done by extending the class, adding the values needed, making sure the associated data elements create them well in the getIndexingFeatures() method. Then, this new IndexingFeature proxy class must be registered in PersistentObjectManager, in four places:

New support for persistence

IRF is provided with basic support for persistent storage, using BufferedRandomAccessFiles. It should also be possible to use Object Data Bases, or even Relational Data Bases. While our aim has been to minmize the effort required to switch to a different sort of persistence mechanism, we have yet to put this to a complete test. See section Examples of persistence mechanism evolution for information on what we had time to attempt.

The main part to change for a switch of persistence mechanism is the broker scheme. With the given file-based persistence scheme, one broker is necessary for every class to be stored. With another scheme, it may not be the case. Then, each object to be stored needs a Handle, so that there may be different types of handle but some persistence mechanism may require only one.

Note on the provided mechanism: The file-based persistence scheme given with IRF is just a tuned version of the most basic one can imagine - serialization. Thus, it is compatible with serialization, and this possiblity is the first to explore while developing a new type of storeable object because it is very quick to get working.

Note on the PersistentObjectManager class: this class is currently used by two type of clients. First, the BufferedRandomAccessFileBrokers, which use it to write and read proxies on disk. This aspect of the Manager is strongly linked to the BrafPb scheme, and may actually be part of the top class of file brokers. The second aspects concerns the Handles. It is only used by the HandleByOid class. Those two aspects have been gathered in a unique class because they present the same kind of behavior and also because it limits the changes that have to be made to the framework in order to extend it to one place. Thus, changing the persistence mechanism doesn't necessarily mean the creation of another manager is necessary. It may just be possible to get rid of it.

Changing the PersistentDualKeyContainer: this class is the heart of the index. It manages itself its persistence, and thus doesn't use a broker nor has a proxy. But still, it could be changed to use a different persistence mechanism. The best thing for this is to read the documentation given with this class to create one that could be used with a relational database for instance.

The current proxy mechanism ensures that proxies for a given object are unique. This means you can't have two proxies pointing at the same real object, and neither can you have a given object materialized twice by different proxies that would be dealing with the same object. But in order to work, this mechanism has to maintain a table of all the proxies that are present in memory at one point, and the number of references that can be found to them accross the application. That's why VirtualProxy provides two methods: addRefToInMemoryProxiesByOid() and deleteRefFromInMemoryProxiesByOid(). These methods must be called by every class that creates or destroys a DIRECT reference to a proxy. When this reference is a local variable, these two steps can be avoided, but if an instance variable is concerned, then this is important. DIRECT means the variable is declared in a form like ProxySomething theVariable. If proxies are to be stored in a container, then the container should take care of those calls. But this is possible only if the container class is accessible. If it's not, either because it is frozen or part of an API for example, then this management can take place one level up, i.e., before adding the proxy to the container (Vector, Hashtable, ...) or right after removing it. The trick here is not to forget that when an object is replaced in such a container, it is equivalent to removing the former object and adding the new one. Thus the deleteRefFromInMemoryProxiesByOid() method must be called on the former object.


National Institute of Standards and Technology Home Last updated: Tuesday, 01-Aug-2000 12:34:22 UTC

Date created: Monday, 31-Jul-00
For further information contact Paul Over (over@nist.gov) with
copy to Darrin Dimmick (ddimmick@nist.gov)