To the extent that IRF is a well-designed IR framework, it should be possible to build many applications without modifying any of the IRF class code, but rather by creating new application classes which in some cases extend IRF classes. This section outlines the main work items required to build a new application when the application documents can be adequately described in terms of the existing set of data elements, the indexes provided (keyword and IDF-based) are appropriate. etc. If the application needs new data elements, alternate sorts of indexing features, different index structures, a different way of supporting persistence, etc., then the section on Changing the framework is more appropriate.
In order to represent a new kind of document so that IRF can use it as a source type, you need to define a Java class for this type. This class must extend the Document class, but thanks to inheritance, most of the features needed for a document type will be present in that new class. You then only have to define which kind of fields your document contains, and give each a name with a very classical instance variable declaration. IRF does all the rest, as explained in the Document class.
For support of persistence the class alone won't be enough: a Proxy class must be created for the corresponding document class. A few features will then have to be provided, but they are easy to define (mainly a constructor that makes sure the real document contained is actually of the wanted class).
The job of converting a raw data file to the new document objects is done by a converter. The converter can extend the IrfConverter class. Mainly, a converter creates documents and their constituent fields out of data elements. That's why a converter must be aware of the type of documents it is attempting to create. If the raw data is in SGML format, or directly the output of nsgmls, it may be a good idea to use this last tool and to extend the Sgml2AppDocConverter class to create the new converter, because basically the only job to do will be to create a ConversionRules class that just matches NSGMLS tags to field names in the defined document class.
As you defined a new kind of document, you need a broker to take care of instances of this new document type. You have to create a Broker class that will extend a generic broker, depending on the storage scheme you chose. If you extend the BufferedRandomAccessFileBroker, you don't have any methods to define at first; the default (de)serialization mechanism works fine. It allows you to get something working quickly. Later, you can tune the mechanism by overriding the two methods writeObject() and readObject(), which should result in a significant speed increase. Tuning the methods is fairly easy. It just consists in calling a PersistentObjectManager method (writeProxy()) with every field of the object to be stored, and the reverse thing for rebuilding it with the buildProxy() method.
Associated with both the broker mechanism and the new document class is
a Handle class. Even if the document class extends an already existing
document class, the Handle class
Thanks to the Java Reflection API, SampleApp is dynamic and as soon as you have created a new type of converter the document class it produces, you can test it with SampleApp as you can manually give a converter class name before a raw data file name. Thus debugging those two new objects can be done before a complete interface is developed.
Both the new proxy class and the new Handle class will have to be written to disk. When they come back from disk, they will also need a way to be reconstructed easily. That's why the PersistentObjectManager exists. It can be extended, overriding its getDocumentType()/getHandleType() and buildDocument()/buildHandleType() methods so that they check to see if they are dealing with the new proxy or handle and if so, behave analogously to the methods in PersistentObjectManager, and otherwise invoke the parent class's method with the same name.
The way in which a document is presented can be changed by overriding in the new document class the relevant methods in the framework's parent document class.