Examples of persistence mechanism evolution

We wanted to see what we would have to do to make IRF use a storage mechanism different from the basic flat file one currently used. The two chosen examples were the freely available object databases storedObjects and Ozone. In both cases we were dealing with the then (March 2000) current versions of software which is rapidly changing, so our comments may no longer apply. In any case our findings are preliminary and are not intended either as a final endorsement or criticism of these two packages.

Ease of use

Ozone

Ozone can be downloaded from http://www.ozone-db.org and is 100% pure Java.
It installs neatly in a directory with gnumake, assuming the directory is empty before. It won't install with the regular Sun make utility with a cryptic error message.
A small tutorial helps for the first steps with Ozone. This system can be used to develop a persitence based application, but the early design of the objects must take this into account: objects to be stored using Ozone have to extend OzoneObject, and will then necessarily have a proxy. Writing code is tricky with Ozone: an application is always the client of a server (no possibility of embedding the server in the client), and these two parts are going to execute code, so it can be hard to tell whether some code will be executed in the client or in the server (and it sometimes may be in both of them at different steps of the program).
In Ozone an interface class exposes all the methods for every object and you use it without knowing if you're using the real class (that you have to write) or the proxy class (which is automatically generated by an Ozone tool).

storedObjects

storedObjects can be downloaded from http://www.jdbms.org and is 100% pure Java.
storedObjects is very basic to install (javac *.java in the source directory). It contains lots of example programs and lots of documentation, but no tutorial (it will one day). At first sight, it really looks like the commercial product ObjectStore by ObjectDesign, or at least like version 3. Associated with the projects files is a schema file on which you run the Schema Generator that creates two DB representation files (one for the server, MAIN.DBM, and one for the client, CLIENT.DBM). Afterwards, no post- or preprocessor is needed to run the application. The server can be embedded in the client thanks to the "fakeTCP" mode, so that both ends run in the same JVM.

Performance / suitability

Ozone is based on Java serialization and reflection. Thus, it runs into the problems we had at the beginning with IRF: efficiency. Creating, saving, restoring objects is very slow. It can be improved with the same kind of tricks we used for IRF (tuned serialization), but there is no guarantee it would go much further.
storedObjects is very fast. At first sight, about 100 times faster than Ozone for the same kind of test application. It also provides several possibilities of indexing the objects with field values, managing referential integrity, etc.

Achievements

In both cases, we was able to get a test application that manages a little index containing pseudo indexing features. In both cases, retrieval works. We had no chance to test whether indexing possibilities of the two ODBMSs could be used because we have a bigger issue first: as the indexes we want to be able to manage are huge, we need a way to have them only partially in memory. That brings the need for on-demand materialization.
With Ozone, We couldn't write code that would allow us to restore partially an index in memory, restore the parts needed and get rid of them when necessary When an Index was restored, it was loaded entirely. If the memory wasn't big enough, the application crashed.
storedObjects is designed to allow people to use on-demand materialization. Thus, we enquired about how we could use it. It appears that this on-demand materialization didn't work properly with arrays (heavily used in Vectors and Hashtables). We submitted a bug report and received a small patch that didn't prove to be enough. So the storedObjects research stopped with this finding, but as soon as this problem is fixed, they could be resumed.

What about IRF ?

In no case have we been able to get beyond the test application with one of the two ODBMSs tried.
With Ozone, the data organization really looks like the one we used to have in IRF: an interface implemented by both the proxy and the real class. In order to use it with IRF, the first step would be to cleanly define which classes need a proxy. Only instances of those classes could then be easily restored, just knowing their names (which means there will be a name per object, i.e., more than an OID). Indexes, Documents, and DEs seem to be the most appropriate classes to have proxies. The other classes currently with proxies in IRF (IndexingFeatures, FeatureLists) could be managed directly by Ozone. Ozone will itself ensure unicity of restored proxies and the restoration of proxy-less objects. This shows the main drawback for the use of Ozone with IRF: Ozone needs to be taken in consideration from the beginning of the design phase, and thus doesn't permit easily to add persistence to an already existing application
With storedObjects, the use of proxies is transparent. You never know which kind of object you're using, actually because the server manages this aspect. You can restore objects knowing their OID, or you can query the collection to find the one you need (knowing, for example, its class and a field content). In order to correctly manage partial restoration of objects, we think we would have to define our own Vector class (like FeatureList), because the storedObjects collections may not match this need. Otherwise, sO provides us with a hashtable class that could be enough (no need for redefinition like in IRF, getActualKey can be emulated with sO indexing/querying facilities). The work could then be easier than with Ozone. Only problem: unlike with Ozone, an application with sO always works through a DBClient. This class allows the user to talk to the DBServer, asking it to store, restore, delete objects. Thus, all classes managing persistence projectwide must be aware of this connection. The need for a sOBroker then begins to appear, keeping track of this collection. There shouldn't be any need for a broker per class but just a general one, a bit like like the PersistentObjectManager currently works in IRF.

And now ?

storedObjects seems the best solution for me. On the user mailing list, after we made our problem known, the project manager exposed ways that could allow sO to work with huge numbers of objects. It may be implemented in the future, and the array partial restoration problem should be addressed quickly, we hope.
The main goal of those tests were to implement a different storage mechanism to see if IRF easily accepts this change. It couldn't actually be tested, but at least we discovered the way two (very) different ODBMS can work. We think IRF should accept such a change quickly, provided we make it a little more dynamic (kind of registration for brokers, proxies, etc, so that there's only a class to change to get a new storage mechanism, not a group of them).

Other References

To get a closer look at what have been done with storedObjects, which has been the main test, you can read Using storedObjects for IRF
National Institute of Standards and Technology Home Last updated: Tuesday, 01-Aug-2000 08:34:28 EDT

Date created: Monday, 31-Jul-00
For further information contact Paul Over (over@nist.gov) with
copy to Darrin Dimmick (ddimmick@nist.gov)