Although we planned to use a file-based persistence scheme for our sample application, we wanted to allow other application developers to use other sorts of support for persistence with minimal changes to the framework. In solving this problem we drew significantly on Larman's discussion (pp. 455-486) of a persistence framework. Note too that we did not try to replace all the functionality of the full object database - only the minimum we thought was needed by a research IR system with emphasis on flexible modeling rather than realistic operational adequacy. For example, we assumed documents could be added but not deleted or changed, we did not implement commit/rollback, etc.
To isolate most of code from knowledge of whether an object exists on disk, in memory, or both and to allow for gradual, on-demand creation, we implemented the virtual proxy pattern. See Gamma, Helm, Johnson, and Vlissides (pp. 207-217) or Buschmann, Meunier, Rohnert, Sommerlad, and Stal (pp. 263-275) for more information on this pattern. Most of the important objects are dealt with indirectly via proxy objects, which act as smart references, passing method calls to the associated real objects, fetching the real objects only on-demand and when not alreadly present in memory. Complex objects such as those representing documents or indexing features are built using proxies and so can be materialized gradually as needed. Here are the classes of proxies that inherit directly from VirtualProxy. These classes are in turn subclassed
For detailed information about how persistence is supported see the main classes at the top of the hierarchies involved:
This method of a proxy must make the real object directly associated
with a proxy persistent and see that the same is done recursively for all
members of the real object that are proxies, etc.
The default method in VirtualProxy does not handle member proxies,
so every proxy class which contains one or more proxies must override this
method to make the recursive call(s) on its contained proxies. This method
has to be recursive, otherwise an object couldn't be reconstructed if all
of its component were not found.
Invocations beyond the first for a given proxy have no effect.
When an object with a proxy is made persistent, the broker assigns
a handle to the real object and an object identifier (Oid) is stored in
the proxy. The reference to the proxy is added to a table of in-memory
proxies by Oid. See below for information about how
this table is used to avoid duplicate instantiations as the result of materialization.
The method of a proxy returns the proxy's real object. If it's not
in memory any more, this method will initiate the series of calls necessary
to get it back from persistent storage.
This method of a proxy cuts the link to the proxy's real object
so that the real object is available for garbage collection.
It shouldn't be necessary to design a recursive makeLightweight for every proxy container class since contained objects with no references to them from outside the container become available for collection as soon as the container is available. But with current Solaris Java 1.2 VM and garbage collector it seems possible that making contained proxies lightweight speeds their collection. The choice of objects having a recursive makeLightweight method isn't easy to make. Right now, only Documents and ProxyFeatureLists have one.
This method updates the object on the storage. Right now, it doesn't
check whether the object has changed or not since it was last saved. Should
do it one day, and attempt to write the new object at the same place if
possible.
The following two methods, provided only by VirtualProxy maintain the table of in-memory proxies:
When a class makes a reference to a proxy or contains a Java container
(e.g., Vector) that makes a reference to a proxy, the class must call this
method of the proxy to register the reference. This could happen in various
methods such as set methods, add methods, etc. When the references are
registered, a Boolean (proxyRefCounted) is set to true.
If a class only has a local variable refering to a proxy, it shouldn't call this method on the proxy. Classes concerned, for example, are subclasses of Document, PersistentDualKeyContainer for the ProxyFeatureLists, ProxyFeatureList for the objects it contains, IndexingFeature.
Constructors do not call addRef... because only references to proxies of persistent objects are registered - since only these objects can be materialized. The constructor cannot know whether the object being constructed will always, sometimes, or never be made persistent. The client controls this decision via the makePersistent method, which registers references to proxies at that point. This method also sets the proxyRefsCounted boolean in the real object (see finalize).
As the proxyRefsCounted boolean now exists in every real class containing proxies, it has to be set to true by the broker for this real class when the object comes back from disk because the default value for this boolean is false, which means a real object that came back from disk won't remove the references it had when it is finalized.
This method must be called by the same classes as called addRef...
in their finalize() method and in other methods, e.g., any which may overwrite
an existing reference to a proxy. This way, just before disappearing, a
container class tells every proxy it contains that it will no longer be
refering to the proxy and the reference count in the table of in-memory
proxies is adjusted accordingly.
The following methods use the table of in-memory proxies to avoid duplication of proxies. One or the other other is used, never both on the same proxy:
This method is used by only brokers, when materializing proxies
contained in the real object they are responsible for. The method returns
the proxy of the corresponding given class with the Oid given. It builds
the proxy from scratch using the no-argument constructor, if the proxy
didn't already exist or returns the already existing one if possible. It
increments the number of references this proxy has, so NO call to addRef...
is necessary.
Sometimes, an object comes back from disk with the usual readObject
or defaultReadObject methods. A proxy may then be redundant with an already
existing one and after creation must be replaced with the return value
of this method. This way, a redundant proxy can be collected and a proxy
for a given object will be unique.
The instances of FileBroker we implemented are responsible for managing persistent versions of most classes of objects (data elements, IR documents, indexes, indexing feature lists, and indexing features). The sample application's "indexDir" parameter controls their location.
Basic persistent data objects produced by the installation test: Bytes Filename Use ----- -------- ---------------- 240 Dec 30 08:10 DB.HCI HciDocs 18283 Dec 30 08:10 DB.HTML DeHtmls 300 Dec 30 08:10 DB.IF IndexingFeatures 2280 Dec 30 08:10 DB.IFWS String feature 33925 Dec 30 08:10 DB.IFWT Text feature 9756 Dec 30 08:10 DB.Indx Indexes 612 Dec 30 08:10 DB.Per PersonNames 7636 Dec 30 08:10 DB.Str StringsBut there are other classes, not derived from Broker, that manage persistent storage of themselves and their components.
The following files, as produced by the installation test, are created/managed by a PersistentDualKeyContainer, the heart of the current Index/IdxIntern class. (The sample application's "indexDir" parameter controls their location.)
Indexes (two files per index: sBs (sources by value) and vBs (value by source): 517 Dec 30 08:10 DBauthorindexsBv0 517 Dec 30 08:10 DBauthorindexvBs0 13692 Dec 30 08:10 DBdocAbstractindexsBv0 517 Dec 30 08:10 DBdocAbstractindexvBs0 2017 Dec 30 08:10 DBtitleindexsBv0 517 Dec 30 08:10 DBtitleindexvBs0 Feature list pools (two files per index): 260 Dec 30 08:10 DB.pool.autho.SbV 260 Dec 30 08:10 DB.pool.autho.VbS 19175 Dec 30 08:10 DB.pool.docAb.SbV 19175 Dec 30 08:10 DB.pool.docAb.VbS 1560 Dec 30 08:10 DB.pool.title.SbV 1560 Dec 30 08:10 DB.pool.title.VbS File name codes: 357 Dec 30 08:10 DBfileNames
The following files, as produced by the installation test, contain information used to manage IRF resources. (The sample application's "irmDir" parameter controls their location.)
The following file is the serialized last-used-Oid, saved by the IrfManager:
4 Dec 30 08:10 DB.Oid
The following file is created/managed by the HandlesByOid class to store the mapping of handles to object identifiers. If all handles are not of the same length, then there would be an additional index file (DB.HanI) as well
54936 Dec 30 08:10 DB.HanD
The following file is the serialized InfoServer, created/managed by the IrfManager. It is used to find all the other persistent objects.
1031 Dec 30 08:10 DB.Info