|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--gov.nist.nlpir.irf.index.braf.PersistentDualKeyContainer
This class is the "heart" of the Index. It mainly contains two
PersistentIrfHashtables of feature lists. The first one, called
valuesBySource, stores all the indexing features found, classified
by source. It will allow user to retrieve all the features of a given
document, for example. The second one, called sourcesByValue, stores
the different values whatever their source is. It will allow user to
retrieve all the sources where a given feature appear.
Each entry in a table corresponds to a list of features. For the
first table, the vector contains all the features found for the
source entry. In the second table, one vector contains all the
sources in which the entry feature can be found. The features
are shared by the lists, ie each feature belongs to two lists.
An example of using the class is below
setIndexingFeature = new PersistentDualKeyContainer("myFile", true); // Add an element to DualKeyContainer using two keys setIndexingFeature.put(source, key, value); // More precise example setIndexingFeature.put("doc1", "word1", "word1InDoc1"); setIndexingFeature.put("doc1", "word2", "word2InDoc1"); setIndexingFeature.put("doc2", "word1", "word1InDoc2"); // We now have: // setIndexingFeatures.getSourceVector("word1") == {"word1InDoc1", "word1InDoc2"} // setIndexingFeatures.getValuesVector("doc1") == {"word1InDoc1", "word2InDoc1"} // But you will actually have to define more precise objects: // the keys have to be DeIntern and the sources have to be // IoAddrIntern, the values have to be ProxyIndexingFeature.The comments of this class are going to define a Contract that any class being used to replace this one must conform to. One may wish to get rid of the current PersistentDualKeyContainer class because it embeds its own persistence mechanism. Thus, instead of modifying it, the easiest may be to write completely another class using the same interface. If parts of PersitentDualKeyContainer are to be reused, the ways those parts are currently managed will have to be studied closely (management of references count, collection, shutdown issues) in order to avoid any strange side effect, even if most of them are commented. The contract is only defined for public methods, obviously, because the interface doesn't constrain to a certain type of inner mechanisms.
DualKeyContainer
, Serialized FormField Summary | |
private java.lang.String |
DB_Directory
Directory containing files that comprise the PDKC |
private static int |
FEATURE_VECTOR_CAPACITY_INCREMENT
|
private static int |
FEATURE_VECTOR_INITIAL_CAPACITY
|
private java.lang.Object |
lastSource
|
private ProxyFeatureList |
lastValues
Caching mechanism objects: This cache is only used by put(). |
private java.lang.String |
poolNameSbV
Name of the pool file for sources-by-value |
private java.lang.String |
poolNameVbS
Name of the pool file for values-by-source |
(package private) static long |
serialVersionUID
serial version universal id - put here so Java does not insert one which may change due to revisions and make it impossible to deserialize earlier versions of serialized objects |
private static int |
SOURCE_VECTOR_CAPACITY_INCREMENT
|
private static int |
SOURCE_VECTOR_INITIAL_CAPACITY
|
private PersistentIrfHashtable |
sourcesByValue
Table of features accessed by document |
private int |
sourcesNumber
Size of sourcesByValue |
private int |
uniqueValuesNumber
Size of valuesBySource |
private PersistentIrfHashtable |
valuesBySource
Table of features accessed by feature |
private int |
valuesNumber
Number of values stored |
Constructor Summary | |
PersistentDualKeyContainer(java.lang.String DB_Directory,
java.lang.String indexName)
Constructor for PersistentDualKeyContainer Contract: This constructor is only used by IdxIntern. |
Method Summary | |
void |
clear()
Clears both tables. |
java.util.Enumeration |
elements()
Returns all the features stored. Contract: The Enumeration returned contains all the ProxyIndexingFeatures stored in the PDKC. |
java.lang.Object |
getActualFeature(java.lang.Object feature)
When a value is stored, it appears in a FeatureList corresponding to its feature (a key of a hashtable). |
java.lang.Object |
getActualSource(java.lang.Object source)
When a value is stored, it appears in a FeatureList corresponding to its source (a key of a hashtable). |
java.util.Vector |
getAllValues()
Returns a Vector containing all the values stored in the PersistentDualKeyContainer. Contract: This Vector is the concatenation of all Vectors or Lists of ProxyIndexingFeatures that may be found in the PDKC. |
int |
getFeatureBinCount()
Returns the length of the base array in the PersistentIrfHashtable by feature. |
ProxyFeatureList |
getFeatureVector(java.lang.Object sourceKey)
Contract: The returned ProxyFeatureList must contain every ProxyIndexingFeature that was stored in the PDKC with a sourceKey matching the one given in this method's parameter. |
int |
getNumberOfSourcesFor(java.lang.Object feature)
Gives the number of sources containing the given feature. |
int |
getNumberOfValuesFor(java.lang.Object source)
Gives the number of features stored for the given source. |
int |
getSourceBinCount()
Returns the length of the base array in the PersistentIrfHashtable by source. |
java.util.Enumeration |
getSources()
Returns the enumeration of Objects used as sources in this PersistentDualKeyContainer. |
int |
getSourcesNumber()
Gives the number of sources in the DualKeyContainer. Contract: Each time a source is added to the PDKC, the sourcesNumber variable must be increased (see
put() ). |
ProxyFeatureList |
getSourceVector(java.lang.Object featureKey)
Returns the Vector associated to the parameter key. Contract: The returned ProxyFeatureList must contain every ProxyIndexingFeature that was stored in the PDKC with a featureKey matching the one given in this method's parameter. |
int |
getUniqueValuesNumber()
Returns the number of different values stored in the PersistentDualKeyContainer. Contract: The name of this method is a bit ambiguous. |
java.util.Enumeration |
getValues()
Returns the enumeration of Objects used as features for this PersistentDualKeyContainer. Caution: these are the actual keys, ie an object used as a feature may not be returned if another object equal to the first one (considering hashCode() and equals()) had already been used as a feature for this DualKeyContainer. |
int |
getValuesNumber()
Gives the total number of features stored in the table. Contract: Each time a feature is added to the PDKC, the valuesNumber variable must be increased (see
put() ). |
boolean |
isEmpty()
|
void |
put(java.lang.Object sourceKey,
java.lang.Object featureKey,
ProxyIndexingFeature proxyObject)
Puts a ProxyIndexingFeature instance in the container. |
void |
showHashtableStatistics()
To retrieve the statistics concerning the hashtables inside the PDKC. |
void |
showStatistics(int depth,
int maxLengthIfDepth3)
Prints statistics about the PersistentDualKeyContainer, ie its size and the size of its elements. |
private void |
showStats(PersistentIrfHashtable table,
int depth,
int maxLengthIfDepth3)
Prints statistics for ONE HVtable. |
java.lang.String |
toString()
Classic representation method. |
Methods inherited from class java.lang.Object |
|
Field Detail |
static final long serialVersionUID
private PersistentIrfHashtable sourcesByValue
private PersistentIrfHashtable valuesBySource
private java.lang.String DB_Directory
private java.lang.String poolNameSbV
private java.lang.String poolNameVbS
private int valuesNumber
private int sourcesNumber
private int uniqueValuesNumber
private static int SOURCE_VECTOR_INITIAL_CAPACITY
private static int SOURCE_VECTOR_CAPACITY_INCREMENT
private static int FEATURE_VECTOR_INITIAL_CAPACITY
private static int FEATURE_VECTOR_CAPACITY_INCREMENT
private transient ProxyFeatureList lastValues
private transient java.lang.Object lastSource
Constructor Detail |
public PersistentDualKeyContainer(java.lang.String DB_Directory, java.lang.String indexName)
DB_Directory
- the name of the directoryindexName
- name of the index this PDKC supports
or the radical of the names for the DB files.IdxIntern
Method Detail |
public final ProxyFeatureList getSourceVector(java.lang.Object featureKey)
hashcode()
and return true
when the
method equals() is called on each other.featureKey
- value used to search the container.public final ProxyFeatureList getFeatureVector(java.lang.Object sourceKey)
hashcode()
and return true
when this
method is called on each other.sourceKey
- source used to search the container.public void put(java.lang.Object sourceKey, java.lang.Object featureKey, ProxyIndexingFeature proxyObject)
sourcesNumber
, uniqueValuesNumber
and valuesNumber
. The first two are incremented by one if sourceKey
and featureKey
respectively are encountered
for the first time. The third variable has to be increased at every
call, because each of them will result in an extra feature in the
PersistentDualKeyContainer.sourceKey
- an IOAddrInternfeatureKey
- a DeInternproxyObject
- Must be a ProxyIndexingFeature. Declared as
an Object only for compliance with DualKeyContainer.DeIntern
,
IoAddrIntern
,
Document
,
DataElem
public java.lang.String toString()
public final int getValuesNumber()
valuesNumber
variable must be increased (see
put()
). This method returns the value of this variable.public final boolean isEmpty()
true
if no key table has been initialized.public final java.util.Enumeration elements()
DualKeyContainer.put(java.lang.Object, java.lang.Object, java.lang.Object)
public java.util.Vector getAllValues()
public void showStatistics(int depth, int maxLengthIfDepth3)
depth
- 1, prints the size of the contained vectors for each hash table,private void showStats(PersistentIrfHashtable table, int depth, int maxLengthIfDepth3)
table
- The hashtable of vectors to be presented.depth
- Same as showStatistics() depth.public void showHashtableStatistics()
public final int getSourcesNumber()
sourcesNumber
variable must be increased (see
put()
). This method returns the value of this variable.
"A source is added" means a ProxyIndexingFeature is given for
storage with a sourceKey that cannot be found yet as being a
source for at least one ProxyIndexingFeature.public final int getUniqueValuesNumber()
uniqueValuesNumber
variable. This
variable counts the number of different features there is in the
pDKC, ie the number of different featureKeys put()
has been called with.public final java.util.Enumeration getSources()
hashCode()
and
equals()
) had already been used as a source for this
PersistentDualKeyContainer. Basically, it's an enumeration of
IoAddrInterns.put()
must be present in the Enumeration
result. "different" for two IoAddrInterns means they return
false
when called one another for equals()
.public final java.util.Enumeration getValues()
getSources()
, except it
returns an Enumeration of DeInterns and not IoAddrInterns.public final int getSourceBinCount()
public final int getFeatureBinCount()
public final java.lang.Object getActualSource(java.lang.Object source)
true
but
still be different objects. The only condition for this is that
they "represent" the same IRF_Document. But IoAddrInterns contains
more data than just a reference to a document. Thus, to gather
information in the IoAddrIntern concerning a document, this method
must be called to ensure the correct IO_AdrIntern is actually used.PersistentIrfHashtable.getActualKey(java.lang.Object)
public final java.lang.Object getActualFeature(java.lang.Object feature)
getActualSource()
: DeInterns and DEs can be considered
equals as soon as they represent the same data, not necessarily in
the same place. The storage in the PDKC must takes this in account.PersistentIrfHashtable.getActualKey(java.lang.Object)
public void clear()
public final int getNumberOfValuesFor(java.lang.Object source)
source
- the source for which the number of values will be computed.public final int getNumberOfSourcesFor(java.lang.Object feature)
feature
- the feature for which the number of sources will be computed.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |