gov.nist.nlpir.irf.conversion
Class Sgml2AppDocConverter

java.lang.Object
  |
  +--gov.nist.nlpir.irf.conversion.IrfConverter
        |
        +--gov.nist.nlpir.irf.conversion.Sgml2AppDocConverter
Direct Known Subclasses:
CranfieldConverter, FbisConverter

public class Sgml2AppDocConverter
extends IrfConverter

Models generic conversion of SGML-tagged documents (as formatted by the SGML parser/checker nsgmls) in a file of proxy Documents as dictated by conversion rules.

Version:
$Revision: 1.2 $
Author:
This software was produced by NIST, an agency of the U.S. government, and by statute is not subject to copyright in the United States. Recipients of this software assume all responsibilities associated with its operation, modification and maintenance.

Inner Class Summary
(package private)  class Sgml2AppDocConverter.InField
          Local class for objects on inFieldStack.
 
Field Summary
(package private)  Sgml2AppDocConverter.InField defaultInField
          Working Infield for start state
(package private)  ConversionRule defaultRule
          Working rule used in start state and when no rule is found for a tag.
private static int EOF
          Working constant
private  java.io.BufferedReader file
          The reader to be used to read the raw data
private  ConversionRules rules
          The rules to be used in conversion
 
Fields inherited from class gov.nist.nlpir.irf.conversion.IrfConverter
proxyDocClass, realDocClass
 
Constructor Summary
Sgml2AppDocConverter()
          Makes a new converter with no rules
Sgml2AppDocConverter(ConversionRules rules)
          Makes a new converter given a set of conversion rules
 
Method Summary
 ProxyDocument buildDoc(java.lang.Class proxyDocClass, java.lang.Class realDocClass, java.util.Vector outFields)
          Builds a ProxyDocument using the data on each field in the outFields vector.
 ProxyDocument convert1()
          Converts one document from a file opened using a BufferedReader.
 ConversionRules getConversionRules()
          Returns the conversion rules
 int ignoreN(int numToIgnore)
          Advances the reader to just beyond the end of the Nth (numToIgnore-th) document, if possible.
 void setConversionRules(ConversionRules rules)
          Sets the conversion rules
 void setRawDocLocation(java.lang.String loc)
          Gives the converter the information it needs to access the raw data for the documents to be converted.
 
Methods inherited from class gov.nist.nlpir.irf.conversion.IrfConverter
getProxyDocClass, getRealDocClass, setProxyDocClass, setRealDocClass
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

rules

private ConversionRules rules
The rules to be used in conversion

file

private java.io.BufferedReader file
The reader to be used to read the raw data

EOF

private static final int EOF
Working constant

defaultRule

ConversionRule defaultRule
Working rule used in start state and when no rule is found for a tag.

defaultInField

Sgml2AppDocConverter.InField defaultInField
Working Infield for start state
Constructor Detail

Sgml2AppDocConverter

public Sgml2AppDocConverter()
Makes a new converter with no rules

Sgml2AppDocConverter

public Sgml2AppDocConverter(ConversionRules rules)
Makes a new converter given a set of conversion rules
Parameters:
conversionRules - collection-specific rules to guide conversion
Method Detail

setConversionRules

public void setConversionRules(ConversionRules rules)
Sets the conversion rules
Parameters:
conversionRules - collection-specific rules to guide conversion

getConversionRules

public ConversionRules getConversionRules()
Returns the conversion rules
Returns:
the collection-specific rules that guide conversion

setRawDocLocation

public void setRawDocLocation(java.lang.String loc)
                       throws java.io.FileNotFoundException
Gives the converter the information it needs to access the raw data for the documents to be converted. In the case of this converter this will be a fully qualified file name.
Parameters:
loc - fully qualified name of the file with the raw doc data
Overrides:
setRawDocLocation in class IrfConverter

ignoreN

public int ignoreN(int numToIgnore)
            throws java.io.IOException
Advances the reader to just beyond the end of the Nth (numToIgnore-th) document, if possible.
Parameters:
file - a file opened using a BufferedReader
numToIgnore - number of documents to skip
Returns:
number of documents skipped
Overrides:
ignoreN in class IrfConverter

convert1

public ProxyDocument convert1()
                       throws java.io.IOException
Converts one document from a file opened using a BufferedReader.
Parameters:
file - a file opened using a BufferedReader
Returns:
a proxy IR document or null
Overrides:
convert1 in class IrfConverter

buildDoc

public ProxyDocument buildDoc(java.lang.Class proxyDocClass,
                              java.lang.Class realDocClass,
                              java.util.Vector outFields)
Builds a ProxyDocument using the data on each field in the outFields vector.
Parameters:
InField - data for Start Document tag
vector - of data on fields to be constructed
Returns:
a proxy IR document