Using IRF as-is with the sample application

Introduction to SampleApp

SampleApp is the name of the sample application shipped with the framework. It has a simple command line interface and can be run interactively or in batch mode. Its reason for being is mainly to allow developers to exercise IRF code.

Running SampleApp

The application is started as follows:
java gov.nist.nlpir.irfapps.SampleApp [options]

The command line options are as follows:

Display the list of command line options.
Run in batch mode. If this option is not present then the sample application will expect input from standard input - see Interactive or Pseudo-Interactive mode. In "noinput" mode the default path to the raw document data and a query are relative and assume the current directory is
irmDir pathname
Use pathname instead of the default (current directory) to find/save the serialized IrfManager and related files. As shipped, these are contained in files with names starting "DB".
indexDir pathname
Use pathname instead of the default (current directory) to find/save the index data for a collection. As shipped, these are contained in files with names starting "DB". Saved with the IrfManager is the location of existing collections so the path to them is not needed once the IrfManager knows about them and has itself been saved.
Print detailed trace information during execution.
Print medium-level trace information during execution.
Print coarse trace information during execution.

Interactive operation

In interactive operation the user is prompted for the necessary information. Options are chosen from the main menu:
 1 - Display available collections
 2 - Choose a collection to work with
 3 - Present all indexes for current collection
 4 - Display index statistics for current collection
 5 - Display stopword statistics for current collection
 6 - Add document(s) to the current collection
 7 - Update current collection's indexes (idf etc)
 8 - Retrieve
 9 - Set indexing modalities for current collection
11 - Show IR manager
12 - Show statistics on all indexes for current collection
13 - Dump in-memory proxy table

 0 - QUIT

Some things to bear in mind:

Pseudo-interactive operation

On platforms which allow redirection of standard input, the input supplied by a user can be provided instead by a file. The contents of this file should mimic exactly the input to SampleApp as if the required operation were to be performed in interactive mode, with each input field separated by a newline. The file may also contain comments, which should be preceded with a "#". The irfapps directory contains a sample file for use with the HCI collection on Unix systems: testHci.inp. (It can also serve as a guide for an interactive session.) The file directs SampleApp to create a new index in the current directory, update the index, perform a retrieval operation on it, taking the query from the named file, and finally display the highest-ranking document before exiting. The relative paths assume the current directory is irfapps.

Usage examples

Assuming Unix and that current directory is irfapps:

java gov.nist.nlpir.irfapps.SampleApp coarse < testHci.inp
Once the application is shutdown, it can be started again as follows for interactive use of existing collections:
java gov.nist.nlpir.irfapps.SampleApp
To reindex, remove the DB files first.


This is what testHci.inp contains:

2					# choose/create a collection
0					# create new collection
.					# no name
.					# no description
.					# store in current directory
gov.nist.nlpir.irfapps.hci.Bib2AppDocConv # name of converter class
9					# set indexing modalities
d					# use default indexing modalities
0					# return to previous menu
6					# index 
hci/data/bib1000			# source of raw document data
1					# starting doc index
1000					# ending doc index
y					# index an index at a time
7					# update
8					# search
1					# weight of index
1					# weight of index
1					# weight of index
lin					# use linear combination of indexes
f					# take query from file
hci/data/q1				# source of query
1					# present first doc in result
0					# end doc display
n					# no more queries
0					# exit application

For information on weighting factors for indexing modalities and the methods for combining indexing modality results to give a final document score, see the section on retrieving.

Batch mode operation

SampleApp is run in batch, or noinput, mode by specifying the "noinput" command line option, as described above. The default noinput path to the raw document data and a query are relative and assume the current directory is Once noinput has been specified, the following additional command line parameters become available:

Perform indexing only.
Perform indexing and updating only.
Perform retrieval only, from existing collection number 1.
from X
Start indexing at document number X. Default is 1.
to Y
End the index at document number Y. Default is 20.
with Z
Index the document collection contained in file Z. Default is hci/data/bib1000.
query Q
Use the query contained in file Q. Default is hci/data/q1.
converter C
Use the converter class C. Default is gov.nist.nlpir.irfapps.hci.Bib2AppDocConv.
Index by indexing feature as opposed to by document, i.e., build all indexes one at a time by scanning each document only for the relevant indexing feature, as opposed to scanning each document for all indexing features and incrementing every index each time a document is scanned. Default is false.

Usage examples

Assuming Unix and that current directory is irfapps and no DB* files exist:

Perform default indexing and retrieval:

java gov.nist.nlpir.irfapps.SampleApp noinput
Reindex the first 50 documents but stop with indexing:
rm DB*; java gov.nist.nlpir.irfapps.SampleApp noinput to 50 index
Reindex the first 50 documents but stop before retrieval:
rm DB*; java gov.nist.nlpir.irfapps.SampleApp noinput to 50 update
Perform one default retrieval on the first existing collection:
java gov.nist.nlpir.irfapps.SampleApp noinput retrieve

Sample document collections

Two sample document collections are provided for use with SampleApp:
Name: HCI
Description: A collection of 1,000 short documents from the publicly available HCI bibliography of Ohio State University
Location: /.../gov/nist/nlpir/irfapps/hci/data/bib1000
Converter Class: gov.nist.nlpir.irfapps.hci.Bib2AppDocConv
Name: Cranfield 
Description: A collection of 1,400 publicly available documents containing aircraft design abstracts dating from the 1960s
Location: /.../gov/nist/nlpir/irfapps/trec/cran/data/cranfield.nsgmls
Converter Class: gov.nist.nlpir.irfapps.trec.cran.CranfieldConverter

National Institute of Standards and Technology Home Last updated: Tuesday, 01-Aug-2000 06:34:40 MDT

Date created: Monday, 31-Jul-00
For further information contact Paul Over ( with
copy to Darrin Dimmick (