java gov.nist.nlpir.irfapps.SampleApp [options]
The command line options are as follows:
MENU:Some things to bear in mind:
1 - Display available collections
2 - Choose a collection to work with
3 - Present all indexes for current collection
4 - Display index statistics for current collection
5 - Display stopword statistics for current collection
6 - Add document(s) to the current collection
7 - Update current collection's indexes (idf etc)
8 - Retrieve
9 - Set indexing modalities for current collection
11 - Show IR manager
12 - Show statistics on all indexes for current collection
13 - Dump in-memory proxy table0 - QUIT
On platforms which allow redirection of standard input, the input supplied by a user can be provided instead by a file. The contents of this file should mimic exactly the input to SampleApp as if the required operation were to be performed in interactive mode, with each input field separated by a newline. The file may also contain comments, which should be preceded with a "#". The irfapps directory contains a sample file for use with the HCI collection on Unix systems: testHci.inp. (It can also serve as a guide for an interactive session.) The file directs SampleApp to create a new index in the current directory, update the index, perform a retrieval operation on it, taking the query from the named file, and finally display the highest-ranking document before exiting. The relative paths assume the current directory is irfapps.
Assuming Unix and that current directory is irfapps:
java gov.nist.nlpir.irfapps.SampleApp coarse < testHci.inpOnce the application is shutdown, it can be started again as follows for interactive use of existing collections:
java gov.nist.nlpir.irfapps.SampleAppTo reindex, remove the DB files first.
This is what testHci.inp contains:
2 # choose/create a collection 0 # create new collection . # no name . # no description . # store in current directory gov.nist.nlpir.irfapps.hci.Bib2AppDocConv # name of converter class 9 # set indexing modalities d # use default indexing modalities 0 # return to previous menu 6 # index hci/data/bib1000 # source of raw document data 1 # starting doc index 1000 # ending doc index y # index an index at a time 7 # update 8 # search 1 # weight of index 1 # weight of index 1 # weight of index lin # use linear combination of indexes f # take query from file hci/data/q1 # source of query 1 # present first doc in result 0 # end doc display n # no more queries 0 # exit application
For information on weighting factors for indexing modalities and the methods for combining indexing modality results to give a final document score, see the section on retrieving.
SampleApp is run in batch, or noinput, mode by specifying the "noinput" command line option, as described above. The default noinput path to the raw document data and a query are relative and assume the current directory is ....gov/nist/nlpir/irfapps. Once noinput has been specified, the following additional command line parameters become available:
Assuming Unix and that current directory is irfapps and no DB* files exist:
Perform default indexing and retrieval:
java gov.nist.nlpir.irfapps.SampleApp noinputReindex the first 50 documents but stop with indexing:
rm DB*; java gov.nist.nlpir.irfapps.SampleApp noinput to 50 indexReindex the first 50 documents but stop before retrieval:
rm DB*; java gov.nist.nlpir.irfapps.SampleApp noinput to 50 updatePerform one default retrieval on the first existing collection:
java gov.nist.nlpir.irfapps.SampleApp noinput retrieve
Name: | HCI |
Description: | A collection of 1,000 short documents from the publicly available HCI bibliography of Ohio State University |
Location: | /.../gov/nist/nlpir/irfapps/hci/data/bib1000 |
Converter Class: | gov.nist.nlpir.irfapps.hci.Bib2AppDocConv |
Name: | Cranfield |
Description: | A collection of 1,400 publicly available documents containing aircraft design abstracts dating from the 1960s |
Location: | /.../gov/nist/nlpir/irfapps/trec/cran/data/cranfield.nsgmls |
Converter Class: | gov.nist.nlpir.irfapps.trec.cran.CranfieldConverter |