IRF Basics

What is IRF?

IRF is a freely available object-oriented framework for information retrieval (IR) applications. A framework is software which defines the architecture of a group of related applications and supplies many of the basic components out of which they can be built, allowing application developers to extend the basic components to meet their special needs. IRF comes with a very simple text IR application (SampleApp) built on the framework and two small text collections HCI and Cranfield. It was developed by the Retrieval Group in the Information Access Division at the National Institute of Standards and Technology (NIST).

Acknowledgements

We gratefully acknowledge that the starting point for IRF was the proprietary FIRE system developed in C++ by researchers at what was then UBS's Information Technology Laboratory. We appreciate their support. They deserve credit for many good design decisions that survive in IRF, which started as a translation of the core elements of FIRE. They are of course not responsible for any problems we may have have introduced into IRF in the translation and modifications we carried out.

Restrictions on IRF's use

Both IRF and the sample application are coded entirely in Java^TM. The BufferedRandomAccessFile class was developed outside of NIST and is covered by a GNU General Public License. All other classes were produced at the National Institute of Standards and Technology by employees of the Federal Government in the course of their official duties and, pursuant to title 17 Section 105 of the United States Code, are not subject to copyright protection. They are in the public domain. and can be redistributed and/or modified freely provided that any derivative works bear some notice that they are derived from it, and any modified versions bear some notice that they have been modified.

Disclaimers

IRF is an experimental system. NIST assumes no responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. We would appreciate acknowledgement if the software is used. Please note that any mention of commercial products in the Guide or other IRF documentation is for information only; it does not imply recommendation or endorsement by NIST.

What are its reasons for being?

In early 1998 the Retrieval group at NIST began planning a follow-on to our public domain text information retrieval (IR) package ZPRISE - one which would be more portable, extendible, and better serve us and other research groups with increasing interest in non-text and mixed media information retrieval. IRF and applications built on it are intended to help meet the often stated needs of IR researchers, developers, and educators for software building blocks they can use as-is or open, study, and modify rather than reinvent. We decided that the best solution would be a freely available object-oriented IR framework.

Current status

We have accomplished much of what we set out to do:

Translation to Java
Source code documentation
Removal of the dependency on ETOS
Removal of the dependency on ObjectStore
Implementation of a broker/proxy architecture design to accommodate various persistence mechanisms
Development of simple file-based persistence

We also investigated the extent to which we could stretch the current implementation to handle significant amounts of text e.g., a TREC-sized collection (500 mB - 2 gB). We did this as an experiment, knowing that the requirements for operational speed would often run counter to the main design goal of IRF. That goal is speed in (prototype) application code development due to:

good modelling for maximal reuse across IR applications dealing with various (combinations of) media
the use of a language like Java

Not surprisingly, we didn't achieve the performance we were aiming for with even one of the TREC collections (e.g. 500mB of news).

Changes in the implementation might produce a faster application, but we are inclined to believe that the best use of IRF is in rapid prototyping of applications with small to moderate amounts of data to find the best approach. Once the approach has been determined, a largely separate real application or more nearly operational prototype can be built, sacrificing flexibility for optimal performance, as needed.

In making the code generally available, we invite others to use and improve IRF in whatever ways best serve their educational and research purposes.

National Institute of Standards and Technology Home

Last updated:

Date created: Monday, 31-Jul-00
For further information contact Paul Over ([email protected]) with
copy to Darrin Dimmick ([email protected])