Use of resources

An application running on IRF is likely to require significant resources:

Here are some data on the resources used by IRF/SampleApp to index and retrieve from a collection of 25,000 text documents from the Foreign Broadcast Information Service that comprise approximately 90 megabytes of data.

The SampleApp was run on an otherwise very lightly loaded UltraSparc Model 60 (296mHz, 512mB of RAM, 1gB of swap) using Java virtual machine size settings of 400mB initial and max. Runs on PCs with about the same clock speeds and available memory took significantly longer.

Note that the raw document text is not requried after conversion and indexing - the indexes contain it. Two indexes are built for each document field - one by document and one by term. Positional information is kept in the index.

         Disk 
        space File
   (in bytes) name
  ----------- ----------------------
      440 128 DB.FBIS
  135 846 760 DB.HanD
   87 346 635 DB.HTML
       85 334 DB.IF
    3 409 987 DB.IFWS
  157 021 391 DB.IFWT
       21 525 DB.Indx
            1 DB.OID
    3 625 815 DB.Str
  166 388 586 DB.VecP
        3 092 DBdocAbstractindexsBv0
       46 642 DBdocAbstractindexvBs0
           67 DBdocNumberindexsBv0
      625 017 DBdocNumberindexvBs0
          219 DBfileNames
    2 385 092 DBtextindexsBv0
      625 017 DBtextindexvBs0
      285 242 DBtitleindexsBv0
      624 167 DBtitleindexvBs0
  -----------
  558 780 717  = 6.2 times raw doc collection size

Elapsed seconds to convert and index = 	21108 =  5.9 hrs
Elapsed seconds to update = 		21549 =  6.0 hrs.

We tested retrieval by starting the sample application, choosing the existing FBIS collection, and performing 100 retrievals: 50 for which the queries were the 2- or 3- word title sections from the TREC-8 ad hoc topics (401-450) and 50 for which the queries were the sentence-length description sections from the same topics. Here are the total elapsed time in seconds for each retrieval and the number of documents retrieved:

TREC
Topic
 |  Title      Description
 |  --------   -----------
 |  Time       Time
 |   |  Docs    |   Docs
 |   |   |      |    |
401 262 9073   572 17613
402   2  525   116  7779
403   0    0   144 10082
404  82 7846   208 12370
405  10 1832   221 10522
406   2  325    77  6440
407   4  925    77  6639
408   1  231    41  4816
409  13 2229   109  9496
410  38 4901   220 12472
411   1  122    74  6365
412  58 6238   252 11229
413  44 3769   193 10845
414  12 2209   341 12894
415   6 1057    93  7081
416  14 2358     9  1373
417   5 1096   118  8092
418   4 1022   170 10213
419   2  394   413 15039
420   1  222     7   999
421  31 4156    77  6070
422   3  574     9  1320
423   1  119     9  1076
424   0  115    35  4210
425   9 1894    21  2781
426  36 4111   179 11112
427   7 1370   111  8060
428  24 3264   228 11941
429   1  328     7  1134
430  13 2042    41  4785
431  10 1610   114  7652
432  13 2108    69  7229
433   3  533     4   647
434  29 3442   816 12191
435  18 3005   348 14510
436   3  657    13  2258
437   9 1530    87  7480
438  40 4533   239 12295
439   6 1135   163  9966
440   7 1464   309 13205
441   1  324    11  2429
442  17 2816   157 10396
443  21 3084   276 12290
444   1   49    65  6432
445  10  705   625 17903
446   6 1049    49  5965
447   5  882   242 12892
448   7 1552    81  7431
449   1  123   208 12143
450  40 4528   321 13921

National Institute of Standards and Technology Home Last updated: Tuesday, 01-Aug-2000 06:34:34 MDT

Date created: Monday, 31-Jul-00
For further information contact Paul Over (over@nist.gov) with
copy to Darrin Dimmick (ddimmick@nist.gov)