SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Okapi at TREC
chapter
S. Robertson
S. Walker
M. Hancock-Beaulieu
A. Gull
M. Lau
National Institute of Standards and Technology
Donna K. Harman
Bibliographic record access is also fast because there is no
indirection: the postings records directly address the biblio-
graphic records, so again there is only one disk access per
record.
File inversion is relatively slow and cpu-bound because of
the multi-pass linguistic processing during index term
extraction. As a rough guide, inversion runs at about one
minute per megabyte of indexable text on a lightly loaded
Sun 4/330.
Limits
Maximum bibliographic file size: 32 gigabytes
but maximum index size 4 gigabytes
Number of records per database: no practical limit
Postings per index term: no practical limit
Maximum amount of data which can be treated as a
`trecord" for retrieval purposes: this is a system
parameter usually set to 16 kilobytes. Up to 64 K or
more is acceptable.
Maximum field length: same as record size
Maximum number of fields per record: 31
Maximum index term length: 127 characters
Maximum number of terms in single query: 32
(interactive Okapi only)
30