<DOC> <DOCNO> SP500215 </DOCNO> <TITLE> NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) </TITLE> <SUBTITLE> Appendix B: System Features </SUBTITLE> <TYPE> Appendix </TYPE> <PAGE CHAPTER="B" NUMBER="19"> <AUTHOR1> </AUTHOR1> <PUBLISHER> National Institute of Standards and Technology </PUBLISHER> <EDITOR1> D. K. Harman </EDITOR1> <COPYRIGHT MTH="March" DAY="" YEAR="1994" BY="National Institute of Standards and Technology"> </COPYRIGHT> <BODY> B. CONSTRUCTION OF INDICES, KNOWLEDGE BASES, AND OIlIER DATA STRUCTURES-- STATISTICS ON DATA SmUCTURES (C OTES: ] Occurrence statistics for the most frequently occurring (in learning set rel docs) 1000 terms for each routing query. ] For the adhoc runs, the `query regression' method was used. The query regression coefficients were computed from the query.nnn and doc.lsp-file (wh created by polynomial regression). Afterwards reweighting of the q3-query-file. 4 query.nnn -> query.lsp. ] Because we used the UMASS INQUERY system and its indexing, all of the answers to the questions in this section for our systems are identical to tt the UMASS system. ] Document vector files and term dictionary produced by SMART: Fach individual collection was indexed separately, so sizes/times are average per col with the range of values specified. The collection statistics are based on the summation of individual collection values so are perhaps less accura collection size of the term dictionary cannot be effectively estimated with this approach. Term positions are not stored within the document vecto Average Range Collection Document Vector Files (MB) 120 31-124 1100 Term Dictionary (MB) 16 15-17 Unknown Time to create both above files (Hours) 10 6-14 120 5] Standard process as implemented by SMART, following parameters as in Part I, Section A. </BODY> </PAGE> </DOC>