TREC-8 Interactive Track Guidelines

Goal 
---- 

The high-level goal of the Interactive Track in TREC-8 remains the
investigation of searching as an interactive task by examining the
process as well as the outcome. To this end a experimental framework
has been designed with the following common features:

        - an interactive search task
        - 6 topics
        - a document collection to be searched
        - a required set of searcher (demographics) questionnaires
        - 6 classes of data to be collected at each site and submitted to NIST
        - 3 summary measures to be calculated by NIST for use by participants

The framework will allow groups to estimate the effect of their
experimental manipulation free and clear of the main (additive)
effects of participant and topic and it will reduce the effect of
interactions.

In TREC-8 the emphasis will be on each group's exploration of
different approaches to supporting the common searcher task and
understanding the reasons for the results they get. No formal
coordination of hypotheses or comparison of systems across sites is
planned for TREC-8, but groups are encouraged to seek out and exploit
synergies. As a first step, groups are strongly encouraged to make the
focus of their planned investigations known to other track
participants as soon as possible, preferably via the track listserv
at [email protected].  Contact track chair Bill Hersh to join.


General Description 
------------------- 

A minimum of 12 participating searchers, one experimental system, and
one control system per site will be required.  The control system can
be any IR system appropriate to the goals of the local experiment,
e.g. a variant of the local experimental system, some other baseline
system such as SMART, ZPRISE, etc.  (See "2. Augmentation" in the
detailed experimental design for information about how to use more
than eight searchers or more than one experimental system within this
design.)

Each searcher will perform six searches on the Financial Times of
London 1991-1994 collection (part of the TREC-8 adhoc collection),
using six topics especially chosen from the TREC-8 adhoc topics and
modified for use in the interactive track.  Each searcher will perform
half of the total number of searches on the site's experimental system
and the other half on its control system.  The experimental design
(see below) determines the order in which each searcher performs the
query and uses the systems (experimental and control).

In resolving experimental design questions not covered here (e.g.,
scheduling of tutorials and searches, etc.), participating sites
should try to minimize the differences between the conditions under
which a given searcher uses the control and those under which s/he
uses the experimental system. For example, running all the control
searches for a participant on one day and the searches on the
experimental system on another invites unequal, confounding
conditions.


Topics
------

Each of the topics will describe a need for information of a
particular type. Contained within the documents of the collection
to be searched will be multiple distinct examples or instances of the
needed information. The interactive topics will be modified versions
of specially selected adhoc topics. Here is an example TREC-6 adhoc
topic:

        Number: 303 

        Title: Hubble Telescope Achievements 

        Description: 
        Identify positive accomplishments of the Hubble telescope 
        since it was launched in 1991.

        Narrative: 
        Documents are relevant that show the Hubble telescope has 
        produced new data, better quality data than previously 
        available, data that has increased human knowledge of the 
        universe, or data that has led to disproving previously 
        existing theories or hypotheses.  Documents limited to the 
        shortcomings of the telescope would be irrelevant.  Details 
        of repairs or modifications to the telescope without 
        reference to positive achievements would not be relevant.


Here is an example of the same topic as it would be modified for use
in the TREC-8 interactive track. Note the addition of the "Please
save" paragraph and the removal of the usual Narrative section with
its specific criteria for relevance or non-relevance:

        Number: 303i 

        Title: Hubble Telescope Achievements 

        Description: 
        Identify positive accomplishments of the Hubble telescope 
        since it was launched in 1991.

        Instances:
        In the time alloted, please find as many DIFFERENT positive 
        accomplishments of the sort described above as you can.
        Please save at least one document for EACH such DIFFERENT 
        accomplishment.
        If one document discusses several such accomplishments, then 
        you need not save other documents that repeat those, since your 
        goal is to identify as many DIFFERENT accomplishments of the sort 
        described above as possible.


Here are the topics for TREC-8 in NUMERICAL order. See the section 
"Experimental design for a site" below for their assignment to blocks 
and the order of presentation within the experimental design.

Number: 
  408i 

Title:
  tropical storms 

Description: 
  What tropical storms (hurricanes and typhoons) have
  caused property damage and/or loss of life?

Instances:
  In the time alloted, please find as many DIFFERENT storms of 
  the sort described above as you can. Please save at least one 
  document for EACH such DIFFERENT storm.
  If one document discusses several such storms, then you need
  not save other documents that repeat those, since your goal 
  is to identify as many DIFFERENT storms of the sort described 
  above as possible.


Number: 
  414i 

Title: 
  Cuba, sugar, imports 

Description: 
  What countries import Cuban sugar?

Instances:
  In the time alloted, please find as many DIFFERENT countries of 
  the sort described above as you can. Please save at least one 
  document for EACH such DIFFERENT country.
  If one document discusses several such countries, then you need
  not save other documents that repeat those, since your goal 
  is to identify as many DIFFERENT countries of the sort described 
  above as possible.


Number: 
  428i 

Title: 
  declining birth rates 

Description:
  What countries other than the US and China have or have had
  a declining birth rate? 

Instances:
  In the time alloted, please find as many DIFFERENT countries of 
  the sort described above as you can. Please save at least one 
  document for EACH such DIFFERENT country.
  If one document discusses several such countries, then you need
  not save other documents that repeat those, since your goal 
  is to identify as many DIFFERENT countries of the sort described 
  above as possible.
 

Number: 
  431i 

Title: 
  robotic technology 

Description: 
  What are the latest developments in robotic technology 
  and in its use?

Instances:
  In the time alloted, please find as many DIFFERENT developments of 
  the sort described above as you can. Please save at least one  
  document for EACH such DIFFERENT development.
  If one document discusses several such developments, then you need
  not save other documents that repeat those, since your goal 
  is to identify as many DIFFERENT developments of the sort described 
  above as possible.


Number: 
  438i 

Title: 
  tourism, increase 

Description:
  What countries have experienced an increase in tourism? 

Instances:
  In the time alloted, please find as many DIFFERENT countries of 
  the sort described above as you can. Please save at least one 
  document for EACH such DIFFERENT country.
  If one document discusses several such countries, then you need
  not save other documents that repeat those, since your goal 
  is to identify as many DIFFERENT countries of the sort described 
  above as possible.


Number:
  446i
 
Title: 
  tourists, violence 

Description: 
  In what countries have tourists been subject to
  acts of violence causing bodily harm or death?

Instances:
  In the time alloted, please find as many DIFFERENT countries of 
  the sort described above as you can. Please save at least one 
  document for EACH such DIFFERENT country.
  If one document discusses several such countries, then you need
  not save other documents that repeat those, since your goal 
  is to identify as many DIFFERENT countries of the sort described 
  above as possible.


Searcher task
-------------

The task of the interactive searcher is to save documents, which,
taken together, contain as many different instances as possible of 
the type of information the topic expresses a need for - within
a 20 minute time limit.

Searchers will be encouraged to avoid saving documents which
contribute no instances beyond those in documents already saved, but
there will be no scoring penalty for saving such documents and
searchers will be told that.

Instructions to be given to searchers
-------------------------------------

The following introductory instructions are to be given once to each
searcher before the first search:

        "Imagine that you have just returned from a visit to your doctor 
        during which it was discovered that you are suffering from high
        blood pressure. The doctor suggests that you take a new experimental
        drug, but you wonder what alternative treatments are currently 
        available.  You decide to investigate the literature on your own
        to satisfy your need for information about what different 
        alternatives are available to you for high blood pressure treatment.
        You really need only one document for each of the different 
        treatments for high blood pressure. 

        You find and save a single document that lists four treatment drugs.
        Then you find and save another two documents that each discusses a
        separate alternative treatment: one that discusses the use of
        calcium and one that talks about regular exercise.  You've run out 
        of time and stop your search. In all, you have identified six 
        different instances of alternative treatments in three documents. 

        ---

        In this experiment, you will face a similar task. You will be 
        presented with several descriptions of needed information on a 
        number of topics. In each case there can be multiple examples or 
        instances of the type of information that's needed.

        We would like you to identify as many different instances as you
        can of the needed information for each topic that will be presented 
        to you -  as many as you can in the 20 minutes you will be given 
        to search.  Please save one document for EACH DIFFERENT instance 
        of the needed information that you identify. If you save one 
        document that contains several instances, try not to save additional
        documents that contain ONLY those instances. However, you will not 
        be penalized if you save documents unnecessarily.  

        As you identify an instance of the needed information, please keep 
        track of which instances you have found: write down a word or short 
        phrase to identify the instance, or--if the system provides a 
        facility to keep track of instances--use it.
        
        Carefully read each topic to understand the type of information 
        needed. This will vary from topic to topic. On one topic you may be 
        looking for instances of a certain kind of event. On another you may 
        be searching for examples of certain sorts of people, places, or 
        things.

        Do you have any questions about 
        - what we mean by instances of needed information 
        - the way in which you are to save nonredundant documents for each
          instance?"

Searcher questionnaires (minimum)
-----------------------

Provided by Rutgers (see track web site)


Data to be collected and submitted to NIST (emailed to [email protected])
------------------------------------------

Several sorts of result data will be collected for evaluation/analysis (for
all searches unless otherwise specified):


   ===>  Due at NIST by 30. August 1999:

        1. sparse format data   


   ===>  Due at NIST by when the site reports for the conference are due:

        2. rich format data

        3. a full narrative description of one interactive session for
           whichever topic is designated as T1

        4. any further guidance or refinement of the task specification
           given to the searchers

        5. data from the common searcher questionnaires

Sparse format data for each search will comprise the list of documents
saved and the elapsed clock time of the search. The searcher's
selection (choice) of items for the final output list must be
identified in terms of each document's TREC document identifier
(DOCNO). The elapsed (clock) time in seconds taken for the search,
from the time the searcher first sees the topic until s/he declares
the search to be finished, should be recorded.  It is assumed that the
interactive search takes place in one uninterrupted session.  If a
session is unavoidably interrupted, it is recommended that it be
abandoned and the topic given to another searcher.  Sparse format data
will be the basis for the summary evaluation at NIST, which will
produce a triple for each search: instance precision, instance
recall, and elapsed clock time.

Rich format data for each search will record:

- the word or phrase each searcher records to describe each
  instance s/he identifies (no reference to the containing document(s))

- significant events in the course of the interaction and their 
  timing.  

          Rich format data are intended for analytical evaluation by the 
          experimenters.
 
          All significant events and their timing in the course of the 
          interaction should be recorded.  The events listed below are those 
          that seem to be fairly generally applicable to different systems 
          and interactive environments; however, the list may need extending 
          or modifying for specific systems and so should be taken as a 
          suggestion rather than a requirement:

          o Intermediate search formulations:  if appropriate to the 
            system, these should be recorded.

          o Documents viewed:  "viewing" is taken to mean the searcher 
            seeing a title or some other brief information about a 
            document; these events should be recorded.

          o Documents seen:  "seeing" is taken to mean the searcher 
            seeing the text of a document, or a substantial section of 
            text; these events should be recorded. 

          o Terms entered by the searcher:  if appropriate to the 
            system, these should be recorded.

          o Terms seen (offered by the system):  if appropriate to the 
            system, these should be recorded.

          o Selection/rejection:  documents or terms selected by the 
            user for any further stage of the search (in addition to the 
            final selection of documents). 

Format of sparse data to be submitted to NIST
---------------------------------------------

TWO files from each site
        
  A. Search file

        Here a "search" is the interaction of a searcher given a topic
        and asked to carry out the interactive search task using a given 
        system against the collection - lasting at most 20 minutes.

        One line for EACH SEARCH, each line containing the 
        following blank-delimited items from left to right:

                1. Unique site ID

                2. Search ID  - site's choice (links search & document files)

                3. Searcher ID - site's choice

                4. System ID - site's choice

                5. TREC topic number
                        
                6. Elapsed time - number of secs., fractions truncated

                   Clock time from the moment the searcher sees the 
                   topic until the moment the searcher indicates the 
                   search is complete or time is up.

  B. Documents file

        One line for each document in a given search result,
        each line containing the following blank-delimited
        items from left to right:

                1. Chronological sequence number ( "1", "2") within a search
                   Use number of last time saved if saved multiple times.
        
                2. Search ID (from search file)

                3. TREC document identifier (DOCNO)     


        NOTE: Reported data items listed within each line must NOT 
        contain whitespace.     


Format of other data to be submitted to NIST
--------------------------------------------

Data other than that in sparse-format should be submitted as ASCII text
files.

The FA-1 score plus the questionaire data for each searcher should be 
submitted in a separate file with format close to the following example
but with the real responses to the right of the colons. The Tutorial
Worksheet and Experimenter Note need not be submitted.


        S i t e:

        S e a r c h e r  I D:

        FA-1 score:  ?

        P r e - s e a r c h :                   (1 per searcher)

        Searcher:       id
        Condition:      ?
        Degrees:        degree major date
        Degrees:        degree major date
        Degrees:        degree major date
        Degrees:        degree major date
        Degrees:        degree major date
        Occupation:     ...
        Gender:         M | F
        Age:            nn
        Previous TREC:  Y | N
        Online searching: nn
        Q1:             1-5
        Q2:             1-5
        Q3:             1-5
        Q4:             1-5
        Q5:             1-5
        Q6:             1-5
        Q7:             1-5
        Q8:             1-5


        S e a r c h :                           (8 per searcher)

        Searcher:       id
        Condition:      ?
        Topic #:        nnn
        Q1:             1-5
        Q2:             1-5
        Q3:             1-5
        Q4:             1-5
        Q5:             1-5
        Q6:             1-5


        P o s t - s y s t e m :                 (2 per searcher)

        Searcher:       id
        Condition:      ?
        Q1:             1-5
        Q2:             1-5
        Q3:             1-5
        Comments:       ...

        S e a r c h e r   w o r k s h e e t :   (8 per searcher)

        Searcher:       id
        Condition:      ?
        Topic #:        nnn
        1.              ...
        2.              ...
        3.              ...
        .
        .
        .
        

        E x i t :                               (1 per searcher)

        Searcher        id
        Q1:             1-5
        Q2:             1-5
        Q3:             1-5
        Q4:             one-system's-name       rank
                        other-system's-name     rank
        Q5:             one-system's-name       rank
                        other-system's-name     rank
        Q6:             one-system's-name       rank
                        other-system's-name     rank
        Q7:             ...
        Q8:             ...
        Q9:             ...

Evaluation of data submitted to NIST
------------------------------------

Evaluation by NIST of the sparse format data will proceed as follows.
For each topic, a pool will be formed containing the unique documents
saved by at least one searcher for that topic regardless of site.

For each topic, the NIST assessor, normally the topic author, will be asked 
to:
        - read the topic carefully 
        - read each of the documents from the pool for that topic and 
          gradually:
           - create a list of instances of the topic's needed information
             type found somewhere in the documents
           - select and record a short phrase describing each instance found
           - determine which documents contain which instances
           - bracket each instance in the text of the document in which it 
             was found

For each search (by a given participant for a given topic at a given site), 
NIST will use the submitted list of selected documents and the assessor's
instance-document mapping for the topic to calculate:

        - the fraction of total instances (as determined by the assessor) for 
          the topic that are covered by the submitted documents (i.e., 
          instance recall)

        - the fraction of the submitted documents which contain one or more
          instances (i.e., instance precision)  

The third measure, elapsed clock time, will be taken directly from the 
submitted results for each search.


Experimental design for a site
------------------------------

  1. Minimal experimental matrix as run

    The design for this year's track departs from last year's.  One limitation
    of last year's balanced block design was the potential statistical
    confouding of topic and its order.  A design that controls for query order
    leads to a simpler statistical analysis of results.

    As such, this year's approach will insure that each query is searched in
    each position (first through sixth) by each system.  This requires a
    minimum of 12 searchers per site.  In addition, the query orders for
    each site will need to be generated in a pseudorandom fashion.  To make
    this process consistent, the query orders will be generated by the OHSU
    group.  Below is an example of system-query order for a site.  (NOTE:
    Please do not use this example, as new sets must be generated for each
    12-searcher block.)

                 Subject       Block #1           Block #2

                     1      System 1: 6-1-2    System 2: 3-4-5
                     2      System 2: 1-2-3    System 1: 4-5-6
                     3      System 2: 2-3-4    System 1: 5-6-1
                     4      System 2: 3-4-5    System 1: 6-1-2
                     5      System 1: 4-5-6    System 2: 1-2-3
                     6      System 1: 5-6-1    System 2: 2-3-4
                     7      System 2: 6-1-2    System 1: 3-4-5
                     8      System 1: 1-2-3    System 2: 4-5-6
                     9      System 1: 2-3-4    System 2: 5-6-1
                    10      System 1: 3-4-5    System 2: 6-1-2
                    11      System 2: 4-5-6    System 1: 1-2-3
                    12      System 2: 5-6-1    System 1: 2-3-4

    Query blocks should be requested from Bill Hersh as early as possible.

  2. Augmentation

     The design for a given site can be augmented in two ways:

       1. Participants can be added in groups of 6 using the design
          above.  Additional blocks should be requested from Bill
          Hersh.

       2. Systems can be added by adding additional groups of 6 users
          with each new system.  Additional blocks should be requested
          Bill Hersh.

     Topics cannot be added/subtracted individually for each site. 

     All augmentations other than the two listed above, however interesting, 
     are outside the scope of this design. If sites plan such adjunct 
     experiments, they are encouraged to design them for maximal synergy 
     with the track design.

 3. Analysis

     Up to each group, but all are strongly encouraged to take advantage
     of the experimental design and undertake:

        1. exploratory data analysis

           to examine the patterns of correlation, interaction, etc.
           involving the major factors. Some example plots for the TREC-6
           interactive data (recall or precision by searcher or topic)
           are available on the Interactive Track web site 
           <> under "Interactive Track History".
           
        2. analysis of variance (ANOVA), where appropriate,

           to estimate the separate contributions of searcher, topic and 
           system as a first step in understanding why the results of one 
           search are different from those of another
Last updated:

Date created: Monday, 31-Jul-00
For information about this webpage contact Paul Over