Notes on plans for DUC 2005 and beyond


------------------------------------------------------------------------------
A. Goals:

   1) find some real need for summarization and motivate/define 
      the evaluation framework from the point of view of one or more 
      realistic task scenarios
   2) move away from generic summaries of newspaper/wire to
      summaries of additional genre with respect to broad subject areas,
      but overlap in some ways with previous source types and tasks
   3) continue working on evaluation
      a) continue to support development, use, and testing of tools for
	 automatic evaluation, (e.g., ROUGE)
      b) continue to explore better ways of coverage evaluation, such
         as the Columbia pyramid suggestions
      c) work hard to build up extrinsic evaluation
   4) allow partial participation (by component, source type...)
   5) be open to evolution of goals in the nature of the task (fusion, 
      extraction, Q&A), input (not just text),and output (lists, outlines, 
      timelines, etc.

------------------------------------------------------------------------------
B. Characterization of summaries (i.e situation reports) in
  terms of Summarising Factors (Karen Sparck Jones 20 May 2004 -
  see "Background items" on DUC Roadmap 2005 - 2007 webpage.): 


 1) PURPOSE FACTORS
 
 a) the SITUATION, i.e. context, within which the summary is to be used
 
      for DUC 2005: Situation report as of a given date for use within a 
            crisis management organization
 
 b) the AUDIENCE for a summary  can be characterised as
 
 	Targeted for individual manager in the crisis management organization, 
            as opposed to a news release for example
 
 b) the USE, or function, for which the summary is intended
 
 	To provide background, current status (problems and responses),
	and as far as possible, likely development of the situation as  
        related to organization's role in situation 

 2) INPUT FACTORS
 
 a) the SUBJECT TYPE of the source
 
 	definition: Variety of subject types possible per topic-situation,
 	    but drawn from a limited set known to systems at development time
 
         For DUC 2005: Work within natural disasters framework, use 1998-2000
             as the timeframe (matches AQUAINT news data)
 
 b) the FORM of the source
 
  	definition: Variety of sources possible per topic-situation,
 	    but drawn from a limited set known to systems at development time
 
         For DUC 2005:  the main source types for 2005 will be government (UN)
              documents and newspaper/wire.  Additional possible sources include
              scientific documents, Usenet news threads, etc. (but see general
              issues) 
 
 	     There are many tables, images of maps, graphs, etc that
 	      could realistically be incorporated in a situation report
 	      but their inclusion in DUC is deferred until systems are
 	      ready to work on handling them.
 
              Other groups would be encouraged to contribute and manage 
              additional genre and media (non-English, speech, etc.).  
              In a similar vein, additional tasks could be proposed
              that would fit into this scenario, such as "headlines" 
              for easy click-down. 
 
              Manage is the operative word here, in that groups would be 
               responsible for providing additional documents to all, and 
              evaluating the results when using non-English text or additional
              tasks.  Note that it needs to be made clear how these additional 
              documents or tasks fit into the situation report scenario.
 
 c) the UNITS taken as source
 
 	Mostly multiple units per topic-situation, but there could be single
           units of a particular form
 
 3) OUTPUT FACTORS
 
 a) the MATERIAL of the summary, i.e. the information it gives, in
    relation to that in the source
 
         For DUC 2005: Use a single outline for all situations. It is based 
           mainly on the WHO situation report outline but modified based on 
           the structure of other situation reports found on the Web. Each 
           section will have a brief paragraph describing the kinds of 
           information that belongs there.  (Note that this is well above
           the level of MUC templates or current factoid QA).
 
         Purposed outline
 
         1. What happened
 
         2. Geographical area affected, including information on the affected
             infrastructure (bridges, roads, land under water, etc.)
 
         3. Populations affected, including information on morbidity, 
               mortality, homelessness,etc.
 
         4. Main needs
 
         5. Local/national response
 
         6. Regional/international response
 
         7. Social/political/geographical constraints
 
         8. Expected developments
  
 
 b) the FORMAT of the summary i.e. the way the summary information is
    expressed
 
         For DUC 2005: blocks of running text summarizing information 
         relevant to a given heading.  These blocks of text could be
         composed of either extracted sentences, or extracted phrases (long),
         of constructed phrases, or generated text.
 
 c) the STYLE of the summary i.e. the relationship to the content
    of the source
 
 	Informative, (Maybe later: aggregative ,critical)
 
 d) the EXPRESSION of the summary i.e. all the linguistic feaures
    of the summary this subsumes
 
 	Structured, somewhat technical, English narrative.
 
 e) the BREVITY of the summary i.e. relative or absolute scale (length)
    of the summary
 
         Each block of text is limited to 665 bytes (up for discussion by
         group)

------------------------------------------------------------------------------ 
C. EVALUATION
 
     For DUC 2005:  the evaluation will try to answer these questions using 
          the following means:
 
     a) How much of the requested info does the submission contain?
 
 	Assessors will create multiple reference reports for each section 
         in the proposed outline.  From these, some sort of list (hereby called
         the infolist, and likely using the Columbia pyramid scheme) of 
         main information items (reflecting the diversity of the reference 
         reports if possible) will be created, again, one list for each
         section.
 
 	1) Submissions will be evaluated using precision (how??)
 	   and recall against the infolist.
        2) use of SEE in which the model is the infolist and the peer is the
           block of text
        3) Submissions will be evaluated against the infolist and/or
           the reference summaries using some automatic method (e.g ROUGE)
 
      b) How usable is the submission? 

  	 Submissions will be evaluated in terms of the
	 time it takes an assessor to find each item
	 on the infolist or determine it is not included
	 in the report.  This is an extrinsic evaluation or a
         pseudo one.  If no information is found for
         a given category of the report, the systems should
         respond "nothing found".

      c) How linguistically well-formed is the submission?

  	 Evaluate the submission in terms of quality questions 
         as in DUC 2004

------------------------------------------------------------------------------
D.  PROPOSED TIMELINE (assuming meeting at HLT in October 2005)
 
   Starting NOW:   An organized set of pilot projects looking into this 
                   infolist idea; note that there has to be some kind of 
                   convergence for evaluation
 
   Dec 2004        NIST provides scenario template, test date range (1998-2000), 
                   test event types (natural disasters), several examples
 
                   Groups look for additional non-newpaper/wire data 
                   (non-English etc.)
 
                   Groups train systems for the new task
 
   June 1, 2005    NIST provides list of test events 
 
   By June 15      Groups select additional documents from non-newspaper/wire 
                   data and send to NIST for distribution
 
   June 15         NIST distributes all test documents
   
   July 1          Results due at NIST two weeks later  
   
   July 30         Results from evaluation out to participants
 
   Oct ??          DUC 2005 meeting (at HLT)

------------------------------------------------------------------------------		
E.. General issues to be resolved:

       1. Some scientific documents are available but it is not clear 
          how the detail they cover fits realistically into the outline.  
          There are Usenet news threads, but again it is not clear how that 
          sort of content would actually be used in writing a situation report.

       2. infolist: this is the holy grail we are all seeking that provides
          the "nuggets" of important information that needs to be contained
          in a summary.  This is related to the QA nuggets, to the Columbia
          pyramid scheme, etc.  The reason for continuing to pursue this
          is clear; how to do it is less clear.  

          One way to tackle this would be a pilot based on the Columbia
          pyramid scheme.  First, Columbia would develop written guidelines,
          possibly using a sample of DUC 2004 topics.  Then other groups 
          will try to use those guidelines, using additional topics.
          Finally NIST will use the "final" version of the guidelines
          with our assessors.  It is important that multiple groups
          work on this project so that this issue can be discussed with 
          better understanding within the whole community.  

       3. How does component analysis fit into this?