--------------------------------------------------------------------- TRECVID 2007: Instructions for creation of video summary ground truth Version 7 B a c k g r o u n d: A good video summary shows the viewer segments containing examples of the main objects and events depicted in the video it summarizes, filtering out the *unclear* and the *predictable*. One way to evaluate such a summary is to have a human summarizer create a filtered list of such segments, each identified uniquely in terms of an object or event. Then the summary can be compared to the list to see how many of the desired objects/events (i.e., segments) it contains. Your task is to watch a video, select desirable segments, and then identify each uniquely by noting an object (animate or inanimate) or event (i.e., one or more objects involved in some action) occuring in the segment. The number of segments will vary with the video. That is OK. It is the nature of rushes that some scenes and parts of scenes will be shot multiple times. The varations in such retakes, while important to the director, will likely be below the level that matters to a highly compressed summary. That is, the summary need only include one instance. An exception might be something that goes wrong and might have a separate use from other takes that proceed mostly as expected. A desirable segment should not cross shot boundaries. You may identify multiple such segments within a single shot. Try not to include extremely short segments separately unless they seems very interesting. You can include segments from the unscripted portion of the video if they are substantial enough and seem as though they might be reusable. However, DO NOT include the starting/ending clap boards of scenes and takes or the color bars at the beginning. The objects/event cue for each desired segment should be as simple as possible while still identifying the segment uniquely within the video. Uniqueness is primary. For example if there are two women in the video and you want to include two segments (a closeup of each), you will need to specify some distinguishing modifiers in your list, e.g., "woman with glasses" versus "woman with red hair", so the person judging the summary against your list can tell when s/he has seen each of the women you designated. Use clear, concrete language - no specialized terminology - in each item. Each item needs to be independent of context - should not refer an other other, e.g., "view of road from different angle". Item should be clear even if we randomized the order of the list or used only a subset. Many videos contain alternate shots of some object/person at different ranges. Be sure to make clear which is which - this may mean mentioning what is visible (should and head vs head only). It should take one of the following forms. - object (no event or camera event) e.g., antique car old woman - object(s) + event e.g., red hot air balloon ascending people talking - object(s) + camera event e.g., pan across room zoom in on newspaper page - object(s) + event + camera event* e.g., zoom in on red hot air balloon ascending zoom in on blimp's cabin touching the water *The set of allowable camera events is limited to the following: zoom in, zoom out, or pan. Remember a zoom or pan is an event. A closeup is a state. In your annotation, list one segment, i.e., one object/event per line. P r o c e d u r e: Play the video at normal speed through one take of the scene, pick the distinct segments you want to select and enter them on the list as described above. Rewatch the scene to suppliment/check the list. Fast Forward through the other takes of the scene unless something really different and interesting happens. Continue in same fashion with any remaining scenes. C h e c k l i s t: 1. Is each line in your groundtruth UNIQUE? (as no two lines should be the same) 2. Is each line in your groundtruth INDEPENDENT? (as each line should stand on its own, eg "view of road from different angle" is NOT independent as it assumes you know what the original angle was before it became "different") 3. Is each line/event you have listed SIGNIFICANT? (don't list something unless it is clear and complete enough to be useful once found, except if its presence is surprising enough to trump its obscurity or incompleteness) 4. Is there ONE OBJECT/EVENT per line? (there should be no more than 1) 5. Does any line have any UNNECESSARY DETAIL? (only the minimum amount of detail that is needed to uniquely describe a line should be given) 6. Is there any line with only CAMERA MOVEMENT? (e.g "Camera Pans Right" probably needs more substance as it unlikely to be the only time in the video when the camera pans right, something like "Camera Pans Right onto an object" gives a more accurate description) so 1. UNIQUE? 2. INDEPENDENT? 3. SIGNIFICANT? 4. ONE OBJECT/EVENT? 5. UNNECESSARY DETAIL? 6. CAMERA MOVEMENT?