In search topics/queries, "contains x" or words to that effect are short for "contains x to a degree sufficient for x to be recognizable as x to a human" . This means among other things that unless explicitly stated, partial visibility or audibility may suffice.
The fact that a segment contains video of physical objects representing the feature target, such as photos, paintings, models, or toy versions of the target, will NOT be grounds for judging the topic(query) to be true for the segment. Containing video of the target within video may be grounds for doing so.
If the topic(query) is true for some frame (sequence) within the shot, then it is true for the shot; and vice versa. This is a simplification adopted for the benefits it affords in pooling of results and approximating the basis for calculating recall.
When a topic(query) expresses the need for x and y and ..., all of these (x and y and ...) must be perceivable simultaneously in one or more frames of a shot in order for the shot to be considered as meeting the need.
News magazine, science news, news reports, documentaries, educational programming, and archival video
Airport Security Cameras & Activity Detection
Video collections from News, Sound & Vision, Internet Archive,
Social Media, BBC Eastenders