The purpose of this email discussion list is to come up with a proposal for incorporating additional rushes video into an evaluation in TRECVID 2006. Rushes are, we have heard from the BBC and others a potentially very valuable source of video for reuse but it is largely untapped because it is very difficult to find out what is there. A case can perhaps also be made that rushes share some characteristics of the "ground reconnaissance" data of interest to various intelligence organizations.
This marks the beginning of the discussion among interested 2005 groups. More people may be added once the call for TRECVID 2006 is out next week. The proposal will go to the entire trecvid2006 list when it exists and must be complete well before the end of February.
Here are some facts we can't lose sight of:
We can start off brainstorming about just real world tasks or the system versions but will very soon have to answer the questions about the other facets of the evaluation.
Here is one possible task for discussion. It involves the Discovery data - let's assume for discussion purposes we can get the use of it. (I don't know how to include the BBC data without adding in manual work for training data creation and results judging.)
It's based on the idea that we could (semi-)automatically create training data and truth data for testing by analyzing the given Discovery verbal clip descriptions to determine which test feature names (from LSCOM-lite (39)?, MediaMill (101)?, full LSCOM (?00) ?) are present and therefore which features should be present in each clip.
Real world task - archivist writes short verbal description (can be several phrases, even sentences) of each clip's content - an enormous effort for a large clip library.
System task - assist the archivist by automatically assigning applicable features to each test clip as in the TRECVID 2006 high-level feature task against broadcast news. (Build foundation for better browsing/search)
Evaluation - measure how well (precision/recall, average precision) systems do at finding the clips with each feature. This could be done automatically using existing software given the truth data derived from the content descriptions for the test clips.
Here is a suggestion for the rushes task. Interested sites develop a system to allow search, browse and identify relevant shots based on a text-only topic ... mirrors the situation whereby somebody sticks their head in your office and asks you to find shots which are about "X" without having any illustrative examples, oh and by the way, I need them in 5 minutes.
Sites can run whatever analysis they want on the rushes, features, shot bounds, keyframe selection, ASR if they can get a French version.
Sites run a cut-down version of the interactive search task with a 5- minute limit per topic-search, and the task is to find as many as possible. Sites then submit only the "found" shots by giving the video file name and frame offsets for start/end of the "relevant" shots - yes, we could have problems with frame offset numbering here. Immediate disadvantage here is that we loose the common shot boundaries making actual judgments more difficult. Also raises the issue of how much of a shot is needed for it to be relevant, and how relevant is my retrieval of 10 seconds of a shot vs. somebody else's 5 seconds of the same shot ?
NIST then assess but are assessing only the (likely to be) relevant shots, which might require less judgment than the pooling of top- submitted. This might even allow greater than the usual 25 topics.
Might even be possible to do graded relevance judgments here ... relevant/very relevant/not relevant ? Perhaps this could be in terms of how much (how many seconds) of the clip a run has identified ?
Evaluation is in terms of how many relevant (or relevant/very relevant) shots a site finds. We could normalise this per-topic/per- site score for the more difficult topics vs. easier in terms of numbers of relshots found by ALL groups. Might seem crude but it rewards sites/systems which are good at finding a greater number of relshots in a fixed, short amount of time.
Without the usual pre-topic formation work done by NIST whereby evaluators view the content a priori and ascertain that there are actually shots in the collection which are relevant, we might find some topics have zero relevant shots, and some have very many but that's the real world.
Advantages are that this would force us to deal with video in our systems, force us to cut out and mark boundaries of video clips as opposed to pre-defined, pre-bound keyframes.
Starting with the earliest...
Hi All,
The purpose of this email discussion list is to come up with a proposal
for incorporating additional rushes video into an evaluation in TRECVID
2006. Rushes are, we have heard from the BBC and others a potentially
very valuable source of video for reuse but it is largely untapped because
it is very difficult to find out what is there. A case can perhaps also
be made that rushes share some characteristics of the "ground reconnaissance"
data of interest to various intelligence organizations.
This marks the beginning of the discussion among interested 2005
groups. More people may be added once the call for TRECVID 2006 is out
next week. The proposal will go to the entire trecvid2006 list when it
exists and must be complete well before the end of February.
Here are some facts we can't lose sight of:
1) We have another 50 hours of BBC rushes about the "French Experience" in
MPEG-1. Alan Smeaton can say more about the content and any metadata that
might be included.
2) We *may* get some data from Discovery Communications, Inc - at least
50 hours of what they call "extra" or "B-roll" video with mostly only natural
sound, (poorer quality that we are used to) 300-500 kbps in Windows Media
format, 1-5 mins clips, with detailed manually-created scene descriptions
(which we could possibly mine for feature training and evaluation truth
data), and covering a very wide range of topics such as science, travel,
adventure, culture, health, history, military, natural history, biography,
etc.
3) An evaluation that is part of TRECVID needs to have all of the following:
- a real world task/scenario we are trying to model and that's important
- a simplified, laboratory version of the above: the system task
- a procedure for evaluating the system's performance of the task
against a known reference that represents desirable system
performance
We can start off brainstorming about just real world tasks or the system
versions but will very soon have to answer the questions about the other
facets of the evaluation.
4) The struction of the evaluation means we usually need
- development data
- reference data for development
- test data (possibly including problem statements, queries, etc.)
- reference data for test (created independent of system output or
based on judging submissions)
5) Right now, all the NIST judging capability is reserved for the search and
high-level feature task assessments. But it's possible that about 200 hours
of additional judging of some kind could be done in the mid-summer time frame
- that would be the usual 10 NIST assessors working 4 hrs/day for 5 days (For
reference they work for 14 halfdays judging feature submissions and the same
amount of effort is spent in judging search results.)
6) In designing a rushes evaluation we'd like to do something that is of
interest to researchers and to BBC and Discovery and the TRECVID sponsor (the
US Intelligence Community). E.g., some demonstrated ability to assist searching,
browsing, summarization, or even just assigning useful keywords would, as many
have heard from various sources in various forums, be much appreciated.
So having thrown all that onto the table.... let the discussion begin. Just
reply-all to keep everyone in the loop.
- Paul Over
Folks
On the BBC rushes data ... BBC sent me 115 CD-Rs of which:
- 101 had proper video files (total 34.2G, about 53 hours);
- 6 are blank;
- 8 contain inappropriate data (not recognised files, some don't
play, some look like some applications).
We've downloaded them all, and are putting on a HDD and shipping to
Paul next week.
Each CD has one video file only, with the file name not meaningful in
any way and each video begins with a few seconds of a testcard
(horizontal stripes of different rainbow-like colours), and then the
video. There are some shot cuts in some sequences, but long sequences
are using a single camera ... very like the stuff used in 2005. The
content is as varied as:
- a group of female jazz singers in a studio recording a song;
- a group of people tasting bananas in a lab;
- a farmer cross-pollinating bananas;
- a subject talking to an interviewer in an office setting;
Most of the people in the video are of African origin, much of the
content appears shot in a tropical climate, and ... wait for it ...
for all the content we've looked at the speech is in French.
There is no metadata or description at all and the quality of the
video seems to be very good indeed, though we haven't looked at the
statistics.
This is certainly challenging, forcing a focus on visual aspects, very
real-world.
- Alan Smeaton
Hi, Here is one possible task for discussion. It involves the Discovery data - let's assume for discussion purposes we can get the use of it. (I don't know how to include the BBC data without adding in manual work for training data creation and results judging.) It's based on the idea that we could (semi-)automatically create training data and truth data for testing by analyzing the given Discovery verbal clip descriptions to determine which test feature names (from LSCOM-lite (39)?, MediaMill (101)?, full LSCOM (?00) ?) are present and therefore which features should be present in each clip. Real world task - archivist writes short verbal description (can be several phrases, even sentences) of each clip's content - an enormous effort for a large clip library. System task - assist the archivist by automatically assigning applicable features to each test clip as in the TRECVID 2006 high-level feature task against broadcast news. (Build foundation for better browsing/search) Evaluation - measure how well (precision/recall, average precision) systems do at finding the clips with each feature. This could be done automatically using existing software given the truth data derived from the content descriptions for the test clips. Any thoughts? - Paul Over
Folks Here is a suggestion for the rushes task. Interested sites develop a system to allow search, browse and identify relevant shots based on a text-only topic ... mirrors the situation whereby somebody sticks their head in your office and asks you to find shots which are about "X" without having any illustrative examples, oh and by the way, I need them in 5 minutes. Sites can run whatever analysis they want on the rushes, features, shot bounds, keyframe selection, ASR if they can get a French version. Sites run a cut-down version of the interactive search task with a 5- minute limit per topic-search, and the task is to find as many as possible. Sites then submit only the "found" shots by giving the video file name and frame offsets for start/end of the "relevant" shots - yes, we could have problems with frame offset numbering here. Immediate disadvantage here is that we loose the common shot boundaries making actual judgments more difficult. Also raises the issue of how much of a shot is needed for it to be relevant, and how relevant is my retrieval of 10 seconds of a shot vs. somebody else's 5 seconds of the same shot ? NIST then assess but are assessing only the (likely to be) relevant shots, which might require less judgment than the pooling of top- submitted. This might even allow greater than the usual 25 topics. Might even be possible to do graded relevance judgments here ... relevant/very relevant/not relevant ? Perhaps this could be in terms of how much (how many seconds) of the clip a run has identified ? Evaluation is in terms of how many relevant (or relevant/very relevant) shots a site finds. We could normalise this per-topic/per- site score for the more difficult topics vs. easier in terms of numbers of relshots found by ALL groups. Might seem crude but it rewards sites/systems which are good at finding a greater number of relshots in a fixed, short amount of time. Without the usual pre-topic formation work done by NIST whereby evaluators view the content a priori and ascertain that there are actually shots in the collection which are relevant, we might find some topics have zero relevant shots, and some have very many but that's the real world. Advantages are that this would force us to deal with video in our systems, force us to cut out and mark boundaries of video clips as opposed to pre-defined, pre-bound keyframes. - Alan Smeaton
Thanks Alan. I'm going to try to keep track of the suggestions here:
http://www-nlpir.nist.gov/projects/tv2006/rushes06.html
Some comments interspersed below - mostly just to itemize gaps we
would need to fill in eventually if we go down this road.
> Interested sites develop a system to allow search, browse and
> identify relevant shots based on a text-only topic ... mirrors the
> situation whereby somebody sticks their head in your office and asks
> you to find shots which are about "X" without having any illustrative
> examples, oh and by the way, I need them in 5 minutes. Sites can
> run whatever analysis they want on the rushes, features, shot bounds,
> keyframe selection, ASR if they can get a French version.
So we would potentially be measuring more than just the system code/
algorithms. :-(
> Sites run a cut-down version of the interactive search task with a
> 5- minute limit per topic-search, and the task is to find as many as
> possible.
Assume only interactive runs allowed? - since getting from text to
video-without-speech will require a human in the loop, interactive
system will return less junk and so take pressure off assessing, and
want to encourage interactive systems generally.
> Sites then submit only the "found" shots by giving the video file
> name and frame offsets for start/end of the "relevant" shots - yes, we
> could have problems with frame offset numbering here. Immediate
> disadvantage here is that we loose the common shot boundaries making
> actual judgments more difficult. Also raises the issue of how much of
> a shot is needed for it to be relevant, and how relevant is my
> retrieval of 10 seconds of a shot vs. somebody else's 5 seconds of
> the same shot ?
Maybe use start time of segment that meets the topic's need to avoid
frame number variation
Maybe use a strict length for each returned item: e.g., 5 secs. For
longer relevant segments, return multiple items.
> NIST then assess but are assessing only the (likely to be) relevant
> shots, which might require less judgment than the pooling of top-
> submitted. This might even allow greater than the usual 25 topics.
Who at NIST assesses, when, using what system??? (See goals section 5)
> Might even be possible to do graded relevance judgments here ...
> relevant/very relevant/not relevant ? Perhaps this could be in terms
> of how much (how many seconds) of the clip a run has identified ?
> Evaluation is in terms of how many relevant (or relevant/very
> relevant) shots a site finds. We could normalise this per-topic/ per-
> site score for the more difficult topics vs. easier in terms of
> numbers of relshots found by ALL groups. Might seem crude but it
> rewards sites/systems which are good at finding a greater number of
> relshots in a fixed, short amount of time. Without the usual
> pre-topic formation work done by NIST whereby evaluators view the
> content a priori and ascertain that there are actually shots in the
> collection which are relevant, we might find some topics have zero
> relevant shots, and some have very many but that's the real world.
Who will make up the topics?
> Advantages are that this would force us to deal with video in our
> systems, force us to cut out and mark boundaries of video clips as
> opposed to pre-defined, pre-bound keyframes.
- Paul Over
Hi all, Regarding the Rushes task for 2006, last year the plan for the 2005 Rushes task was to explore options and based on our experiences, to have a well-defined task this year. Thus far, it seems that the task will be similar to the Interactive Search task of TrecVid 2005, which is good. From out point of view in CDVP, DCU, we hope to further develop our object segmentation tools used in the rushes task in 2005 and evaluate using an interactive system. >> Assume only interactive runs allowed? - since getting from text to >> video-without-speech will require a human in the loop, interactive >> system will return less junk and so take pressure off assessing, and >> want to encourage interactive systems generally. I think that focusing on interactive search is a good idea. If we are modeling a real-world scenario where a user is seeking a number of video clips then the top 1,000 does not make much sense and as stated, less submitted results (high precision) will help the rushes task to be evaluated in the 200 hours available. It could be possible that the judgments become a shared task among the participants? I am not sure how participants feel about this, but with less submitted runs for evaluation, this would require significantly less effort for evaluation. >> Who will make up the topics? Can we as participants suggest candidate topics once the data has been distributed, and a subset of these selected by NIST for test and development collections? Participants have suggested topics before in a previous TrecVid. Regarding the nature of the topics, Alan's suggestion of text only topics is sensible. Participants can allow their interactive users to formulate their queries whichever way they see fit. Last year for example, we used Google Image search to aid query formulation, or some participants could rely on ASR through French. >>>> Sites then submit only the "found" shots by giving the video file >>>> name and frame offsets for start/end of the "relevant" shots - yes, >>>> we could have problems with frame offset numbering here. Regarding the use of start time / end time of a video segment in the result submissions, are we not better off still relying on predefined shot boundaries? It will make for easier system comparison, pooled evaluations and lower the development effort by participants. The results submitted could be comprised of either a ranked list or non-ranked set of sequential shot clusters for each topic. Or alternatively, provide the SB definitions and keyframes to lower the required development effort, but accept result submissions where the start time / end time of submitted result video segments are defined. In any case, I think that the provision of SB definitions and keyframes will be useful for participants. regards Cathal Gurrin
Hi all, I do support the tasks of annotation and search on rushes. But should we assume that the users of rushes are more likely the experts (eg filmmaker, someone finding useful segments for composing new videos)? If this is the case, we should include more queries/topics with camera setting (eg, camera range, camera angle, camera motion, focus/defocus object, lighting source). The examples may be like: - Find X appeared in close-up shot (or medium, long-distance shot) - Find establishing shot - Find shot with camera looking up/down something - Find objects X and Y, with X in focus, and Y is defocused - Find shot with one object being tracked by camera This could probably make the rushes task different from traditional high-level feature extraction and search tasks. In addition, since we are dealing with unedited videos, we may need the sub-shot boundary as well. Example: a clip may only have one shot, and one shot may last for more than 10 min -- containing segments with fast zoom/pan, shaking artifact..... It would be useful if we can have, for instance, one keyframe for each sub-shot. This can ease annotation, search, as well as system evaluation. Regards, CW Ngo
Paul
Pardon me for joining this list late. I am looking at the notes so
far. It seems that if you want to replicate some broadcast domain tasks on
the rushes data and then add some new rushes specific tasks. For feature
detection, training annotation and evaluation will be an issue, and for
search, evaluation will be an issue. Do you have any feel for how likely
people are to go through another annotation task this year? Applying models
built from some other domain to Discovery to bootstrap the semi-automatic
annotation may not work given the diverse nature of the content. So before
we go for any task over any other task, it is necessary to get your opinion
on how much work you feel people are willing to put in for another round of
voluntary annotation. This is better understood by us in context of the
scope of the broadcast part of TRECVID 2006 because people will have to
split their time between the tasks on these two domains. With a rather
fixed pool of resources at all sites which I assume will not double from
2005 to 2006, we have to find out from NIST, what your priorities will be.
These answers will impact significantly tasks, that can be feasibly framed
and evaluated.
Again, my apologies if these questions have already been answered earlier.
Thanks
Sincerely
Milind Naphade
Hi Milind, Good to hear from you. Milind Naphade wrote: > Pardon me for joining this list late. I am looking at the notes so > far. It seems that if you want to replicate some broadcast domain tasks on > the rushes data and then add some new rushes specific tasks. For feature > detection, training annotation and evaluation will be an issue, and for > search, evaluation will be an issue. The proposal for a high-level feature task against Discovery data asks whether we can get the training and truth data for evaluation semi-automatically by matching some set of feature names against the words in the Discovery verbal content descriptions. If this does not yield a set of features with sufficient examples, then clearly the proposal fails and its back to the drawing board. As for the search proposal, yes, there are more questions about who will do what. Alan addressed some issues. I raised some more questions. Cathal made some suggestions. We have not heard from anyone else. I make no assumptions. The point of the discussion is to see if we can come up with a task *and * evaluation - including all the pieces. (In my initial note I mentioned NIST might be able to do a week of manual assessments but this would have to be earlier than the usual assessments.) > Do you have any feel for how likely > people are to go through another annotation task this year? I am assuming no new annotation. > Applying models > built from some other domain to Discovery to bootstrap the semi-automatic > annotation may not work given the diverse nature of the content. OK, but this was not part of the proposal as I saw it. > So before > we go for any task over any other task, it is necessary to get your opinion > on how much work you feel people are willing to put in for another round of > voluntary annotation. I am still assuming no new annotation. > This is better understood by us in context of the > scope of the broadcast part of TRECVID 2006 because people will have to > split their time between the tasks on these two domains. Some groups may decide not to do everything and so do not need to split their time. > With a rather > fixed pool of resources at all sites which I assume will not double from > 2005 to 2006, we have to find out from NIST, what your priorities will be. NIST's top priority has to be the completion of the 2-year cycle on news video but we are trying within limited resources to help the community start work on some other kinds of video also of interest to the TRECVID sponsors. - Paul Over
Hi all, here are some additional comments from our group, but at first a question: Are there any examples of manual descriptions available and in particular, are these descriptions about the content (who and what is recorded) or about the recording itself (how was it recorded, close-ups, tracking objects etc)? "Verbal" - means textual? Overall, we would prefer queries and topic searches as proposed by CW Ngo (find close-up of person X, find object X being tracked by camera, person X is speaking, Y listens). If the descriptions mentioned above are in this way this kind of search could be easily combined with Paul's first suggestion for a task. And, to make use of these manual descriptions might help to save evaluation time. But we are not sure whether they are suitable for training purposes (supposing a great diversity in the recordings/genres)? Regarding Alan's suggestion: It is a good idea/scenario as well. But, in the first run, how do systems come from a text query to visual content assuming that there is eventually no ASR? Is it intended that this kind of task forces the use of knowledge databases/ontologies? What about adding visual queries of a certain object/person of interest? Alternatively it would be difficult to assume a generic object detector. Some more ideas for queries/search: Retrieve sequences according to their recording quality (so that a content producer is willing to use it), quality e.g. in terms of: - camera is absolutely still (no camera shaking), respectively - there is a smooth camera movement - light, sharpness, audio quality Some further ideas: - Retrieve sequences where person X is present - Retrieve sequences where person X is (not) speaking. - Retrieve sequences where person X (or any person) shows a certain emotional, facial expression. - retrieve sequences with certain audio features (speech, music, silence, etc.) Evaluation: Since shots in the rushes are probably very long we suggest to use a predefined sub-shot length (of about 2-5 seconds) as the retrieval unit. A retrieved long sequence would be (virtually) divided into these sub-shots and thus retrieving long relevant/irrelevant shot would enhance/degrade the precision measure. Ralph Ewerth
Hi, Some comments from UEA on the BBC rushes scenario: - we like Alan Smeaton's idea of using 5-second chunks rather than shots with predefined boundaries - relevance should be not relevant/somewhat relevant/relevant (we tried looking at some of last year's BBC rushes cut into 5 second chunks and believe we need the middle category for chunks where a small part is relevant), - if the evaluation was based on precision and ignored recall it should make the job easier, - if we focus on precision, evaluation could be a community task with each run evaluated by 2+ others, preferably with an evolving list of known results (e.g. if 3 judge independently agree, a segment can be automatically classified without the need for further judgements). Dan Smith
Ralph Quick answers to some of your questions > > Are there any examples of manual descriptions available and in particular, are these descriptions about the content (who and what is recorded) or about the recording itself (how was it recorded, close-ups, tracking objects etc)? "Verbal" - means textual? Nope. There is no indication in the 50 hours we have got from BBC last month of what the content is. There are only about 100+ individual MPEG-1 files of 20 minutes or more each, with no hint in the filename either. > Regarding Alan's suggestion: > It is a good idea/scenario as well. But, in the first run, how do systems come from a text query to visual content assuming that there is eventually no ASR? Is it intended that this kind of task forces the use of knowledge databases/ontologies? What about adding visual queries of a certain object/person of interest? Alternatively it would be difficult to assume a generic object detector. The content doesn't appear to have any people of notable interest - there are no George Bush or Tony Blair people, it appears to be "ordinary" people so the searches could not be for named individuals but might be for a person in front of a banana tree where the person could be anybody. To kickstart a visual-only search, one approach would be to have the video analysed and classified a priori into an ontology of features, and another could be to source a representative image from an outside resource -- like Google images. If you do a Google image search for "person banana tee" you will get 5 screens of images and the 2nd and 3rd pages have pictures of a person (smiling face of a child) in front of a banana tree, so you could use that as a seed for an image-only. [Paul won't like that idea because it means the search is not repeatable and in fact the pictures of that smiling child do not appear to be at their original URL any more so you would have to use the Google cache to retrieve the full image, but if we insisted that search runs also submit any outside resources like query images that sould be OK.] > Some more ideas for queries/search: > > Retrieve sequences according to their recording quality (so that a content producer is willing to use it), quality e.g. in terms of: > - camera is absolutely still (no camera shaking), respectively > - there is a smooth camera movement > - light, sharpness, audio quality It seems the original recording quality on this content was good, using good cameras so there isn't much camera shake and the movement appears *mostly* smooth, so it was probably recorded in high quality and digitised to MPEG-1. > > Some further ideas: > - Retrieve sequences where person X is present > - Retrieve sequences where person X is (not) speaking. > - Retrieve sequences where person X (or any person) shows a certain emotional, facial expression. These would work if person X wasn't a famous person but was somebody who was known to appear in the footage. One issue is that the people in the rushes seem to be ordinary people, passers-by, and so don't re- occur across different video files, so once you have found one instance of the person fertilizing the banana tree that person's other appearances are localised and clustered. > - retrieve sequences with certain audio features (speech, music, silence, etc.) > > Evaluation: > Since shots in the rushes are probably very long we suggest to use a > predefined sub-shot length (of about 2-5 seconds) as the retrieval unit. A retrieved long sequence would be (virtually) divided into these sub-shots and thus retrieving long relevant/irrelevant shot would > enhance/degrade the precision measure. That would remove the contentious issue of having a master shot reference I think. - Alan
Hi All, I fully agree with Alan's idea to have BBC rushes cut into, say 5 second chunks and we can perhaps classify these chunks into meaningful categories. I also agree with Dan that we should give importance to precision and ignore recall that would make the job easier for both NIST and the participants. Lekha Chaisorn
Ralph Ewerth wrote: > Are there any examples of manual descriptions available and in > particular, are these descriptions about the content (who and what is > recorded) or about the recording itself (how was it recorded, > close-ups, tracking objects etc)? "Verbal" - means textual? Ralph, Were you thinking about the Discovery data when you asked the above question? If so,... No, I don't have any examples yet. But I have seen some and they contained natural language text - a short paragraph, sentences, phrases, describing what you see when you view the clip. In some cases there are descriptions of the camera position and movement as well. Discovery is currently working on a script to check the MediaMill 101 feature names against the the text of their content descriptions to see how many hits they get for each feature. - Paul Over
Hi, Regarding evaluation measures, I would like to plug the T2I framework that was published at RIAO 2004 (http://www.cwi.nl/~arjen/pub/t2i.pdf). It addresses the problems caused by 'overlap' between result and reference items. The idea here is that systems return only entry points into the videos (the starting time of their returned segments); the ground truth should have labeled correct segments (start+end). The model assumes that users views video until their so-called tolerance to irrelevance (T2I) has been reached, and then proceed to the next suggested entry point. Success can then be measured given a fixed amount of time that the user would want to waste (this principle dates back to the expected search length introduced by Cooper). For example, one could count the number of relevant fragments found before reaching this wasted effort (this is similar to precision at N); we also gave some alternative measures in the same framework. Best regards, Arjen de Vries
Last
updated: