A manually written three-sentence segment summary.
For each interview, a collection of metadata that describes that interview. This includes basic biographical details (e.g., interviewee name and birthdate) and half-page free text summary of the interview is also offered.
Czech Collection
The Czech interviews will use a no-boundary condition where relevance assessors mark only topically relevant "playback points" during assessment. Unlike the English collection, where the task is to retrieve segments, the task in Czech is to retrieve ranked lists of candidate playback points. The collection will be formatted as close to ordinary CLEF documents as possible, and will contain around 500 hours of speech.
Topics
25 topics, written in the usual CLEF title, description, and narrative format will be released for the English and Czech collections. Topics will be available in Czech, English, French, German, Russian, and Spanish. The creation of other topic languages can should be arranged by those sites interested in using them.
Additional Resources
In order to facilitate broad participation, the basic test collection is formatted in the same way as a typical CLEF ad-hoc test collection. The following additional resources will also be available to support system development:
A number of representative training topics, with relevance judgments for the same collections of interviews that will be used in the evaluation.
Scripts for generating alternative relevance judgments for the training topics that can be used to support detailed failure analysis (English only).
Scripts for generating richer metadata for each segment using synonymy, part-whole, and is-a thesaurus relationships. This capability can be used with the automatically assigned thesaurus categories or (for constrastive runs) with the manually assigned thesaurus categories (English only).