Guidelines

Participation

Research groups or individuals interested in participating in the track should do the following:

Register
Registration for CLEF 2007 is now open.
Upon registration, every participant will receive instructions to download the appropriate document collections from the CLEF FTP site.

Email the track organisers

Doug Oard
Gareth Jones

Please indicate your wish to participate, and the languages (query and document languages) that will be used in your experiment. Once registration for CLEF is confirmed, participants will receive instructions to proceed.

Submission

Designing Official Runs

There are two tasks:

Searching English (using the English collection); and
Searching Czech (using the Czech collection).

For each task, each participant can submit up to five runs in total. For comparative purposes, at least one of these should be a fully automatic monolingual run (English → English and/or Czech → Czech) using TD (both the <title> and the <desc>) topic fields. In keeping with the goals of CLEF, the use of non-English queries is encouraged.

For example, if you used the English collection in your experiment:

You must submit one fully automatic English → English TD run with no manual intervention during indexing or retrieval process; and
Four cross-lingual runs in maximum, in which any combinations of fields are allowed.

Some submitted runs might be used to create pools for relevance assessment (i.e., "judged runs"); others could be scored using those judgments (i.e., "un-judged runs"). The number of judged runs for each participant will depend on the total number of submissions received.

Which Topics to Run

Test topics will be selected to contain enough known relevant segments (or playback points in Czech collection) to avoid instability in the evaluation measures, and few enough known relevant segments (or playback points) that we believe the judged set to be reasonably complete. We will report official evaluation results only on the selected evaluation topics.

For the Searching English task, you should submit runs for ALL 105 topics. Of which

63 (63_qid.txt) are the 2006 training topics for which QRELS are available (these 63 topics should be used for system tuning); and
33 (33_qid.txt) are the 2006 testing topics for which QRELS are available (You must NOT use these 33 queries for system tuning); and
the remaining 9 topics with no QRELS available.

Submission Format

For each run, the top-1000 results will be accepted in the format:

Topic-ID Dummy-field Document-ID Rank Similarity-score Run-ID

Field	Description
Topic-ID	Topic number in the <NUM> field of each topic file.
Dummy-field	Unused field for evaluation, usually use 'Q0' for the field.
Document-ID	Document ID in the <DOCNO> field of the segment.
Rank	Rank assigned to the segment (1=highest).
Similarity-score	System-generated similarity value of the segment to the topic. (This field is mandatory for evaluation.)
Run-ID	An unique identifier for each run.

Participants should each adopt some standard convention for Run-ID that begins with a unique identifier for the participant and ends with a unique run identifier.

For example, the University of Ipsilanti might create a Run-ID: UipsASR1b. Please try to avoid Run-ID longer than about 10 characters.

What to Put in Your README File

Your README file should contain the following information:

Institution name;
Name of the contact person;
Complete contact information (email, phone, postal address, and fax);

Where to Send Your Submission

English runs should be submitted via email to

Czech runs should be submitted via the web

in a single compressed file (.zip or .tgz).

The compressed file should contain:

Each run in a separate file;
One README;
One completed questionnaire for all submitted runs.

Submissions will be acknowledged within 24 hours; participants must ensure that acknowledgment is received!

If you have any questions, please email

Doug Oard

Gareth Jones

Pavel Pecina