Unsupervised Morpheme Analysis -- Morpho Challenge 2008

This is a page of the previous Morpho Challenge 2008. The current challenge is Morpho Challenge 2009.

Rules

Acceptance

The organizers retain all rights to the Challenge data, which is given to the participants for use in this challenge only. The organizers may use the data submitted to the Challenge freely, without restrictions.

Eligibility

Anyone is allowed to participate. A participant may be either a single person or a group. A single person can participate in at most two groups. A participant is allowed to submit at most three different solutions, where each solution corresponds to a particular morpheme analysis method. Each of these methods may naturally be applied to each of the test languages. If a participant submits more than three solutions, the organizers decide which of the three will be accepted.

Test languages

Data sets are provided for five languages: Arabic, English, Finnish, German and Turkish. Participants are encouraged to apply their algorithm to all of these test languages, but are free to leave some languages out, if they wish to do so.

(New languages may be added, if interested co-organizers, suitable data and evaluation analyses become available in time.)

Task

The task is the unsupervised morpheme analysis of every word form contained in a word list supplied by the organizers for each test language.

The participants will be pointed to corpora in which the words occur, so that the algorithms may utilize information about word context.

Solutions, in which a large number of parameters must be "tweaked" separately for each test language, are of little interest. This challenge aims at the unsupervised (or very minimally supervised) morpheme analysis of words. The abstracts submitted by the participants must contain clear descriptions of which steps of supervision or parameter optimization are involved in the algorithms.

Competitions

The segmentations will be evaluated in two complementary ways:

Competition 1: The proposed morpheme analyses will be compared to a linguistic "gold standard".
Competition 2: Information retrieval (IR) experiments will be performed, where the words in the documents and queries will be replaced by their proposed morpheme representations. The search will then be based on morphemes instead of words.

Competition 1 will include all five test languages. Winners will be selected separately for each language. As a performance measure, the F-measure of accuracy of suggested morpheme analyses is utilized. Should two solutions produce the same F-measure, the one with higher precision will win.

Competition 2 will include three of the test languages. The organizers will perform the IR experiments based on the morpheme analyses submitted by the participants.

Workshop and publication

All good results will be acknowledged with fame and glory. Presentations for the challenge workshop will be selected by the organizers based on the results and a paper of at most 10 pages describing the algorithm and the data submission.

Workshop papers

For your paper submission (due August 1st), please use the single-column CLEF 2007 Notebook Proceedings format. Here are a sample PDF file and a template Latex document. Detailed formatting instructions can be requested from the Morpho Challenge organizers. The maximum length of your paper is 10 pages (including references and figures). Email your paper submission to the organizers.

Arbitration

In the case of disagreement the organizers will decide the final interpretation of the rules.