Laboratory of Computer and Information Science / Neural Networks Research Centre CIS Lab Helsinki University of Technology

This is a page of the previous Morpho Challenge 2007. The current challenge is Morpho Challenge 2009.



The organizers retain all rights to the Challenge data, which is given to the participants for use in this challenge only. The organizers may use the data submitted to the Challenge freely, without restrictions.


Anyone is allowed to participate. A participant may be either a single person or a group. A single person can participate in at most two groups. A participant is allowed to submit at most three different solutions, where each solution corresponds to a particular morpheme analysis method. Each of these methods may naturally be applied to each of the test languages. If a participant submits more than three solutions, the organizers decide which of the three will be accepted.

Test languages

Data sets are provided for four languages: English, Finnish, German and Turkish. Participants are encouraged to apply their algorithm to all of these test languages, but are free to leave some languages out, if they wish to do so.

(New languages may be added, if interested co-organizers, suitable data and evaluation analyses become available in time.)


The task is the unsupervised morpheme analysis of every word form contained in a word list supplied by the organizers for each test language.

The participants will be pointed to corpora in which the words occur, so that the algorithms may utilize information about word context.

Solutions, in which a large number of parameters must be "tweaked" separately for each test language, are of little interest. This challenge aims at the unsupervised (or very minimally supervised) morpheme analysis of words. The abstracts submitted by the participants must contain clear descriptions of which steps of supervision or parameter optimization are involved in the algorithms.


The segmentations will be evaluated in two complementary ways:

Competition 1 will include all four test languages. Winners will be selected separately for each language. As a performance measure, the F-measure of accuracy of suggested morpheme analyses is utilized. Should two solutions produce the same F-measure, the one with higher precision will win.

Competition 2 will include three of the test languages. The organizers will perform the IR experiments based on the morpheme analyses submitted by the participants.

Workshop and publication

All good results will be acknowledged with fame and glory. Presentations for the challenge workshop will be selected by the program committee based on the results and an extended abstract of at most 6 pages.

Extended abtracts

For the extended abstract you can use the two-column ACL/COLING format (figures and tables may still span the whole width of a page). Detailed formatting instructions can be found here: PDF or PS. You can use the following files: Latex style file, Latex bibliography file, and a template Latex document. The maximum length of your paper is 6 pages (including references and figures). Email your extended abstract to the organizers by May 31.

Camera-ready submission

The final camera-ready submissions use a different format than the papers submitted for review. We are sorry about the inconvenience of your having to reformat your documents. For your final paper submission (due August 15th), please use the single-column CLEF 2007 Notebook Proceedings format. Here are a sample PDF file and a template Latex document. Detailed formatting instructions can be requested from the Morpho Challenge Organizers. The maximum length of your paper is 10 pages (including references and figures), but you can ask for a couple of extra pages if you think it improves the paper. Email your final paper to the organizers.


In the case of disagreement the organizers will decide the final interpretation of the rules.


You are at: CIS → Unsupervised Morpheme Analysis -- Morpho Challenge 2007

Page maintained by webmaster at, last updated Monday, 07-Apr-2008 15:09:08 EEST