Unsupervised Morpheme Analysis -- Morpho Challenge 2010

Results

This page contains the result tables of the Morpho Challenge 2010. The full evaluation reports and the descriptions of the participating methods are published in the workshop proceedings:

Mikko Kurimo, Sami Virpioja, and Ville T. Turunen (Eds.). Proceedings of the Morpho Challenge 2010 workshop. Technical Report TKK-ICS-R37, Aalto University School of Science and Technology, Department of Information and Computer Science, Espoo, Finland, September 2010. [PDF]

One hundred samples from each submitted analysis file is available here. The words were selected randomly.

Competition 1 - Comparison to Linguistic Morphemes

Result tables

The segmentation with the highest F-measure is the best.

The reference methods are Morfessor Baseline and Morfessor Categories-MAP algorithms (see Creutz and Lagus, 2007) and letters, which simply segments each word to the letters it consists of. This gives the best recall available for any method that is based solely on segmentation.

We also provide an alternative linguistic evaluation using the EMMA method (see Spiegler and Monson, 2010). Due to the computational complexity of the method, the full Morpho Challenge test sets could not be applied. Instead, the evaluation is performed by sampling 10 random subsets of 1000 words from the test sets and calculating the scores for each subset. The result tables show the avarage values over the subsets.

Alternative result tables

Competition 2 - Information Retrieval

Result tables

In Competition 2, the morpheme analyses were compared by using them in an Information Retrieval (IR) task with three languages: English, German and Finnish. The experiments were performed by replacing the words in the corpus and the queries by the submitted morpheme analyses. The evaluation criterion was Mean Average Precision.

The IR experiments were performed using the freely available LEMUR toolkit version 4.4. The popular Okapi BM25 ranking function was used. Okapi BM25 does not perform well with the indexes that heve many very common terms. An automatic stoplist was used to overcome this. Any term that has a collection frequency higher than 75000 (Finnish) or 150000 (German and English) is added to the stoplist and thus exluded from the corpus.

In addition to the submissions, a number of reference methods were tested.

Competition 3 - Statistical Machine Translation

Result tables

In Competition 3, the morpheme analasyses were compared by using them in two machine translation (MT) tasks: German to English and Finnish to English. The experiments were performed by replacing the words in the source language side of the parallel corpus by the submitted morpheme analyses. The target language side (English) was not modified. The final translations were obtained by Minimum Bayes Risk (MBR) combination with a standard word-based translation model. The performance was measured with BLEU scores.

Note: The results are not comparable to the results of Competition 3 in Morpho Challenge 2009 due to a different size of the translation model training data.