[an error occurred while processing this directive]

Competition 1

In Competition 1, for each language, the morpheme segmentations proposed by the participants' algorithm will be compared against a linguistic gold standard. Samples of the gold standards used are available for download on the Datasets page.

In the final evaluation, only a subset of all words in the data will be included in the evaluation. The exact consitution of this subset will be revealed to the participants only after the final submission.

In the evaluation, word frequency plays no role. All words are equally important, be they frequent or rare.

The evaluation will take place using this program, written in Perl: evaluation.perl. Participants can download the program and evaluate their segmentations relative to the gold standard samples provided. The evaluation program is invoked as follows:

evaluation.perl [-trace] -desired goldstdfile -suggested yoursegmentsfile

The file containing your segmentations should contain one word per line. The word may not be preceded by a number indicating its frequency in the data. Suggested morpheme boundaries are marked with a space character. Use UTF-8 encoding of the character set.

Example extract of the contents of a possible segmentation file:

sea
sea bed
sea bed s
sea bird
sea board
sea board s

The -trace argument is optional and produces output for every evaluated word separately. Regardless of the status of the trace argument, the evaluation program produces output of the following kind:

evaluation.perl, Wed Aug 31 16:45:40 2005
Evaluation of segmentation in file "testsegmentation.eng" against
gold standard segmentation in file "goldstdsample.eng":
Number of words in gold standard: 532 (type count)
Number of words in data set: 111476 (type count)
Number of words evaluated: 228 (0.20% of all words in data set)
Morpheme boundary detections statistics:
F-measure:  63.77%
Precision:  67.35%
Recall:     60.55%

The participant achieving the highest F-measure will be the winner of Competition 1. In case of a tie, higher precision wins. Winners will be selected separately for each language.

Evaluation measures

The evaluation is based on the placement of morpheme boundaries.

Example. Suppose that the proposed segmentation of two English words are: boule vard and cup bearer s'. The corresponding desired (gold standard) segmentations are boulevard and cup bear er s '. Taken together, the proposed segmentations contain 2 hits (correctly placed boundaries between cup and bear, as well as between er and s). There is 1 insertion (the incorrect boundary between boule and vard) and 2 deletions (the missed boundaries between bear and er, and between the plural s and the apostrophe ' marking the possessive).

Precision is the number of hits (H) divided by the sum of the number of hits and insertions (I): Precision = H/(H+I).

Recall is the number of hits divided by the sum of the number of hits and deletions (D): Recall = H/(H+D).

F-Measure is the harmonic mean of precision and recall, which equals: F-Measure = 2H/(2H+I+D).

Comments

In many cases, it is difficult to come up with one single correct morpheme segmentation. However, in Competition 1 we will use the provided gold standard as the only correct answer. For some words, there are multiple interpretations in the gold standard. All of them are considered correct, and the alternative that provides the best alignment against the proposed segmentation is chosen.

Note also that we are looking for a surface-level segmentation. That is, the segmented word form must contain exactly the same letters as the original, unsplit word. Linguistically, one may want to split, e.g., the English word "invited" into "invite+ed". Nonetheless, in this competition, inserted (or dropped or altered) letters are not allowed. (The desired segmentation in this particular example is "invit+ed".)

Competition 2

Competition 2 does not require any extra effort by the participants. The organizers will use the segmentations provided by the participants in order to segment the words in a large corpus of Finnish text (possibly other languages as well). An n-gram language model will be trained for this segmentation and this language model will be used in a speech recognition experiment.

The winner of Competition 2 is the participant that provides the segmentation that produces the lowest phoneme error rate in speech recognition. The phoneme error is calculated as the sum of the number of substituted, inserted, and deleted phonemes divided by the number of phonemes in the correct transcription of the data.

You are at: (none)

Page maintained by (none), last updated (none)