Unsupervised Morpheme Analysis -- Morpho Challenge 2007

This is a page of the previous Morpho Challenge 2007. The current challenge is Morpho Challenge 2009.

Frequently Asked Questions

-->

Question: Your website rules page does not define what counts as "unsupervised learning". I suppose this means that the program cannot be explictly given a training file containing "example answers", and nor can example answers be hard-coded into the program. Can you suggest a better definition?

Answer: That sounds like good minimum requirement. Of course, one sees solutions where people make lots of "hard-coded" assumptions about word structure, e.g., stem-final vowels that can be dropped etc., so at some point one wonders where to draw a border between entirely unsupervised methods, minimally supervised methods and so on. Thus, it is important that all such assumptions be explicitly mentioned when results are reported.

Question: Looking at the competition description it seems clear that you are looking for morpheme classification (e.g. distinguishing English plural nouns from third-person-singular verbs, both of which are regularly associated with adding "s" to a stem). I cannot see how such distinctions are possible without access to word classes. However, none of the corpora you provide include POS information. Are you expecting entrants to also write a word-classification algorithm alongside their morphology analyser/classifier, or are you allowing the use of supervised taggers?

Answer: The idea is that your algorithm works in an unsupervised fashion. Maybe you will find different distributions for different stems: if "s" (in English) goes together with "ing" and "ed" you have one kind of morpheme (verb ending), if it does not, or goes together with "'s" you have another kind of morpheme (noun ending). So, you are not allowed to use supervised taggers. However, do not let this put you off. We do not know how advanced systems people will come up with. For instance, treating all "s":s alike may be rather OK if your system otherwise does a good job finding word segments accurately.