An automated phonetic/phonemic transcriber supporting English, German, and Danish. Outputs transcriptions in the International Phonetic Alphabet IPA or the SAMPA alphabet designed for speech recognition technology.

The tool is lexicon-free and is based purely on predictive modelling derived from decision analysis. It is 100 % data-driven, i.e. the underlying decision tree has been generated automatically from data containing orthographic forms and their phonemic counterparts [1].

The transcription tool is not error free. For "normal" native words it mostly produces correct results, however for words of foreign origin, some proper names, abbreviations etc. it often fails. Other systems resort to hybrid solutions where a lexicon with "exceptions" is combined with predictive mapping based on decision trees, neural networks, or similar technologies.

[1] The data-driven, predictive model is suited only for languages with alphabetic orthografies (where one grapheme largely corresponds to one phone). This excludes languages like Chinese (with a syllable based orthography) and Hebrew (consonantal orthography). Moreover, for languages with alphabetic orthographies the problem of mapping graphemic symbols to phonemic ones does not have equal complexity. There are extremely "easy" languages like Turkish where the problem largely can be solved simply by substituting orthographic symbols with phonemic ones without considering the context. And there are "difficult" languages like Danish where certain historical sound changes (weakening of plosives and lowering of vowels in certain contexts etc.) have resulted in a complex relation between orthography and pronunciation.