Translating the FINREP taxonomy using a domain-specific corpus

Mihael Arcan, Susan Marie Thomas, Derek De Brandt, Paul Buitelaar

Machine Translation Summit XIV
Conference Paper
Our research investigates the use of statistical machine translation (SMT) to translate the labels of concepts in an XBRL taxonomy. Often taxonomy concepts are given labels in only one language. To enable knowledge access across languages, such monolingual taxonomies need to be translated into other languages. The primary challenge in label translation is the highly domain-specific vocabulary. To meet this challenge we adopted an approach based on the creation of domain-specific resources. Application of this approach to the translation of the FINREP taxonomy, translating from English to German, showed that it significantly outperforms SMT trained on general resources.