IMPROVING ACCESS TO EDUCATIONAL COURSES VIA AUTOMATIC MACHINE TRANSLATION - NEW DEVELOPMENTS IN POST-EDITING

J. Pietrzak1, A. Jauregi1, J. Van de Walle2, A. Eriksson3

1Eleka Ingeniaritza Linguistikoa (SPAIN)
2CrossLang (BELGIUM)
3Convertus AB (SWEDEN)
There is a continuing increasing need for universities and other higher educational institutions to provide course syllabi documentation and educational information in English. Access to translated course syllabi and degree programmes plays a crucial role in the degree to which these institutions effectively attract students and, more importantly, has an impact on international profiling. To present all educational information in English is a major challenge for most higher educational institutions.

The regulatory environments in the context of the Bologna treaty combined with budget constraints and limited human resources make it very difficult for higher educational institutions to deliver English (and Chinese) documentation, which affects their capacity to promote their services locally, regionally, nationally, and internationally. Confronted with the ECTS requirements, many of them now spend vast amounts of money and time in providing traditional human translated documents.

As European Higher Education and European Research are two pillars of the knowledge-based society, the Bologna Translation Service (BTS) project received funding under the European Union's ICT Policy Support Programme and aims at providing a solution to this problem by offering a low-cost, web-based, high-quality machine translation (MT) service for higher educational institutions. The first phase of the project will include the automatic translation of syllabi, study programmes, diploma supplements and student application forms from 7 European languages (German, Spanish, Finnish, French, Dutch, Portuguese, and Turkish) to English and from English to Chinese.

The BTS approach will be to integrate existing MT components into a web-based collaboration framework. The basis will be statistical MT engines for all language pairs. Baseline statistical MT systems created using the Moses toolkit will be further refined and improved by adding in data from the educational domain and applying domain adaptation and automated and human post-editing. For a selected number of language pairs, system combination will be applied in order to further improve translation quality.

By making study programmes more accessible to potentially interested parties, BTS will help to make degrees and qualifications more visible to the labour market, identify career opportunities, and stimulate the research needed to increase European competitiveness.

In this paper, we first describe the motivating factors behind the provision of such a service. Following this, we provide an overview of the BTS framework, and focus on the automatic post-editing implementation for Spanish and Portuguese, comparing the rule-based post-editing of statistical MT, and statistical post-editing performed over rule-based MT. Several ways of addressing the problem, and an evaluation of each, are presented.