KSU ARABIC ENGLISH PARALLEL CORPUS FOR TRANSLATION EDUCATION: OVERVIEW AND PRELIMINARY RESULTS
Parallel corpora - a collection of aligned translated texts of two or more languages – plays a significant role in translation (Sinclair, 1991; Baker, 1995, 1998, 1999; Biber et al. 1998; Kennedy, 1998; Laviosa, 1998; Barlow, 2000; Bowker, 2001, Hunston, 2002;Wang Kefei et al. 2005). The benefit of using corpora in translators’ training is evident as corpora could provide learners with real examples and virtually unlimited language data (Bernardini et.al 2003).
According to Pearson (2003), parallel corpora allow the students to observe how translators behave when constrained by the source text as it will show the strategies employed by the translators. Students can examine “how much of the material in a source text is directly transferable to the target language, how much of it needs to be adapted or localized in some way, whether any of it can, or indeed should, be omitted.” (Pearson (2003: 17)).
Given this importance of the availability of such learning resources for education and training of translators, Arabic suffers from a lack of such learning resources. Although there is a limited number of free Arabic English parallel corpora (see for example UN parallel corpus), a major drawback is that they are domain restricted corpora which limits their benefits for Arabic translation education.
This paper presents an on-going project to design and construct a balanced, representative and free-to-use Arabic English parallel corpus (‘AEPC’) supported by the Research Center for the Humanities, Deanship of Scientific Research, King Saud University. In addition to that, the project involves the design and implementation of Arabic English concordance software. The proposed parallel corpus and its tools can be integrated in translators’ training institutions as an educational resource for translation studies and teaching.
The first phase of this project involves compiling high quality translated text samples; all translations are done by human translators, i.e., no machine-translated texts are included. The corpus will cover a wide range of text types and rich metadata. The target figure for the corpus is minimally 10 million words with the intention to increase in the future. After compiling texts, manual alignment, i.e., human-aided alignment, is performed offering better outcomes in terms of accuracy compared to automated alignment.
The second phase of this project is to make it widely accessible to translators and language researchers by providing online access to the AEPC. Through a web interface, users will be able to explore the content of the AEPC in both English and Arabic. A bilingual concordance will be available to allow users to explore the corpus.