Could not download file: This paper is available to authorised users only.


K. Kotani1, T. Yoshimi2

1Kansai Gaidai University (JAPAN)
2Ryukoku University (JAPAN)
Recently there have been proposed automatic evaluation systems for second language (L2) writing using machine learning algorithms (Lee et al. 2007, Sun et al. 2007, Kotani et al. 2009). Advanced learners can write sentences similar to sentences written by native speakers (native speaker-like sentences) while, on the other hand, not-advanced learners are not able to write such sentences. Therefore, these systems classify L2 learner sentences as native speaker-like sentence or not.
Since evaluation systems are trained with machine learning algorithms, the classification accuracy depends on the amount of training data for machine learning and the parameter setting of machine learning. It is generally supposed that the classification accuracy will be improved when the larger amount of training data is used. As it is difficult to prepare the large amount of L2 learner sentence data, we used machine translation sentences as an alternative data following Lee et al. (2007) and Kotani et al. (2009). As for the parameter setting, we adjusted parameter setting, and chose proper setting unlike Kotani et al. (2009).
In this paper, we assessed the validity of our evaluation system for L2 writing. First, we examined whether or not our evaluation system yields high classification accuracy. Even though our evaluation system used machine translation sentences as training data, our system marked relatively high classification accuracy (90.7%).
Secondly, we examined whether the rate of native-speaker-like sentences would increase for sentences written by advanced learners. This examination revealed that our evaluation system provided proper evaluation results. Although Lee et al. (2007) examined the classification accuracy of their evaluation system, they did not examine whether their system showed the tendency that the rate of native speaker-like sentences would increase for proficiency learners.
Finally, we compared our evaluation system with a system based on the evaluation system of Lee et al. (2007). Our evaluation system and the system of Lee et al. (2007) used different linguistic properties for examining native speaker-likeness of L2 learner sentences. The experimental result showed that our evaluation system marked a higher classification accuracy than that of a system based on linguistic properties proposed by Lee et al. (2007).
These results showed that (i) our evaluation system could properly classify L2 learner sentences either into native speaker-like sentences or into non-native speaker-like sentences, and that (ii) our evaluation system could identify the fluency of L2 learner sentences. Hence, our evaluation system should be valid as an evaluation system for L2 writing.