CLASSIFICATION OF SENTENCE WRITTEN BY LEARNERS OF ENGLISH BASED ON LINGUISTIC FEATURES AND LEARNER FEATURES
EFL (English as a foreign language) teachers have to identify sentences including errors made by their EFL learners in order to teach appropriate usages. However, this identification task is a time-consuming endeavor. The time and effort required for this task can be reduced by automatically classifying sentences in terms of adequacy, which means the incidence of errors calculated by dividing the number of correctly used words by the number of words in a sentence.
In addition to the language teaching, automatic classification can also help language learning by facilitating self-learning, which plays an important role in language learning (Tsou et al. 2002). Automatic classification implemented on a computer assisted language learning system provides classification results to EFL learners. Then, EFL learners only have to check sentences classified as inadequate for finding errors.
A previous approach to classification for language teaching and learning (Lee et al. 2007) estimated the adequacy of a sentence using linguistic features such as the distribution of POS (part-of-speech). Another approach to classification for language teaching and learning (Kotani et al. 2013) estimated the adequacy of a text using both linguistic features and learner features such as the writing speed and the learner’s self-evaluation results.
Lee et al. (2007) showed that sentence classification was successful based only on linguistic features. On the other hand, Kotani et al. (2013) showed that text classification marked better results when using both linguistic and learner features. Thus, these previous studies suggest that both linguistic features and learner features are useful for sentence classification.
Given an importance of sentence classification for language teaching and learning, and a potential advantage of learner features for sentence classification, the present study developed a sentence classification method that estimated the adequacy of sentences written by EFL learners using a discriminant analysis. The explanatory variables were linguistic and learner features used by Kotani et al. (2013).
The sentence classification with both linguistic features and learner features had higher accuracy than the classification only with linguistic features in leave-one-out (k-fold) cross-validation tests: 70.9% for classification with learner features, and 68.1% for classification without learner features. These classification accuracies were compared using Pearson's chi-squared test with Yates' continuity correction, and significant difference was found (Yates' chi-square=8.60, p<0.01). Given these results, although our classification method still needs to be improved, we consider this method useful.
Kotani, K. T. Yoshimi, & M. Uchida. 2013. Automatic Classification of Texts Written by Learners of English as a Foreign Language based on Linguistic Features and Learner Features. Proceedings of 7th International Technology, Education and Development Conference, 6305-6314.
Lee, J., M. Zhou, & X. Liu. 2007. Detection of non-native sentences using machine-translated training data. Proceedings of the 2007 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 93-96.
Tsou, W., W. Wang, & H.-Y. Li. 2002. How Computers Facilitate English Foreign Language Learners Acquire English Abstract Words. Computers & Education, 39(4), 415-428.