DETECTING ESL/EFL GRAMMATICAL ERRORS BASED ON N-GRAMS AND WEB RESOURCES

M.C. Lee1, J.W. Chang2, L.C. Jade2

1Ming Chuan University (TAIWAN)
2National Cheng Kung University, Department of Engineering Science (TAIWAN)
With the trend of globalization and the coming of the global village, English has become the most important language in the world due to the frequently international political communication, economic, and cultural exchanges. Moreover, it has appeared in the official school curriculums of more than 100 ESL/EFL (English as a Second/Foreign Language) countries, such as Germany, China, Egypt, Spain, Russia, and etc. Grammatical knowledge of a language has always been considered to be the core of second language (L2) proficiency development. Analysis of learners' errors, and the contexts in which they occur, is a necessary and important step towards a comprehensive understanding of why EFL/ESL learners commit errors.The grammatical errors made by non-native speakers may be influenced by their mother tongue.

Following are common sentence-level errors produced by Chinese-speaking novice writers:
1. You suit this job. [You are suited for this job.]
2. Do you watch the TV? [Are you watching TV?]
3. I have no finished my homework. [I haven’t finished my homework.]

These errors are typical examples of learners’ attempts to map Chinese syntax and meanings on English constructions. This paper proposes a N-gram based grammar checker. Our goal is to build a fully automatic detection method for random grammatical errors and errors with multiple types committed by ESL/EFL learners. The n-gram conditional probabilities are estimated via Maximum Likelihood Estimator (MLE) based on the relative frequency of Part-of-Speech sequences retrieved from web resources, and language model (LM) perplexity is used to determine whether grammatical errors appear in the input sentence. In this study, two famous ESL/EFL speaking/writing corpuses are introduced and tested in this study. We also show that by using the web-scale resources, most error types can be detected accurately.