A. Deveneyns, J. Tummers

Leuven University College (BELGIUM)
There is a growing concern in Flanders about the deterioration of native (written) language proficiency amongst youngsters. That concern is reflected in the Department of Education’s policy in which language proficiency is a focal point ([1], [2], [3]). Despite recent research (e.g. [4]), there is still a lack of objective data gathered in a natural setting and allowing detailed analyses of native adults’ language proficiency. In this paper, we will outline a study of the errors in texts written in Dutch by bachelor students. We will pin-point the most acute and most frequent errors in order to develop adapted language material to bridge the gap between the actual and the desired level of proficiency.

At KHLeuven, a Flemish university college of about 6,500 students in 13 bachelor programs, a project is running to analyze the written language proficiency of students. A corpus of 346 texts was gathered by asking students of all programs at the end of year 1 to write a 500 word persuasive text using a computer and any information they deemed useful. In that corpus, all errors were identified and coded. Starting from James’ broad definition of an error as “an unsuccessful bit of language” [5], an error coding scheme was designed that, in line with learner corpus research [6], combines linguistic information (spelling; lexicon; syntax; textual structure) and error information (erroneous use; omission; redundancy).

Based on the annotated corpus, the following research questions are addressed:

(i) What are the most frequently made errors?
(ii) What are the most typical errors?

To tackle those research questions, a quantified error taxonomy is built. The first question is answered by looking at the overall corpus frequency of the different error types; the second by looking at their document frequency. The combination of both measures sheds light on the distribution of errors in texts by bachelor students, as well as the extent in which those errors recur.
The most widespread and recurrent errors belong to the categories textual grammar (especially referential coherence), syntax, punctuation and lexical use. Those errors, contrary to spelling errors, typically cause interpretative problems which interrupt the reading process. The results are the starting point of a usage-based remediation process of the students’ written language proficiency by creating a higher awareness of correct language use.

