LE GRAMMAIRIEN: A NEW GEC SYSTEM FOR FRENCH L2
Concordia University (CANADA)
About this paper:
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
There are several grammar-checking tools for French, some of which have reached a high level of maturity, such as Antidote (see Beaudry et al., 2024) and Le Bon Patron (Nadasdi and Sinclair, 2007). However, one of the main limitations of these systems is their closed nature and reliance on a rule-based approach (Affes et al., 2023), which fails to capture the specificities of second-language writing, often described as "non-orthodox" or deviating from the norms of the target language (Lynton Parslow, 2015).
This rule-based approach does not yield a satisfactory representation of the linguistic productions of second-language learners, making it difficult to detect errors. We believe that an approach based on large language models (LLMs) would be better suited to meet the needs of these learners. However, these systems generally require large amounts of parallel data, i.e., human-annotated data that includes both the errors and their corrections. Moreover, the most advanced commercial systems tend to over-correct (Bryant et al., 2023), going beyond traditional grammatical error correction (GEC), in a broader sense of the term "grammar" (Chomsky, 1965). Their conversational nature often leads them to misinterpret interrogative or imperative sentences, treating them as requests for a response rather than as constructions to be corrected.
In French, publicly available parallel data is scarce. Usable corpora, such as Lang-8, have flaws that prevent their direct use. Other corpora, artificially generated by modifying authentic and correct productions, do not reflect learners' linguistic competence, producing errors that learners themselves would likely not make.
Our project aims to create both a parallel data corpus for training and evaluation (about 25,000 examples from various error corpora, corrected and annotated by humans) and an open-source model based on Llama 3.1 to perform automatic grammatical error correction (GEC) on second-language French writing. We therefore propose a pre-trained GEC system, named Le Grammairien. During its evaluation on an independent dataset, this system outperformed both rule-based correctors and general-purpose language models not explicitly trained for grammatical error correction.Keywords:
GEC, French L2, AI.