IMPLEMENTATION AND EVALUATION OF NEW FEATURES AIMED AT IMPROVING THE ACCURACY OF A CEFR-ALIGNED CAN-DO STATEMENTS AUTOMATIC ESTIMATION SYSTEM
Shizuoka University (JAPAN)
About this paper:
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
The CEFR, or the Common European Framework of Reference for Languages, was published by the Council of Europe in 2001 as a shared framework for language education and learning across Europe. It accommodates many languages, including English and various European languages. Although the CEFR includes skill categories such as “reading,” “writing,” “listening,” and “speaking,” the present study focuses on “reading” ability.
Each CEFR level is constructed using several Can-Do Statements (hereafter CDS), which describe what learners can do in the target language. From Pre-A1 to C2, there are 38 CDS in total. However, some CEFR levels and CDS include content that is difficult even for native speakers. Therefore, in this study, we exclude the CDS belonging to C2 and C1, as well as one of the CDS assigned to level B2.
This study concerns the CEFR for Japanese. At present, there are few comprehensive studies that apply the CEFR to Japanese-language education. A major reason for this is that no CEFR-aligned text corpus has been developed for Japanese. Constructing such a corpus requires a large number of example sentences, and manually assigning CDS to each sentence incurs extremely high cost. Thus, this study aims to reduce that cost by performing automatic CDS estimation using machine learning.
To enable automatic CDS estimation, example sentences with annotations are required as training data. Therefore, we collected 555 example sentences with CDS information from ten Japanese-language educators who have expertise in the CEFR. These examples include multi-label sentences assigned more than one CDS. Approximately 87.8% of the 555 sentences have two or more CDS, and the average number of CDS per sentence is about 2.7.
We first estimate the CEFR level of each sentence. Next, we estimate the CDS belonging to the predicted CEFR level. The proposed method allows multi-label predictions for both CEFR-level estimation and CDS estimation. We use 10-fold cross-validation to compare the predicted CDS with the gold-standard CDS for the 555 sentences. Using the counts of true positives, false positives, and false negatives, we calculate precision and recall. The harmonic mean of these values, the F-score, is used as the final evaluation metric.
We newly added “ratio of verbal-noun constructions (sahen nouns)” as a feature because previous analyses on Japanese language difficulty showed a large difference in this ratio between “easy Japanese” and “ordinary Japanese,” indicating its importance as a feature. Experimental results show that adding this feature improves performance for all levels except B2. Although the goal of this study was to improve overall estimation accuracy, the F-score for A1 showed limited improvement, and that of B2 decreased. Therefore, it is necessary to consider new features that can improve accuracy for A1 and B2. Adding additional features—such as passive-voice ratio—may further enhance performance. We also plan to conduct controlled experiments using the current results to determine which features contribute most to accuracy improvement.
Some features used in previous research reflect linguistic “specialization,” but they are currently represented as binary flags indicating presence or absence. We propose developing numerical criteria for such features—similar to the “kanji-character ratio”—and incorporating them into the system, which we expect will further contribute to improving estimation accuracy.Keywords:
CEFR, can-do statement, machine learning, development.