Charles University (CZECH REPUBLIC)
About this paper:
Appears in: EDULEARN18 Proceedings
Publication year: 2018
Pages: 1963-1970
ISBN: 978-84-09-02709-5
ISSN: 2340-1117
doi: 10.21125/edulearn.2018.0557
Conference name: 10th International Conference on Education and New Learning Technologies
Dates: 2-4 July, 2018
Location: Palma, Spain
The paper presents possibilities of improving students’ writing skills through e-learning. It addresses the topic of automated evaluation of students’ essays and introduces improvements of the system EVALD (Evaluator of Discourse) solving this task for Czech. With the implementation of new morpho-syntactic features, the system works with the reliability of 0.63 macro-average F-score. Except for the overall mark, the system newly provides also a detailed feedback for users. Using the EVALD automatic system, the student learns stronger and weaker aspects of his or her text in the following language fields: spelling, vocabulary, morphology, syntax, and discourse (in terms of coreference and discourse relations). The student (a learner of Czech) is thus given a detailed evaluation of his or her text, which is available online and gives thus space for practicing writing skills easily through e-learning.
An overall mark given to each text corresponds to one of the six categories of language proficiency defined by the document of the Council of Europe “Common European Framework of Reference for Languages” (CEFR): A1–C2 (basic language user – proficient language user).

Methods – Machine learning experiments:
On the automatically pre-processed language data, machine learning experiments were conducted using Random Forest Algorithm implemented in WEKA tool, evaluating by 10-fold cross-validation. The automatic procedure tried to imitate the evaluation of texts performed by human annotators.
In the experiments, we compare extensions to EVALD application with a baseline that follows a design of its original version. The results demonstrate that the system with enriched features reaches 0.63 of macro-average F-score. We thus observe an improvement by 4 percentage points upon the system using the original feature set. In the full paper, we provide also confusion matrix for Random Forest on the complete dataset. With a tolerance of one-level distance (i.e. human annotator gives e.g. B1, EVALD B2), macro-average F-score is 0.88. The system was trained on the set of original students’ essays written by Czech learners with various degree of language competence (altogether, 945 texts were used).

Use of automated evaluation of essays in e-learning:
We discuss how automated evaluation of essays may be helpful for students who want to improve their writing skills through e-learning. Thanks to the Evaluator of Discourse for Czech, the students may quickly and easily verify their competence in essay writing and gain useful feedback. Apart from the overall mark, the system gives the student also a further evaluation – a list of the weak and strong aspects. In the paper, we illustrate the automated feedback on a sample of text and we demonstrate how it may be used in praxis.

The full version of the paper further contains a description of the initial human annotation of essays (with the inter-annotator agreement), presentation of data and their automatic pre-processing. The paper also describes machine learning experiments with EVALD in detail and demonstrates which language aspects (features) help in automated essay scoring the best. The paper also compares EVALD with similar related projects for other languages. The most space is then given to demonstration of the use of automated evaluation of essays in e-learning.
Essay scoring, automated text evaluation, e-learning, writing skills, language acquisition, Czech as a foreign language.