DIGITAL LIBRARY
QUANTITATIVE ANALYSIS OF TEXTS WRITTEN BY LANGUAGE LEARNERS
University of South Bohemia in Ceske Budejovice, Faculty of Arts (CZECH REPUBLIC)
About this paper:
Appears in: EDULEARN21 Proceedings
Publication year: 2021
Pages: 8059-8063
ISBN: 978-84-09-31267-2
ISSN: 2340-1117
doi: 10.21125/edulearn.2021.1636
Conference name: 13th International Conference on Education and New Learning Technologies
Dates: 5-6 July, 2021
Location: Online Conference
Abstract:
In the paper, we focus on language acquisition by non-native speakers of Czech. On authentic corpus data in Czech, we quantitatively analyze the texts of A2, B1 and B2 levels as defined by the Common European Framework of Reference (CEFR) and we observe the similarities and differences between them in terms of morphology, vocabulary, syntax and discourse features. To exceed the sentence boundary and to learn the rules of creating a text in a foreign language is very difficult for language learners. The text is a broad net of relations ensuring together text cohesion and coherence. We suppose that the texts on different CEFR levels differ in parameters detectable through quantitative analysis.

In the analytical part, we examine texts of different language levels (A2, B1 and B2) as well as of different text genres (the answer to an email vs. reflection essay), all coming from the MERLIN corpus (https://merlin-platform.eu/). MERLIN contains texts written by learners of Czech, German or Italian across the CEFR levels (A1–C2) and offers thus a representative material for our research task.

The analysis has been carried out in three steps. We examine: i) A2 and B1 texts (Basic User vs. Independent User) with the same genre: the answer to an email, ii) B1 and B2 texts (two subclasses within the level of Independent User), both of the genre of reflection essay, and iii) B1 and B1 texts differing in the genre: the answer to an email vs. reflection essay.

For our analysis, we use the QuitaUp application (https://www.korpus.cz/quitaup/) enabling the quantitative analysis of the inserted texts. We focus on the topic of text creation, specifically on the following research points: tokens, types, hapaxes (words used in the text only once), verb distance, activity, descriptivity, average token length, moving average type-token ratio, and moving average morphological richness. We thus observe, e.g., how the texts of different CEFR levels differ in terms of repeating words (hapaxes vs. types), length and complexity of sentences (verb distance) or a ratio of various parts of speech (activity vs. descriptivity).

Based on the research results, we discuss how the text levels analyzed (A2, B1 and B2) differ in measurable text parameters and, at the same time, whether and how the text parameters are influenced by the selection of text genre (the answer to an email vs. reflection essay). Generally, we focus on how the results may be beneficial for a better description of the CEFR levels in the area of text creation, which may be used in the evaluation of texts written by language learners.
Keywords:
Foreign language acquisition, discourse, text analysis, writing skills.