LEARNERSOURCING IN HUMANITIES AND SOCIAL SCIENCES
University of Zagreb, Faculty of Humanities and Social Sciences (CROATIA)
About this paper:
Conference name: 14th annual International Conference of Education, Research and Innovation
Dates: 8-9 November, 2021
Location: Online Conference
Abstract:
One of the major challenges associated with the development of high-quality language datasets and annotations for natural language processing tasks (such as machine translation, sentiment analysis, text generation or text classification) in less-resourced languages (for which few computational data resources exist) is a large amount of financial and human resources needed to develop and maintain language datasets. Therefore, in this paper, we explore how learnersourcing, as a form of crowdsourcing where student-learners represent a crowd that engages in a meaningful learning experience, can be applied to support the development of datasets for sentiment analysis tapping the knowledge of students as annotators-in-training.
Learnersourcing has recently been used to transcribe speech for online lecture videos, to label sub-goals in instructional videos, to provide explanations for programming misconceptions, to create complex, peer and self-assessment in massive online classes, for generating design feedback on instructional videos and for recommendations for remediation in online classes. The examples from previous studies are limited mainly to courses in the context of a specific academic field.
In this paper, we present the results of learnersourcing in a multidisciplinary setting, where Linguistic students (Humanities) and students of Information Sciences (Social Sciences) engaged in a sentiment annotation task. We also present the student evaluations of a given task, the platform and their motivation.
Learnersourcing aims to transform students from passive learners to active knowledge seekers and producers who engage with various learning tasks that boost their higher-order thinking skills (HOTS). The task described in this paper was connected to learning goals of three academic courses (Language Engineering, Translator and the Computer and Corpus Linguistics), where students had an opportunity to identify the features that distinguish natural language processing systems from other intelligent systems, compare different machine translation systems and simultaneously reflect on semantic issues in machine translation while engaging with the sentiment annotations. A total of 62 students between the age of 22 to 24 were presented with movie reviews from the adventure genre in Croatian. These movie reviews were collected from the web portal recenzijefilmova.com. The task involved marking each sentence (~5000 sentences for each student) with five categories of coarse sentiment (namely Positive, Negative, Neutral, Mixed and Other). The annotation campaign was carried out on an annotation platform called Inception, and each student spent an average of four hours on the task. The students also participated in a machine translation (MT) evaluation task consisting of a comprehension test and rating of MT output, as well in an exercise involving error quantification and classification in sample texts written in several languages and translated by machine into Croatian.
Results indicate that the recruitment of students as annotators-in-training for the development of language datasets in a less-resourced language benefits the students who provide evidence about their learning outcomes achievement by engaging in the development of the dataset and reflection. It also benefits the whole machine learning community that gains open-source datasets for training and validation of language models.Keywords:
Learnersourcing, language datasets, natural language processing, machine translation, sentiment analysis, higher-order thinking skills.