DIGITAL LIBRARY
COURSE-AGNOSTIC EARLY IDENTIFICATION OF STUDENTS AT RISK FROM MOODLE ACTIVITY PATTERNS
NOVA IMS Information Management School, Universidade Nova de Lisboa (PORTUGAL)
About this paper:
Appears in: INTED2023 Proceedings
Publication year: 2023
Pages: 4585-4592
ISBN: 978-84-09-49026-4
ISSN: 2340-1079
doi: 10.21125/inted.2023.1207
Conference name: 17th International Technology, Education and Development Conference
Dates: 6-8 March, 2023
Location: Valencia, Spain
Abstract:
Traditional summative methods, such as final exams, are commonly used in higher education to determine whether a student has met the intended learning outcomes of a course. However, these methods are often applied at later stages of the course, leaving struggling students without the opportunity to receive feedback and change their approach in time to prevent failure. Early identification of students in need could allow educators to provide feedback and promptly develop corrective measures.

Learning management systems (LMS) have widespread in higher education and serve as intermediaries between educational content and a diverse student population. LMS create timestamped records of all student interactions with the system, potentially providing information that can be extracted and used to identify students needing support.

This study used Moodle logs from multiple courses at a Portuguese information management school to extract features describing student behaviour in each course. These features were used to train course-agnostic predictive models for the early identification of students at risk of failing the course. The models were trained using stratified 10-fold Cross-Validation with 30 repeats and six popular machine learning classifiers: K-Nearest Neighbors, Logistic Regression (LR), Naïve Bayes (NB), Multi-Layer Perceptron (MLP), Classification and Regression Trees (CART), and Support Vector Machines (SVM).

The results showed that the best six classifiers (SVM) achieved an average area under the receiver operating characteristic (ROC) curve (AUC) of 0.715, resulting in good discriminative power while only using data collected up to the midway point of each course. Additionally, when considering other performance metrics, SVM had an average accuracy of 0.669 and correctly identified 62% of the students at risk of failing. While none of the other models reached the threshold of an AUC score equal to or greater than 0.7, three (LR, MLP, KNN) achieved an AUC score greater than 0.675.

These findings suggest that LMS usage records can be reliable indicators of student success at the early stages of a course, enabling educators to identify struggling students and provide timely feedback that may prevent avoidable failures.
Keywords:
Student performance, Early prediction, Learning management systems, Machine learning.