DIGITAL LIBRARY
USING EARLY CLICKSTREAM DATA TO IDENTIFY AT-RISK STUDENTS IN HIGHER EDUCATION: AN LSTM-BASED APPROACH
Universidade Nova de Lisboa, NOVA Information Management School (NOVA IMS) (PORTUGAL)
About this paper:
Appears in: EDULEARN24 Proceedings
Publication year: 2024
Pages: 6216-6225
ISBN: 978-84-09-62938-1
ISSN: 2340-1117
doi: 10.21125/edulearn.2024.1471
Conference name: 16th International Conference on Education and New Learning Technologies
Dates: 1-3 July, 2024
Location: Palma, Spain
Abstract:
Identifying students who require additional support is a challenging task for educators at the higher education level. One of the main reasons for this difficulty is that traditional assessments of student performance, such as a final exam or project, occur at the end of the course when it is often too late to implement corrective measures that could prevent a student from failing. Learning management systems (LMS) help students engage with educational content and provide a continuous flow of data that documents student-content interactions throughout the course. Although the data collected may be incomplete and noisy, it can still provide valuable insights into student behaviour and performance.

This work uses Moodle logs to create course-agnostic Long Short-Term Memory unit (LSTM)-based classifiers for early identification of students at risk of failing a course. Our classifiers used the day-wise sequences of the number of clicks by each student in various activity types as input. By analysing data collected up to the 50th day of each course, our LSTM-based classifiers achieved an average area under the receiver operating characteristic curve (AUC) of 0.69 while identifying close to 28% of the at-risk students. Furthermore, with minor changes to the model's hyperparameters, we created a classifier that achieved a slightly lower AUC score (0.67) but identified more than 50% of the at-risk students. Moreover, models trained using only the first 25 days of click sequences achieved similar recall scores, even if their overall accuracy and AUC scores were inferior.

These results suggest that our approach can help educators identify struggling students and provide them with timely feedback to prevent avoidable failures. Future research could explore the generalisation of this approach to more courses and recognise the contribution of each activity type to the final prediction.
Keywords:
Student Performance, Early Prediction, Learning Management Systems, Machine Learning.