DIGITAL LIBRARY
WEEKLY PREDICTION OF AT-RISK STUDENTS USING DATA MINING
Eötvös Loránd University (HUNGARY)
About this paper:
Appears in: EDULEARN22 Proceedings
Publication year: 2022
Pages: 9230-9236
ISBN: 978-84-09-42484-9
ISSN: 2340-1117
doi: 10.21125/edulearn.2022.2213
Conference name: 14th International Conference on Education and New Learning Technologies
Dates: 4-6 July, 2022
Location: Palma, Spain
Abstract:
Technological advancements and the explosion of information have made online learning increasingly popular since it enables learners to gain new skills without a physical mentor. particularly, virtual learning was useful during recent pandemic crisis. Despite the advantages of distance learning, it faces a challenge of high dropout rates and low completions rates. Educational institutes will be able to predict students’ performances and make interventions on time if there us an analytical strategy. However, efficient prediction and proactive intervention needs, reliable, meaningful, and accurate data. One form of tools used in online learning is Virtual Learning Environment (VLE), used for teaching and content delivery. VLE stores the interaction traces of students, assessments scores, and demographic characteristics. The increased used of such tools by students generate more data flows. Hence, learning analytics and data mining mining become crucial to make use of the data.

In this study, we use data from Open University (OU), one of the largest distance learning universities worldwide. OU data are provided as 7 separate CSV tables. The tables contain information about students’ interactions with VLE, assessment scores, registration, etc.

VLE engagement data was provided as summary of daily total clicks. We first integrated and transformed VLE interaction data from daily basis to weekly basis using Python programming language and Pandas library and some formulas. Similarly, assessments (studentAssessements and assessments) are integrated and transformed using some formula to obtain a total week score based on the assessments’ weights. Students who failed or withdrawn are considered as at-risk and labeled 1, others labeled as 0, i.e., we deal with at-risk as a binary classification problem. Assessments were available during certain weeks; they were added to corresponding VLE interaction weeks.

Knowledge discovery process was followed in the paper. After preparing final tabular dataset, feature selection techniques were applied on VLE data as some VLE activities are not important. Then, multiple machine learning models are applied and evaluated using F-measure and accuracy to determine best prediction model. The results show using only VLE data best model F-measure was about 69.4% in first week and 71.7% in second week. In week 15, the prediction accuracy was 74.3%. The performance was boosted in weeks where assessments are available. For example, in week 7, the accuracy was 80.4%. This means assessments are more discriminative. Nevertheless, VLE can be used to predict students outcomes starting from second week.
Keywords:
At-risk prediction, Distance Learning, Learning analytics, Educational data mining.