DIGITAL LIBRARY
EARLY STUDENT DETECTION FOR IMPROVEMENT OF ACADEMIC RESULTS USING MACHINE LEARNING MODELS
Rey Juan Carlos University (SPAIN)
About this paper:
Appears in: EDULEARN22 Proceedings
Publication year: 2022
Pages: 6177-6185
ISBN: 978-84-09-42484-9
ISSN: 2340-1117
doi: 10.21125/edulearn.2022.1454
Conference name: 14th International Conference on Education and New Learning Technologies
Dates: 4-6 July, 2022
Location: Palma, Spain
Abstract:
Data science is a multidisciplinary field combining mathematics, statistics, programming skills, and domain expertise whose purpose is to extract valuable information from data. In particular, the present work aims to use the tools of Data Science to improve the academic results of students by early identifying their characteristics and the type of motivation they need. This work is framed in the Learning Analytics field that tries to enhance education using analytics tools like Data Science.

In the present work, a complete framework is presented to collect data from students through online questionnaires conducted at the beginning of the course, to model it with Machine Learning models, and to extract meaningful information from it. This information can highlight both those students at risk of failing the course as well as more influential students that might motivate positively or negatively the rest of the class. Thanks to this framework, the teacher can identify this type of students at an early stage of the course and take actions accordingly. For example, mentoring personally those students at risk, showing the utility of the course to motivate the students’ interest in it or accomplishing more real-life activities.

At the moment, two types of questionnaires have been developed and tested:
(1) to detect students at risk of failing and
(2) to identify the influential students.

The first one is composed of questions regarding the students’ interest in the subject, their marks in previous related subjects, estimated time devoted to study, the expected mark, etc. The second one is more focused on the leadership skills of students, their participation in the social networks with other classmates, average time spent talking/chatting about the subject with classmates, etc.

To get relevant information from the questionnaires, clustering models are applied since they reveal groups of data (students in this case) with similar characteristics. By analyzing the characteristics of the resulting groups, the teacher can easily identify the different types of students, their expected performance, how influential they are, etc.

In particular, the framework has been assessed in artificial and real data. First, different responses to both questionnaires have been simulated considering extreme situations to check the framework. This experiment has shown stable and explainable results validating the framework. The real data comes from 40 students of the subject Introduction to Computer Science of the Software Engineering Bachelor’s Degree in the Rey Juan Carlos University in Spain. Through the presented framework and the collected data different clear groups have been formed and defined. These results have provided rich information to the teachers that have been taken into account during the course with positive feedback from teachers and students, and an improvement in the average score of the subject.

The proposed tool brings together the Data Science and the Education field to improve the success of students. It has shown promising results both in artificial and real datasets, it is simple, easy to use, useful, non-invasive with students, applicable to all educational levels, and, with little modifications on questionnaires, adaptable to other objectives.
Keywords:
Students profiling, Student support, Influential student, Data Science, Machine Learning.