DIGITAL LIBRARY
A PREDICTIVE MODEL OF SCHOOL FAILURE
INVALSI (ITALY)
About this paper:
Appears in: EDULEARN24 Proceedings
Publication year: 2024
Pages: 8268-8272
ISBN: 978-84-09-62938-1
ISSN: 2340-1117
doi: 10.21125/edulearn.2024.1954
Conference name: 16th International Conference on Education and New Learning Technologies
Dates: 1-3 July, 2024
Location: Palma, Spain
Abstract:
Introduction:
School failure is often understood only as early school leaving (ESL), in fact it means the student who leaves school during the year and then is outside the education system in the following years.
A further aspect of school failure, however, is that is related to low performances in some of the basic skills, Italian language (reading comprehension) and Mathematics mainly, but also in English.
We also have seen, over the years, emerge another phenomenon, outlined through INVALSI data, which is the implicit dispersion. By definition, the students part of this phenomenon are those who, even if they obtain a high school diploma, do not have the appropriate skills to deal easily with adult life, in short, those who leave high secondary school with the basic skills provided at the end of low secondary school. Unfortunately, data tell us that at the end of high secondary school this phenomenon stands at around just under 10%.

Research object and hypothesis:
The present work aims to create a model that allows to well identify in advance the so-called "at risk" students, i.e. those students on whom the school, through the work of teachers and school leaders, can intervene in order to reverse the forecast of school failure.
A statistical model of this type allows to identify, with a reasonable margin of error, the students who may fall into one of those categories at risk, namely abandonment, implicit dispersion or low performer. We intend to analyze the phenomenon also from a geographical point of view, to understand if there are areas more at risk.

Data used:
The data used in this work are the INVALSI data of 3 cohorts, the one outgoing in 2019, 2021 and 2022; since these are outgoing students from grade 13 and the students' entire career is considered backwards, the data on absences from the Ministry of Education has also been added.
All the datasets have been harmonized and queued in order to create a single database useful for preparing the model. For each student, the previous scores and all the information of family background, geographical and school context available over time were retrieved in order to have a dataset as complete as possible. The various cohorts are distinguished through the year variable.

Method:
In this work we propose an approach based on a supervised machine learning algorithm to identify students at risk of school failure. In particular, a Random Forest model was used, one of the most widely used algorithms for classification tasks. Using the data of the three cohorts to train the model it is possible, given a new dataset, to make predictions and thus be able to identify students at risk. The variables used concern both the context data of the students and the results in the National Surveys in previous years. The assessment of the importance of these variables in the classification provides further indications on what are the potential causes of school failure.

Results:
The results show that the algorithm is able to predict with a good level of accuracy students at risk of school failure. The analysis of classification performance metrics should be considered thoroughly before predicting potential cases of abandonment and a possible design of mechanisms for improvement interventions. The analysis of the importance of the most influential variables for the classification shows that the school performance in the previous surveys makes the greatest contribution to the forecast.
Keywords:
Learning analytics, dispersion, abandonment, forecasting, machine learning.