INTEGRATING MACHINE LEARNING AND GAUSSIAN AHP TO INFORM EDUCATIONAL POLICY: AN EXPLAINABLE ALLOCATION FRAMEWORK
University of Fortaleza (BRAZIL)
About this paper:
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
Brazil's federal financing of basic education is redistributive and supplementary, designed to equalise educational opportunities and ensure a minimum standard of quality, as monitored by the Basic Education Development Index (IDEB). IDEB guides the federal government in allocating resources, both technical and financial. Determining which investments most efficiently improve IDEB—e.g., school infrastructure, class hours, the number of teachers, or their qualifications —remains an open question. Moreover, administrators may adopt additional objectives, such as lowering dropout rates, within a multi-objective decision-making framework. Using 2023 data from the National Institute for Educational Studies and Research (INEP), covering 31,061 public schools serving grades 6th to 9th, we combine machine learning with multi-criteria decision analysis (MCDA) to estimate each school’s likelihood of concurrently achieving a high IDEB and a low dropout rate. To tackle the unbalanced datasets and its bias in classification, this study applies methods at the data and algorithm levels, besides using metrics such as G-mean and Unweighted Average Recall (UAR), better suited to evaluate the performance of a classifier in imbalanced datasets. Moreover, different feature selection methods, namely Recursive Feature Elimination (RFE) and SelectFromModel, were used in order to obtain the best-performing subset of original features without transformation. Additionally, three classification models were built applying Support Vector Machines (SVM), Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) algorithms, and we employed SHapley Additive exPlanations (SHAP) to obtain global interpretability of the models and to quantify the direction and magnitude of the influence of variables associated with high IDEB and low dropout on models decisions. Finally, the most significant attributes were considered decision criteria. At the same time, schools were selected as alternatives, and an MCDA method, the Gaussian Analytic Hierarchy Process (Gaussian-AHP), was applied to identify and rank schools across the country with a higher probability of high IDEB and low dropout rates. These schools would be prioritised for resource allocation. This study differs from previous studies since its dataset is focused on the schools - on their infrastructure, human resources, dropout levels, performance of their students and other educational indicators - and not on the socioeconomic profile of students. Furthermore, we couple machine-learning models with MCDA to estimate and rank schools by their probability of meeting predefined targets, providing an evidence-based basis for policymaking and school leadership. The framework yielded a ranking of public schools that, based on empirical evidence, are better positioned to achieve the predetermined multiple objectives—namely, a high IDEB and low dropout rates—and provides the federal government with concrete guidance on which actions to prioritize, rather than limiting its role to generic exhortations to improve IDEB scores.Keywords:
Educational data science, student performance, school dropout, imbalanced learning, explainable AI, multi-criteria decision analysis, gaussian-AHP.