DIGITAL LIBRARY
APPLICATION OF BAYESIAN NETWORKS FOR EXTRACTING THE STUDENT DROPOUT PROFILE IN THE COMPUTER ENGINEERING DEGREE AT THE UNIVERSITY OF CASTILLA - LA MANCHA
Universidad Castilla-La Mancha (SPAIN)
About this paper:
Appears in: ICERI2015 Proceedings
Publication year: 2015
Pages: 3711-3720
ISBN: 978-84-608-2657-6
ISSN: 2340-1095
Conference name: 8th International Conference of Education, Research and Innovation
Dates: 18-20 November, 2015
Location: Seville, Spain
Abstract:
Student dropout is a problem that significantly affects all universities since it leads to economic losses, social problems and possible psychological problems in students. This problem can be measured quantitatively by the number of students who do not return to enroll in university, or in the academic program for which they were enrolled, in a given period.

It would be very beneficial to identify the characteristics of the student prone to abandon their studies in order to help universities to take steps to reduce the dropout rate. As teachers of the University of Castilla-La Mancha (UCLM), we are interested in obtaining the student profile dropout the computer engineering degree, because its rate is high in the UCLM.

On the other hand, Data Mining techniques allow to extract relevant knowledge from data. Among the various existing techniques it is worth mentioning those based on Bayesian networks. A Bayesian network is a directed acyclic graph whose nodes represent variables, and its links, dependencies and independencies between those variables. These dependencies are obtained from the conditional probabilities for each node given its parents, by applying algorithms based on Bayes' theorem. There are also learning algorithms that take as input a database and provide the graph of the network or the associated probabilities or even both things at once. The main advantages provided by Bayesian networks are that they have a rich semantic, allowing the user an easy interpretation of the results. Furthermore, there are abduction algorithms that provide the variables configuration that maximizes the joint probability what can be used for getting profiles, the main aim of our research.

Thus, this paper presents the process of identifying the student dropout profile in the Computer Engineering Degrees at UCLM applying Bayesian networks.
The methodology that was used consists of four stages: definition of the study population, preparing the database, application of the specific data mining techniques and interpretation of results. In our case, the study population is composed of all students that leave the different computer engineering degrees offered by the UCLM in the period between 2008 and 2012. In the stage related to the database preprocessing the 491 initial records were reduced to 363 after a purge process.

Then the learning algorithm K2 was applied to those data and a Bayesian network was obtained on which an abduction process to identify the profile sought was performed. This profile corresponds to a man who was admitted to the technical engineering in computer systems with a note of low access (range 5-6), his age when he dropouts was between 31 to 40 years, acceded to the degree via Pre-Registration 1st course and selectivity test, parents have primary school and was abandoned in the degree of 7-13 years.

This main limitation of this work is that the number of initial data is small. However, this does not invalidate our research because we have worked with all data from the entire population for the time period studied. In addition it has served to establish the methodological bases to expand and extrapolate this study in the future.