NATURAL LANGUAGE PROCESSING FOR DATA MINING IN COMPUTER SCIENCE EDUCATION
Universidad Nacional de Educacion a Distancia (UNED) (SPAIN)
About this paper:
Conference name: 12th International Conference on Education and New Learning Technologies
Dates: 6-7 July, 2020
Location: Online Conference
Abstract:
Monitoring student performance is paramount to ensure teaching quality. In particular, identifying the topics in a subject that present the highest difficulty according to the results obtained by the students allows teachers to determine which aspects of the teaching of a subject need to be reinforced, require more explanation, materials, additional time, etc.
For this analysis to be meaningful, a substantial amount of data is required about the students' results. The analysis of the topics of the subjects requires to know of the subject topics involved in the questions asked to the students in the evaluation tests. Afterwards, from the topics and the associated student results, we can drawn conclusions about the comparative difficulty of different topics of the subject.
However, the manual annotation of the topics involved in the tests questions is a very expensive and time-consuming task. Therefore, in this paper we propose to apply natural language processing techniques to perform this task. Specifically, we propose to apply automatic techniques for detecting key phrases in the test questions. These techniques amounts to identifying the expressions that, according to statistical data, are more relevant to characterize the texts.
For the evaluation of the proposal we have focused on a subject related to advanced algorithms that is taught at the National University of Distance Education of Spain in a Computer Science degree. This is a fundamental subject in the Computer Science curriculum with advanced level of contents that requires prior knowledge of mathematics and programming.
The subject includes advanced data structures of computer science, and to algorithmic schemes, i.e. general principles to address a problem with. The data structures taught in the subject include hash tables, graphs, and heaps. The algorithmic schemes include the greedy one, divide and conquer, the backtracking scheme, dynamic programming and branch and bound.
For this subject we have manually annotated a set of tests with the most relevant topics related to each question. This has allowed us to evaluate the techniques presented in this work. The techniques developed and evaluated on this subject are general and applicable to any other subject and discipline. In this way it is possible to automate the learning analysis.Keywords:
Learning analytics, computer science education, data mining, natural language processing.