AI-DRIVEN LEARNING ANALYTICS FROM MULTI-YEAR PROGRAMMING ASSIGNMENTS USING LARGE LANGUAGE MODELS

C. Cano-Espinosa; C. Pomares Puig; C. Quesada-Granja; Á. Díez Díaz; S. Suescun-Ferrandiz

doi:10.21125/inted.2026.1149

DIGITAL LIBRARY

AI-DRIVEN LEARNING ANALYTICS FROM MULTI-YEAR PROGRAMMING ASSIGNMENTS USING LARGE LANGUAGE MODELS

University of Alicante (SPAIN)

About this paper:

Appears in: INTED2026 Proceedings
Publication year: 2026
Article: 1149
ISBN: 978-84-09-82385-7
ISSN: 2340-1079
doi: 10.21125/inted.2026.1149

Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain

Abstract:

This work presents a study that applies large language models (LLMs) to conduct a large-scale, automated analysis of programming assignments submitted by students over several academic years in an introductory programming course. The objective is to extract pedagogically relevant indicators from a large corpus of historical student code in order to better understand learning patterns, common difficulties, and the progression of programming competencies across the semester.

A set of indicators was defined to capture structural, syntactic and semantic aspects of each program, including code complexity, error patterns, use of control structures, modularity, commenting practices, memory-management strategies, and overall code organisation. An LLM-based assessment pipeline was developed to process all submissions and generate consistent analyses across cohorts. This approach enables the extraction of information that extends beyond traditional static-analysis tools, offering richer insights into students’ reasoning processes and programming habits.

The analyses enabled by these indicators operate at multiple levels. At the individual level, the study explores how students incorporate newly introduced concepts and how their programming style evolves throughout the course. At the group level, aggregated analytics help identify topics that systematically present greater conceptual challenges, such as pointers or dynamic memory allocation, and how these challenges manifest across different academic years. Temporal analyses explore how coding behaviours evolve month by month, identifying shifts in student engagement as the course progresses. The study also examines how the properties extracted from code submissions relate to exam performance, assessing whether patterns in programming work align with students’ broader understanding of the subject.

The proposed methodology demonstrates how LLM-based code analysis can support evidence-based improvements in teaching and curriculum design. By offering a scalable and consistent framework for analysing large volumes of student code, this work contributes to the integration of AI-powered learning analytics into computer science education. The final paper will present the full methodological details, the complete set of indicators, and an extended comparative analysis across cohorts, highlighting practical implications for instructors and course designers.

Keywords:

Learning Analytics, Artificial Intelligence, LLMs, Code Analysis, Programming Education.

About this paper:

Abstract:

Keywords:

Citation