CE-QUIZ: AI-ASSISTED DEBUGGING IN EXAMS
City University of Hong Kong (HONG KONG)
About this paper:
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
Introducing Generative AI (GenAI) into large-scale computer-based programming examinations, such as the first-year undergraduate Python programming course at CityUHK with over 600 students, presents a critical challenge. While debugging with GenAI is an essential skill, unrestricted access to powerful language models risks replacing students' computational and logical thinking, as they can easily obtain complete solutions to basic programming tasks. This creates a dilemma: Although integrating learning objectives attuned to GenAI is imperative, the absence of robust assessment frameworks continues to hinder educators from endorsing its use.
To address this, we propose a controlled approach by weakening the GenAI model and leveraging our prior E-Quiz framework for self-contained, programmable exams. We introduce GenAI-assisted collaborative E-Quiz (CE-Quiz), a method designed to train and assess computational thinking in programming and deductive reasoning in debugging. Preliminary results indicate significant improvement: over 50 students scoring below 40% in the midterm E-Quiz on basic topics (iteration, functions) achieved average scores above 60% in the end-of-semester CE-Quiz covering advanced concepts (decorators, generators, composite data types).
Our solution incorporates a portable GenAI-assisted visual debugger we developed (demo at https://yoaocopy.github.io/OPTM/ with testing code). The model is purposefully weakened to ensure domain-specific security, providing hints but not fixes for Python programs. This is achieved using small language models (SLMs) with 1.5b parameters, which, unlike existing approaches requiring heavy infrastructure, can be easily trained and deployed. It also works in a serverless setup such as WebLLM, making it scalable to any number of users. To compensate for the reasoning limitations of SLMs, we employ a Chain-of-Thought (CoT) mechanism. Through Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO), using Qwen3-32B as an RLAIF reward model, the system is trained to provide thought-provoking questions that help students discover errors instead of jumping quickly to a fix that stops the thought process essential in debugging.
Our work demonstrates an adaptable self-contained exam platform with an AI-assisted debugging tool that encourages rather than replaces thinking. The findings suggest that a carefully designed GenAI-integrated assessment can enhance learning outcomes while preserving core problem-solving skills. The approach can be extended to non-programming courses, where SLMs can be trained and deployed to help students avoid careless mistakes and structure answers effectively, while maintaining domain-specific security.Keywords:
GenAI, Programming course, Debugging tool, E-Quiz, Computational Thinking, Education.