DIGITAL LIBRARY
PEER EVALUATIONS: DATA DRIVEN LEARNING OF DEBUGGING SKILLS
Emory University (UNITED STATES)
About this paper:
Appears in: EDULEARN24 Proceedings
Publication year: 2024
Pages: 7833-7841
ISBN: 978-84-09-62938-1
ISSN: 2340-1117
doi: 10.21125/edulearn.2024.1836
Conference name: 16th International Conference on Education and New Learning Technologies
Dates: 1-3 July, 2024
Location: Palma, Spain
Abstract:
Detecting and correcting errors in computer code, also known as debugging, is a fundamental skill for computer programmers. Novice programmers often find this skill difficult to learn. However, explicit and deliberate teaching of this skill is often overlooked in introductory programming courses, and students are left to learn it indirectly through trial and error while solving traditional code writing assignments.

To address this issue, we designed and implemented a course activity named "Peer Evaluations" which helps students practice their debugging skills by exposing them to hundreds of faulty programs written by their peers.

During the semester, students regularly attempt to solve many programming problems, and all these attempts are stored in our course submission system. Incorrect submissions are anonymized and distributed to multiple students in the form of grading tasks. Each student (peer grader) solves a random set of grading tasks through an interface that shows them the original question, the original student’s answer, a correct answer, and a grading rubric. The goal of a peer grader is to find the mistakes in the original solution, assign a score to the solution according to the rubric, and write informative feedback. Peer graders receive credit based on the accuracy of their grading, which is automatically calculated as a function of the distance between each individual score and the median of the other scores assigned by other peer graders to the same problem.

We implemented this activity in a large introductory programming course for multiple semesters, and we evaluated it for accuracy over thousands of submissions. Our data showed very good internal accuracy, i.e., how much individual graders tend to agree with each other; and fair external accuracy, i.e., agreement between peer graders and teaching assistant. Further evaluation with independent judges revealed that a larger proportion of the aggregate scores assigned by peer graders was labeled more accurate than the corresponding scores assigned by teaching assistants.
Keywords:
Computer Science Education, Peer Grading, Debugging.