A. Vista, E. Care, P. Griffin

University of Melbourne (AUSTRALIA)
This paper presents the development of a guided peer-review system with the main goal of increasing the reliability and construct validity of the process for marking complex assessment tasks. This is done by adapting a host of technologies to improve peer-review, thereby facilitating an assessment system that emphasise student interaction and engagement in conjunction to the aim of increasing the efficiency of large scale marking of complex tasks. Currently, complex tasks incur significant costs to mark, becoming exorbitant for courses with large number of students (e.g., in massive open online courses or MOOCs).

Large scale assessments are currently dependent on automated systems for scoring them. However, automated scoring systems tend to work best when the assessments are quantitative in format, where the correctness or incorrectness of a response can be explicitly defined. While there has long been evidence (e.g., Rudman, 1977) that well-constructed quantitative format tests can measure complex cognitive processes, they are limited when it comes to assessing tasks that require deeper analysis, where diverse concepts need to be expounded and connected with each other, and where holistic assessment is required.

The drawback when it comes to scoring complex and lengthy assessments is that they tend to be tedious to mark and are generally not amenable to automated scoring (e.g., Coursera has limited its autoscoring to tasks with well-defined responses). It has been a challenge to develop a system to score these types of complex assessments (Wilkowski et al., 2014). Distributed marking can offer a solution to handle both the volume and complexity of these assessments. Peer-grading has been found to be a viable choice for distributed marking of complex tasks and studies have provided evidence that structured peer-grading can be reliable (Sadler & Good, 2006). However, the diversity inherent in very large classes can be a weakness for peer-grading systems because it increases the chance that peer-reviewers may have educational backgrounds that are not matching the level of the task being assessed (Pappano, 2012), thereby raising reliability issues and philosophical objections of having non-experts score complex tasks.

We propose a solution wherein peer scoring is assisted by an automated guidance system that enables a more focused targeting of predefined rubrics via an interface that automatically scaffolds the target paper based on predefined rubrics and taxonomy frameworks so that relevant content and indicators of higher level developmental skills are framed and drawn to the attention of the marker. This automated facilitation has a similar purpose to manually training the peer markers on the rubrics and scoring criteria prior to the actual marking, which has been found to increase peer-marking reliability (Sadler & Good, 2006). The logistics of pre-training peer markers in very large classes limits the viability of this approach. In comparison, whereas pre-training provides markers with a general preparation on the task rubrics through a trial task, the automated scaffolding system that we propose guides the markers as indicators for each rubric criterion are observed in actual and individual task submissions. Eventually, we aim to establish that the scores produced are comparable to scores produced by expert raters.