EVALUATING THE EFFECTIVENESS OF AI-BASED ESSAY GRADING TOOLS IN THE SUMMATIVE ASSESSMENT OF HIGHER EDUCATION
The Hong Kong Polytechnic University (HONG KONG)
About this paper:
Conference name: 16th annual International Conference of Education, Research and Innovation
Dates: 13-15 November, 2023
Location: Seville, Spain
Abstract:
Recently, new developments in Artificial Intelligence (AI)-related educational assessment are attracting increasing interest from educators. AI-based marking or grading tools are able to greatly ease teachers’ workload, especially when marking individual essays that involve thousands of university students. Compared with other assessment methods, essays are notably time consuming to be marked manually. Implementing AI algorithms for automated grading of essays not only reduces the time for assessment, but also provides a mechanism to test the robustness of human grading itself (Ramalingam, et al., 2018). There are three categories of assessment in higher education: diagnostic assessment, formative assessment, and summative assessment. AI tools for essay grading are more used in formative and summative assessments than for diagnostic assessment. Compared with those of formative assessments, essays in summative assessments are usually longer and more challenging to mark. It is therefore imperative to understand how to use AI-based automated essay scoring (AES) tools to complete the marking tasks effectively. Technology-enhanced assessment have been widely discussed in pedagogy literature, however, the study on AI assessment, especially AI assessment for summative essay grading in higher education is rare. This research-in-progress paper provides solid theoretical foundation and study plan to assess and compare the effectiveness of various existing AI-based essay grading tools in higher education. The AI-based essay grading tools include Intelligent Essay Assessor, Intellimetric, e-Rater, Copyleaks, progressay, and ASC. Specifically, in our research design, reflective essay assignments from a freshmen course that involve thousands of students in university will be used as the summative assessments. 800 student assignments with human-generated grades will be used as training data for the selected AI grading platforms. Based on platform requirements/functions, the assessment rubrics and marking rules will be entered into each AI grader platforms. Grades of another 200 student assignments generated by the selected AI graders will be used as testing data to examine the predictive accuracy and effectiveness of these AI algorithms. A high level of accuracy and effectiveness means closer scores to those of human marked essays. Finally, we are going to compare the accuracy and reliability of the grades generated by different platforms and select the one with highest prediction rate for future practice or further research. Statistics such as correlation and T-test will be used to test the differences between platforms. For future research plan, upon selecting the best assessment platform, we intend to investigate on how to generate tailored feedback to each individual student in other formative assessments. In the last stage, after gaining experiences with the commercial platforms, we will further develop our own advanced AI programs to enhance the overall effectiveness of the AI-based assessment tool in both summative and formative assessments.
Reference:
[1] Ramalingam, V. V., Pandian, A., Chetry, P., & Nigam, H. (2018, April). Automated essay grading using machine learning algorithm. In Journal of Physics: Conference Series (Vol. 1000, No. 1, p. 012030). IOP Publishing.Keywords:
Artificial Intelligence, essay grading, summative assessment, higher education.