ASSESSMENT AND MEASUREMENT IN HIGHER EDUCATION WITH MULTIPLE CRITERIA, SCALES, AND RATERS: A THEORETICAL APPROACH AND A PRACTICAL EXAMPLE
1 Edumetrics R&D (GERMANY)
2 University of Northampton (UNITED KINGDOM)
About this paper:
Conference name: 18th International Technology, Education and Development Conference
Dates: 4-6 March, 2024
Location: Valencia, Spain
Abstract:
Traditional assessment and measurement (A&M) in higher education (HE) is mostly holistic (relying on assessor’s private quality criteria) or based on computing the sum or average of scores from student’s responses on test items (using n-point scales). It is unclear how those assessment types, or the student’s scores on various scales may be reasonably and fairly combined. Researchers have come up with statistical models to enable A&M with multiple criteria and various scales. The acceptance of such advanced models has remained minimal: they may be too complex for educational practice. They are rather suited for educational research projects with enough resources in suitable organizational settings.
Since 2003 we are developing, testing, and applying more adequate approaches/frameworks for A&M in HE. Our most recent system, Q, takes the challenging work practices of teachers in HE into account. Q enables A&M with any number of quality criteria (not just one), on scales of various types (instead of a fixed type of scale) and allowing multiple assessors responsible for distinct quality aspects. Q was designed with flexibility/adaptability in mind. In many fields of HE, it is no longer enough to test students for their factual knowledge or artificial problem-solving ability. Instead, HE is often focused on skills that shall be assessed through observation of students’ behavior and/or examination of the results of long-term project.
Our approach offers several benefits for teachers and students:
(1) Q offers a unified vocabulary for talking about assessor roles and assessment components.
(2) Q can be adapted to different curricula, courses, and classes.
(3) Q provides a simple meta-structure to specify A&M components.
(4) The components can be re-used or further specified/modified.
(5) Quality criteria in Q are not fixed. What counts is that they are strongly tied to a course-specific assignment/test, and to a measurement scale.
(6) Q allows many types and forms of scales, such as n-point scales or (signed) percentage scales.
(7) Q refrains from specifying a fixed grading scale because the latter differs substantively between countries.
There are 3 main implications for teacher training and assessment practice:
(1) Preservice teachers need to get up-to-date training in assessment and evaluation (A&E) methods/tools, incl. frameworks like Q. Once in service, it is too late to pick up the necessary vocabulary and skills.
(2) On the job, junior teachers will get sufficient support from management and staff to acquire new A&E tools and to use/evaluate them in their class.
(3) Students need instruction and training in modern A&E concepts/techniques to understand and appreciate new ways of A&M.
Q provides scoring models for instructional types not yet covered by traditional A&M, such as group work. We have used Q to offer customizable solutions for Group-Peer-Assessment (Q/GPA), i.e., deriving individual student scores from a group score and mutual peer ratings, given two parameters: the rating impact and the group spread. Q/GPA comes with a built-in test to check that the results are correctly calculated (SJI principle). Q/GPA imposes no artificial restrictions on group size, type or number of quality criteria, or other context-specific aspects of GPA. For years, Q/GPA has been used as an Excel VB application. We are now working on the integration of Q/GPA into existing GPA tools.Keywords:
Q, score, rating, mark, grade, bounded scale, scale operation, Group Peer Assessment (GPA), group score, group spread, peer assessments, student contribution.