COMPUTERIZED ADAPTATIVE ASSESSMENT IN HIGHER EDUCATION: CHALLENGES AND CHANCES
Without doubt, the most common means of evaluating the knowledge of large groups (university students and public exam candidates, etc.) is the traditional multiple choice exam done with pen and paper, due to the simplicity of its production and correction. This exam is based on the classical test theory, characterised by the presentation of the same list of questions and
alternative answers to all examinees and, by means of a linear model, the evaluation of knowledge from the number of correct answers. However, one of its main limitations is the lack of precision of the results and the high cost in terms of resources and time due to the use of normative groups.
In recent years a new generation of multiple choice exams has emerged, which is successfully overcoming the problems posed by the traditional ones. These are Computerized Adaptive Tests (CAT), developed to measure ability, knowledge, attitude, personality, etc. They are characterised by their flexibility as regards the ability level of the candidate and their biggest advantage is the precision of the results, as well as the saving of time needed for the exam and the possibility of obtaining results immediately.
In order to evaluate knowledge, a CAT uses a calibrated item bank with the model of three parameters (3-p) of the Item Response Theory. The test begins with the selection at random of an item of average difficulty from the item bank to which the examinee must answer. Thus an iterative process is initiated in which the presentation of successive items of varying degrees of difficulty always depends on the estimated level of ability. The process finishes when the standard error is less or equal to that previously fixed by the evaluator. In this way, although different examinees are evaluated with different items, both in number and content, the evaluation of the ability of all candidates is guaranteed with the same reliability. The results given by CATS offer the evaluator the ability level of the examinee, the final estimation error, the items presented and the answers given to each one.
In this study, we present the procedures we follow to produce a CAT for testing the knowledge acquired by undergraduate students in the area of Statistics as part of the degree in Psychology. The 254 items in the bank are multiple choice questions and are dichotomously scored. They were calibrated according to the model (3-p) and for each one the level of difficulty, discrimination and pseudo-chance is given. The software used is the programme FastTest v. 2.0. The test begins with an item of average difficulty (-1 < bj < 1) and the following items are the most informative (least standard error, maximum discrimination and least pseudo-chance) for each evaluation of ability level. The test ends when estimation error drops to 0.30. We present an
example of the application and a report on the results. In our case, we can confirm that the CAT we have produced is an appropriate and innovative form of evaluating knowledge acquired in Statistics, satisfying the needs arising from the new paradigms of knowledge evaluation in the field of education.