DIGITAL LIBRARY
THE USE OF AI IN EDUCATIONAL STANDARDISED TESTING: A SCOPING REVIEW
INVALSI (ITALY)
About this paper:
Appears in: INTED2025 Proceedings
Publication year: 2025
Pages: 3404-3411
ISBN: 978-84-09-70107-0
ISSN: 2340-1079
doi: 10.21125/inted.2025.0868
Conference name: 19th International Technology, Education and Development Conference
Dates: 3-5 March, 2025
Location: Valencia, Spain
Abstract:
Technological advancements, particularly Artificial Intelligence (AI), have significantly influenced educational assessment, including standardised testing. AI's potential applications in this domain include stimuli and item generation, test assembly, Computer Adaptive Testing (CAT), automated scoring, and personalized assessments. These innovations promise benefits such as efficiency in item generation, accelerated scoring, streamlined test assembly, and individualization of assessment. However, practical and ethical challenges remain to be addressed.

This scoping review investigates the extent and types of evidence on AI use in standardised testing for primary and secondary education, guided by the research question:
- What is the scope and nature of evidence on AI applications in standardised tests within primary and secondary education contexts?

A structured methodology was employed, including a comprehensive literature search across five databases (APA PsycINFO®, BEI, Education Source Ultimate, ERIC, and Scopus) using the PCC (Population, Context, Concept) framework. The general search query combined terms like *"artificial intelligence"*, *"standardised test*" and *"large-scale assessment"*, adapted per database.

Following deduplication, 383 records were screened in Rayyan, with inclusion criteria focusing on the following:
1. AI or Machine Learning (ML) relevance
2. Application in educational contexts
3. Connection to standardised tests or International Large-Scale Assessments (ILSAs)
4. Focus on primary and secondary education

Conflicts during screening were resolved collaboratively, leaving 43 articles for full-text review. Of these, 12 met the inclusion criteria for in-depth analysis.

Preliminary findings reveal diverse themes, including CAT, scoring, item generation, and the validity of AI-supported assessments, with CAT and scoring being the most frequently addressed. However, item generation remains underexplored. This review highlights critical gaps and opportunities for future research in leveraging AI for standardised educational assessments.
Keywords:
Artificial intelligence, standardised test, large-scale assessment, educational setting.