COMPUTATIONAL THINKING MEASUREMENT CHALLENGES: RELIABILITY, VALIDITY, AND EQUITY IN INTERNATIONAL PILOT STUDIES
1 University of the Basque Country EHU (SPAIN)
2 Turku Research Institute for Learning Analytics, University of Turku (FINLAND)
3 KTH Royal Institute of Technology (SWEDEN)
4 Ankara University (TURKEY)
5 Eötvös Loránd University (HUNGARY)
6 Vilnius University (LITHUANIA)
About this paper:
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
The assessment of computational thinking (CT) has become a critical challenge in contemporary education, as CT has been incorporated into the curricula of many countries and is recognized as a fundamental competence for problem-solving, algorithmic reasoning, and abstraction across all disciplines. This study presents an analysis of CT assessment data collected from three age groups (9–10, 11–12, and 13–14 years), involving over 3,000 students and their teachers, in a pilot study implemented in autumn 2024–spring 2025 in several countries: Finland, Hungary, Lithuania, Türkiye, Sweden, and Spain. CT encompasses skills such as decomposition, pattern recognition, algorithm design, and debugging, which are essential for fostering logical reasoning and digital literacy. Assessing these skills requires instruments that balance validity, reliability, and impartiality. Percentile-based indicators (p10, median, p90) were used to describe score distributions by country and age group; On the other hand, Item Response Theory (IRT) provided a deeper understanding of item functioning through three parameters: difficulty, discrimination, and guessing. The results reveal that PC tests show strong internal consistency and acceptable structural validity, but tend to concentrate measurement accuracy at higher ability levels, especially for younger cohorts. Median scores increase with age, confirming developmental progression; however, the wide percentile ranges indicate substantial heterogeneity within countries. IRT analyses show generally well-executed items, with discrimination values above conventional thresholds and difficulty parameters spanning [−2, 2], although the scarcity of very easy items limits diagnostic sensitivity for low-ability students. These findings underscore the need for adaptive assessment strategies and calibrated item sets to cover the entire ability continuum. From a pedagogical perspective, percentile profiles and IRT parameters allow for differentiated feedback: students near the 10th percentile benefit from tasks that emphasize basic computational thinking processes, while those near the 90th percentile require complex algorithmic challenges. The study concludes that integrating percentile norms with these types of models improves interpretability and fairness in computational thinking assessment, thus recognizing that incorporating process data analysis to capture strategic behaviours is an important assessment tool. By aligning measurement design with the multidimensional nature of computational thinking, educational systems can ensure that computational thinking assessment is statistically rigorous.Keywords:
Computational Thinking, Educational Assessment, Item Response Theory, Percentile Analysis, Adaptive Testing, Cross-Country Comparison, Educational Measurement.