DIGITAL LIBRARY
EXPLORATORY EDUCATIONAL ANALYTICS OF UAE PISA TEST RESULTS
American University of Sharjah (UNITED ARAB EMIRATES)
About this paper:
Appears in: EDULEARN21 Proceedings
Publication year: 2021
Pages: 7186-7193
ISBN: 978-84-09-31267-2
ISSN: 2340-1117
doi: 10.21125/edulearn.2021.1451
Conference name: 13th International Conference on Education and New Learning Technologies
Dates: 5-6 July, 2021
Location: Online Conference
Abstract:
Programme for International Student Assessment (PISA) is a standardized test conducted by the OECD (Organization for Economic Co-operation and Development) to measure 15-year-olds’ abilities in reading, mathematics and science knowledge and skills to meet real-life challenges. A school level analysis of the 2018 PISA test results of the UAE was performed. The raw PISA data was first transformed and cleaned using ETL techniques. This resulted in cleaned data for 259 (6,715 students) schools across the seven Emirates. Unsupervised learning algorithms including k-means, k-medoids, hierarchical clustering and DBSCAN were used to group school into similar clusters. Gradient boosting was then used to determine the key features underlying each clustering. Classification and Regression Tree (CART) was used as a visual explanatory mechanism for each clustering. External validation was determined using Purity, Entropy and Adjusted Rand Index (ARI). Internal validation was carried out using Dunn Index, Silhouette Analysis, GAP Index, Davies-Bouldin’s Index, and Calinski-Harabasz Pseudo F-statistic, and t-distributed Stochastic Neighbor Embedding (t-SNE). Application of the various algorithms resulted in 3 to 6 clusters. According to internal validation metrics, k-means (Calinski-Harabasz Pseudo F-statistic = 122.92) with 6 clusters was the best algorithm followed by K-medoid (Calinski-Harabasz Pseudo F-statistic = 90.55) with 3 clusters. School gender was among the top three most important feature identified across algorithms. School Zones, Council, Urban/Rural Status, or the type of Curriculum did not explain clustering across algorithms with an ARI between 0.01 and 0.13 with one exception. School Gender was the best explanatory mechanism across algorithms with the ARI ranging between 0.48 to 0.92. This suggest that whether a school is female only, male only or mixed was the key explanatory mechanism for clustering schools. Therefore, one recommendation is the exploration of differences between these types of schools in how they yield differing PISA test results. Finally, discrepancies between PISA scores and Ministry of Education’s internal exams were also found in certain clusters and warrant further investigation.
Keywords:
PISA, educational analytics, clustering, unsupervised learning.