THE PERFORMANCE OF TURKISH AND US STUDENTS ON PISA 2012 MATHEMATICS ITEMS: A DIFFERENTIAL ITEM FUNCTIONING ANALYSIS
Indiana University (UNITED STATES)
This study aimed to understand the differences in probability of answering the mathematics questions on PISA 2012 (Programme for International Student Assessment), of students from two different countries: Turkey and the United States. PISA is an international assessment operated by OECD (Organization for Economic Co-operation and Development), and was used as a data collection strategy by the 66 participant OECD countries and economical jurisdictions in 2012 (OECD, 2013).
Even though both groups of students from these two countries, theoretically, are on the same trait level on the construct that the mathematics items are assessing, they may answer the questions in different manners. Turkish and American students took the PISA 2012 assessment in Turkish and in English, respectively, and this might affect their responses differently although they have the same skill on the items.
Through this study, it was revealed the items working differently on these two student groups, through a special analysis method: DIF (Differential Item Functioning). DIF has become a substitution for the notions of item or test bias in IRT (Item Response Theory) (Embretson, & Reise, 2013; Zumbo, 1999). DIF occurs when the item differs across the groups in terms of having the same relationship to the latent variable (Embretson, & Reise, 2013). In this study, it was used the SIBTEST computerized procedure (Shealy and Stout 1993), a multidimensional method for detecting the DIF, to reveal the DIF flagged items among all.
The two parameters used to determine if the item is flagged DIF are the beta estimation and standardized p- difference index. Based on the SIBTEST procedure, p- difference index indicated the items 2, 3, 4, 6, 7, 9, 10, and 12 are the DIF flagged items, and beta estimation specified items 2, 4, 6, and 7 are the ones favoring the US students, and items 3, 9, 10, and 12 are favoring the Turkish students.
SIBTEST Results for Items from 1 to 12 (p-value for DIF): 0.523, 0.000*, 0.010**, 0.003*, 0.688, 0.009*, 0.000*, 0.420, 0.000**, 0.001**, 0.147, 0.006**
* DIF flagged items favoring Turkish students
** DIF flagged items favoring US students
Source: Stout and Roussos (1996)
In terms of the construct 12 items measure, these two countries differ from each other by students’ probabilities to answer the items correctly. Since the majority of items showed DIF, the next step will be to investigate the reasons of why these items behave differently, especially in terms of the language issue.
 Embretson, S. E., & Reise, S. P. (2013). Item response theory. Psychology Press.
 OECD. (2013). PISA 2012 Assessment and Analytical Framework: Mathematics, Reading, Science, Problem Solving and Financial Literacy. Paris: OECD Publishing.
 Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
 Stout, W., & Roussos, L. (1996). DIF-pack SIBTEST program [Open source computer software]
 Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.