DIGITAL LIBRARY
PERSONALITY BY PROMPT: EVALUATING GPT-GENERATED PERSONALITY QUESTIONNAIRES FOR EDUCATIONAL ASSESSMENT
1 Roma Tre University (ITALY)
2 Sapienza University of Rome (ITALY)
About this paper:
Appears in: INTED2026 Proceedings
Publication year: 2026
Article: 1181
ISBN: 978-84-09-82385-7
ISSN: 2340-1079
doi: 10.21125/inted.2026.1181
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
The rise of large language models (LLMs) has opened new pathways for educational assessment and learning analytics, including the automated generation of questionnaires used to support student profiling, guidance, and formative evaluation. Within educational contexts, brief personality measures are frequently employed to inform teaching practices, student support services, and research on learning-related individual differences. However, the psychometric properties of LLM-generated assessment tools remain largely unexplored.

This study examined the potential of LLMs in developing brief personality questionnaires aligned with the Big Five model, which postulates five core traits: Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness. A sample of 111 Italian university students (mean age = 24.6 ± 6.1 years; 87 females, 22 males, and 2 gender non-conforming participants) completed the Italian Ten Item Personality Inventory (I-TIPI) along with two parallel 10-item Big Five measures (two items per trait) generated by GPT-4 (gpt-4-turbo model). The first GPT-based questionnaire was created using a minimal “unguided” prompt, whereas the second was generated using an augmented prompt that included explicit, literature-based definitions of the Big Five traits.

Reliability analyses confirmed the expected low internal consistency of the I-TIPI, while the unguided GPT questionnaire yielded consistently higher reliability across traits (McDonald’s ω = .62–.90). The definition-guided version showed excellent reliability for Agreeableness and Openness (ω = .91 and .86, respectively), but weaker coefficients for Extraversion and Emotional Stability. Both GPT-based measures demonstrated moderate to strong convergence with the I-TIPI (|r| = .25–.74).

The unguided questionnaire generally showed equal or stronger correlations for Agreeableness, Conscientiousness, and Emotional Stability, whereas the guided version outperformed only for Openness. The two GPT-generated questionnaires were also strongly correlated with each other (r = .60–.82 across traits).

Overall, the findings suggest that GPT-4 can generate educationally relevant assessment scales with robust psychometric properties, and that constraining the model with explicit theoretical definitions does not necessarily enhance their quality. From an educational perspective, these results highlight the potential of LLMs as tools to support the rapid development of context-specific questionnaires for educational research, student assessment, and personalized learning environments.
Keywords:
Large language models, educational assessment, personality, questionnaire development, learning analytics.