ASSESSING THE UTILITY OF LARGE LANGUAGE MODEL SIMULATIONS FOR CLINICAL MEDICAL EDUCATION: A PILOT STUDY
1 University of California, Irvine School of Medicine (UNITED STATES)
2 University of California, Irvine School of Medicine, Dept of Emergency Medicine (UNITED STATES)
3 UCLA School of Dentistry (UNITED STATES)
About this paper:
Conference name: 19th International Technology, Education and Development Conference
Dates: 3-5 March, 2025
Location: Valencia, Spain
Abstract:
Introduction:
The integration of artificial intelligence (AI) into medical education offers new opportunities to enhance clinical decision making, especially with large language models (LLMs). Unlike static resources, GPT-based simulations allow learners to engage in dynamic, decision-based training that mirrors the complexity and unpredictability of actual patient care. Such tools are particularly valuable for medical students who are transitioning from preclinical to clinical curriculums. This study aims to evaluate the effectiveness and utility of GPT-based simulations in preparing third-year medical students (MS3s) for clinical responsibilities and assessments.
Methods:
A decision-based simulation framework was developed using an asthma exacerbation as its chief clinical problem. Third year medical students were then instructed to use ZotGPT, an institutionally developed large language model (GPT-4 Turbo) that provides additional safety measures. The similar was designed to simulate clinical scenarios without requiring multiple-choice inputs and update patient conditions dynamically based on treatment decisions, allowing students to practice real-time decision-making. Participants were instructed to focus on treatment steps without specifying drug dosages or using conditional statements, as the simulations were pre-programmed to reflect patient responses to interventions. Participants then received feedback on the appropriateness of their actions and accessed the simulation using pre- and post-simulation surveys. For subgroup analysis, responses were stratified based on the duration of survey completion (≤180, 181–600, and >600 secondss). This stratified approach helped explore potential patterns linked to the time participants engaged with the survey. The data was analyzed using a one-sample Wilcoxon signed rank test, and statistical significance was determined at a standard alpha level of 0.05.
Results:
A total of 155 students participated in the study. Students rated the simulation as significantly helpful in clarifying the proper first step in acute asthma management (p = 0.001), identifying appropriate second-line treatments for unresolved respiratory distress (p = 0.002), and understanding treatments for severe exacerbations refractory to other interventions (p = 0.021). Additionally, the simulation significantly aided in clarifying the final step in treatment for recovered severe exacerbations (p = 0.005). The simulation was found to be as useful (no statistical difference) as non-simulation learning methods (e.g., textbooks, lectures, flashcards) and in-person simulation labs. Subgroup analysis of participants who spent more than 600 seconds on the survey further corroborated these findings across all domains (p = 0.004–0.037), however, students found the simulation to be more useful than textbooks, flashcards, and lectures (p = 0.013).
Conclusions:
The findings of this study suggest a positive perception of the simulation and suggest that for most students, the simulation is as useful as the gold standard education modalities. Notably, students who spent more time with the simulation reported finding it more beneficial than conventional resources, highlighting the value of immersive learning experiences. While the study focused on a single clinical scenario and relied on self-reported data, the results suggest a promising role for AI-driven tools to help develop the clinical decision-making skills of medical students.Keywords:
Medical Education, LLMs, GPT, AI, Clinical Education.