TOWARDS USING LARGE LANGUAGE MODELS TO AUTOMATICALLY GENERATE READING COMPREHENSION ASSESSMENTS FOR EARLY GRADE READING ASSESSMENT
American University of Sharjah (UNITED ARAB EMIRATES)
About this paper:
Conference name: 18th International Technology, Education and Development Conference
Dates: 4-6 March, 2024
Location: Valencia, Spain
Abstract:
Reading comprehension is a fundamental skill that plays a crucial role in education. Standardized reading assessment tests are pivotal tools in the educational landscape, serving to systematically measure and evaluate an individual's reading proficiency. Early Grade Reading Assessment (EGRA) serves as a globally recognized standard for assessing reading proficiency among young learners. EGRA is a vital tool in the field of education used to evaluate and monitor the reading proficiency of early-grade students. Typically administered to children in the early stages of their educational journey, EGRA aims to assess key components of reading comprehension, including fluency, accuracy, vocabulary, and comprehension of text. This assessment serves as a crucial diagnostic and intervention tool, helping educators identify students who may be struggling with reading, thereby enabling targeted support to enhance literacy skills. EGRA's standardized procedures and benchmarks facilitate data-driven decision-making in the education sector, allowing for the measurement of literacy growth and the effectiveness of literacy programs. However, assessing reading comprehension skills can be challenging for educators, as it often requires individualized attention and extensive time investment. The creation of materials for the Early Grade Reading Assessment (EGRA) presents an array of substantial challenges. First and foremost, developing reading materials that are age-appropriate, culturally sensitive, and linguistically relevant to a diverse range of early-grade learners can be an intricate task. Crafting texts that are engaging, yet not too challenging for the target age group, requires a deep understanding of cognitive development and literacy levels. Ensuring that materials are balanced in terms of topics and themes to avoid bias or favoritism is an additional challenge. Furthermore, the process of generating comprehension questions that effectively measure reading comprehension can be intricate, as it requires a clear alignment with the learning objectives and skills being assessed. Quality control and standardization in material generation are also significant challenges, as variations can affect the reliability and validity of the assessments. Additionally, keeping the materials current and responsive to evolving educational needs is an ongoing concern. Overall, the development of EGRA materials necessitates careful consideration of linguistic, cultural, cognitive, and pedagogical factors, demanding a high degree of expertise and attention to detail. This paper explores the use of state-of-the-art Large Language Models (LLMs) to automatically generate EGRA comprehension stories, associated comprehension questions, and sample answers to these questions. The basic idea is to provide these reading comprehension stories to teachers to be used in their regular teaching activities throughout the year. In specific, the paper assesses the capabilities of state-of-the-art LLMs, such as GPT-4 or equivalent models, in generating contextually appropriate, age-relevant reading texts for early-grade students. This assessment will consider factors like linguistic complexity, cultural relevance, and alignment with EGRA reading proficiency levels. The focus of the evaluation of generated questions is on question quality, diversity, and the ability to measure core reading comprehension skills.Keywords:
Early Grade Reading Assessment, Technology-enhanced learning, Reading Assessment, Large Language Models, GPT.