University of Alicante (SPAIN)
About this paper:
Appears in: INTED2024 Proceedings
Publication year: 2024
Pages: 374-377
ISBN: 978-84-09-59215-9
ISSN: 2340-1079
doi: 10.21125/inted.2024.0144
Conference name: 18th International Technology, Education and Development Conference
Dates: 4-6 March, 2024
Location: Valencia, Spain
Conversational artificial intelligences have been developed slowly but steadily in the last years. However, in the recent months, they have experienced a heavy boost with the emergence of ChatGPT and other large language models (LLM). Nowadays, these novel systems can maintain a conversation with a human using natural language, answer a myriad of different questions and, basically, generate coherent textual content given any prompt.

Even though these algorithms have limitations and can make mistakes, this fact is starting to affect traditional learning and evaluation methodologies in schools and universities.

In this work, we explore the potential of using large language models to automatically generate tests that the teachers can use to evaluate a certain topic. Specifically, we use different LLMs, such as ChatGPT, Bard and Llama, to automatically generate questions for multiple-choice quizzes of different topics and we discuss their accuracy, insight ability and suitability for studying and evaluation. We involved different topics from quantum physics to the history of pirates, and considered different sources such as text and transcriptions of videos about the topic. Our findings show that LLMs can be a powerful tool to generate test, however they require supervision and editing to be used to perform real evaluations.
LLM, quizzes, multiple-choice, artificial intelligence.