DIGITAL LIBRARY
QUESTION GENERATION FOR TEXTBOOK FLASHCARDS
University of Pennsylvania (UNITED STATES)
About this paper:
Appears in: EDULEARN22 Proceedings
Publication year: 2022
Page: 3412 (abstract only)
ISBN: 978-84-09-42484-9
ISSN: 2340-1117
doi: 10.21125/edulearn.2022.0832
Conference name: 14th International Conference on Education and New Learning Technologies
Dates: 4-6 July, 2022
Location: Palma, Spain
Abstract:
Spaced Repetition is an effective way of reviewing learning materials. In this paper, we present a novel application that lets professors automate the process of creating flashcards by generating questions and answers from textbooks, and lets students study those flashcards using spaced repetition.

We evaluate the software by:
a) having expert annotators evaluate the quality of the flashcards generated using our application against those created by other methods and
b) deploying the application in a college-level computer science classroom.

We find that our method generates high quality flashcards, and improves grades by 0.29\sigma. We describe the deployed prototype of the application, which will be demoed at the conference. Finally, we make some observations about the practical aspects of deploying the application in a classroom setting.

Method:
We use a fine-tuned pre-trained language model for our QG task. Our model of choice is a T5-base model (a large, pre-trained language model) fine-tuned on the Stanford Question Answer Dataset (SQuAD) to do three different tasks: answer extraction, question generation, and question answering. To generate questions, we split the input text into chunks of size <512 tokens such that no sentences are split across a chunk boundary and all chunks have a roughly equal number of sentences. We iteratively apply the T5 model to each chunk to extract answer-like spans of text (one per sentence) and generate questions that would be likely to have those answers. If the model does not identify an answer-like span for a given sentence, the sentence is skipped and no question is asked. To get the final list of questions, we concatenate all of the generated question lists from each chunk together.

Experiments:
We conducted three experiments. In the first one, we generated 675 questions from summaries of a computer science college course. The summaries were written by three human annotators (teaching assistants). Secondly, we compared the performance of our model with a fully automated pipeline in which we automatically generate summaries using a BART-large model which was fine-tuned for summarization on the CNN/DailyMail dataset. Thirdly, we compare with a similar pipeline generating from the raw text without summarization.

Evaluation:
We randomly sampled 100 question-answer pairs from each of the three categories, which were evaluated by three annotators. We present the result of the inter-annotator agreement and question quality. Our results show that generating questions for summaries significantly out-performs generating questions for raw text.

Prototype:
Our tool is designed to to allow a student or an instructor to enter text (in the large text box) by simply copying and pasting. They can click on the "generate cards" button to initiate the model on the back end which will analyze the text and automatically generate flashcards. Instructors have the option of creating a class and assigning different courses for each class. Each course will contain its own set of flashcards which the instructor can make available to the students or even assign them. The output questions are displayed and the corresponding flashcards are created and they can be flipped to see the answer. The user can interact with the questions to edit them or reject them. Instructors can manage their database of previously generated questions and can also manage and add students to the course using a similar interface.
Keywords:
Ed Tech, Assessment, Question Answering, Spaced Repetition, A.I., Natural Language Processing, Deep Learning, Online Learning, Smart Flashcards.