DIGITAL LIBRARY
LEVERAGING PIX2PIX ARCHITECTURE FOR SALIENCY MAP GENERATION FROM TEACHING SLIDES
Tecnologico de Monterrey (MEXICO)
About this paper:
Appears in: EDULEARN24 Proceedings
Publication year: 2024
Pages: 3020-3026
ISBN: 978-84-09-62938-1
ISSN: 2340-1117
doi: 10.21125/edulearn.2024.0803
Conference name: 16th International Conference on Education and New Learning Technologies
Dates: 1-3 July, 2024
Location: Palma, Spain
Abstract:
Visual elements serve as tools indispensable for modern education, these tools provide teachers with different means of engaging students. These visual elements include contents such as slides, posters, and diagrams, amongst others. These elements offer an approach to convey information to the students, specifically making use of visual learning styles. By supplementing the textual content with diagrams and images, teachers can facilitate concept comprehension while maintaining student attention. However, if designed wrongly, visual content can also distract student from the most relevant content in favor of other elements with more coloring, bigger elements, etc. This can cause the opposite effect to what is intended in the use of visual content, this makes of importance to review the visual material to be presented to the students. However, this can be a slow process as the present mechanism to evaluate fixation in visual educational content is reduced to traditional in-person fixation analysis, thus, is a time-consuming process reliant on volunteers and students.

This paper presents an application of an image-to-image translation model for the detection of visual fixations in educational presentations. This work is aimed at assisting educators in optimizing their visual teaching materials by validating the salient sections of their slides and evaluating if the intended focus points align with the students’ fixations. The proposed model was trained using the pix2pix architecture which trains a conditional Generative Adversarial Network (cGAN) to map input images to output images using a U-Net based Generator with skip connections and a PatchGAN discriminator which penalizes structures at the scale of N x N patches to determine whether an image is real or fake. The model is trained on a dataset consisting of 400 images with a crop of 256x256 pixels and the ground truth, also of 256x256 pixels, these images portray different sections of educational slides both with and without a heatmap overlay which represent the different level of fixations. The images were collected with the participation of 20 students in a previous study.

Utilizing a cGAN architecture for the generation of gaze heatmaps presents an advantage in the analysis of fixations for educational visual content by circumventing the lengthy process of validating individual slides with several students using traditional in-person fixation analysis method, this enables the user to streamline the analysis and validation process, thereby enabling more efficient feedback mechanisms for the evaluation and improvement of educational visual materials.

As a conclusion, the implementation of a cGan for generating fixation heatmaps is a promising tool for enhancing the validation process of visual material. The results obtained from our implementation present satisfactory results when accompanied by an empirical evaluation for interpretation, however, numerical validation is still necessary for finetuning and accuracy. This approach is a step forward to leverage computational artificial intelligence techniques to enhance educational pedagogy.
Keywords:
Educational technology, visual learning, fixation analysis, cGAN, artificial intelligence.