Universidad Politécnica de Madrid (SPAIN)
About this paper:
Appears in: ICERI2023 Proceedings
Publication year: 2023
Pages: 7741-7747
ISBN: 978-84-09-55942-8
ISSN: 2340-1095
doi: 10.21125/iceri.2023.1945
Conference name: 16th annual International Conference of Education, Research and Innovation
Dates: 13-15 November, 2023
Location: Seville, Spain
The rapid advancement of Artificial Intelligence (AI) has led to an increased interest in incorporating AI skills into educational curricula. Our Educational Innovation Project titled "DIRASEI - Design and Implementation of new tools and Resources for the application of the Challenge-Based Learning methodology to the teaching and learning of Intelligent Electronic Systems Design" aims to leverage the potential of challenge-based learning to teach AI skills in electronics laboratory subjects. Specifically, this abstract focuses on the application of embedded speech technologies, including speech recognition and understanding, to a Problem-Based Learning (PBL) ARM microprocessors programming laboratory subject.

Enabling speech recognition on low-cost platforms like STM32 microprocessors opens up a wide range of practical applications in various domains and can revolutionize user interactions, making them more intuitive, hands-free, and efficient. From home automation and smart devices to voice-controlled industrial machinery and assistive technologies, the possibilities for integrating speech recognition into everyday devices and systems are vast and promising. However, it also presents challenges due to limited computational resources and memory constraints. To address this, we explore the use of two key technologies provided by Picovoice: Leopard Speech-to-Text and Porcupine Wake Word. These technologies are professional and not entirely free or open-source; however, they are available for academic purposes at no cost offering distinct approaches to voice AI processing and providing valuable insights for students, particularly for electronics students whose curricula typically do not include AI-related topics.

Leopard Speech-to-Text employs a hybrid speech-to-text system, breaking down the speech recognition process into two steps: phoneme recognition and transducing phonemes into text. This approach is less computationally intensive and can be efficiently implemented on low-power microprocessors. Students gain insights into using Deep Neural Networks (DNNs) for phoneme recognition and weighted finite state transducers (WFSTs) for transducing phonemes into text. On the other hand, Porcupine Wake Word focuses on Keyword Spotting (KWS), aiming to detect specific keywords or phrases in audio input.

Picovoice leverages pre-trained models available for speech recognition, such as those provided by open-source frameworks like Mozilla DeepSpeech or Kaldi. Using these models students learn about the optimization required for their lightweight implementation and enhanced speech recognition performance.

Throughout the educational experience, students also explore and become familiar with techniques such as model compression, quantization, and pruning to reduce model size and computational requirements while maintaining acceptable accuracy. Additionally, they have the opportunity to design their own customized solutions based on neural network architectures with reduced complexity, tailored for low-cost platforms.

By integrating embedded speech technologies into an electronics laboratory subject, students not only gain valuable AI skills but also develop an understanding of real-world applications and challenges. It also provides them a high-level orientation, enabling students to comprehend the technologies used and their significance in the development of lightweight and efficient AI systems.
Deep learning, speech technologies, Picovoice, programming laboratory subject.