DIGITAL LIBRARY
DESIGNING A MANDARIN LEARNING IN DONGXIANG NATIONALITY BY ARTIFICIAL INTELLIGENT SPEECH TECHNOLOGY
Northwest Normal University (CHINA)
About this paper:
Appears in: INTED2021 Proceedings
Publication year: 2021
Pages: 2982-2989
ISBN: 978-84-09-27666-0
ISSN: 2340-1079
doi: 10.21125/inted.2021.0637
Conference name: 15th International Technology, Education and Development Conference
Dates: 8-9 March, 2021
Location: Online Conference
Abstract:
Motivation:
Mandarin is the common language of communication in China, and it is also an essential bridge between ethnic minorities. The promotion of Mandarin is of great significance to promote national harmony and cultural exchange. As one of the ethnic minorities in China, Dongxiang students also need to learn Mandarin. However, there is a lack of bilingual teachers to teach Mandarin in the Dongxiang area, and there is no intelligent language education system suitable for Dongxiang students to learn Mandarin.

Method:
This paper proposes transforming Dongxiang language into Chinese characters and then Mandarin and constructs an intelligent speech teaching system suitable for Dongxiang students' Mandarin learning. The system consists of a Speech enhancement generative adversarial network (Segan) and a Tacotron Automatic Speech Recognition (TASR), and a speech synthesis (Tacotron 2 + Wavernn), which takes Dongxiang phoneme as input and Chinese characters and Mandarin as output. This study needs to design a 6000 sentence Dongxiang language corpus with phoneme and tone features as the training corpus. Because there is no character in the Dongxiang language, Chinese characters are used to represent the corpus. On this basis, the corpus is recorded and annotated with "SAMPA-DX." In the experimental stage, the Dongxiang corpus is firstly sent to Segan for preprocessing. The purpose is to enhance the original speech and improve the quality of speech. Secondly, we train the TASR model, including the input layer, embedding layer, coding layer, attention mechanism, decoding layer, and output layer. The features used in the input layer include text annotated by Dongxiang corpus and Mel spectrum. Finally, the Chinese characters output from the TASR model are synthesized into Mandarin through Tacotron2 + Wavernn model.

Result:
The experimental results show that the error rate of the TASR model trained by Segan enhanced corpus is 31%, which is 2% higher than that without Segan. The MOS value of synthesized Mandarin is 4.2.

Conclusion:
The proposed method can be applied to the Mandarin Education of Dongxiang students. It can help students learn Mandarin instead of teachers. Students speak the Dongxiang language, and the corresponding Chinese characters are translated and synthesized into Mandarin by the system to achieve the purpose of teaching.
Keywords:
Mandarin education, language learning, intelligent speech technology, speech recognition, speech synthesize.