DIGITAL LIBRARY
IMPROVING MANDARIN LEARNING FOR DUNGAN BY AUTOMATIC SPEECH RECOGNITION AND MACHINE TRANSLATION
Northwest Normal University (CHINA)
About this paper:
Appears in: INTED2021 Proceedings
Publication year: 2021
Pages: 2893-2900
ISBN: 978-84-09-27666-0
ISSN: 2340-1079
doi: 10.21125/inted.2021.0618
Conference name: 15th International Technology, Education and Development Conference
Dates: 8-9 March, 2021
Location: Online Conference
Abstract:
Motivation:
As China's "one belt, one road" economy develops, more and more Donggan people want to learn Chinese to understand and integrate into China. Mandarin learning education has developed rapidly in recent years under the promotion of Donggan students learning in China. At present, Donggan students learn grammar, vocabulary, and other knowledge of Mandarin from teachers in the classroom with a traditional way. However, traditional learning methods need bilingual teachers who understand both Chinese and Donggan. Teachers who teach Chinese for Donggan in China are mainly Chinese, and few teachers understand Donggan Language, which brings difficulties to Donggan students' Chinese teaching. Therefore, using artificial intelligence technology to develop a technology that can help teachers translate Donggan into Chinese can help teachers teach Donggan students Mandarin.

Method:
This paper uses automatic speech recognition and machine translation technology to solve the lack of bilingual teachers when Donggan students learn Putonghua. First, we use end-to-end speech recognition technology to train the speech recognition acoustic model of Donggan Language. Secondly, we use the transformer method to train a machine translation model from Donggan to Chinese. Finally, we use the acoustic model of speech recognition to recognize Donggan speech as Donggan text and use the machine translation model to translate Donggan text into Chinese text.

Result:
We construct a Donggan Language Corpus, including Donggan text and its corresponding Donggan phonetics. Among them, there are 4616 sentences in Donggan text, including initials, finals, and tones. Five native speakers of the Donggan Language recorded 4616 Donggan Phonetics (923 sentences per person for 6 hours). The experimental results show that this method's word error rate is 27.9% in Donggan speech recognition. Compared with the traditional hidden Markov model (HMM) and deep neural network (DNN), the word error rate is reduced by 3.24%. We conducted experiments on 20 Donggan students in Putonghua learning scene. Students of Donggan nationality can convert Donggan pronunciation into corresponding Chinese text through the method in this paper.

Conclusion:
This paper puts forward a method to convert Donggan speech into Chinese text used for Donggan students' Mandarin learning. The results show that this method can effectively improve the Donggan students' Mandarin learning level and promote the better development of Donggan Putonghua learning.
Keywords:
Second-language Learning, Language Learning Innovations, Dungan language, Automatic Speech Recognition, Machine translation.